Sensitiveness analysis: Sample sizes for t-tests for paired samples

Similar documents
7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

Two-Sample Inferential Statistics

Parameter Estimation, Sampling Distributions & Hypothesis Testing

MATH 240. Chapter 8 Outlines of Hypothesis Tests

Psychology 282 Lecture #4 Outline Inferences in SLR

The t-statistic. Student s t Test

DISTRIBUTIONS USED IN STATISTICAL WORK

What p values really mean (and why I should care) Francis C. Dane, PhD

Statistical Inference

10.4 Hypothesis Testing: Two Independent Samples Proportion

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

Stephen Senn. P value wars

ScienceDirect. Who s afraid of the effect size?

Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box Durham, NC 27708, USA

Institute of Actuaries of India

F79SM STATISTICAL METHODS

Methodological workshop How to get it right: why you should think twice before planning your next study. Part 1

HYPOTHESIS TESTING. Hypothesis Testing

The One-Way Repeated-Measures ANOVA. (For Within-Subjects Designs)

Data Mining. CS57300 Purdue University. March 22, 2018

Test 3 Practice Test A. NOTE: Ignore Q10 (not covered)

Bayesian Information Criterion as a Practical Alternative to Null-Hypothesis Testing Michael E. J. Masson University of Victoria

Methodology Review: Applications of Distribution Theory in Studies of. Population Validity and Cross Validity. James Algina. University of Florida

ECE531 Screencast 9.2: N-P Detection with an Infinite Number of Possible Observations

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

Reports of the Institute of Biostatistics

PSY 305. Module 3. Page Title. Introduction to Hypothesis Testing Z-tests. Five steps in hypothesis testing

Variance Estimates and the F Ratio. ERSH 8310 Lecture 3 September 2, 2009

A SAS/AF Application For Sample Size And Power Determination

Statistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong

Hypothesis Tests and Estimation for Population Variances. Copyright 2014 Pearson Education, Inc.

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Sampling distribution of t. 2. Sampling distribution of t. 3. Example: Gas mileage investigation. II. Inferential Statistics (8) t =

Statistics for Managers Using Microsoft Excel/SPSS Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests

T-Test QUESTION T-TEST GROUPS = sex(1 2) /MISSING = ANALYSIS /VARIABLES = quiz1 quiz2 quiz3 quiz4 quiz5 final total /CRITERIA = CI(.95).

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

An inferential procedure to use sample data to understand a population Procedures

What is a Hypothesis?

Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

9.5 t test: one μ, σ unknown

Chapter 7 Comparison of two independent samples

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

Inferences about central values (.)

Section 9.5. Testing the Difference Between Two Variances. Bluman, Chapter 9 1

Hypothesis testing:power, test statistic CMS:

PSYC 331 STATISTICS FOR PSYCHOLOGISTS

Inferences for Correlation

A Power Fallacy. 1 University of Amsterdam. 2 University of California Irvine. 3 University of Missouri. 4 University of Groningen

Chapter 24. Comparing Means

Lecture 7: Hypothesis Testing and ANOVA

Rigorous Science - Based on a probability value? The linkage between Popperian science and statistical analysis

MA131 Lecture For a fixed sample size, α and β cannot be lowered simultaneously.

Declarative Statistics

Inference for Single Proportions and Means T.Scofield

Spearman Rho Correlation

The problem of base rates

Rigorous Science - Based on a probability value? The linkage between Popperian science and statistical analysis

Controlling Bayes Directional False Discovery Rate in Random Effects Model 1

Hypothesis Testing and Estimation under a Bayesian Approach

Study Ch. 9.3, #47 53 (45 51), 55 61, (55 59)

Sampling Distributions

DEALING WITH MULTIVARIATE OUTCOMES IN STUDIES FOR CAUSAL EFFECTS

Correlation. Martin Bland. Correlation. Correlation coefficient. Clinical Biostatistics

Physics 403. Segev BenZvi. Classical Hypothesis Testing: The Likelihood Ratio Test. Department of Physics and Astronomy University of Rochester

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

De Finetti s ultimate failure. Krzysztof Burdzy University of Washington

Philosophy and History of Statistics

Confidence Interval Estimation

Tables Table A Table B Table C Table D Table E 675

Nonparametric Statistics

CBA4 is live in practice mode this week exam mode from Saturday!

Sampling, Confidence Interval and Hypothesis Testing

CHAPTER 8. Test Procedures is a rule, based on sample data, for deciding whether to reject H 0 and contains:

CHAPTER 9: HYPOTHESIS TESTING

Fundamental Probability and Statistics

16.400/453J Human Factors Engineering. Design of Experiments II

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

Notes on Decision Theory and Prediction

Hypothesis Tests Solutions COR1-GB.1305 Statistics and Data Analysis

Quantitative Analysis and Empirical Methods

POLI 443 Applied Political Research

Detection and Estimation Chapter 1. Hypothesis Testing

Some Notes on ANOVA for Correlations. James H. Steiger Vanderbilt University

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION

Bayesian Statistics as an Alternative for Analyzing Data and Testing Hypotheses Benjamin Scheibehenne

Lab #12: Exam 3 Review Key

PSY 216. Assignment 9 Answers. Under what circumstances is a t statistic used instead of a z-score for a hypothesis test

Addition of Center Points to a 2 k Designs Section 6-6 page 271

Using SPSS for One Way Analysis of Variance

On the Triangle Test with Replications

MBA 605, Business Analytics Donald D. Conant, Ph.D. Master of Business Administration

Significant Figures. Significant Figures 18/02/2015. A significant figure is a measured or meaningful digit.

Chapter 4. Latin Square Design

Multiple Regression Analysis

Testing Independence

Mathematical statistics

Multiple samples: Modeling and ANOVA

INTERVAL ESTIMATION AND HYPOTHESES TESTING

Transcription:

Sensitiveness analysis: Sample sizes for t-tests for paired samples (J.D.Perezgonzalez, 2016, Massey University, New Zealand, doi: 10.13140/RG.2.2.32249.47203) Table 1 shows the sample sizes required for obtaining a statistical significant result for a desired minimum effect size (MES) when carrying out Fisher s tests of significance (e.g., 1954) to assess mean differences between paired observations (dependent means) using t-tests. Table 1. Sample sizes for paired-sample t-tests sig = 0.001 sig = 0.01 sig = 0.05 MES (d z,4 ) 2-tailed 1-tailed 2-tailed 1-tailed 2-tailed 1-tailed 0.05 4329 3818 2653 2164 1536 1082 0.10 1000 960 667 544 387 273 0.15 487 430 299 244 174 123 0.20 277 244 170 139 99 70 0.25 180 158 110 90 64 46 0.30 127 112 78 64 46 32 0.35 95 84 58 48 34 24 0.40 74 65 46 38 27 19 0.45 60 53 37 30 22 16 0.50 50 44 31 25 18 13 0.55 42 37 26 22 16 11 0.60 36 32 23 19 14 10 0.65 32 28 20 17 12 9 0.70 28 25 18 15 11 8 0.75 25 23 16 13 10 7 0.80 23 21 15 12 9 7 0.85 21 19 13 11 8 6 0.90 19 17 12 10 8 6 0.95 18 16 12 10 7 6 1.00 17 15 11 9 7 5 Notes: Sample sizes capture MES up to four decimal places. Main source: Perezgonzalez, J.D. (2016). Statistical sensitiveness for science. arxiv 1604.01844 (Retrievable from http://arxiv.org/abs/1604.01844)

Notes. # A minimum effect size (MES) is the minimum amount of standardized difference between the mean of the null hypothesis and the level of significance of interest to the research project at hand. (It would correspond to Cohen s d z and d 4 also d z and d 4, 1988 if latter found to be the mean of the population effect size.) Unlike Cohen s effect sizes, an MES does not make a claim on the (unknown) population effect size but is independent of it. Instead, an MES sets an a priori standard of importance asking, How small ought a difference to be for me to consider it of importance (a.k.a., of practical significance)? (That is, once estimated, the real effect may be larger or smaller than the MES, although this should not have retroactive impact on the initial decision of importance for the research project.) Because an MES does not make a claim on population effect sizes, any decision about importance is made before knowing the real effect of the research treatment in the population. This makes the MES a good construct for those situations when population effect sizes are unknown (thus, a power analysis is not possible) as well as when Fisher s tests of significance are used (the latter because these tests effectively ignore any knowledge about the population effect size and Type II error). A sensitiveness analysis provides the sample size required for capturing the desired MES (or larger) as a statistically significant result. The probability of capturing such effect, however, depends on the unknown population effect size, so that such probability is greater when the population effect size is larger than the MES and gets smaller when the population effect size is smaller than the MES. Because we do not actually know the population effect size, it is not possible to predict such probability (which is otherwise known as power). Sensitiveness and power share a common background insofar a power analysis is a sensitiveness analysis with the MES calculated based on known information about the population effect size (e.g., a power analysis based on a one-tailed paired-sample t-test, ES = 0.50, α = 0.01, and power = 0.80 implies an MES = 0.37, thus requires the same sample size than a sensitiveness analysis based on a one-tailed paired-sample t-test, MES = 0.37, and sig = 0.01; both will call for the same critical value, CV t (42) = 2.418). However, although a power analysis is a sensitiveness analysis, the opposite is not true: We cannot know the power of a test without prior knowledge of the population effect size.

Table 2 shows ranges of effect sizes that will not be captured under the alternative hypothesis (a.k.a., as significant ) by Neyman-Pearson s tests (1933), the effect sizes at the boundary effectively becoming the MES of the corresponding power analyses. Table 2. Effect sizes under the alternative hypothesis that will not be so captured via power analysis. pwr = 0.90 pwr = 0.80 α = 0.01 α = 0.05 α = 0.01 α = 0.05 ES (d z,d 4 ) 2-tailed 1-tailed 2-tailed 1-tailed 2-tailed 1-tailed 2-tailed 1-tailed 0.20 [-0.13, 0.13] [-, 0.12] [-0.12, 0.12] [-, 0.11] [-0.15, 0.15] [-, 0.15] [-0.14, 0.14] [-, 0.13] 0.50 [-0.33, 0.33] [-, 0.32] [-0.30, 0.30] [-, 0.28] [-0.38, 0.38] [-, 0.37] [-0.35, 0.35] [-, 0.33] 0.80 [-0.53, 0.53] [-, 0.51] [-0.48, 0.48] [-, 0.45] [-0.60, 0.60] [-, 0.59] [-0.55, 0.55] [-, 0.52] # Minimum effect sizes have the same definition than Cohen s effect sizes, so that MES = 0.20 may be considered small, MES = 0.50 may be considered medium, and MES = 0.80 may be considered large. Although Table 1 provides sample sizes for MES as large as one standard deviation, the researcher ought to consider the implications of choosing a particularly large MES. Indeed, reproducible results will only occur when the population effect size is larger than the MES (the larger the better), and a large MES implies that the effect size in the population is so large that it may be plainly visible even before starting the research, something not too common in science. # Table 1 also provides sample sizes for conventional significance levels of 5%, 1%, and 1. The typical (mis)use of tests of significance as tests of hypotheses calls for a level of significance of 1% or lower as a more appropriate standard for better science than larger levels, such as the so popular 5% (e.g., Sellke, Bayarri, & Berger, 2001). # A procedure for calculating sample sizes for desired MES is given in Perezgonzalez (2016). A simpler procedure can be obtained using Excel, as follows: A B 1 MES = d z 0.37 Input desired MES here 2 sig 0.01 Input desired level of significance here 3 n 43 Use this cell for increasing sample size iteratively 4 df 42 Set up a formula that automatically subtracts 1 (degree of freedom) from n above (i.e. [ =B3-1 ]) 5 CV(t) 2.70 Set up a t-test function, either [ =T.INV.2T(B2,B4) ] for a two-tailed test or [ =T.INV(B2,B4)*(-1) ] for a one-tailed test 6 d = 0.4115 Set up a formula that automatically calculates Cohen s d z from CV(t) (i.e. [ =B5/SQRT(B3) ]). Compare the result against MES: If larger, increase n ; if smaller, decrease n. # The formula for calculating Cohen s d z (or d 4 ) from a paired-sample t-test is the following: dd zz = tt nn

References Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences, 2nd Edn. New York, NY: Psychology Press. doi:10.4324/9780203771587 Fisher, R. A. (1954). Statistical Methods for Research Workers, 12th Edn. Edinburgh, UK: Oliver and Boyd. Neyman, J., & Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society of London, Series A, 231, 289-337. doi:10.1098/rsta.1933.0009 Perezgonzalez, J.D. (2016). Statistical sensitiveness for science. arxiv 1604.01844 (Retrievable from http://arxiv.org/abs/1604.01844) Sellke, T., Bayarri, M. J., & Berger, J. O. (2001). Calibration of p values for testing precise null hypotheses. The American Statistician, 55(1), 62 71.

Science Philosophy of science Methods Replication CCMA, MA Prediction Freq. replication Updating Proto-science Description Xplore ES CI Modelling Pseudo-science Data testing Significance Acceptance Bayes factors NHST Hypothesis testing Bayes-Laplace Sensitiveness analysis provides a methodological tool for sampling calculation appropriate for Fisher s tests of significance (akin to what power analysis does for Neyman- Pearson s tests of acceptance) It also helps put importance (i.e., practical significance) at the forefront of research goals