Later in the same chapter (page 45) he asserted that

Similar documents
T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

Business Statistics. Lecture 10: Course Review

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

Swarthmore Honors Exam 2012: Statistics

SCHOOL OF MATHEMATICS AND STATISTICS

Stat 5102 Final Exam May 14, 2015

Regression. Marc H. Mehlman University of New Haven

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

MATH 644: Regression Analysis Methods

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Fundamental Probability and Statistics

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Multiple Linear Regression

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Lecture 2. The Simple Linear Regression Model: Matrix Approach

A noninformative Bayesian approach to domain estimation

Multiple Regression Part I STAT315, 19-20/3/2014

Introductory Econometrics. Review of statistics (Part II: Inference)

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

STAT 572 Assignment 5 - Answers Due: March 2, 2007

Inference for Regression

Introduction to Linear Regression

Two-sample t-tests. - Independent samples - Pooled standard devation - The equal variance assumption

ST430 Exam 1 with Answers

Introduction to Statistical Data Analysis III

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Maximum-Likelihood Estimation: Basic Ideas

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015

STAT 350: Summer Semester Midterm 1: Solutions

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

R 2 and F -Tests and ANOVA

ORF 245 Fundamentals of Statistics Practice Final Exam

Density Temp vs Ratio. temp

Lecture 3: Inference in SLR

Stat 412/512 REVIEW OF SIMPLE LINEAR REGRESSION. Jan Charlotte Wickham. stat512.cwick.co.nz

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Non-parametric Statistics

Psychology 282 Lecture #4 Outline Inferences in SLR

Introduction to Statistical Inference

AGEC 621 Lecture 16 David Bessler

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

1 Multiple Regression

Exercise 5.4 Solution

Probability and Statistics

5.5 PROBABILITY AS A THEORETICAL CONCEPT

MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010

Business Statistics. Lecture 10: Correlation and Linear Regression

Introduction and Single Predictor Regression. Correlation

Biostatistics for physicists fall Correlation Linear regression Analysis of variance

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

MODELS WITHOUT AN INTERCEPT

Holiday Assignment PS 531

Probability Distributions.

Regression Models - Introduction

Inference with Heteroskedasticity

Linear Probability Model

Math 423/533: The Main Theoretical Topics

Mathematics for Economics MA course

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

HANDBOOK OF APPLICABLE MATHEMATICS

Stat 5303 (Oehlert): Balanced Incomplete Block Designs 1

Psych 230. Psychological Measurement and Statistics

Statistical Distribution Assumptions of General Linear Models

Monitoring Random Start Forward Searches for Multivariate Data

CIS 2033 Lecture 5, Fall

Stat 5101 Lecture Notes

PSY 305. Module 3. Page Title. Introduction to Hypothesis Testing Z-tests. Five steps in hypothesis testing

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Review of Statistics 101

Stat 5303 (Oehlert): Tukey One Degree of Freedom 1

STAT 215 Confidence and Prediction Intervals in Regression

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

Statistics for Engineers Lecture 9 Linear Regression

Ch. 7: Estimates and Sample Sizes

Introduction and Background to Multilevel Analysis

CS 361: Probability & Statistics

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Business Statistics. Lecture 5: Confidence Intervals

Introduction to Statistical Data Analysis Lecture 8: Correlation and Simple Regression

Testing Research and Statistical Hypotheses

1 The Classic Bivariate Least Squares Model

Non-parametric Inference and Resampling

Confidence Intervals, Testing and ANOVA Summary

Lecture 1 Introduction to Probability and Set Theory Text: A Course in Probability by Weiss

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee

Hypothesis Testing hypothesis testing approach

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018

Last week: Sample, population and sampling distributions finished with estimation & confidence intervals

The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials.

Chapter 1 Statistical Inference

Simple linear regression

Sociology 6Z03 Review II

Week 8: Correlation and Regression

STA Module 4 Probability Concepts. Rev.F08 1

An introduction to biostatistics: part 1

Introduction to Statistics and R

Transcription:

Chapter 7 Randomization 7 Randomization 1 1 Fisher on randomization..................... 1 2 Shoes: a paired comparison................... 2 3 The randomization distribution................. 4 4 Theoretical justification of t-approximation.......... 7 1 Fisher on randomization Fisher (1966, Chapter III) discussed in detail the analysis of an experiment on plant growth made by Charles Darwin. After mentioning a number of possible systematic effects due to the placement of plants within pots, Fisher commented (page 44): Randomisation properly carried out, in which each pair of plants are assigned their positions independently at random, ensures that the estimates of error will take proper care of all such causes of different growth rates, and relieves the experimenter from the anxiety of considering and estimating the innumerable causes by which his data may be disturbed. The one flaw in Darwin s procedure was the absence of randomisation. Later in the same chapter (page 45) he asserted that... the physical act of randomisation, which, as has been shown, is necessary for the validity of any test of significance, affords the means, in respect of any particular body of data, of examining version: 16 Oct 2016 printed: 16 October 2016 Stat 312/612 c David Pollard

the wider hypothesis in which no normality of distribution is implied. He then proceeded to describe what is now often called a Fisher randomization test, which uses only the probabilities supplied by the randomization. After some tedious calculations he then commented that the tail probability he had obtained was very nearly equivalent to that obtained using the t test with the hypothesis of a normally distributed population. In other words, the randomization is needed to turn some systematic effects into random noise and apparently the probabilistic analysis based only on the randomization leads to conclusions similar to those obtained under a model with normal errors. Fisher s view has not been accepted by all statisticians. Debabrata Basu was a notable strong critic. For his insightful critique read Chapters XIV and XV of Ghosh (1988), a collection of some of his papers and conference talks. 2 Shoes: a paired comparison Shoes data from Box et al. (1978, Section 4.2):... measurements of the amount of wear of the soles of shoes worn by 10 boys. The shoe soles were made of two different synthetic materials, A and B.... the experiments were run in pairs. Each boy wore a special pair of shoes, the sole of one shoe having been made with A and the sole of the other with B. The decision as to whether the left or the right sole was made with A or B was determined by the flip of a coin.... the variability among the boys has been eliminated.... by working with the 10 differences B A most of this boy-to-boy variation could be eliminated. An experimental design of this kind is called a randomized paired comparison design I have coded the coin tosses as ±1, with +1 meaning apply treatment A to the left sole and 1 meaning apply treatment A to the right sole. With that convention, the difference between the B and A treatments for boy i equals y i = x i d i, where d i denotes the difference wear for right foot minus wear for left foot. Draft: 16 Oct 2016 c David Pollard 2

shoes <- read.table("shoes.data",sep="\t",header=t) shoes$coin <- 2 * (shoes$foot == "L") - 1 # +1 for left shoes$ba.diff <- shoes$b - shoes$a shoes$rl.diff <- shoes$ba.diff * shoes$coin shoes A B foot.a coin BA.diff RL.diff 1 13.2 14.0 L 1 0.8 0.8 2 8.2 8.8 L 1 0.6 0.6 3 10.9 11.2 R -1 0.3-0.3 4 14.3 14.2 L 1-0.1-0.1 5 10.7 11.8 R -1 1.1-1.1 6 6.6 6.4 L 1-0.2-0.2 7 9.5 9.8 L 1 0.3 0.3 8 10.8 11.3 L 1 0.5 0.5 9 8.8 9.3 R -1 0.5-0.5 10 13.3 13.6 L 1 0.3 0.3 Presumably the experimenter was seeking to learn which treatment was better (= less wear). BHH later revealed that material B was cheaper. So perhaps the experimenter was looking for evidence than material A did produce significantly less wear. That hypothesis suggests the use of a onesided t-test: Is the difference shoes$ba.diff significantly large (positive)? Let me start with the analysis based on the assumption that the differences y i are independent N(δ, σ 2 ) random variables. The estimate of σ 2 is σ 2 = i (y i y) 2 /9 and the t-statistic is tstat = 10y/ σ. As I am feeling lazy I ll let R do all the work: summary(lm(shoes$ba.diff ~ 1)) Call: lm(formula = shoes$ba.diff ~ 1) Residuals: Min 1Q Median 3Q Max -0.610-0.110-0.010 0.165 0.690 Coefficients: Estimate Std. Error t value Pr(> t ) Draft: 16 Oct 2016 c David Pollard 3

(Intercept) 0.4100 0.1224 3.349 0.00854 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.3872 on 9 degrees of freedom The observed value of the t-statistics has a one-sided p-value of about 0.004. Material A really does seem to be doing better. 3 The randomization distribution The randomization is represented by a vector x in { 1, +1} 10, which, by construction, takes each of the 1024 possible values with probability 2 10. Under the null hypothesis that the two treatments actually have the same (random?) effect on the amount of wear, the random variable x i should have no influence on d i = shoes$rl.diff i. That is, x and d should be independent. Under the null hypothesis, the observed values of y i is just ±d i, where d i is the random value generated by the pecularities of each boy s behavior and the sign is attached independently of d i. # Create a 10 by 2^{10} matrix representing all possible # outcomes for tosses of 10 coins rand <- coin(10) # defined in cointoss.r yrand <- rand * shoes$rl.diff # 1024 values for different x's # My function tstat() calculates the t -statistic # for each possible y: Tstats <- apply(yrand,2,tstat) According to Fisher, the spread in the 1024 t-statistics (one for each possible realization of x) should look like the spread in 1024 observations from the t 9 distribution. A plot of the sorted values of Tstats against the corresponding quantiles of the t 9 distribution, # quantiles at 1024 p's of t_9: <- qt(ppoints(1024),df=9) would be almost a straight line if the values in Tstats were spread out like a t 9 distribution. Draft: 16 Oct 2016 c David Pollard 4

For the sake of comparison, I include some quantile plots for samples taken from the t 9 distribution (at least according to what R regards as a random sample): old.par <- par(no.readonly = T) par(mfrow = c(3,3)) t9 <- function(p){ qt(p, df = 9) } for (ii in 1:8){ <- rt(1024,df=9) # sample of size 1024 from t_9 qqplot(,,pch=18,new=t) qqline(,distribution = t9,col="red") } qqplot(,tstats,pch=18,new=t) qqline(tstats,distribution = t9,col="red") 6 2 2 4 6 4 0 2 4 4 0 2 4 6 3 1 1 3 Tstats 4 0 2 4 par(old.par) Are you impressed? Draft: 16 Oct 2016 c David Pollard 5

The randomization test looks at where the observed value of y (namely, 0.41) sits amongst the set of all possible y generated by different realizations of x: ymeans <- sort(apply(yrand,2,mean)) print(ymeans[1015:1024]) [1] 0.39 0.39 0.39 0.41 0.41 0.41 0.41 0.43 0.45 0.47 plot(ymeans, (1:1024)/1024,pch=".",xlim=c(0,0.5), ylab="empirical distn. function of the ymeans") empirical distn. function of the ymeans 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.1 0.2 0.3 0.4 0.5 ymeans The interpretation is complicated slightly by the ties in the ymeans values. BHH page 100 decided to count half the ties at 0.41 as being greater than the observed value, which led to a p-value 5/1024 0.005, impressively close to the p-value calculated via the normal approximation. Draft: 16 Oct 2016 c David Pollard 6

4 Theoretical justification of t-approximation Several authors have tried to justify Fisher s assertion about the approximate t n 1 -distribution under the null hypothesis for ny T = i n (y i y) 2 /(n 1) when the y i s are generated by a randomization: y i = x i d i. I have found the paper by Box and Andersen (1955) the most helpful. The idea is that the random variable B = T 2 n 1 + T 2 = ny2 i n y2 i would have a beta(1/2, (n 1)/2) distribution if the y i s were independent N(0, σ 2 ) s. That beta distribution has expected value 1/n and variance 3/(n 2 + 2n). If y i = x i d i, with the d i s being treated as constants, then ( ) 2 B = n 1 i x if i where f i = d i / j n d2 j, which (under the randomization distribution) has expected value 1/n and a variance that is close to 3/(n 2 + 2n) if the f i s are not too extreme. References Box, G. E. P. and S. L. Andersen (1955). Permutation theory in the derivation of robust criteria and the study of departures from assumption. Journal of the Royal Statistical Society, Series B 17, 1 34. Box, G. E. P., W. G. Hunter, and J. S. Hunter (1978). Statistics for Experimenters: An Introduction to Design, Data Analysis, and Model Building. New York: Wiley. Fisher, R. A. (1966). The Design of Experiments (8th ed.). Haffner. First edition 1935. Republished in 1991 by Oxford as part of a collection of three of Fisher s books, under the title Statistical Methods, Experimental Design, and Scientific Inference. Ghosh, J. K. (Ed.) (1988). Statistical Information and Likelihood: A Collection of Critical Essays by Dr. D. Basu, Volume 45 of Lecture Notes in Statistics. Springer-Verlag. Draft: 16 Oct 2016 c David Pollard 7