L11 Hypothesis testing

Similar documents
Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

Resampling and the Bootstrap

Chapter 9. Bootstrap Confidence Intervals. William Q. Meeker and Luis A. Escobar Iowa State University and Louisiana State University

Bootstrap tests. Patrick Breheny. October 11. Bootstrap vs. permutation tests Testing for equality of location

L3 Monte-Carlo integration

Bivariate Paired Numerical Data

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Nonparametric Tests. Mathematics 47: Lecture 25. Dan Sloughter. Furman University. April 20, 2006

Unit 14: Nonparametric Statistical Methods

Introductory Econometrics

Non-parametric Inference and Resampling

36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1

Classical Inference for Gaussian Linear Models

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

Introduction to Statistical Data Analysis III

Advanced Statistics II: Non Parametric Tests

Hypothesis Testing with the Bootstrap. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods

Distribution-Free Procedures (Devore Chapter Fifteen)

The assumptions are needed to give us... valid standard errors valid confidence intervals valid hypothesis tests and p-values

Visual interpretation with normal approximation

Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Ch 2: Simple Linear Regression

IEOR E4703: Monte-Carlo Simulation

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

5 Introduction to the Theory of Order Statistics and Rank Statistics

0.1 The plug-in principle for finding estimators

Normal (Gaussian) distribution The normal distribution is often relevant because of the Central Limit Theorem (CLT):

Math 494: Mathematical Statistics

Answers and expectations

Statistical Inference. Why Use Statistical Inference. Point Estimates. Point Estimates. Greg C Elvers

Non-parametric Hypothesis Testing

Conditioning Nonparametric null hypotheses Permutation testing. Permutation tests. Patrick Breheny. October 5. STA 621: Nonparametric Statistics

ECO375 Tutorial 4 Introduction to Statistical Inference

Introductory Econometrics. Review of statistics (Part II: Inference)

Bayesian Regression Linear and Logistic Regression

Summary of Chapters 7-9

Lecture 1: Random number generation, permutation test, and the bootstrap. August 25, 2016

Permutation Tests. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods

Statistical Process Control (contd... )

Section 3 : Permutation Inference

Answer Key for STAT 200B HW No. 7

Political Science 236 Hypothesis Testing: Review and Bootstrapping

Bias Variance Trade-off

Previously Monte Carlo Integration

A better way to bootstrap pairs

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Empirical Risk Minimization Algorithms

Part 4: Multi-parameter and normal models

The Nonparametric Bootstrap

INTERVAL ESTIMATION AND HYPOTHESES TESTING

Chapter 15: Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics

Analysis of Variance

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Preliminary Statistics. Lecture 5: Hypothesis Testing

ST495: Survival Analysis: Hypothesis testing and confidence intervals

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Statistical Inference

Permutation tests. Patrick Breheny. September 25. Conditioning Nonparametric null hypotheses Permutation testing

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

CS281A/Stat241A Lecture 22

Chapter 2: Resampling Maarten Jansen

Chapter 10. U-statistics Statistical Functionals and V-Statistics

One-Sample Numerical Data

Rank-Based Methods. Lukas Meier

First steps of multivariate data analysis

Lecture 15. Hypothesis testing in the linear model

Null Hypothesis Significance Testing p-values, significance level, power, t-tests Spring 2017

Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Multivariate Statistical Analysis

A Statistical Journey through Historic Haverford

Multiple Linear Regression

Announcements. Unit 3: Foundations for inference Lecture 3: Decision errors, significance levels, sample size, and power.

This is particularly true if you see long tails in your data. What are you testing? That the two distributions are the same!

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur

Chapter 18 Resampling and Nonparametric Approaches To Data

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

Design of the Fuzzy Rank Tests Package

Cross-Validation with Confidence

First we look at some terms to be used in this section.

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1

Resampling and the Bootstrap

Correlation. We don't consider one variable independent and the other dependent. Does x go up as y goes up? Does x go down as y goes up?

Introduction to Nonparametric Statistics

Preliminary Statistics Lecture 5: Hypothesis Testing (Outline)

Primer on statistics:

Computer Intensive Methods in Mathematical Statistics

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Lecture Particle Filters

Fitting Narrow Emission Lines in X-ray Spectra

Inferential Statistics

Simulation. Alberto Ceselli MSc in Computer Science Univ. of Milan. Part 4 - Statistical Analysis of Simulated Data

Weighted Least Squares

Practice Problems Section Problems

Final Exam November 24, Problem-1: Consider random walk with drift plus a linear time trend: ( t

EXACT AND ASYMPTOTICALLY ROBUST PERMUTATION TESTS. Eun Yi Chung Joseph P. Romano

A copula goodness-of-t approach. conditional probability integral transform. Daniel Berg 1 Henrik Bakken 2

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

Transcription:

Hypothesis Simple Composite Hypothesis Simple Composite Monte-Carlo and Empirical Methods for Statistical Inference, FMS091/MASM11 L11 Hypothesis testing Hypothesis testing Statistical hypothesises Testing Hypothesises Simple hypothesises Composite hypothesises Pivotal tests Conditional tests Bootstrap tests Statistical hypothesises David Bolin - bolin@maths.lth.se MC Based Statistical Methods, L11 1/24 A statistical hypothesis is a statement about the distributional properties of data. The goal of hypothesis tests is to see if data agrees with the statistical hypothesis. Rejection of a hypothesis indicates that there is sufficient evidence in the data to make the hypothesis unlikely. Strictly speaking a hypothesis test does not accept a hypothesis, it fails to reject it. Testing hypothesises David Bolin - bolin@maths.lth.se MC Based Statistical Methods, L11 2/24 The basis of a hypothesis test consist of A null hypothesis that we wish to test. A test statistic t(y), i.e.a function of the data. A rejection or critical region R. If the test statistic falls in the rejection region, then we reject the null hypothesis. David Bolin - bolin@maths.lth.se MC Based Statistical Methods, L11 3/24 David Bolin - bolin@maths.lth.se MC Based Statistical Methods, L11 4/24

Important concepts Significance The probability (risk) of the test incorrectly rejecting the null hypothesis Power The probability of the test correctly rejecting the null hypothesis. Isafunctionofthetrue, unknown parameter. p-value The probability, given the null hypothesis, of observing a result at least as extreme as the test statistic. Type I error Incorrectly rejecting the null hypothesis. Type II error Failing to reject the null hypothesis. Statistical hypothesises We divide hypothesises into Simple Specifies a single distribution for the data, e.g. y N ( θ, σ 2) with H 0 : θ = 0 and σ 2 known. Composite Specifies more than one distribution for the data, e.g. y N ( θ, σ 2) with H 0 : θ = 0 and σ 2 unknown; or y N ( θ, σ 2) with H 0 : θ 0 and σ 2 known. Testing simple hypothesis David Bolin - bolin@maths.lth.se MC Based Statistical Methods, L11 5/24 Hypothesis Simple Composite Monte Carlo test In a simple hypothesis the null hypothesis, H 0, completely specifies a distribution. We construct/define a test statistic, t(y), such that large values of t(y) indicate evidence against H 0. The p-value of the test is now p(y) = P(t(Y) t(y) H 0 ). The rejection region is R = {y : p(y) α}. Thus to find the rejection region we need to find the distribution of t(y), Y H 0. David Bolin - bolin@maths.lth.se MC Based Statistical Methods, L11 6/24 Hypothesis Simple Composite Monte Carlo test Monte Carlo test of a simple hypothesis Algorithm 1. Draw N samples, y (1),...,y (N),fromthedistribution specified by H 0. 2. Calculate the test statistic, t (i) = t(y (i) ) for each sample. 3. Estimate the p-value using Monte Carlo integration as p(y) = 1 N 4. If p(y) α reject H 0. N i=1 ( ) 1 t (i) t(y). David Bolin - bolin@maths.lth.se MC Based Statistical Methods, L11 7/24 David Bolin - bolin@maths.lth.se MC Based Statistical Methods, L11 8/24

Testing composite hypothesis For a composite hypothesis H 0 defines a set of distributions. Thus, we can t simply simulate from H 0.Threealternatives exist: Pivotal tests Find a pivotal test statistic such that t(y) has the same distribution for all distributions in H 0. Conditional tests Convert the composite hypothesis to a simple by conditioning on a sufficient statistic. Bootstrap tests Replacing the composite hypothesis by a simple based on an estimate from data. Rank based tests David Bolin - bolin@maths.lth.se MC Based Statistical Methods, L11 9/24 Apopularclassofpivotaltestsarerankedbasedtests. For a sample y 1,...,y n,therankr i of y i is the position of y i when the sample is sorted with respect to size. Examples: Spearman s rank correlation Test if two variables are correlated, without making further assumptions about the nature of the relationship between the variables. Wilcoxon rank-sum test Test of two samples to see if observations in one of the samples is larger than observations in the other sample. Rank based tests assume exchangeability under H 0. ArandomvariableY = (Y 1,...,Y n ) is exchangable if (Y i1,...,y in ) has the same distribution for all permutations of the index vector. David Bolin - bolin@maths.lth.se MC Based Statistical Methods, L11 11/24 Pivotal tests In a pivotal test we construct the test statistic so that t(y) has the same distribution for all distributions specified by H 0. Example: Assume we have two samples x 1,...,x n N(μ x, 1), y 1,...,y m N(μ y, 1) and want to test if the means are equal: The test statistic H 0 : μ x = μ y, against H 1 : μ x μ y t(x, y) = x ȳ is pivotal as it does not depend on the unknown common mean μ under H 0. Thus, we may use any value of μ in the simulation step of the Monte-Carlo test. David Bolin - bolin@maths.lth.se MC Based Statistical Methods, L11 10/24 We have 273 historical and current ph-measurements of 149 lakes in Wisconsin and want to test if the ph-levels have increased. We assume that all measurements are independent and that historical measurements have a distribution F 0 and that new measurements have a distribution G 0. We want to test H 0 : F 0 = G 0 against H 1 : F 0 G 0 30 25 20 15 10 5 Historical 0 5 5.5 6 6.5 7 7.5 8 8.5 Ph 30 25 20 15 10 5 Current 0 5 5.5 6 6.5 7 7.5 8 8.5 Ph David Bolin - bolin@maths.lth.se MC Based Statistical Methods, L11 12/24

Under H 0,thevectorofranksr = (r 1,...,r 273 ) is a draw from the uniform distribution on the set of permutations of the vector (1,...,273). Thus, any statistic based on r is pivotal. If the ph-levels had increased, current measurements would have larger ranks, and we choose t(y) as the average difference in ranks. This gives a p-value of 0.0131. Note: If there values in y are not unique we have to account for this by calculating what is called tied ranks. n1 = length(ph1); n2 = length(ph2); %calculate ranks R = tiedrank([ph1;ph2]); t = mean(r(n1+1:end))-mean(r(1:n1)); %simulate to get p value t perm = zeros(1e5,1); for i=1:numel(t perm) %create vector of ranks R = randperm(n1+n2); %calculate U value t perm(i) = mean(r(n1+1:end))-mean(r(1:n1)); end %p-value phat = mean(t perm>=t) David Bolin - bolin@maths.lth.se MC Based Statistical Methods, L11 13/24 Now use Wilcoxon rank-sum test to test if the observations are equally large in both samples. The test statistic is: ( t = min r1,i n 1(n 1 + 1), r 2,i n ) 2(n 2 + 1), 2 2 where n 1 and n 2 are the number of observations in the two samples, and r 1,i is the rank of samples in the first dataset when the observations are sorted together. If all observations in one set is smaller than in the other t = 0, thus small values of t indicate evidence against H 0 : Samples are from the same distribution. The p-value is 0.0251, whichcanbecomparedwiththe approximate p-value of 0.0257. Conditional tests David Bolin - bolin@maths.lth.se MC Based Statistical Methods, L11 14/24 In a conditional test we condition on a sufficient statistic to simplify the composite hypothesis. A sufficient statistic completely summarizes the information contained in the data about unknown parameters It can be shown that t(y) is a sufficient statistic if and only if the density can be factored as f θ (y) = h(y)g θ (t(y)). The trick is to find a sufficient statistic that does not reduce the distribution of the test statistic to a point mass. The p-value of the conditional test is now p(y) = P(t(Y) t(y) s(y) = s(y), H 0 ). David Bolin - bolin@maths.lth.se MC Based Statistical Methods, L11 15/24 David Bolin - bolin@maths.lth.se MC Based Statistical Methods, L11 16/24

Example: Coal mining data Recall the coal mining data. From the example in the book we see a clear breakpoint around 1890. 1947 the National Coal Board was created to run the nationalized coal mining industry. We want to test if this effected the number of accidents. H 0 : X t is a Poisson process with intensity λ. H 1 : X t is Poisson with λ 1 before 1947 and λ 2 after. Asufficientstatisticforestimatingλ is the total number of disasters N. Given the number of events in an interval [T 1, T 2 ],the time of events in a Poisson process are U (T 1, T 2 ). λ1 = 1.04, λ 2 = 0.50 and p = 0.054. Example: Coal mining data %picking out accidents from 1891 to 1963 T low = 1891; T break = 1947; T high = 1963; tau = coal mine(coal mine>t low & coal mine<t high); %test statistic l1 = sum(tau<=t break)/(t break-t low); l2 = sum(tau>t break)/(t high-t break); t = abs(l1 - l2); %simulate N events from a poisson process tau star = rand(length(tau),1e5)*(t high-t low) + T low; t star = abs(sum(tau star<=t break)/(t break-t low) -... sum(tau star>t break)/(t high-t break)); %p-value p = mean(t star>=t) Permutation tests David Bolin - bolin@maths.lth.se MC Based Statistical Methods, L11 17/24 Asetofrandomvariablesaresaidtobeexchangeable if they have the same distribution for all permutations. If the random variables are exchangeable then the ordered variables is a sufficient statistic. The distribution of y given the ordered sample is the uniform distribution on the set of all permutations of y. Conditioning on the ordered variables leads to permutation tests. Permutation tests can be very efficient in testing an exchangeable null-hypothesis against a non-exchangeable alternative. I.e. for testing if two samples differ in some way. Monte Carlo permutation test Algorithm David Bolin - bolin@maths.lth.se MC Based Statistical Methods, L11 18/24 1. Draw N permutations, y (1),...,y (N),ofthevectory. 2. Calculate the test statistic, t (i) = t(y (i) ) for each permutation. 3. Estimate the p-value using Monte Carlo integration as p(y) = 1 N 4. If p(y) α reject H 0. N i=1 ( ) 1 t (i) t(y). David Bolin - bolin@maths.lth.se MC Based Statistical Methods, L11 19/24 David Bolin - bolin@maths.lth.se MC Based Statistical Methods, L11 20/24

(cont.) Assume that the distribution for current data can be written as G 0 (y) = F 0 (y θ). That is, the mean of the current data is the mean of the historical data plus θ. We now want to test H 0 : θ = 0 against H 1 : θ>0. Under H 0,alldataareiidandthusexchangeable. We use the difference in the sample means as a test statistic. Apermutationtestgivesp = 0.0198 Bootstrap tests Remember the relation between hypothesis testing and confidence intervals. Knowing how to calculate Bootstrap confidence intervals, we directly have a tool for doing tests on the form H 0 : θ = θ 0 or H 0 : θ>θ 0. In general, the idea in a Bootstrap test is to estimate P 0 from data with P. Given P we revert to a simple Monte Carlo test, simulating from P. Note that this is similar to ordinary Bootstrap with the exception that we require P to fulfill the null-hypothesis. As an example we compare: ( ) y N θ, σ 2 with H 0 : θ = 0 and σ 2 unknown. David Bolin - bolin@maths.lth.se MC Based Statistical Methods, L11 21/24 Example: Testing goodness of fit Assume that we have some data that we want to test whether it comes from a Gaussian distribution. Thus, we want to test H 0 : data are iid Gaussian variables. Anaturalstatisticisthesumofsquareddifferences between the fitted Gaussian cdf and the empirical distribution function: n ( ( )) i y(i) t(y) = n Φ ˆμ 2 ˆσ i=1 In a Bootstrap test we sample from the fitted Gaussian distribution to estimate the p-value. David Bolin - bolin@maths.lth.se MC Based Statistical Methods, L11 22/24 Example: Testing goodness of fit %Estimate parameters: muhat = mean(y); sigmahat = std(y); t = sum(([1:n]/n - normcdf(sort(y),muhat,sigmahat)). 2); %Bootstrap p-value: N = 1e5; t star = zeros(n,1); for i=1:n %Sample from P hat: y star = sort(muhat + sigmahat randn(1,n)); m = mean(y star); s = std(y star); t star(i) = sum(([1:n]/n - normcdf(y star,m,s)). 2); end phat = mean(t star>=t) David Bolin - bolin@maths.lth.se MC Based Statistical Methods, L11 23/24 David Bolin - bolin@maths.lth.se MC Based Statistical Methods, L11 24/24