Lecture 8 Hypothesis Testing

Similar documents
NO! This is not evidence in favor of ESP. We are rejecting the (null) hypothesis that the results are

The Purpose of Hypothesis Testing

Ch. 7. One sample hypothesis tests for µ and σ

Chapter 23. Inferences About Means. Monday, May 6, 13. Copyright 2009 Pearson Education, Inc.

Week 1 Quantitative Analysis of Financial Markets Distributions A

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

Physics 509: Non-Parametric Statistics and Correlation Testing

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

Physics 509: Bootstrap and Robust Parameter Estimation

Non-parametric Inference and Resampling

Chapter 23. Inference About Means

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

Originality in the Arts and Sciences: Lecture 2: Probability and Statistics

Confidence intervals

Slides for Data Mining by I. H. Witten and E. Frank

Physics 509: Error Propagation, and the Meaning of Error Bars. Scott Oser Lecture #10

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

Physics 6720 Introduction to Statistics April 4, 2017

Solving with Absolute Value

Introduction to Statistics and Error Analysis II

The Student s t Distribution

HYPOTHESIS TESTING: FREQUENTIST APPROACH.

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008

Permutation Tests. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods

Practice Problems Section Problems

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Lecture 10: Generalized likelihood ratio test

Probability Distributions Columns (a) through (d)

Confidence Intervals. - simply, an interval for which we have a certain confidence.

Chapter 26: Comparing Counts (Chi Square)

This does not cover everything on the final. Look at the posted practice problems for other topics.

Some Statistics. V. Lindberg. May 16, 2007

Statistical Methods for Astronomy

Math 115 Spring 11 Written Homework 10 Solutions

MONT 105Q Mathematical Journeys Lecture Notes on Statistical Inference and Hypothesis Testing March-April, 2016

LAB 2. HYPOTHESIS TESTING IN THE BIOLOGICAL SCIENCES- Part 2

Statistical Intervals (One sample) (Chs )

Solutions to Practice Test 2 Math 4753 Summer 2005

Data Analysis and Monte Carlo Methods

Introduction to Error Analysis

CONTINUOUS RANDOM VARIABLES

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Probability Distributions

Statistical Inference

FRAME S : u = u 0 + FRAME S. 0 : u 0 = u À

6.867 Machine Learning

Descriptive Statistics

Parameter Estimation and Fitting to Data

Elementary Statistics Triola, Elementary Statistics 11/e Unit 17 The Basics of Hypotheses Testing

Statistical Methods in Particle Physics

INTERVAL ESTIMATION AND HYPOTHESES TESTING

Data analysis and Geostatistics - lecture VI

Basic Probability Reference Sheet

Confidence Intervals with σ unknown

How do we compare the relative performance among competing models?

Chapter 5: HYPOTHESIS TESTING

8: Hypothesis Testing

Statistical Methods for Astronomy

Confidence Intervals

Practice Questions for Final

CENTRAL LIMIT THEOREM (CLT)

AST 418/518 Instrumentation and Statistics

STATISTICS OF OBSERVATIONS & SAMPLING THEORY. Parent Distributions

Statistics: CI, Tolerance Intervals, Exceedance, and Hypothesis Testing. Confidence intervals on mean. CL = x ± t * CL1- = exp

ECO220Y Continuous Probability Distributions: Uniform and Triangle Readings: Chapter 9, sections

Chapter 4: An Introduction to Probability and Statistics

The t-statistic. Student s t Test

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 2 MATH00040 SEMESTER / Probability

We start with a reminder of a few basic concepts in probability. Let x be a discrete random variable with some probability function p(x).

Episode 535: Particle reactions

Data Analysis and Statistical Methods Statistics 651

Stephen Scott.

Frequentist Confidence Limits and Intervals. Roger Barlow SLUO Lectures on Statistics August 2006

Business Statistics. Lecture 5: Confidence Intervals

t-test for b Copyright 2000 Tom Malloy. All rights reserved. Regression

Evaluating Classifiers. Lecture 2 Instructor: Max Welling

Part 3: Parametric Models

z-test, t-test Kenneth A. Ribet Math 10A November 28, 2017

Modern Methods of Data Analysis - WS 07/08

Summary: the confidence interval for the mean (σ 2 known) with gaussian assumption

A Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes.

Confirmed: 2D Final Exam:Thursday 18 th March 11:30-2:30 PM WLH Physics 2D Lecture Slides Lecture 19: Feb 17 th. Vivek Sharma UCSD Physics

14.30 Introduction to Statistical Methods in Economics Spring 2009

Lecture 4 Propagation of errors

Null Hypothesis Significance Testing p-values, significance level, power, t-tests Spring 2017

Notes on Continuous Random Variables

CS 160: Lecture 16. Quantitative Studies. Outline. Random variables and trials. Random variables. Qualitative vs. Quantitative Studies

Probability. Table of contents

z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests

Physics 102: Lecture 28

6.1 Moment Generating and Characteristic Functions

Line of symmetry Total area enclosed is 1

Chapter 23: Inferences About Means

Sample Size Determination

Lectures on Statistical Data Analysis

OHSU OGI Class ECE-580-DOE :Statistical Process Control and Design of Experiments Steve Brainerd Basic Statistics Sample size?

Understanding p Values

The Unit Electrical Matter Substructures of Standard Model Particles. James Rees version

Lecture 7: Hypothesis Testing and ANOVA

Transcription:

Lecture 8 Hypothesis Testing Taylor Ch. 6 and 10.6 Introduction l The goal of hypothesis testing is to set up a procedure(s) to allow us to decide if a mathematical model ("theory") is acceptable in light of our experimental observations. l Examples: u Sometimes its easy to tell if the observations agree or disagree with the theory. n A certain theory says that Columbus will be destroyed by an earthquake in May 199. n A certain theory says the sun goes around the earth. n A certain theory says that anti-particles (e.g. positron) should exist. u Often its not obvious if the outcome of an experiment agrees or disagrees with the expectations. n A theory predicts that a proton should weigh 1.67x10-7 kg, you measure 1.65x10-7 kg. n A theory predicts that a material should become a superconductor at 300K, you measure 80K. u Often we want to compare the outcomes of two experiments to check if they are consistent. n Experiment 1 measures proton mass to be 1.67x10-7 kg, experiment measures 1.6x10-7 kg. Types of Tests l Parametric Tests: compare the values of parameters. u Example: Does the mass of the proton = mass of the electron? l Non-Parametric Tests: compare the "shapes" of distributions. u Example: Consider the decay of a neutron. Suppose we have two theories that predict the energy spectrum of the electron emitted in the decay of the neutron (beta decay): K.K. Gan L8: Hypothesis Testing 1

Theory 1 : n Æ p + e Theory : n Æ p + e + n n : neutrino u Energy (MeV) Energy (MeV) n Both theories might predict the same average energy for the electron. + A parametric test might not be sufficient to distinguish between the two theories. n The shapes of their energy spectrums are quite different: H Theory 1: the spectrum for a neutron decaying into two particles (e.g. p + e). H Theory : the spectrum for a neutron decaying into three particles (p + e + n). + We would like a test that uses our data to differentiate between these two theories. We can calculate the c of the distribution to see if our data was described by a certain theory: c n (y = i - f (x i,a,b...)) Â i=1 s i n (y i ± s i, x i ) are the data points (n of them) n f(x i, a, b ) is a function that relates x and y + accept or reject the theory based on the probability of observing a c larger than the above calculated c for the number of degrees of freedom. n Example: We measure a bunch of data points (y i ± s i, x i ) and we believe there is a linear relationship between x and y. y = a + bx K.K. Gan L8: Hypothesis Testing

H H If the y s are described by a Gaussian PDF then minimizing the c function (or using LSQ or MLM method) gives an estimate for a and b. As an illustration, assume that we have 6 data points and since we extracted a and b from the data, we have 6 - = 4 degrees of freedom (DOF). We further assume: c 6 (y = i - (a+ bx i )) Â =15 i=1 s i + What can we say about our hypothesis that the data are described by a straight line? + Look up the probability of getting c r 15 by chance : P(c 15,4) ª 0.006 + only 6 of 1000 experiments would we expect to get this result (c r 15) by chance. + Since this is such a small probability we could reject the above hypothesis or we could accept the hypothesis and rationalize it by saying that we were unlucky. + It is up to you to decide at what probability level you will accept/reject the hypothesis. Confidence Levels (CL) l An informal definition of a confidence level (CL): CL = 100 x [probability of the event happening by chance] u The 100 in the above formula allows CL's to be expressed as a percent (%). l We can formally write for a continuous probability distribution P: CL =100 prob(x 1 X x ) =100 x Ú x 1 P(x)dx l Example: Suppose we measure some quantity (X) and we know that X is described by a Gaussian distribution with mean m = 0 and standard deviation s = 1. u What is the CL for measuring (s from the mean)? For a CL, we know Ú =.5% P(x), x 1, and x. CL =100 prob(x ) =100 1 e -(x-m) s p Ú s dx = 100 e - x dx p K.K. Gan L8: Hypothesis Testing 3

u To do this problem we needed to know the underlying probability distribution P. u If the probability distribution was not Gaussian (e.g. binomial) we could have a very different CL. u If you don t know P you are out of luck! l Interpretation of the CL can be easily abused. u Example: We have a scale of known accuracy (Gaussian with s = 10 gm). n We weigh something to be 0 gm. n Is there really a.5% chance that our object really weighs 0 gm?? + probability distribution must be defined in the region where we are trying to extract information. Confidence Intervals (CI) l For a given confidence level, confidence intervals are the range [x 1, x ] that gives the confidence level. u Confidence interval s are not always uniquely defined. For a CI, we know l u We usually seek the minimum or symmetric interval. P(x) and CL and wish to Example: Suppose we have a Gaussian distribution with m = 3 and s = 1. determine x 1, and x. u What is the 68% CI for an observation? u We need to find the limits of the integral [x 1, x ] that satisfy: x 0.68 = Ú P(x)dx x 1 u For a Gaussian distribution the area enclosed by ±1s is 0.68. x 1 = m - 1s = x = m + 1s = 4 + confidence interval is [,4]. Upper/Lower Limits l Example: Suppose an experiment observed no event. u What is the 90% CL upper limit on the expected number of events? K.K. Gan L8: Hypothesis Testing 4

l e -l l n CL = 0.90 =  If l =.3, then 10% of the time n=1 e -l l n e -l l n we expect to observe zero events 1- CL = 0.10 =1-  =  n=1 = e -l even though there is nothing n=0 wrong with the experiment! l =.3 u If the expected number of events is greater than.3 events, + the probability of observing one or more events is greater than 90%. Example: Suppose an experiment observed one event. u What is the 95% CL upper limit on the expected number of events? e -l l n CL = 0.95 =  n= 1- CL = 0.05 =1- l = 4.74 e -l l n  = n= 1 e -l l n  = e -l + le -l n=0 Procedure for Hypothesis Testing a) Measure something. b) Get a hypothesis (sometimes a theory) to test against your measurement. c) Calculate the CL that the measurement is from the theory. d) Accept or reject the hypothesis (or measurement) depending on some minimum acceptable CL. l Problem: How do we decide what is acceptable CL? u Example: What is an acceptable definition that the space shuttle is safe? H One explosion per 10 launches or per 1000 launches or? K.K. Gan L8: Hypothesis Testing 5

Hypothesis Testing for Gaussian Variables l If we want to test whether the mean of some quantity we have measured (x = average from n measurements) is consistent with a known mean (m 0 ) we have the following two tests: l Test m = m 0 Condition s known Test Statistic x - m 0 s / n m = m s unknown x - m 0 0 t(n 1) s / n u s: standard deviation extracted from the n measurements. u t(n 1): Student s t-distribution with n 1 degrees of freedom. n Student is the pseudonym of statistician W.S. Gosset who was employed by a famous English brewery. Example: Do free quarks exist? Quarks are nature's fundamental building blocks and are thought to have electric charge (q) of either (1/3)e or (/3)e (e = charge of electron). Suppose we do an experiment to look for q = 1/3 quarks. u Measure: q = 0.90 ± 0. = m ± s u Quark theory: q = 0.33 = m 0 u Test the hypothesis m = m 0 when s is known: + Use the first line in the table: z = x - m 0 s / n = 0.9-0.33 0. / 1 =.85 Test Distribution Gaussian m = 0, s = 1 n Assuming a Gaussian distribution, the probability for getting a z.85, prob(z.85) = Ú P(m,s, x)dx = Ú P(0,1, x)dx = 1.85.85 e - x p.85 K.K. Gan L8: Hypothesis Testing 6 Ú dx = 0.00

n If we repeated our experiment 1000 times, + two experiments would measure a value q 0.9 if the true mean was q = 1/3. + This is not strong evidence for q = 1/3 quarks! u If instead of q = 1/3 quarks we tested for q = /3 what would we get for the CL? n m = 0.9 and s = 0. as before but m 0 = /3. + z = 1.17 + prob(z 1.17) = 0.13 and CL = 13%. + quarks are starting to get believable! l Consider another variation of q = 1/3 problem. Suppose we have 3 measurements of the charge q: q 1 = 1.1, q = 0.7, and q 3 = 0.9 u We don't know the variance beforehand so we must determine the variance from our data. + use the second test in the table: m = 1 3 (q 1 + q + q 3 ) = 0.9 l s = n Â(q i - m) i=1 z = x - m 0 s / n n -1 = 0. + (-0.) + 0 = 0.9-0.33 0. / 3 = 4.94 = 0.04 n Table 7. of Barlow: prob(z 4.94) 0.0 for n 1 =. + 10X greater than the first part of this example where we knew the variance ahead of time. Consider the situation where we have several independent experiments that measure the same quantity: u We do not know the true value of the quantity being measured. u We wish to know if the experiments are consistent with each other. K.K. Gan L8: Hypothesis Testing 7

l m 1 m = 0 m 1 m = 0 m 1 m = 0 s 1 and s known s 1 = s = s unknown s 1 s unknown Example: We compare results of two independent experiments to see if they agree with each other. Exp. 1 1.00 ± 0.01 Exp. 1.04 ± 0.0 u Use the first line of the table and set n = m = 1. z = Test x 1 - x s 1 /n +s /m Conditions Test Statistic x 1 - x s 1 /n+s /m x 1 - x Q 1/n +1/m x 1 - x s 1 /n+ s /m n z is distributed according to a Gaussian with m = 0, s = 1. n Probability for the two experiments to disagree by 0.04: prob( z 1.79) =1- = 1.79 1.04-1.00 (0.01) + (0.0) =1.79 Test Distribution Gaussian m = 0, s = 1 t(n + m ) approx. Gaussian m = 0, s = 1 Ú P(m,s, x)dx = 1- Ú P(0,1, x)dx =1-1 p -1.79-1.79 H We don't care which experiment has the larger result so we use ± z. + 7% of the time we should expect the experiments to disagree at this level. + Is this acceptable agreement? 1.79 1.79 e - x Q (n -1)s 1 + (m -1)s n + m - Ú dx = 0.07-1.79 K.K. Gan L8: Hypothesis Testing 8