NO! This is not evidence in favor of ESP. We are rejecting the (null) hypothesis that the results are

Similar documents
Lecture 8 Hypothesis Testing

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Frequentist Inference

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

Lecture 1 Probability and Statistics

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Statistics 511 Additional Materials

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

Expectation and Variance of a random variable

This is an introductory course in Analysis of Variance and Design of Experiments.

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

Exam II Covers. STA 291 Lecture 19. Exam II Next Tuesday 5-7pm Memorial Hall (Same place as exam I) Makeup Exam 7:15pm 9:15pm Location CB 234

Lecture 6 Simple alternatives and the Neyman-Pearson lemma

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Mathematical Statistics - MS

5. Likelihood Ratio Tests

Lecture 2: Poisson Sta*s*cs Probability Density Func*ons Expecta*on and Variance Es*mators

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Summary. Recap ... Last Lecture. Summary. Theorem

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22

Lecture 1 Probability and Statistics

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

1 Inferential Methods for Correlation and Regression Analysis

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

A statistical method to determine sample size to estimate characteristic value of soil parameters

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

Lecture 7: Properties of Random Samples

Chapter 6 Principles of Data Reduction

Module 1 Fundamentals in statistics

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Data Analysis and Statistical Methods Statistics 651

6.3 Testing Series With Positive Terms

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

Common Large/Small Sample Tests 1/55

Lecture Notes 15 Hypothesis Testing (Chapter 10)

Last Lecture. Wald Test

Topic 18: Composite Hypotheses

Interval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ),

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

MA238 Assignment 4 Solutions (part a)

Math 152. Rumbos Fall Solutions to Review Problems for Exam #2. Number of Heads Frequency

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Sample Size Determination (Two or More Samples)

Estimation for Complete Data

Class 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Topic 9: Sampling Distributions of Estimators

Lecture 2: Monte Carlo Simulation

MATH/STAT 352: Lecture 15

Random Variables, Sampling and Estimation

Statistical Fundamentals and Control Charts

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Stat 319 Theory of Statistics (2) Exercises

Couple of definitions. Hypothesis tests for one parameter. Some Prob/Stat Review. Likewise. Standard normal distribution. Normal distribution

SDS 321: Introduction to Probability and Statistics

Section 9.2. Tests About a Population Proportion 12/17/2014. Carrying Out a Significance Test H A N T. Parameters & Hypothesis

The standard deviation of the mean

Properties and Hypothesis Testing

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

PH 425 Quantum Measurement and Spin Winter SPINS Lab 1

AMS570 Lecture Notes #2

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Sample questions. 8. Let X denote a continuous random variable with probability density function f(x) = 4x 3 /15 for

6 Sample Size Calculations

Topic 9: Sampling Distributions of Estimators

Power and Type II Error

Math 140 Introductory Statistics

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,

Final Examination Solutions 17/6/2010

Inferential Statistics. Inference Process. Inferential Statistics and Probability a Holistic Approach. Inference Process.

Infinite Sequences and Series

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Lecture 7: Non-parametric Comparison of Location. GENOME 560, Spring 2016 Doug Fowler, GS

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MCT242: Electronic Instrumentation Lecture 2: Instrumentation Definitions

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Economics Spring 2015

Analysis of Experimental Data

GG313 GEOLOGICAL DATA ANALYSIS

UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS

32 estimating the cumulative distribution function

Computing Confidence Intervals for Sample Data

f(x)dx = 1 and f(x) 0 for all x.

1036: Probability & Statistics

[ ] ( ) ( ) [ ] ( ) 1 [ ] [ ] Sums of Random Variables Y = a 1 X 1 + a 2 X 2 + +a n X n The expected value of Y is:

Introductory statistics

Understanding Samples

STAT431 Review. X = n. n )

Chapter 6 Sampling Distributions

October 25, 2018 BIM 105 Probability and Statistics for Biomedical Engineers 1

Seunghee Ye Ma 8: Week 5 Oct 28

Topic 9: Sampling Distributions of Estimators

Transcription:

Hypothesis Testig Suppose you are ivestigatig extra sesory perceptio (ESP) You give someoe a test where they guess the color of card 100 times They are correct 90 times For guessig at radom you would expect 10 correct times. IS THIS EVIDENCE FOR THE EXISTENCE OF ESP??? NO! This is ot evidece i favor of ESP. We are rejectig the (ull) hypothesis that the results are cosistet it t with h chace. Other possible ull hypotheses that could fit the data: 1) the perso cheated ) the perso has ESP 3) the perso was lucky 4) somethig else we have t thought of We have ot tested 1) - 4) 1

Hypothesis Testig Read Taylor Ch 6 ad Sectio 10.8 Itroductio The goal of hypothesis testig is to set up a procedure(s) to allow us to decide if a mathematical model ("theory") is acceptable i light of our experimetal observatios. Examples: Sometimes its easy to tell if the observatios agree ordisagree with the theory. A certai theory says that Columbus Ohio will be destroyed by a earthquake i May 199. A certai theory says the su goes aroud the earth. A certai theory says that ati-particles (e.g. positro) should exist. Ofte its ot obvious if the outcome of a experimet agrees or disagrees with the expectatios. A theory predicts that a proto should weigh 1.67x10-7 kg, you measure 1.65x10-7 kg. A theory predicts that a material should become a supercoductor at 300K, you measure 80K. Ofte we wat to compare the outcomes of two experimets to check if they are cosistet. Experimet 1 measures proto mass to be 1.67x10-7 kg, experimet measures 1.6x10-7 kg. Types of Tests Parametric Tests: compare thevalues of parameters. Example: Does the mass of the proto = mass of the electro? No-Parametric Tests: compare the "shapes" of distributios. Example:Cosider the decay of a eutro. Suppose we have two theories that predict the eergy spectrum of the electro emitted i the decay of the eutro (beta decay): Theory 1 predicts ->pe (decays to two particles) Theory predicts ->pev (decays to three particles, v=eutrio)

pe pev Both theories might predict the same average eergy for the electro. A parametric test might ot be sufficiet to distiguish betwee the two theories. The shapes of their eergy spectrums are quite differet: Theory 1: the spectrum for a eutro decayig ito two particles (e.g. ->p + e). Theory : the spectrum for a eutro decayig ito three particles (->p + e +??). We would like a test that uses our data to differetiate betwee these two theories. I previous lectures we have ru across the chi-square ( ) probability distributio ad saw that we could use it to decide (subjectively) if our data was described by a certai model. ( yi f ( xi, a, b...)) i 1 i (y i ± σ i, x i ) are the data poits ( of them) f(x i, a, b..) is a fuctio ( model ) that relates x ad y 3

Example: We measure a buch of data poits (x, y±σ) ad we believe there is a liear relatioship betwee x ad y: y=a+bx If the y s are described by a Gaussia pdf the we saw previously that miimizig the X fuctio (or LSQ or MLM methods) gives us a estimate for a ad b. Assume: We have 6 data poits. Sice we used the 6 data poits to fid quatities (a, b) we have 4 degrees of freedom (dof). Further, assume that: ( y ( a 6 i bx i i 1 i )) What ca we say about our hypothesis, the data are described by a straight lie? 15 To aswer this questio we fid (look up) the probability to get X 15 for 4 degrees of freedom: P(X 15, 4 dof) ~ 0.006 Thus i oly 6 of 1000 experimets would we expect to get this result by chace. Sice this is a such a small probability bilit we could reject the above hypothesis or we could accept the hypothesis ad ratioalize it by sayig that we were ulucky. It is up to you to decide what at probability level you will accept/reject the hypothesis. I high eergy physics the stadard is 5 sigma or ~6x10-7 4

Cofidece Levels (CL) A iformal defiitio of a cofidece level (CL): CL = 100 x [probability of the evet happeig by chace] The 100 i the above formula allows CL's to be expressed as a percet (%). We ca formally write for a cotiuous probability distributio p: CL 100 prob( x1 X x) 100 x x 1 p( x) dx For a CL we kow p(x), x 1, ad x Example: Suppose we measure some quatity (X) ad we kow that X is described by a Gaussia pdf with mea μ = 0 ad stadard deviatio σ= 1. What is the CL for measurig x (σ above the mea)? CL 100 prob(x ) 100 1 e (x) 100 dx To do this problem we eeded to kow the uderlyig probability distributio fuctio p. If the pdf was ot Gaussia (e.g. biomial) we could have a very differet CL. If you do t kow the pdf you are out of luck! Iterpretatio of the CL ca be easily abused. Example: We have a scale of kow accuracy (Gaussia with σ= 10gm). We weigh somethig to be 0 gm. Is there really a.5% chace that our object really weighs 0gm?? Probability distributio must be defied i the regio where we are tryig to extract iformatio. Iterpretatio of the meaig of a CL depeds o Classical or Baysia viewpoits. Baysia ad Classical are two schools of thought o probability ad its applicatios. e x dx.5% 5

Cofidece Itervals (CI) For a give Cofidece Level, cofidece iterval (CI) is the rage [x 1, x ]. Cofidece Iterval s are ot always uiquely defied. We usually seek the miimum or symmetric iterval. Example: Suppose we have a Gaussia distributio with = 3 ad = 1. What is the 68% CI for a observatio? We eed to fid the limits of the itegral [x 1, x ] that satisfy: x 0.68 p( x) dx x 1 For a CI we kow p(x) ad CL. We wat to determie x 1 ad x For a Gaussia distributio the area eclosed by ±1σ is 0.68. x 1 = 1 = x = +1 =4 The Cofidece Iterval is [,4]. Upper Limits/Lower Limits If evets from a Poisso process are observed we ca calculate l upperad lower limitsits o the 1 average, λ: e e CL upper CL lower r1! r0! Example: Suppose a experimet observed o evets of a certai type they were lookig for. u What is the 90% CL upper limit o the expected umber of evets (λ)? e CL 0.90 If =.3 the 10% of the time 1! we expect to observe zero evets e e 1 CL 0.10 1 e eve though h there is othig wrog 1! 0! with the experimet!.3 6

Example: Suppose a experimet observed oe evet. What is the 95% CL upper limit o the expected umber of evets (λ)? 095 e CL0.95! e 1 e 1CL0.051 e! 0! 4.74 e Hypothesis Testig for Gaussia Variables If we wat to test t whether the mea of some quatity we have measured (x = average from measuremets) is cosistet with a kow mea (μ 0 ) we have the followig two tests: Test Coditio Test Statistic Test Distributio = 0 kow x 0 Gaussia / = 0, = 1 = 0 ukow x 0 t( 1) s / s = stadard deviatio extracted from the measuremets. t( 1): Studet s t-distributio with 1 degrees of freedom. Studet is the pseudoym of statisticia W.S. Gosset who was employed by a famous Eglish brewery. 7

Procedure for Hypothesis Testig a) Measure somethig. b) Get a hypothesis (sometimes a theory) to test agaist your measuremet ( ull hypothesis, H 0 ). c) Calculate the CL that the measuremet is from the theory. d) Accept or reject the hypothesis (or measuremet) depedig o some miimum acceptable CL. Problem: How do we decide what is a acceptable CL? u Example: What is a acceptable defiitio that the space shuttle is safe? Oe explosio per 10 lauches or per 1000 lauches or? H 0 is True H 0 is False Reject Type 1 error OK Accept OK Type error I hypothesis testig we are assumig that H 0 is true. We ever disprove H 0. If the CL is low all we ca say is that our data do ot support H 0. If our CL is α% (e.g. 5%) the we make a type 1 error % of the time if H 0 is true! I a trial H 0 = iocet. Covictig a iocet perso is a type 1 error while lettig a guilty perso go free is a type error. 8

Example: Do free quarks exist? (NO they do t!) Quarks are ature's fudametal buildig blocks ad are thought to have electric charge ( q ) of either (1/3)e or (/3)e (e = charge of electro). Suppose we do a experimet to look for q = 1/3 quarks. Measure: q = 0.90 ± 0. (This gives ad ) Quark theory: q = 0.33 = proto= uud eutro=udd Test the hypothesis = 0 whe is kow: 0 There are types of quark boud states: BARYONS (protos, eutros)=3 quarks MESONS (quark, ati-quark) Use the first lie i the table: z x + =ud 0 0.9 0.33 / 0./ 1.85 K + =us Assumig a Gaussia distributio, the probability for gettig g a z.85, 1 prob(z.85) P(,, x)dx P(0,1, x)dx e x dx 0.00.85.85.85 CL is just 0.%! If we repeated our experimet 1000 times, two experimets would measure a value q 0.9 if the true mea was q = 1/3. This is ot strog evidece for q = 1/3 quarks! (We make a type I error 0.% of the time IF the hypothesis were actually true) If istead of q =1/3 quarks we tested for q =/3 what would we get for the CL? μ= 0.9 ad σ = 0. as before but μ 0 = /3. z = 1.17 prob(z 1.17) = 0.13 ad CL = 13%. Quarks are startig to get believable! (BUT do t trust the experimet) 9

Cosider aother variatio of q = 1/3 problem. Suppose we have 3 measuremets of the charge q: q 1 = 1.1, q = 0.7, ad q 3 = 0.9 We do't kow the variace beforehad so we must determie the variace from our data. use the secod test i the table: 1 3 (q 1 q q 3 ) 0.9 (q i ) i 1 s i1 1 z x 0 s / 0. (0 0.) 0 0.9 0.33 0. / 3 4.94 0.04 Need a t distributio table: Table 7. of Barlow: prob(z 4.94) 0.0 for 1 =. 10X greater tha the first part of this example where we kew the variace ahead of time. Tests to see if two meas are cosistet with each other Test Coditios Test Statistic Test Distributio = 0 1 ad x 1 x 1 Gaussia kow 1 / /m = 0, = 1 1 = 0 1 = = x 1 x t( + m ) ukow Q 1/1/m 1 = 0 1 x 1 x approx. Gaussia ukow s 1 / s /m = 0, = 1 Q ( 1)s 1 (m 1)s m 10

Example: We compare results of two idepedet experimets to see if they agree with each other. Exp. 1 1.00 ± 0.0101 Exp. 1.04 ± 0.0 Use the first lie of the table ad set = m = 1. x 1 x 1.04 1.00 z 1.79 1 / /m (0.01) (0.0) z is distributed accordig to a Gaussia with Probability for the two experimets to disagree by 0.04 : prob( z 1.79) 1 1.79 P(,,x)dx 1 P(0,1, x)dx 1 1 1.79 1.79 We do't care which experimet has the larger result so we use ± z. 1.79 1.79 e x 1.79 7% of the timeweshould expect the experimets to disagree at this level. Is this acceptable agreemet? If we reject the hypothesis ad it actually true, we make a type I error 7% of the time dx 0.07 11