Hypothesis Testig Suppose you are ivestigatig extra sesory perceptio (ESP) You give someoe a test where they guess the color of card 100 times They are correct 90 times For guessig at radom you would expect 10 correct times. IS THIS EVIDENCE FOR THE EXISTENCE OF ESP??? NO! This is ot evidece i favor of ESP. We are rejectig the (ull) hypothesis that the results are cosistet it t with h chace. Other possible ull hypotheses that could fit the data: 1) the perso cheated ) the perso has ESP 3) the perso was lucky 4) somethig else we have t thought of We have ot tested 1) - 4) 1
Hypothesis Testig Read Taylor Ch 6 ad Sectio 10.8 Itroductio The goal of hypothesis testig is to set up a procedure(s) to allow us to decide if a mathematical model ("theory") is acceptable i light of our experimetal observatios. Examples: Sometimes its easy to tell if the observatios agree ordisagree with the theory. A certai theory says that Columbus Ohio will be destroyed by a earthquake i May 199. A certai theory says the su goes aroud the earth. A certai theory says that ati-particles (e.g. positro) should exist. Ofte its ot obvious if the outcome of a experimet agrees or disagrees with the expectatios. A theory predicts that a proto should weigh 1.67x10-7 kg, you measure 1.65x10-7 kg. A theory predicts that a material should become a supercoductor at 300K, you measure 80K. Ofte we wat to compare the outcomes of two experimets to check if they are cosistet. Experimet 1 measures proto mass to be 1.67x10-7 kg, experimet measures 1.6x10-7 kg. Types of Tests Parametric Tests: compare thevalues of parameters. Example: Does the mass of the proto = mass of the electro? No-Parametric Tests: compare the "shapes" of distributios. Example:Cosider the decay of a eutro. Suppose we have two theories that predict the eergy spectrum of the electro emitted i the decay of the eutro (beta decay): Theory 1 predicts ->pe (decays to two particles) Theory predicts ->pev (decays to three particles, v=eutrio)
pe pev Both theories might predict the same average eergy for the electro. A parametric test might ot be sufficiet to distiguish betwee the two theories. The shapes of their eergy spectrums are quite differet: Theory 1: the spectrum for a eutro decayig ito two particles (e.g. ->p + e). Theory : the spectrum for a eutro decayig ito three particles (->p + e +??). We would like a test that uses our data to differetiate betwee these two theories. I previous lectures we have ru across the chi-square ( ) probability distributio ad saw that we could use it to decide (subjectively) if our data was described by a certai model. ( yi f ( xi, a, b...)) i 1 i (y i ± σ i, x i ) are the data poits ( of them) f(x i, a, b..) is a fuctio ( model ) that relates x ad y 3
Example: We measure a buch of data poits (x, y±σ) ad we believe there is a liear relatioship betwee x ad y: y=a+bx If the y s are described by a Gaussia pdf the we saw previously that miimizig the X fuctio (or LSQ or MLM methods) gives us a estimate for a ad b. Assume: We have 6 data poits. Sice we used the 6 data poits to fid quatities (a, b) we have 4 degrees of freedom (dof). Further, assume that: ( y ( a 6 i bx i i 1 i )) What ca we say about our hypothesis, the data are described by a straight lie? 15 To aswer this questio we fid (look up) the probability to get X 15 for 4 degrees of freedom: P(X 15, 4 dof) ~ 0.006 Thus i oly 6 of 1000 experimets would we expect to get this result by chace. Sice this is a such a small probability bilit we could reject the above hypothesis or we could accept the hypothesis ad ratioalize it by sayig that we were ulucky. It is up to you to decide what at probability level you will accept/reject the hypothesis. I high eergy physics the stadard is 5 sigma or ~6x10-7 4
Cofidece Levels (CL) A iformal defiitio of a cofidece level (CL): CL = 100 x [probability of the evet happeig by chace] The 100 i the above formula allows CL's to be expressed as a percet (%). We ca formally write for a cotiuous probability distributio p: CL 100 prob( x1 X x) 100 x x 1 p( x) dx For a CL we kow p(x), x 1, ad x Example: Suppose we measure some quatity (X) ad we kow that X is described by a Gaussia pdf with mea μ = 0 ad stadard deviatio σ= 1. What is the CL for measurig x (σ above the mea)? CL 100 prob(x ) 100 1 e (x) 100 dx To do this problem we eeded to kow the uderlyig probability distributio fuctio p. If the pdf was ot Gaussia (e.g. biomial) we could have a very differet CL. If you do t kow the pdf you are out of luck! Iterpretatio of the CL ca be easily abused. Example: We have a scale of kow accuracy (Gaussia with σ= 10gm). We weigh somethig to be 0 gm. Is there really a.5% chace that our object really weighs 0gm?? Probability distributio must be defied i the regio where we are tryig to extract iformatio. Iterpretatio of the meaig of a CL depeds o Classical or Baysia viewpoits. Baysia ad Classical are two schools of thought o probability ad its applicatios. e x dx.5% 5
Cofidece Itervals (CI) For a give Cofidece Level, cofidece iterval (CI) is the rage [x 1, x ]. Cofidece Iterval s are ot always uiquely defied. We usually seek the miimum or symmetric iterval. Example: Suppose we have a Gaussia distributio with = 3 ad = 1. What is the 68% CI for a observatio? We eed to fid the limits of the itegral [x 1, x ] that satisfy: x 0.68 p( x) dx x 1 For a CI we kow p(x) ad CL. We wat to determie x 1 ad x For a Gaussia distributio the area eclosed by ±1σ is 0.68. x 1 = 1 = x = +1 =4 The Cofidece Iterval is [,4]. Upper Limits/Lower Limits If evets from a Poisso process are observed we ca calculate l upperad lower limitsits o the 1 average, λ: e e CL upper CL lower r1! r0! Example: Suppose a experimet observed o evets of a certai type they were lookig for. u What is the 90% CL upper limit o the expected umber of evets (λ)? e CL 0.90 If =.3 the 10% of the time 1! we expect to observe zero evets e e 1 CL 0.10 1 e eve though h there is othig wrog 1! 0! with the experimet!.3 6
Example: Suppose a experimet observed oe evet. What is the 95% CL upper limit o the expected umber of evets (λ)? 095 e CL0.95! e 1 e 1CL0.051 e! 0! 4.74 e Hypothesis Testig for Gaussia Variables If we wat to test t whether the mea of some quatity we have measured (x = average from measuremets) is cosistet with a kow mea (μ 0 ) we have the followig two tests: Test Coditio Test Statistic Test Distributio = 0 kow x 0 Gaussia / = 0, = 1 = 0 ukow x 0 t( 1) s / s = stadard deviatio extracted from the measuremets. t( 1): Studet s t-distributio with 1 degrees of freedom. Studet is the pseudoym of statisticia W.S. Gosset who was employed by a famous Eglish brewery. 7
Procedure for Hypothesis Testig a) Measure somethig. b) Get a hypothesis (sometimes a theory) to test agaist your measuremet ( ull hypothesis, H 0 ). c) Calculate the CL that the measuremet is from the theory. d) Accept or reject the hypothesis (or measuremet) depedig o some miimum acceptable CL. Problem: How do we decide what is a acceptable CL? u Example: What is a acceptable defiitio that the space shuttle is safe? Oe explosio per 10 lauches or per 1000 lauches or? H 0 is True H 0 is False Reject Type 1 error OK Accept OK Type error I hypothesis testig we are assumig that H 0 is true. We ever disprove H 0. If the CL is low all we ca say is that our data do ot support H 0. If our CL is α% (e.g. 5%) the we make a type 1 error % of the time if H 0 is true! I a trial H 0 = iocet. Covictig a iocet perso is a type 1 error while lettig a guilty perso go free is a type error. 8
Example: Do free quarks exist? (NO they do t!) Quarks are ature's fudametal buildig blocks ad are thought to have electric charge ( q ) of either (1/3)e or (/3)e (e = charge of electro). Suppose we do a experimet to look for q = 1/3 quarks. Measure: q = 0.90 ± 0. (This gives ad ) Quark theory: q = 0.33 = proto= uud eutro=udd Test the hypothesis = 0 whe is kow: 0 There are types of quark boud states: BARYONS (protos, eutros)=3 quarks MESONS (quark, ati-quark) Use the first lie i the table: z x + =ud 0 0.9 0.33 / 0./ 1.85 K + =us Assumig a Gaussia distributio, the probability for gettig g a z.85, 1 prob(z.85) P(,, x)dx P(0,1, x)dx e x dx 0.00.85.85.85 CL is just 0.%! If we repeated our experimet 1000 times, two experimets would measure a value q 0.9 if the true mea was q = 1/3. This is ot strog evidece for q = 1/3 quarks! (We make a type I error 0.% of the time IF the hypothesis were actually true) If istead of q =1/3 quarks we tested for q =/3 what would we get for the CL? μ= 0.9 ad σ = 0. as before but μ 0 = /3. z = 1.17 prob(z 1.17) = 0.13 ad CL = 13%. Quarks are startig to get believable! (BUT do t trust the experimet) 9
Cosider aother variatio of q = 1/3 problem. Suppose we have 3 measuremets of the charge q: q 1 = 1.1, q = 0.7, ad q 3 = 0.9 We do't kow the variace beforehad so we must determie the variace from our data. use the secod test i the table: 1 3 (q 1 q q 3 ) 0.9 (q i ) i 1 s i1 1 z x 0 s / 0. (0 0.) 0 0.9 0.33 0. / 3 4.94 0.04 Need a t distributio table: Table 7. of Barlow: prob(z 4.94) 0.0 for 1 =. 10X greater tha the first part of this example where we kew the variace ahead of time. Tests to see if two meas are cosistet with each other Test Coditios Test Statistic Test Distributio = 0 1 ad x 1 x 1 Gaussia kow 1 / /m = 0, = 1 1 = 0 1 = = x 1 x t( + m ) ukow Q 1/1/m 1 = 0 1 x 1 x approx. Gaussia ukow s 1 / s /m = 0, = 1 Q ( 1)s 1 (m 1)s m 10
Example: We compare results of two idepedet experimets to see if they agree with each other. Exp. 1 1.00 ± 0.0101 Exp. 1.04 ± 0.0 Use the first lie of the table ad set = m = 1. x 1 x 1.04 1.00 z 1.79 1 / /m (0.01) (0.0) z is distributed accordig to a Gaussia with Probability for the two experimets to disagree by 0.04 : prob( z 1.79) 1 1.79 P(,,x)dx 1 P(0,1, x)dx 1 1 1.79 1.79 We do't care which experimet has the larger result so we use ± z. 1.79 1.79 e x 1.79 7% of the timeweshould expect the experimets to disagree at this level. Is this acceptable agreemet? If we reject the hypothesis ad it actually true, we make a type I error 7% of the time dx 0.07 11