Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

Size: px
Start display at page:

Download "Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede"

Transcription

1 Hypothesis Testing: Suppose we have two or (in general) more simple hypotheses which can describe a set of data Simple means explicitly defined, so if parameters have to be fitted, that has already been done Let us (arbitrarily) call the most important hypothesis H, the null hypothesis The other hypothesis is called H, the alternative hypothesis We want to make a rigorous, quantitative decision in terms of choosing between the two hypotheses H and H, based on measurements (which may have already been performed) Let us call that set of measurements an experiment Let W represent the space of all possible outcomes of that experiment We can think of W as containing two interesting regions: Region is the so-called ical region (aka the rejection region) ω If an experiment has its outcome in the ical region ω, then we reject the null hypothesis H Region is the acceptance region W ω If an experiment has its outcome in the region W ω, then we accept the null hypothesis H In general, we use a test statistic to define the two regions Thus, hypothesis testing is reduced to studying the properties of test statistics Let the random variable x be defined as the test statistic We define the level of significance of the hypothesis test as the probability that the test statistic x will fall in the ical region (aka rejection region) when H is true: = P x ( ω H ) Thus, we see that is the probability that we will reject the null hypothesis H even though H is true This is bad physically, it represents a loss and therefore hopefully is small We define the power of the hypothesis test β as the probability that the test statistic x will fall in the ical region when the alternative hypothesis H is true: ( ω H ) β = P x Thus, we see that β is the probability that we will reject the null hypothesis H when the alternative hypothesis H is true This is good therefore, hopefully β is large P598AEM Lecture otes 5

2 Finally, we see that β must therefore be the probability that we will accept the null hypothesis H when the alternative hypothesis H is true Physically, β represents a contamination Thus: β = P x W ω H This is bad hopefully β is small A good hypothesis test will choose the test statistic x and the ical region ω such that both (loss) and β (contamination) are small This will minimize: a) Errors of the st kind (ie loss ), which occur when the null hypothesis H is rejected even though the null hypothesis H is true (occurs with probability = ) b) Errors of the nd kind (ie contamination ), which occur when the null hypothesis H is accepted when the alternative hypothesis H is true (occurs with probability = β ) A PDF ( ) A PDF ( ) f x H exists for the test statistic x if the null hypothesis H is true f x H exists for the test statistic x if the alternative hypothesis H is true Then: = ( ω ) = ( ) = ( ) P x H f x H dx f x H dx ω x ω x = x = W ω and: β = ( ω ) = ( ) = ( ) P x H f x H dx f x H dx These relations are shown graphically in the figure below: f β ( x H ) f x H dx f x H dx loss Acceptance Region, W ω contamination f ( x H ) Critical Region, ω aka Rejection Region x β β x x P598AEM Lecture otes 5

3 An Example From Particle Physics: Suppose we are carrying out a proton-proton elastic scattering experiment: p + p p+ p Define this to be the null hypothesis H Assume the experiment uses particle detectors which are only sensitive to charged particles Sometimes the inelastic reaction p+ p p+ p+ π also occurs Define this to be the alternative hypothesis H ow, the neutral pi-meson decays via the electromagnetic interaction to two gamma rays, ie π γ + γ But since the gamma rays are electrically neutral particles, they will not be detected in this experiment How do we separate H from H ie how do we classify a given event as an elastic scattering one vs an inelastic scattering event? If the experiment measures the 3-D momentum components p = pxˆ ˆ x + py y + pz zˆ of all of the final-state charged particles (and we are % certain that they are protons, + ie other inelastic scattering processes such as p + p p+δ + n have (somehow) been eliminated), then using the measured momenta of the two final-state protons and conservation of momentum and energy, we can calculate the missing mass squared ( M ) for each event This quantity is just the (square) of the (relativistic) invariant mass of a hypothetical unobserved particle, if the reaction is assumed to be p+ p p+ p+ M If the p+ p scattering reaction is truly elastic and if the final-state charged particle momenta were measured perfectly, of course we would expect to find M = for each event However, if the reaction were truly the alternative hypothesis, that of the inelastic scattering reaction p+ p p+ p+ π, we would expect M = Thus, here we choose M miss to be the test statistic If the final-state charged particle momentum measurements had no uncertainties associated with them, we would simply have: miss If M miss = then the null hypothesis H is true (ie p + p elastic scattering) If M = then the alternative hypothesis H is true (ie p + p inelastic scattering) miss M π However, there are finite final-state momentum measurement uncertainties associated with each event In fact, there are typically different uncertainties on the measurements of the different x, y, z components of p for each charged particle {typically σ p σ x p are comparable, y but σ p z is significantly different} Suppose (in a separate, modified experiment) that we can study known/pure samples of each kind of scattering elastic and inelastic p + p scattering The probability distributions associated with M miss for the two kinds of scattering might be similar to those shown in the two figures below: M π miss miss miss P598AEM Lecture otes 5 3

4 P( M miss ) loss elastic P( M miss ) contamination ical region, ω M miss inelastic We define the ical region ω as the region β β M M π M miss M miss > M where M is some ical mass Clearly, we would like to have both both and β = zero, or at least as small as possible In this example, we see how compromises must be made If we want the power of the of the hypothesis test β to be large (ie β small) so that our sample includes as little background as possible, then the level of significance of the hypothesis test will be large and we will lose many real p + p p+ p elastic scattering events On the other hand, if we prefer to minimize our losses of real events we will have to settle for a large background contamination One solution to this difficulty would be to design/build an experiment with better resolution (ie narrower M miss distributions) Monte Carlo simulations of the experiment should have been done before and during the design of the detector in order to discover problems like this! A procedure known as the eyman-pearson Test will determine the optimum value of the test statistic X that simultaneously maximizes the the power of the of the hypothesis test β ie determines the smallest/least contamination β for a given level of significance of the hypothesis test (ie a loss/inefficiency of the signal) Formally, we want to maximize β = P( x ω H ) for a given value of P( x ω H ) Let the PDF for measuring X be ( ) and ( ) = f x H if the null hypothesis H is true, f x H if the alternative hypothesis H is true Suppose that x corresponds to the measurement of a single random variable, x being the ical value (ie if x > x, we reject H although it is true a loss) with P598AEM Lecture otes 5 4

5 Then: = ( ω ) = ( ) = ( ) P x H f x H dx f x H dx ω x ω x and: β = ( ω ) = ( ) = ( ) P x H f x H dx f x H dx where the integrals are over the ical region ω f ( x H ) We can rewrite: β = f ( x H) dx= f ( x H) dx x X f ( x H ) f ( x H) ote that is also the Likelihood ratio of hypothesis H relative to hypothesis H : f ( x H ) f ( x H) L ie f ( x H) L ( x H ) ( x H ) f L L β = f ( x H) dx= f ( x H) dx x f x = expectation value of L L ow the null hypothesis H is true in the ical region ω We know that numerically: f x H f x H β = ( ) = f x H f x H f x H x dx η η where η is some point in the ical region ω In order to make β as large as possible for a given/fixed value of, we must make the f ( x H) ratio as large as possible for all points in the ical region ω f ( x H ) Turning this around, we can use this to define the ical region ω For example, if we decide that we want β to be eg k = times as large as, β f ( x H ) then we can define the ical region ω as the region where = > k = f ( x H ) In principle, we can solve this and get a compact rule for defining the ical region ω ( H) ( H ) L L More generally, the ical region ω is defined such that: > k L L where L is a likelihood and k is a constant (ie just a number) nb For composite hypotheses, where parameter estimation and hypothesis testing must be done simultaneously, general techniques for choosing the best test are not well developed Often, one must eg resort to Monte Carlo and brute force techniques P598AEM Lecture otes 5 5

6 Example: The lifetime of the Σ baryon This is a very short lived particle that decays Σ Λ + γ via the electromagnetic interaction, the Λ baryon subsequently decays via the (charged) weak interaction, eg Λ p + π We assume that the electromagnetic decays of the Σ baryon obey the exponential decay law and thus can be described by an exponential probability density for the time at which the Σ decays: ( ; τ ) f t = τ This normalized PDF gives the probability that the Σ baryon will decay in an infinitesimal time interval dt between t and t dt f t; τ dt with τ = the mean lifetime of the Σ baryon + as Suppose two theories predict the mean lifetime of the Σ baryon to be τ and τ, respectively Then let: H be the null hypothesis that τ = τ And let: H be the alternative hypothesis that τ = τ ie if f t; τ = e τ and if τ t H is true, then: t e τ Suppose that we have measured the decay times t i for The Likelihood Ratio for the two hypotheses is: L L ( H ) ( H ) L( ti τ τ ) o i= L( ti τ τo) i= f t; τ = e τ t τ H is true, then: Σ baryon decays ti τ e given = = = = given = t i τ e This ratio should exceed some factor k in order to make β more than k times as large as : e ti i= τ L L ( H) ( H ) ti i= τ = e > k ti i = ow the measured mean lifetime is the sample mean: t = ie This can be solved, yielding: t > k+ T ln ln τ e t τ > k P598AEM Lecture otes 5 6

7 Summary: If we consider t space, the ical region which can best reject H relative to H with t > lnk+ ln τ T a factor k is With this erion, β will be k times as large as, where β is the contamination and is the loss In order to obtain actual numbers, we must fix/specify or β Let us work out examples where we fix the loss Here: τ τ H is the alternative hypothesis that τ = τ H is the null hypothesis that = Then: = Probability that we will reject H although it is true = ( given τ ) ical region in t f t dt We first examine the extreme case of =, ie we make a single measurement of the lifetime Then t = t, ie the mean lifetime = the (one) lifetime measurement t Then the two PDF s are: f ( t given τ ) e τ f t given τ = e τ τ t τ = and: T P598AEM Lecture otes 5 7

8 Let us demand, for example, that = 5%, ie we reject the true hypothesis H 5% of the time t Then: 5 e τ dt T τ = = defines the ical (aka rejection) region ω Evaluating this yields: T τ = 996 3, or: T 3τ The value of β is also now determined: ie the contamination is β = 78% t τ T τ T τ β = e dt = e = e T τ Suppose that our single measurement t of the mean lifetime of the Σ baryon comes out to t = t < T = 3τ Then we accept H, ie we accept the hypothesis that τ = τ be Only 5% of the time will we reject H, even though it is true, if we have chosen T = 3τ Unfortunately, we will accept H a fraction β = 78% of the time even when H is true, ie when τ = τ! Thus, here we commit a Type-II error This is not a very statistically significant test {but our data only consists of one single event } It is not optimal in the eyman-pearson sense, since: β = 45, however if we solve 5 for k given the value = 5 we get k = 4 Actually, we solve t lnk ln T T = 3τ > + τ for k using = and We will indeed do better when is large and we make many measurements of decay times, t i However, if we instead have a statistically significant sample of events, ie is now very large, in this regime the statistical uncertainty t t σ = on the experimental determination of the mean lifetime t is now Gaussian-distributed The PDF s will then be: ( t τ ) τ given f t τ f t τ = e π τ ( ) ( t τ ) 4τ given f t τ f t τ = e π ( τ ) P598AEM Lecture otes 5 8

9 Here, we obtain the lower boundary of the ical region ω from ( τ ) = f t dt For = 5 and using the Gaussian single-sided upper 95% CL table on p 8 of P598AEM Lect otes 8, we find: T τ = +, or: T = + τ We then use all of the events and calculate β ( τ ) T f t dt = T 645 For example, suppose = Then: T = + τ = 645τ of the ical region ω for a loss of = 5% is the lower boundary We also obtain β = 99999, ie a contamination of β = = % This is fine!!! The two (Gaussian) probability distributions are now well-separated from each other, as shown in the figure below: = Critical Region, ω τ T = 645τ τ P598AEM Lecture otes 5 9

10 Parametric Tests involve the parameters of a distribution often the Gaussian distribution An example from particle physics is the search for new particles For each event of a particle physics reaction such as: ee π ππππ, we might calculate the invariant mass M π + π of the π + π pairs (nb there are four possible + opposite charged sign pion pair combinations for each event) and plot a histogram of the invariant mass M π + π of the π + π pairs A bump in the M π + π mass distribution could indicate that we are observing the preferential + + production ee πππ + R, where the particle R subsequently decays (in s?) to π π + # of pairs ma m R m b mass of pair Even if we could pick out just those π + π pairs which truly came from R decay, and plotted m R, we would still get a broad bump (and not a δ-function) for two reasons: experimental uncertainties on the measurements of the components of p π + and p π (which are used in the calculation of Γ= τ 66 MeV for τ = s, since m R ) and also the uncertainty principle 66 MeV -s ΔEΔt natural linewidth, If we try to select π + π pairs which come from R decay by defining an upper and lower limit on m R we will get them and also: a) π + π pairs which are not associated with R particles b) π + π pairs made of the π + from an R and an unassociated π (and vice versa) The invariant π + π pair masses of these background events need not peak, and even if they do, there is no reason to expect them to peak near m R We usually assume that the background is described by a smooth function of mass (nb We can use a Monte Carlo program to generate such a background!) Then, we can interpret the histogram as a possible superposition of R decay events (solid red line) and background (dashed blue line) in above figure Of course, statistical fluctuations in the number of events per bin σ ~ n are always present It is possible that the entire bump near n i i m R could be due solely to statistical fluctuations P598AEM Lecture otes 5

11 Let us estimate the probability that in the region m a to entirely due to a statistical fluctuation of the background m b the observed effect at Here, one simple procedure (there are other, more elaborate ones too) is: m= m is Let be the total number of events in the bump region m a to m b Let B be the number of events in that region which are Background Let H be the null hypothesis that = B We assume that there is a way to estimate B and σ B, which are assumed to be Gaussiandistributed, either from fitting the mass distribution away from the bump, or from some model, or from a Monte Carlo program, etc If H is true, then B is consistent with zero If we assume that B and are estimated independently, then: σ = σ + σ B B If is large enough that it is (approximately) Gaussian, then σ B B Δ= will be a Gaussiandistributed random variable with mean = and unit standard deviation, ie distributed as (,) In order to be able to decide whether or not the null hypothesis H is true, we simply apply the rules for Gaussian distributed random variables Thus, Δ is the number of standard deviations away from the expected mean, which is zero We can look in tables for the probability P ( Δ ) that the given value of Δ will occur by chance Finally, P ( Δ ) is the probability that the alternative hypothesis H is true, ie the total events are not simply a statistical fluctuation of the B background events In particle physics, it is currently not fashionable to quote even a 3 or 4 standard deviation bump as a new particle without a lot of confirming work Although the probability of B > 4σ is very small (odds are 6,:), there are so many mass plots studied in all highenergy experiments that we expect to occasionally see such large fluctuations (In some years, it has been estimated that well over, mass plots were examined!) The Goodness of Fit Test usually describes a situation where we want to accept or reject a hypothesis H without specifying an alternative We can still have a PDF f ( x H ) and a ical region ω, such that if x ω, then we reject the hypothesis H The probability that we will reject the null hypothesis H, even though it is true, is still = ω ( ) f x H dx However, since we have no alternative hypothesis H we have no way to calculate the probability of accepting H although it is false, ie that H is true R P598AEM Lecture otes 5

12 Thus, we can make a quantitative statement against H (eg a LSQ fit of a theory prediction to the data is bad), but we have no quantitative evidence for H This leads to the (imprecise) statement that one can never prove a theory, but one can disprove it How does one choose tests in this case? Some turn out to be more sensitive than others to deviations from H There are other types of tests, called distribution-free methods, where if T x has a known distribution, independent of the distribution of x H is true a test statistic In these cases (many such T s exist), one calculates T and then reads from a table the corresponding level of significance of the test One test-statistic example is distribution ( x) normalized so that ( x) F to observed data Here, χ in the test of the goodness of fit of a theoretical probability F x is typically proportional to a PDF and F Δx gives the number of measurements predicted to fall in the range x to x +Δ x The data is typically plotted in a histogram of bins with constant bin width Δ x Let us assume that the number of events predicted in the i th histogram bin, n i is Poisson distributed Then the variance σ =, or: σ = Let ni ni ni m i be the number of experimentally-measured data events in the i th histogram bin ni The test statistic is: T bins ( m n ) ( m n ) bins i i i i i= σ n i= n i i = = If the null hypothesis H is true, ie if the theory does agree well with the measurements, then the test statistic T will be χ -distributed, and we already know how to test for its significance Some detail: Let the total number of measurements be (nb same # for theory and for data!) Then: bins = ni where bins i= is the number of histogram bins Suppose that the PDF of the theory is such that n f ( x ) i = i Then the test statistic is: T i= ( i) ( x ) bins mi f x = f i bins Because of the normalization condition = n, we see that not all of the terms in the test i= statistic T are independent Once we know the contents of the histogram bin # s,,, bins, the # events in the last histogram bin is fixed by the requirement that the total number of events add up to Thus, there are degrees of freedom for the χ distribution bins i P598AEM Lecture otes 5

13 If, in addition we were eg fitting for M λ -parameters, then we d instead have M degrees of freedom for χ The reason that we can use this to reject H if we expect (on average) that each term in the bins χ sum (instead of being χ is too large is that if H is not true, then, will have a nonzero expectation value and contribute much more to the sum than it would if H were true) One drawback for the χ test is that if the number of measurements is small, the theory of χ won t hold In that case we must use other test statistics, eg those based on Likelihood This is very messy, and in this situation we d prefer using the Kolmogorov-Smirnov Test The Kolmogorov - Smirnov (K-S) Test: The K-S test looks at the Cumulative Distribution Function F ( x X) rather than the Probability Density Function f ( x ), and is based on ordered statistics o binning of the measurements S ( x X) is required here, as would be the case eg using a the K-S test also has many variations The Standard Two-Sided/Double-Sided K-S Test: χ test ote that Assume that we have independent measurements x, x,, x of a random variable that we want to compare to an (apriori known/assumed) theoretical prediction, with accompanying f x and corresponding theory Cumulative Distribution Function (CDF): theory PDF X F x X f x dx = We then form the so-called empirical sample/experimental Cumulative Distribution Function S x X as follows: (CDF) Define: S ( X = ) Order (ie sort) the measurements in increasing values of the random variable x (After ordering/sorting, the measurements are located at: X = x x x3 x Define: S ( x X) S ( x ) k, ie S ( x X) measurement k k X = x Thus: S ( x ), k S x, increases/increments by at each Then: S ( x X ) S ( x ) = =, and thus we can define: S X =+ P598AEM Lecture otes 5 3

14 For a continuous theory CDF F( x X), the so-called two-sided (aka double-sided) K-S test statistic D is defined as the supremum of the maximum absolute difference between S x X F x X CDF s : experiment ( ) and theory Since D sup S x X F x X k k k= : S xk X S xk k, the K-S test statistic D is computed as: { } k k D sup S x F x = max F x, F x k k k k k= : k= : where the independent experimental measurements x k {k = :} of the random variable x have been sorted into ascending values: x < x < x3 < xk < < x The theory CDF F ( x X) F( x ) is correspondingly evaluated at k k F x for each value of k A typical double-sided/two-sided KS test comparison of the {absolute} difference between experiment & theory CDF s vs X {with the K-S test statistic D indicated in green} might look similar to that shown in the figure below:, F x S x Theory F x k k k x k F( x ) F( x ) D = sup S x F x k k k { k k } = max, D x x 3 x 4 x 5 x6 x7 Experiment S x k If the null hypothesis S x X = F x X ) and provided that no parameter(s) of the theory have been estimated from the data, the double-sided K-S test statistic D is distributed in such a way that it is independent of the choice of the theory CDF F ( x X) ie the double-sided K-S test statistic D is distribution free if is large enough! If the experiment S ( x X) is repeated a gazillion times, it can be seen that the doublesided K-S test statistic H is true (ie D is itself a random variable If the null hypothesis S x X = F x X ), then in the asymptotic / infinite statistics limit, the double-sided K-S test statistic D converges to zero: H is true (ie { } { D } S ( x X) F( x X) lim lim sup = k k k= : k k X P598AEM Lecture otes 5 4

15 However, if the null hypothesis H is true (ie S ( x X) F( x X) = ), then in the asymptotic / infinite statistics limit, the product D z is a random variable that statistically converges to the Kolmogorov-Smirnov Test Statistic Probability Distribution Function (PDF), f ( z) (valid for > 8 ): K S { } { } lim D lim sup S x X F x X = f z k k K S k= : nb f ( z) = max B() t = PDF of the maximum of the absolute value of B ( t ) ( t ) K S t [,], where B () t is the so-called Brownian bridge, and here, in this situation: t = FK S( z Z), the Kolmogorov-Smirnov Test Statistic Cumulative Distribution Function (CDF), =, ( t FK S( z Z) ) Z F z Z f z dz K S K S = The analytic form of the Kolmogorov-Smirnov Test Statistic Probability Distribution f z, the PDF associated with the asymptotic / infinite statistics limit is Function K S the infinite series expression (valid for > 8 ): K S 8 ( ) k = k k z f z = k z e Linear and semi-log plots of the asymptotic / infinite statistics limit Kolmogorov- Smirnov Test Statistic Probability Distribution Function fk S( z) vs z are shown respectively in the two figures below: The Cumulative Distribution Function (CDF) associated with the Kolmogorov-Smirnov fk S z is: Test Statistic Probability Distribution Function Z F z Z f z dz = K S K S P598AEM Lecture otes 5 5

16 Linear and semi-log plots of the Kolmogorov-Smirnov Test Statistic Cumulative FK S z Z vs Z are shown respectively in the two figures below: Distribution Function The analytic form of the Kolmogorov-Smirnov Test Statistic Cumulative Distribution FK S z Z, in the asymptotic / infinite statistics limit (valid for > 8 ) is: Function k Z π FK S( z Z) = ( ) e = ( ) e = e z Z ote that since k k k Z k 8 Z k= k= k= FK S z Z f K S z dz fk S z = FK S z = Z z, hence differentiating the above infinite series expression for the Kolmogorov-Smirnov Test Statistic Cumulative Distribution Function FK S( z Z) with respect to z = Z, we obtain the above infinite series expression for the Kolmogorov-Smirnov Test Statistic Probability Distribution f z Function K S =, then From either of the above two plots of the Kolmogorov-Smirnov Test Statistic Cumulative Distribution Function FK S( z Z) vs Z, since in the asymptotic / infinite statistics limit Z = D, if we eg choose a ical value of FK S( z Z ) = = 8 (ie = ), there thus exists a corresponding Z = Z for this choice of Physically, this choice of ical value choice means that if the null hypothesis H is true S x X = F x X ), then the Kolmogorov-Smirnov Test Statistic Prob Function (ie Prob ( z Z ) F ( z Z ) = = ie % of the time we will reject the null K S K S hypothesis H, and we thus correspondingly commit an error of the st kind The analytic form of the Kolmogorov-Smirnov Test Statistic Prob Function ProbK S( z Z ) FK S( z Z ), in the asymptotic limit (valid for > 8 ) is: k k Z k k Z Prob z Z F z Z = e = e p-value K S K S k= k= P598AEM Lecture otes 5 6

17 Linear and semi-log plots of the Kolmogorov-Smirnov Test Statistic Prob Function, Prob z Z F z Z vs Z are shown in the two figures below, respectively K S K S The * points marked on the curves of either of the above Prob ( z Z ) F ( z Z ) K S K S vs Z plots indicate several useful choices of ical values - Z associated with the asymptotic ( > 8) limit of the Kolmogorov-Smirnov Test Statistic Prob Function These are summarized in the table below: Asymptotic ( > 8) Critical Values of the Kolmogorov-Smirnov Test Statistic: Z = D (%) Z % 7749 % % % 5747 % 6764 % % 55 Typically (by convention) the ical value = 5 = 5% is most frequently chosen The p-value Prob ( z Z ) F ( z Z ) = K S K S associated with the double-sided single-sample K-S test statistic D Z in the asymptotic limit (valid for > 8 ) is: k k z π k 8z p-value = e = e z k= k= When the p-value =, then D =, ie S ( x X) perfectly matches F( x X) When the p-value =, then S ( x X) is not at all statistically compatible with F( x X) The null hypothesis H, namely that S ( x X) F( x X) corresponding experimental vs theoretical PDF s s ( x) f ( x) = (or, equivalently the P598AEM Lecture otes 5 7 for all X = for all x ) is formally rejected if D > Z, or equivalently if D > D Z, or equivalently if the p-value > when using the above table (valid for > 8 )

18 However for experimental samples with < 8, using the above table for asymptotic / infinite statistics limit results are increasingly inaccurate as For decent accuracy, we must instead use the table given below of ical values of D Z for the double-sided single-sample K-S test for specified values of such that ProbK S( D D Z ) = Critical Values of D Z for the Double-Sided Single-Sample Kolmogorov-Smirnov Test: = = = 5 = = = = = 5 = = Over For experimental samples with < 8 the following parameterization can alternatively be used: Z + + D MA Stephens, Journal of the Royal Statistical Society, Series B, Vol 3, p 5-, 97 An example of the use of the Table of Critical Values of K-S Statistics for < 8: Suppose we have = 7 measurements of the random variable x From the above table: = = = 5 = = D = 385 D = 436 D = 4834 D = 5384 D = 5758 and are told that for = 7 measurements that ProbK S( D D Z ) = P598AEM Lecture otes 5 8

19 We compare the value of D obtained from our double-sided single-sample K-S test with D from the above table If the null hypothesis H ( S ( x X) = F( x X) ) is true, then for = 7 events: D 385 8% of the time (and: % of the time we reject the null hypothesis) D 436 9% of the time (and: % of the time we reject the null hypothesis) D % of the time (and: 5% of the time we reject the null hypothesis) D % of the time (and: % of the time we reject the null hypothesis) D % of the time (and: % of the time we reject the null hypothesis) If our D is larger than one of these values, then we also can cast doubt on the truth of the null hypothesis, at the level indicated K-S Test Confidence Intervals: For the assumption that the null hypothesis H S ( x X) F( x X) corresponding experimental vs theoretical PDF s s ( x) f ( x) = (or, equivalently the = for all x ) is true, the K-S test statistic D is itself a random variable, and has a PDF that is universal and independent of the choice of the theoretical CDF F( x X), and furthermore is known for all Thus, one may use the K-S test statistic D to construct Confidence Intervals for any continuous theory CDF F x X F x X, we can write the K-S test probability statement: ( ) Thus, for any such ProbK S D > D Z = We can invert this probability statement to obtain a Confidence Interval statement about the F x X, valid for all x: theory CDF ({ } { }) Prob S x X D < F x X < S x X + D = K S Physically, this statement means that, for any point x, the theory CDF F( x X) of being larger than ( ) but smaller than a probability S x X D Hence, one can construct a Confidence Interval of width S x X CDF ( ); the probability that the true theory CDF F ( x X) Confidence Interval is will have S x X + D ± D centered on the experimental lies within this If one or more parameters of the theory have been estimated from the data (eg the sample mean and/or the sample variance), the ical values in the above table are invalid for use in this situation However, in this situation, tables of ical values for these kind of K-S tests have been prepared/do exist for certain specific cases, such as the Gaussian/normal distribution and the exponential distribution See eg Table 54 of the Biometrika Tables for Statisticians, edited by ES Pearson and HO Hartley (97) See also MA Stephens, EDF Statistics for Goodness of Fit and Some Comparisons, Journal of the American Statistical Association, Vol 69, o 347, p (Sept 974) P598AEM Lecture otes 5 9

20 The K-S Test for Two Experimental CDF s: If we wish to compare two independent experimental samples, with CDF s S ( x X) S ( x X), respectively for the null hypothesis version of the above K-S Test also exists for use in this situation Sample S ( x X) entries and sample S ( x X) has works for any pair of empirical experimental sample CDF s S ( x X), S ( x X) Here, the two-experiment double-sided K-S test statistic and H that they are the same/identical, a modified has entries Again, this is a distribution-free test, one which, D is defined as: D sup S x X S x X, k j k= :, j= : The figure below shows the CDF s and the K-S statistic D for two experimental samples,, with empirical sample CDF s S ( x X) k and S x X j : Since the two experimental samples both have finite statistics, the ability to reject the null hypothesis H will be correspondingly weaker The null hypothesis H S( x X) = S( x X) (or equivalently their corresponding PDF s s ( x) s ( x) = for all x ) is formally rejected if: D = D > Z eff,, + P598AEM Lecture otes 5

21 D = D > D Or eff,, + using the above table of ical values k k z π ( k ) 8z p-value = e = e < z or equivalently: The Smirnov Cramér von Mises Test: k= k= with: ( + + ) Z D eff eff, The Smirnov-Cramér-Von Mises Test is somewhat similar to the Kolmogorov Smirnov Test, however the S-C-vM test statistic W uses the entirety of the {square} of the difference SCvM between the two CDF s, not just the maximum {absolute} separation distance D between the two CDF s The S-C-vM test statistic W SCvM is defined as: + = WSCvM S x X F x X f x dx E [ S x X F x X ] X Using: = we can write: F x X f x dx WSCvM S x X F x X df df x = f x dx, thus: When written this way, it is obvious that the S-C-vM test statistic Thus, the SCvM Test is distribution-free for all, not just large Since the sorted empirical experimental sample CDF is statistic W A-D is numerically computed as: W SCvM ( k ) = + F x k = k W SCvM is independent of F! S x X k, the S-C-vM test Once the number W SCvM has been computed for a given, ordered/sorted empirical experimental sample CDF S ( xk X) k vs the xk -sorted theory CDF F ( x k ), we can then refer to a table of ical values for the S-C-vM Test in order to determine whether to accept / reject the null hypothesis H For the S-C-vM Test, in the asymptotic limit, the CDF associated with the a random variable SCvM test statistic is: k ( ) 4 6 k 4 k + F z = k e z J 4k 6z W + + SCvM 4 π z k = k P598AEM Lecture otes 5

22 where the ordinary Bessel function of order is: m= The convergence of this expression fortunately is extremely rapid For, the probability { SCvM } m ( ) ( m ) J x = x m! Γ + + m+ Prob W > W = (%) of the time when the null hypothesis H is true) The table below gives values of W for several choices of Critical Values of the Smirnov Cramér von Mises Test Statistic: W (%) 347 % 46 5% 743 % 68 % 4 6 For finite statistics, the probability Prob WSCvM + W + > = (%) of the time when the null hypothesis H is true S ( x X) = F( x X), See MA Stephens, EDF Statistics for Goodness of Fit and Some Comparisons, Journal of the American Statistical Association, Vol 69, o 347, p (Sept 974) for additional details The Anderson-Darling Test: The Anderson-Darling Test is similar/related to that of the Smirnov-Cramér-Von Mises Test, comparing the entirety of the {square} of the differences between experiment vs theory CDF s over their entirety, for a test of the null hypothesis H S ( x X) = F( x X) If we consider the more general case of a generalized statistical test using the quadratic difference between experiment vs theory CDF s, the generalized quadratic difference test statistic W is: ψ ( ) W S x X F x X F x X df As before, the continuous theory CDF is F( x X) S ( x X) is defined from the sorted empirical experiment sample PDF s ( x X) S ( xk X) k, and ψ () t function that depends on the continuous theory CDF F( x X), the empirical experiment sample CDF : t is some apriori pre-assigned, non-negative weight Here again, the generalized quadratic difference test statistic W is said to be distribution- F x X -dependence is integrated out in the above integral free, since any/all P598AEM Lecture otes 5

23 If the weight function ψ () t is chosen to be a constant (ie unity): ψ ( t) =, then the above generalized quadratic difference test statistic is none other than the S-C-vM test statistic W SCvM : WSCvM S x X F x X df ow, if the experiment is repeated a gazillion times, it can be then seen that the {sorted} S x X is itself a random variable The product empirical experimental sample CDF S ( x X) is in fact a binomial distribution with probability F( x X) The expectation value of S ( x X) is E S ( x X) = F( x X) with variance F( x X) F( x X) Hence, if we instead choose the weight function ψ ( t) for the generalized quadratic difference test statistic to be: ψ () t = = variance t ( t), ie F ( x X ) F( x X) ( F( x X) ) ψ = Then for a specified X, the quantity S ( x X) F( x X) S ( x X) F( x X) ψ ( F( x X) ) = F( x X) ( F( x X) ) in the asymptotic limit has true mean of and unit variance when the null hypothesis S x X = F x X )!!! H is true (ie Thus, for the specific choice of the weight function ψ ( t) = t( t) where t F( x X) =, the generalized quadratic difference test is then known as the Anderson-Darling Test: ( ) ( ) ( ) ( ) S x X F x X WA-D df F x X F x X Since the sorted empirical experimental sample CDF is Darling test statistic W A-D is numerically computed as: k = k S x X k, the Anderson- {( ) log ( k) log ( k) } WA-D = k F x + k + F x Again, the Anderson-Darling test statistic asymptotic limit, the probability { A-D } W A-D is itself a random variable, and in the Prob W > W = (%) of the time when the null hypothesis H is true (ie S ( x X) = F( x X), or equivalently their corresponding PDF s s( x) = f ( x), for all x ) Thus, for the Anderson-Darling Test, in the asymptotic limit, the CDF associated with the A-D test statistic is: P598AEM Lecture otes 5 3

24 k ( ) Γ ( k+ )( 4k+ ) ( + ) π ( + ) ( + ) F ( z) = e e dw WA D z 4k 8z z 8 w 4k π w 8z k = k! The convergence of this expression is also extremely rapid, and is such that for 5, to 3 decimal places of accuracy, the table of ical values of W for various choices of are: A-D Critical Values of the Anderson Darling Test Statistic: W (%) 933 % 49 5% 3857 % 5969 % P598AEM Lecture otes 5 4

Recall the Basics of Hypothesis Testing

Recall the Basics of Hypothesis Testing Recall the Basics of Hypothesis Testing The level of significance α, (size of test) is defined as the probability of X falling in w (rejecting H 0 ) when H 0 is true: P(X w H 0 ) = α. H 0 TRUE H 1 TRUE

More information

Hypothesis Testing - Frequentist

Hypothesis Testing - Frequentist Frequentist Hypothesis Testing - Frequentist Compare two hypotheses to see which one better explains the data. Or, alternatively, what is the best way to separate events into two classes, those originating

More information

Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests

Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests http://benasque.org/2018tae/cgi-bin/talks/allprint.pl TAE 2018 Benasque, Spain 3-15 Sept 2018 Glen Cowan Physics

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.1 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Chapte The McGraw-Hill Companies, Inc. All rights reserved. er15 Chapte Chi-Square Tests d Chi-Square Tests for -Fit Uniform Goodness- Poisson Goodness- Goodness- ECDF Tests (Optional) Contingency Tables A contingency table is a cross-tabulation of n paired observations

More information

Statistical Methods for Particle Physics (I)

Statistical Methods for Particle Physics (I) Statistical Methods for Particle Physics (I) https://agenda.infn.it/conferencedisplay.py?confid=14407 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

FYST17 Lecture 8 Statistics and hypothesis testing. Thanks to T. Petersen, S. Maschiocci, G. Cowan, L. Lyons

FYST17 Lecture 8 Statistics and hypothesis testing. Thanks to T. Petersen, S. Maschiocci, G. Cowan, L. Lyons FYST17 Lecture 8 Statistics and hypothesis testing Thanks to T. Petersen, S. Maschiocci, G. Cowan, L. Lyons 1 Plan for today: Introduction to concepts The Gaussian distribution Likelihood functions Hypothesis

More information

Asymptotic Statistics-VI. Changliang Zou

Asymptotic Statistics-VI. Changliang Zou Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous

More information

Statistic Distribution Models for Some Nonparametric Goodness-of-Fit Tests in Testing Composite Hypotheses

Statistic Distribution Models for Some Nonparametric Goodness-of-Fit Tests in Testing Composite Hypotheses Communications in Statistics - Theory and Methods ISSN: 36-926 (Print) 532-45X (Online) Journal homepage: http://www.tandfonline.com/loi/lsta2 Statistic Distribution Models for Some Nonparametric Goodness-of-Fit

More information

Statistical Methods in Particle Physics. Lecture 2

Statistical Methods in Particle Physics. Lecture 2 Statistical Methods in Particle Physics Lecture 2 October 17, 2011 Silvia Masciocchi, GSI Darmstadt s.masciocchi@gsi.de Winter Semester 2011 / 12 Outline Probability Definition and interpretation Kolmogorov's

More information

H 2 : otherwise. that is simply the proportion of the sample points below level x. For any fixed point x the law of large numbers gives that

H 2 : otherwise. that is simply the proportion of the sample points below level x. For any fixed point x the law of large numbers gives that Lecture 28 28.1 Kolmogorov-Smirnov test. Suppose that we have an i.i.d. sample X 1,..., X n with some unknown distribution and we would like to test the hypothesis that is equal to a particular distribution

More information

Statistics for Data Analysis. Niklaus Berger. PSI Practical Course Physics Institute, University of Heidelberg

Statistics for Data Analysis. Niklaus Berger. PSI Practical Course Physics Institute, University of Heidelberg Statistics for Data Analysis PSI Practical Course 2014 Niklaus Berger Physics Institute, University of Heidelberg Overview You are going to perform a data analysis: Compare measured distributions to theoretical

More information

14.30 Introduction to Statistical Methods in Economics Spring 2009

14.30 Introduction to Statistical Methods in Economics Spring 2009 MIT OpenCourseWare http://ocw.mit.edu 4.0 Introduction to Statistical Methods in Economics Spring 009 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Statistical Methods for Astronomy

Statistical Methods for Astronomy Statistical Methods for Astronomy If your experiment needs statistics, you ought to have done a better experiment. -Ernest Rutherford Lecture 1 Lecture 2 Why do we need statistics? Definitions Statistical

More information

Practice Problems Section Problems

Practice Problems Section Problems Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,

More information

E. Santovetti lesson 4 Maximum likelihood Interval estimation

E. Santovetti lesson 4 Maximum likelihood Interval estimation E. Santovetti lesson 4 Maximum likelihood Interval estimation 1 Extended Maximum Likelihood Sometimes the number of total events measurements of the experiment n is not fixed, but, for example, is a Poisson

More information

Lecture 2: CDF and EDF

Lecture 2: CDF and EDF STAT 425: Introduction to Nonparametric Statistics Winter 2018 Instructor: Yen-Chi Chen Lecture 2: CDF and EDF 2.1 CDF: Cumulative Distribution Function For a random variable X, its CDF F () contains all

More information

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur Lecture No. # 38 Goodness - of fit tests Hello and welcome to this

More information

Statistical Methods in Particle Physics

Statistical Methods in Particle Physics Statistical Methods in Particle Physics Lecture 11 January 7, 2013 Silvia Masciocchi, GSI Darmstadt s.masciocchi@gsi.de Winter Semester 2012 / 13 Outline How to communicate the statistical uncertainty

More information

Finding Outliers in Monte Carlo Computations

Finding Outliers in Monte Carlo Computations Finding Outliers in Monte Carlo Computations Prof. Michael Mascagni Department of Computer Science Department of Mathematics Department of Scientific Computing Graduate Program in Molecular Biophysics

More information

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland Frederick James CERN, Switzerland Statistical Methods in Experimental Physics 2nd Edition r i Irr 1- r ri Ibn World Scientific NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TAIPEI CHENNAI CONTENTS

More information

Investigation of goodness-of-fit test statistic distributions by random censored samples

Investigation of goodness-of-fit test statistic distributions by random censored samples d samples Investigation of goodness-of-fit test statistic distributions by random censored samples Novosibirsk State Technical University November 22, 2010 d samples Outline 1 Nonparametric goodness-of-fit

More information

Weak interactions. Chapter 7

Weak interactions. Chapter 7 Chapter 7 Weak interactions As already discussed, weak interactions are responsible for many processes which involve the transformation of particles from one type to another. Weak interactions cause nuclear

More information

Physics 509: Non-Parametric Statistics and Correlation Testing

Physics 509: Non-Parametric Statistics and Correlation Testing Physics 509: Non-Parametric Statistics and Correlation Testing Scott Oser Lecture #19 Physics 509 1 What is non-parametric statistics? Non-parametric statistics is the application of statistical tests

More information

Inferring from data. Theory of estimators

Inferring from data. Theory of estimators Inferring from data Theory of estimators 1 Estimators Estimator is any function of the data e(x) used to provide an estimate ( a measurement ) of an unknown parameter. Because estimators are functions

More information

Primer on statistics:

Primer on statistics: Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood

More information

Hypothesis testing. Chapter Formulating a hypothesis. 7.2 Testing if the hypothesis agrees with data

Hypothesis testing. Chapter Formulating a hypothesis. 7.2 Testing if the hypothesis agrees with data Chapter 7 Hypothesis testing 7.1 Formulating a hypothesis Up until now we have discussed how to define a measurement in terms of a central value, uncertainties, and units, as well as how to extend these

More information

Hypotheses Testing. Chapter Hypotheses and tests statistics

Hypotheses Testing. Chapter Hypotheses and tests statistics Chapter 8 Hypotheses Testing In the previous chapters we used experimental data to estimate parameters. Here we will use data to test hypotheses. A typical example is to test whether the data are compatible

More information

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Statistical Methods in Particle Physics Lecture 1: Bayesian methods Statistical Methods in Particle Physics Lecture 1: Bayesian methods SUSSP65 St Andrews 16 29 August 2009 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan

More information

Relative branching ratio measurements of charmless B ± decays to three hadrons

Relative branching ratio measurements of charmless B ± decays to three hadrons LHCb-CONF-011-059 November 10, 011 Relative branching ratio measurements of charmless B ± decays to three hadrons The LHCb Collaboration 1 LHCb-CONF-011-059 10/11/011 Abstract With an integrated luminosity

More information

Dr. Maddah ENMG 617 EM Statistics 10/15/12. Nonparametric Statistics (2) (Goodness of fit tests)

Dr. Maddah ENMG 617 EM Statistics 10/15/12. Nonparametric Statistics (2) (Goodness of fit tests) Dr. Maddah ENMG 617 EM Statistics 10/15/12 Nonparametric Statistics (2) (Goodness of fit tests) Introduction Probability models used in decision making (Operations Research) and other fields require fitting

More information

If we want to analyze experimental or simulated data we might encounter the following tasks:

If we want to analyze experimental or simulated data we might encounter the following tasks: Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction

More information

One-Sample Numerical Data

One-Sample Numerical Data One-Sample Numerical Data quantiles, boxplot, histogram, bootstrap confidence intervals, goodness-of-fit tests University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html

More information

Random Number Generation. CS1538: Introduction to simulations

Random Number Generation. CS1538: Introduction to simulations Random Number Generation CS1538: Introduction to simulations Random Numbers Stochastic simulations require random data True random data cannot come from an algorithm We must obtain it from some process

More information

Statistics for the LHC Lecture 2: Discovery

Statistics for the LHC Lecture 2: Discovery Statistics for the LHC Lecture 2: Discovery Academic Training Lectures CERN, 14 17 June, 2010 indico.cern.ch/conferencedisplay.py?confid=77830 Glen Cowan Physics Department Royal Holloway, University of

More information

Distribution Fitting (Censored Data)

Distribution Fitting (Censored Data) Distribution Fitting (Censored Data) Summary... 1 Data Input... 2 Analysis Summary... 3 Analysis Options... 4 Goodness-of-Fit Tests... 6 Frequency Histogram... 8 Comparison of Alternative Distributions...

More information

Systematic uncertainties in statistical data analysis for particle physics. DESY Seminar Hamburg, 31 March, 2009

Systematic uncertainties in statistical data analysis for particle physics. DESY Seminar Hamburg, 31 March, 2009 Systematic uncertainties in statistical data analysis for particle physics DESY Seminar Hamburg, 31 March, 2009 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan

More information

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu

More information

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Lecture 5 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

Statistics for the LHC Lecture 1: Introduction

Statistics for the LHC Lecture 1: Introduction Statistics for the LHC Lecture 1: Introduction Academic Training Lectures CERN, 14 17 June, 2010 indico.cern.ch/conferencedisplay.py?confid=77830 Glen Cowan Physics Department Royal Holloway, University

More information

Statistical Applications in the Astronomy Literature II Jogesh Babu. Center for Astrostatistics PennState University, USA

Statistical Applications in the Astronomy Literature II Jogesh Babu. Center for Astrostatistics PennState University, USA Statistical Applications in the Astronomy Literature II Jogesh Babu Center for Astrostatistics PennState University, USA 1 The likelihood ratio test (LRT) and the related F-test Protassov et al. (2002,

More information

P Values and Nuisance Parameters

P Values and Nuisance Parameters P Values and Nuisance Parameters Luc Demortier The Rockefeller University PHYSTAT-LHC Workshop on Statistical Issues for LHC Physics CERN, Geneva, June 27 29, 2007 Definition and interpretation of p values;

More information

HYPOTHESIS TESTING: FREQUENTIST APPROACH.

HYPOTHESIS TESTING: FREQUENTIST APPROACH. HYPOTHESIS TESTING: FREQUENTIST APPROACH. These notes summarize the lectures on (the frequentist approach to) hypothesis testing. You should be familiar with the standard hypothesis testing from previous

More information

Lecture 8 Hypothesis Testing

Lecture 8 Hypothesis Testing Lecture 8 Hypothesis Testing Taylor Ch. 6 and 10.6 Introduction l The goal of hypothesis testing is to set up a procedure(s) to allow us to decide if a mathematical model ("theory") is acceptable in light

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Statistical Methods for Particle Physics Lecture 4: discovery, exclusion limits

Statistical Methods for Particle Physics Lecture 4: discovery, exclusion limits Statistical Methods for Particle Physics Lecture 4: discovery, exclusion limits www.pp.rhul.ac.uk/~cowan/stat_aachen.html Graduierten-Kolleg RWTH Aachen 10-14 February 2014 Glen Cowan Physics Department

More information

Hypothesis testing:power, test statistic CMS:

Hypothesis testing:power, test statistic CMS: Hypothesis testing:power, test statistic The more sensitive the test, the better it can discriminate between the null and the alternative hypothesis, quantitatively, maximal power In order to achieve this

More information

Goodness-of-Fit Considerations and Comparisons Example: Testing Consistency of Two Histograms

Goodness-of-Fit Considerations and Comparisons Example: Testing Consistency of Two Histograms Goodness-of-Fit Considerations and Comparisons Example: Testing Consistency of Two Histograms Sometimes we have two histograms and are faced with the question: Are they consistent? That is, are our two

More information

Treatment of Error in Experimental Measurements

Treatment of Error in Experimental Measurements in Experimental Measurements All measurements contain error. An experiment is truly incomplete without an evaluation of the amount of error in the results. In this course, you will learn to use some common

More information

Robustness and Distribution Assumptions

Robustness and Distribution Assumptions Chapter 1 Robustness and Distribution Assumptions 1.1 Introduction In statistics, one often works with model assumptions, i.e., one assumes that data follow a certain model. Then one makes use of methodology

More information

Political Science 236 Hypothesis Testing: Review and Bootstrapping

Political Science 236 Hypothesis Testing: Review and Bootstrapping Political Science 236 Hypothesis Testing: Review and Bootstrapping Rocío Titiunik Fall 2007 1 Hypothesis Testing Definition 1.1 Hypothesis. A hypothesis is a statement about a population parameter The

More information

More Empirical Process Theory

More Empirical Process Theory More Empirical Process heory 4.384 ime Series Analysis, Fall 2008 Recitation by Paul Schrimpf Supplementary to lectures given by Anna Mikusheva October 24, 2008 Recitation 8 More Empirical Process heory

More information

Statistical Methods in Particle Physics Lecture 2: Limits and Discovery

Statistical Methods in Particle Physics Lecture 2: Limits and Discovery Statistical Methods in Particle Physics Lecture 2: Limits and Discovery SUSSP65 St Andrews 16 29 August 2009 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan

More information

Chapter 4. Theory of Tests. 4.1 Introduction

Chapter 4. Theory of Tests. 4.1 Introduction Chapter 4 Theory of Tests 4.1 Introduction Parametric model: (X, B X, P θ ), P θ P = {P θ θ Θ} where Θ = H 0 +H 1 X = K +A : K: critical region = rejection region / A: acceptance region A decision rule

More information

LECTURE NOTES FYS 4550/FYS EXPERIMENTAL HIGH ENERGY PHYSICS AUTUMN 2013 PART I A. STRANDLIE GJØVIK UNIVERSITY COLLEGE AND UNIVERSITY OF OSLO

LECTURE NOTES FYS 4550/FYS EXPERIMENTAL HIGH ENERGY PHYSICS AUTUMN 2013 PART I A. STRANDLIE GJØVIK UNIVERSITY COLLEGE AND UNIVERSITY OF OSLO LECTURE NOTES FYS 4550/FYS9550 - EXPERIMENTAL HIGH ENERGY PHYSICS AUTUMN 2013 PART I PROBABILITY AND STATISTICS A. STRANDLIE GJØVIK UNIVERSITY COLLEGE AND UNIVERSITY OF OSLO Before embarking on the concept

More information

A new class of binning-free, multivariate goodness-of-fit tests: the energy tests

A new class of binning-free, multivariate goodness-of-fit tests: the energy tests A new class of binning-free, multivariate goodness-of-fit tests: the energy tests arxiv:hep-ex/0203010v5 29 Apr 2003 B. Aslan and G. Zech Universität Siegen, D-57068 Siegen February 7, 2008 Abstract We

More information

Lecture 7: Hypothesis Testing and ANOVA

Lecture 7: Hypothesis Testing and ANOVA Lecture 7: Hypothesis Testing and ANOVA Goals Overview of key elements of hypothesis testing Review of common one and two sample tests Introduction to ANOVA Hypothesis Testing The intent of hypothesis

More information

Goodness-of-Fit Pitfalls and Power Example: Testing Consistency of Two Histograms

Goodness-of-Fit Pitfalls and Power Example: Testing Consistency of Two Histograms Goodness-of-Fit Pitfalls and Power Example: Testing Consistency of Two Histograms Sometimes we have two histograms and are faced with the question: Are they consistent? That is, are our two histograms

More information

Discovery significance with statistical uncertainty in the background estimate

Discovery significance with statistical uncertainty in the background estimate Glen Cowan, Eilam Gross ATLAS Statistics Forum 8 May, 2008 Discovery significance with statistical uncertainty in the background estimate Introduction In a search for a new type of event, data samples

More information

Modified Kolmogorov-Smirnov Test of Goodness of Fit. Catalonia-BarcelonaTECH, Spain

Modified Kolmogorov-Smirnov Test of Goodness of Fit. Catalonia-BarcelonaTECH, Spain 152/304 CoDaWork 2017 Abbadia San Salvatore (IT) Modified Kolmogorov-Smirnov Test of Goodness of Fit G.S. Monti 1, G. Mateu-Figueras 2, M. I. Ortego 3, V. Pawlowsky-Glahn 2 and J. J. Egozcue 3 1 Department

More information

Modeling and Performance Analysis with Discrete-Event Simulation

Modeling and Performance Analysis with Discrete-Event Simulation Simulation Modeling and Performance Analysis with Discrete-Event Simulation Chapter 9 Input Modeling Contents Data Collection Identifying the Distribution with Data Parameter Estimation Goodness-of-Fit

More information

Physics 6720 Introduction to Statistics April 4, 2017

Physics 6720 Introduction to Statistics April 4, 2017 Physics 6720 Introduction to Statistics April 4, 2017 1 Statistics of Counting Often an experiment yields a result that can be classified according to a set of discrete events, giving rise to an integer

More information

Discovery and Significance. M. Witherell 5/10/12

Discovery and Significance. M. Witherell 5/10/12 Discovery and Significance M. Witherell 5/10/12 Discovering particles Much of what we know about physics at the most fundamental scale comes from discovering particles. We discovered these particles by

More information

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1) Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1) Detection problems can usually be casted as binary or M-ary hypothesis testing problems. Applications: This chapter: Simple hypothesis

More information

A TEST OF RANDOMNESS BASED ON THE DISTANCE BETWEEN CONSECUTIVE RANDOM NUMBER PAIRS. Matthew J. Duggan John H. Drew Lawrence M.

A TEST OF RANDOMNESS BASED ON THE DISTANCE BETWEEN CONSECUTIVE RANDOM NUMBER PAIRS. Matthew J. Duggan John H. Drew Lawrence M. Proceedings of the 2005 Winter Simulation Conference M. E. Kuhl, N. M. Steiger, F. B. Armstrong, and J. A. Joines, eds. A TEST OF RANDOMNESS BASED ON THE DISTANCE BETWEEN CONSECUTIVE RANDOM NUMBER PAIRS

More information

Physics 403. Segev BenZvi. Classical Hypothesis Testing: The Likelihood Ratio Test. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Classical Hypothesis Testing: The Likelihood Ratio Test. Department of Physics and Astronomy University of Rochester Physics 403 Classical Hypothesis Testing: The Likelihood Ratio Test Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Bayesian Hypothesis Testing Posterior Odds

More information

Preliminary Statistics Lecture 5: Hypothesis Testing (Outline)

Preliminary Statistics Lecture 5: Hypothesis Testing (Outline) 1 School of Oriental and African Studies September 2015 Department of Economics Preliminary Statistics Lecture 5: Hypothesis Testing (Outline) Gujarati D. Basic Econometrics, Appendix A.8 Barrow M. Statistics

More information

Computer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr.

Computer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr. Simulation Discrete-Event System Simulation Chapter 8 Input Modeling Purpose & Overview Input models provide the driving force for a simulation model. The quality of the output is no better than the quality

More information

Constructing Ensembles of Pseudo-Experiments

Constructing Ensembles of Pseudo-Experiments Constructing Ensembles of Pseudo-Experiments Luc Demortier The Rockefeller University, New York, NY 10021, USA The frequentist interpretation of measurement results requires the specification of an ensemble

More information

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Applied Mathematical Sciences, Vol. 4, 2010, no. 62, 3083-3093 Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Julia Bondarenko Helmut-Schmidt University Hamburg University

More information

Statistics and Data Analysis

Statistics and Data Analysis Statistics and Data Analysis The Crash Course Physics 226, Fall 2013 "There are three kinds of lies: lies, damned lies, and statistics. Mark Twain, allegedly after Benjamin Disraeli Statistics and Data

More information

SIMULATED POWER OF SOME DISCRETE GOODNESS- OF-FIT TEST STATISTICS FOR TESTING THE NULL HYPOTHESIS OF A ZIG-ZAG DISTRIBUTION

SIMULATED POWER OF SOME DISCRETE GOODNESS- OF-FIT TEST STATISTICS FOR TESTING THE NULL HYPOTHESIS OF A ZIG-ZAG DISTRIBUTION Far East Journal of Theoretical Statistics Volume 28, Number 2, 2009, Pages 57-7 This paper is available online at http://www.pphmj.com 2009 Pushpa Publishing House SIMULATED POWER OF SOME DISCRETE GOODNESS-

More information

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)? ECE 830 / CS 76 Spring 06 Instructors: R. Willett & R. Nowak Lecture 3: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we

More information

ECE 510 Lecture 6 Confidence Limits. Scott Johnson Glenn Shirley

ECE 510 Lecture 6 Confidence Limits. Scott Johnson Glenn Shirley ECE 510 Lecture 6 Confidence Limits Scott Johnson Glenn Shirley Concepts 28 Jan 2013 S.C.Johnson, C.G.Shirley 2 Statistical Inference Population True ( population ) value = parameter Sample Sample value

More information

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015 STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots March 8, 2015 The duality between CI and hypothesis testing The duality between CI and hypothesis

More information

Lectures on Statistical Data Analysis

Lectures on Statistical Data Analysis Lectures on Statistical Data Analysis London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk

More information

BAYESIAN DECISION THEORY

BAYESIAN DECISION THEORY Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will

More information

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1 Lecture 2 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

Modern Methods of Data Analysis - WS 07/08

Modern Methods of Data Analysis - WS 07/08 Modern Methods of Data Analysis Lecture V (12.11.07) Contents: Central Limit Theorem Uncertainties: concepts, propagation and properties Central Limit Theorem Consider the sum X of n independent variables,

More information

Statistics for Particle Physics. Kyle Cranmer. New York University. Kyle Cranmer (NYU) CERN Academic Training, Feb 2-5, 2009

Statistics for Particle Physics. Kyle Cranmer. New York University. Kyle Cranmer (NYU) CERN Academic Training, Feb 2-5, 2009 Statistics for Particle Physics Kyle Cranmer New York University 91 Remaining Lectures Lecture 3:! Compound hypotheses, nuisance parameters, & similar tests! The Neyman-Construction (illustrated)! Inverted

More information

Analysis of Z ee with the ATLAS-Detector

Analysis of Z ee with the ATLAS-Detector DESY Summer Student Programme 2008 Hamburg Analysis of Z ee with the ATLAS-Detector Maximilian Schlupp ATLAS Group 11.09.2008 maximilian.schlupp@desy.de maximilian.schlupp@tu-dortmund.de Supervisor: Karsten

More information

Institute for the Advancement of University Learning & Department of Statistics

Institute for the Advancement of University Learning & Department of Statistics Institute for the Advancement of University Learning & Department of Statistics Descriptive Statistics for Research (Hilary Term, 00) Lecture 7: Hypothesis Testing (I.) Introduction An important area of

More information

Lecture 21: October 19

Lecture 21: October 19 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 21: October 19 21.1 Likelihood Ratio Test (LRT) To test composite versus composite hypotheses the general method is to use

More information

Statistical inference

Statistical inference Statistical inference Contents 1. Main definitions 2. Estimation 3. Testing L. Trapani MSc Induction - Statistical inference 1 1 Introduction: definition and preliminary theory In this chapter, we shall

More information

Statistical Inference. Hypothesis Testing

Statistical Inference. Hypothesis Testing Statistical Inference Hypothesis Testing Previously, we introduced the point and interval estimation of an unknown parameter(s), say µ and σ 2. However, in practice, the problem confronting the scientist

More information

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

Modeling the Goodness-of-Fit Test Based on the Interval Estimation of the Probability Distribution Function

Modeling the Goodness-of-Fit Test Based on the Interval Estimation of the Probability Distribution Function Modeling the Goodness-of-Fit Test Based on the Interval Estimation of the Probability Distribution Function E. L. Kuleshov, K. A. Petrov, T. S. Kirillova Far Eastern Federal University, Sukhanova st.,

More information

1; (f) H 0 : = 55 db, H 1 : < 55.

1; (f) H 0 : = 55 db, H 1 : < 55. Reference: Chapter 8 of J. L. Devore s 8 th Edition By S. Maghsoodloo TESTING a STATISTICAL HYPOTHESIS A statistical hypothesis is an assumption about the frequency function(s) (i.e., pmf or pdf) of one

More information

Maximum-Likelihood fitting

Maximum-Likelihood fitting CMP 0b Lecture F. Sigworth Maximum-Likelihood fitting One of the issues I want to address in this lecture is the fitting of distributions dwell times. We want to find the best curve to draw over a histogram,

More information

Lectures 5 & 6: Hypothesis Testing

Lectures 5 & 6: Hypothesis Testing Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across

More information

The Goodness-of-fit Test for Gumbel Distribution: A Comparative Study

The Goodness-of-fit Test for Gumbel Distribution: A Comparative Study MATEMATIKA, 2012, Volume 28, Number 1, 35 48 c Department of Mathematics, UTM. The Goodness-of-fit Test for Gumbel Distribution: A Comparative Study 1 Nahdiya Zainal Abidin, 2 Mohd Bakri Adam and 3 Habshah

More information

INTERVAL ESTIMATION AND HYPOTHESES TESTING

INTERVAL ESTIMATION AND HYPOTHESES TESTING INTERVAL ESTIMATION AND HYPOTHESES TESTING 1. IDEA An interval rather than a point estimate is often of interest. Confidence intervals are thus important in empirical work. To construct interval estimates,

More information

Negative binomial distribution and multiplicities in p p( p) collisions

Negative binomial distribution and multiplicities in p p( p) collisions Negative binomial distribution and multiplicities in p p( p) collisions Institute of Theoretical Physics University of Wroc law Zakopane June 12, 2011 Summary s are performed for the hypothesis that charged-particle

More information

14.3 Are Two Distributions Different?

14.3 Are Two Distributions Different? 614 Chapter 14. Statistical Description of Data 14.3 Are Two Distributions Different? Given two sets of data, we can generalize the questions asked in the previous section and ask the single question:

More information

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Probability quantifies randomness and uncertainty How do I estimate the normalization and logarithmic slope of a X ray continuum, assuming

More information

Statistics Challenges in High Energy Physics Search Experiments

Statistics Challenges in High Energy Physics Search Experiments Statistics Challenges in High Energy Physics Search Experiments The Weizmann Institute of Science, Rehovot, Israel E-mail: eilam.gross@weizmann.ac.il Ofer Vitells The Weizmann Institute of Science, Rehovot,

More information

STATISTICS OF OBSERVATIONS & SAMPLING THEORY. Parent Distributions

STATISTICS OF OBSERVATIONS & SAMPLING THEORY. Parent Distributions ASTR 511/O Connell Lec 6 1 STATISTICS OF OBSERVATIONS & SAMPLING THEORY References: Bevington Data Reduction & Error Analysis for the Physical Sciences LLM: Appendix B Warning: the introductory literature

More information

Review. December 4 th, Review

Review. December 4 th, Review December 4 th, 2017 Att. Final exam: Course evaluation Friday, 12/14/2018, 10:30am 12:30pm Gore Hall 115 Overview Week 2 Week 4 Week 7 Week 10 Week 12 Chapter 6: Statistics and Sampling Distributions Chapter

More information