Goodness-of-Fit Considerations and Comparisons Example: Testing Consistency of Two Histograms

Size: px
Start display at page:

Download "Goodness-of-Fit Considerations and Comparisons Example: Testing Consistency of Two Histograms"

Transcription

1 Goodness-of-Fit Considerations and Comparisons Example: Testing Consistency of Two Histograms Sometimes we have two histograms and are faced with the question: Are they consistent? That is, are our two histograms consistent with having been sampled from the same parent distribution? Each histogram represents a sampling from a multivariate Poisson distribution. uorv bin index 1 Frank Porter, Caltech Seminar, February 26, 2008

2 Testing Consistency of Two Histograms There are two variants of interest to this question: 1. We wish to test the hypothesis (absolute equality): H 0 : The means of the two histograms are bin-by-bin equal, against H 1 : The means of the two histograms are not bin-by-bin equal. 2. We wish to test the hypothesis (shape equality): H 0 : The densities of the two histograms are bin-by-bin equal, against H 1 : The densities of the two histograms are not bin-by-bin equal. 2 Frank Porter, Caltech Seminar, February 26, 2008

3 Testing Consistency of Two Histograms Some Context There are many tests to address whether a dataset is consistent with having been drawn from some continuous distribution. Histograms are built from such data. So why not work directly with the data? Good question! Maybe original data not available. May be able to use features of binned data, eg, normal approximation χ 2. These tests may often be adapted to address whether two datasets have been drawn from the same continuous distribution, often called two-sample tests. These tests may then be further adapted to the present problem, of determining whether two histograms have the same shape. Also discussed as comparing whether two (or more) rows of a table are consistent. 3 Frank Porter, Caltech Seminar, February 26, 2008

4 Notation, Conventions We assume that we have formed our two histograms with the same number of bins, k, with identical bin boundaries. The bin contents of the first histogram are given by realization u of random variable U, and of the second by realization v of random variable V. Thus, the sampling distributions are: P (U = u) = P (V = v) = k i=1 k i=1 μ u i i u i! e μ i, ν v i i v i! e ν i, where the vectors μ and ν are the mean bin contents of the respective histograms. 4 Frank Porter, Caltech Seminar, February 26, 2008

5 Notation, Conventions (continued) We define: N u N v k U i, i=1 k V i, i=1 Total contents of first histogram, Total contents of second histogram, μ T N u = ν T N v = t i u i + v i, k μ i i=1 k ν i i=1 i =1,...,k Will drop distinction between random variable and realization. 5 Frank Porter, Caltech Seminar, February 26, 2008

6 Power, Confidence Level, P -value Interested in the power of a test, for given confidence level. The power is the probability that the null hypothesis is rejected when it is false. Power depends on the true sampling distribution. Power 1 probability of a Type II error. The confidence level is the probability that the null hypothesis is accepted, if the null hypothesis is correct. Confidence level 1 probability of a Type I error. In physics, we don t specify the confidence level of a test in advance, at least not formally. Instead, we quote the P -value for our result. This is the probability, under the null hypothesis, of obtaining a result as bad or worse than our observed value. This would be the probability of a Type I error if our observation were used to define the critical region of the test. 6 Frank Porter, Caltech Seminar, February 26, 2008

7 Large Statistics Case If all of the bin contents of both histograms are large, we use the approximation that the bin contents are normally distributed. Under H 0, u i = v i μ i,i=1,...,k. More properly, it is U i = μ i, etc., but we are permitting u i to stand for the random variable as well as its realization, as noted above. Let the difference for the contents of bin i between the two histograms be: Δ i u i v i, and let the standard deviation for Δ i be denoted σ i. Then the sampling distribution of the difference between the two histograms is: 1 k P (Δ) = 1 (2π) k/2 exp 1 k Δ 2 i. 2 i=1 σ i i=1 σ 2 i 7 Frank Porter, Caltech Seminar, February 26, 2008

8 Large Statistics Case Test Statistic This suggests the test statistic: k T = i=1 If the σ i were known, this would simply be distributed according to the chi-square distribution with k degrees of freedom. The maximum-likelihood estimator for the mean of a Poisson is just the samplednumber. ThemeanofthePoissonisalsoitsvariance,andwe will use the sampled number also as the estimate of the variance in the normal approximation. We ll refer to this approach as a χ 2 test. Δ 2 i σi 2. 8 Frank Porter, Caltech Seminar, February 26, 2008

9 Large Statistics Algorithm absolute We suggest the following algorithm for this test: 1. For σ 2 i form the estimate ˆσ 2 i =(u i + v i ). 2. Statistic T is thus evaluated according to: T = k i=1 (u i v i ) 2 u i + v i. If u i = v i =0for bin i, the contribution to the sum from that bin is zero. 3. Estimate the P -value according to a chi-square with k degrees of freedom. Note that this is not an exact result. 9 Frank Porter, Caltech Seminar, February 26, 2008

10 Large Statistics Algorithm shapes If only comparing shapes, then scale both histogram bin contents: 1. Let N =0.5(N u + N v ). Scale u and v according to: u i u i = u i(n/n u ) v i v i = v i(n/n v ). 2. Estimate σ 2 i with: ˆσ 2 i = ( N N u ) u i + ( N N v ) v i. 3. Statistic T is thus evaluated according to: k T = i=1 ( ui N u v ) 2 i N v u i Nu 2 + v i Nv Estimate the P -valueaccordingtoachi-squarewithk 1 degrees of freedom. Note that this is not an exact result. 10 Frank Porter, Caltech Seminar, February 26, 2008

11 Application to Example With some small bin counts, might not expect this method 30 to be espe- bin index cially good for our example, but can try it anyway: Type of test T NDOF P (χ 2 >T) P -value χ 2 absolute comparison χ 2 shape comparison Likelihood Ratio shape comparison Kolmogorov-Smirnov shape comparison NA 0.22 Bhattacharyya shape comparison NA 0.32 Cramér-Von-Mises shape comparison NA 0.29 Anderson-Darling shape comparison NA 0.42 Likelihood value shape comparison NA 0.29 Column P -value attempts a more reliable estimate of the probability, under the null hypothesis, that a value for T will be as large as observed. Compare with column P (χ 2 >T) column, the probability assuming T follows a χ 2 distribution with NDOF degrees of freedom. 11 Frank Porter, Caltech Seminar, February 26, 2008 uorv

12 Application to Example (continued) uorv bin index The absolute comparison yields much poorer agreement than the shape comparison. This is readily understood: The total number of counts in one dataset is 783; in the other it is 633. Treating these as samplings from a normal distribution with variances 783 and 633, we find a difference of 3.99 standard deviations or a twotailed P -value of This low probability is diluted by the binby-bin test, but is still reflected in the poor agreement. [Auxillary lesson: The more you know about what you want to test, the better (more powerful) the test you can make. ] 12 Frank Porter, Caltech Seminar, February 26, 2008

13 An Issue: We don t know H 0! Evaluation of the probability under the null hypothesis is in fact problematic, since the null hypothesis actually isn t completely specified. The problem is the dependence of Poisson probabilities on the absolute numbers of counts. Probabilities for differences in Poisson counts are not invariant under the total number of counts. Unfortunately, we don t know the true mean numbers of counts in each bin. Thus, we must estimate these means. The procedure adopted here has been to use the maximum likelihood estimators (see later) for the mean numbers, in the null hypothesis. We ll have further discussion of this procedure below It does not always yield valid results. 13 Frank Porter, Caltech Seminar, February 26, 2008

14 Use of chi-square Probability Distribution In our example, the probabilities estimated according to our simulation and the χ 2 distribution probabilities are fairly close to each other. Suggests the possibility of using the χ 2 probabilities if we can do this, the problem that we haven t completely specified the null hypothesis is avoided. Conjecture: Let T be the χ 2 test statistic, for either the absolute or the shape comparison, as desired. Let T c be a possible value of T (perhaps the critical value to be used in a hypothesis test). Then, for large values of T c, under H 0 (or H 0 ): ( ) P (T <T c ) P T<T c χ 2 (T,ndof), where P ( T<T c χ 2 (T,ndof) ) is the probability that T < T c according to a χ 2 distribution with ndof degrees of freedom (either k or k 1, depending on which test is being performed). 14 Frank Porter, Caltech Seminar, February 26, 2008

15 Is it true? As we ll see, not really worth proving, but here is the idea: Consider the case of one bin. Suppose that ν = μ and that μ is small ( 1, say). Since we are interested in large values of the statistic, we are interested in the situation where one of u, v is large, and the other small (since μ is small). Suppose it is u that is large. Then (u v)2 T = u. u + v For given v (0, say), the probability of T is thus P (T ) μt T! e μ. 15 Frank Porter, Caltech Seminar, February 26, 2008

16 Is it true? (continued) This may be compared with the chi-square probability distribution for one degree of freedom: P (T = χ 2 )= 1 2π e T/2 T. The ratio is, dropping the constants: P (T ) P (T = χ 2 ) μt e T/2 T T! = [ ( exp T 12 +lnμ)], T Γ(T ) which approaches zero for large T, for any given μ. We conclude that the conjecture is valid in the case of one bin, and strongly suspect that the argument generalizes to multiple bins. 16 Frank Porter, Caltech Seminar, February 26, 2008

17 Is it Useful? If true, the conjecture tells us that if we use the probabilities from a χ 2 distribution in our test, the error that we make is in the conservative direction. That is, we ll reject the null hypothesis less often than we would with the correct probability. This conjecture is independent of the statistics of the sample, bins with zero counts are fine. In the limit of large statistics, the inequality approaches equality. Lest we conclude that it is acceptable use this great simplification in all situations, we hasten to point out that it isn t as nice as it sounds. The problem is that, in low statistics situations, the power of the test according to this approach can be dismal. We might not reject the null hypothesis in situations where it is obviously implausible. 17 Frank Porter, Caltech Seminar, February 26, 2008

18 Is it Useful? (continued) Probability(T<Tc) (a) (b) (c) Tc Tc Tc Comparison of actual (cumulative) probability distribution for T (solid blue curves) withchi-square distribution (dashed red curves). All plots are for 100 bin histograms, and null hypothesis: (a) Each bin has mean 100. (b)eachbinhasmean1. (c) Bin j has mean 30/j. 18 Frank Porter, Caltech Seminar, February 26, 2008

19 General (Including Small Statistics) Case If the bin contents are not large, then the normal approximation may not be good enough and the χ 2 statistic may not follow a χ 2 distribution. A simple approach is to combine bins until the normal approximation is valid. In many cases this doesn t lose too much statistical power. A common rule-of-thumb is to combine until each bin has at least 7 counts. Try this on our example, as a function of the minimum number of events per bin. The algorithm is to combine corresponding bins in both histograms until both have at least minbin counts in each bin. 19 Frank Porter, Caltech Seminar, February 26, 2008

20 Combining bins for χ 2 test uorv bin index T or k minbin Probability minbin Left: The example pair of histograms. Middle: ThesolidcurveshowsthevalueoftheteststatisticT,andthe dashed curve shows the number of histogram bins for the data in the example, as a function of the minimum number of counts per bin. Right: The P -value for consistency of the two datasets in the example. The solid curve shows the probability for a chi-square distribution, and the dashed curved shows the probability for the actual distribution, with an estimated null hypothesis. 20 Frank Porter, Caltech Seminar, February 26, 2008

21 Working with the Poissons - Normalization Test Alternative: Work with the Poisson distribution. Separate the problem of the shape from that of the overall normalization. To test normalization, compare totals over all bins between the histograms. Distribution is P (Nu,N v )= μn u T νn v T N u!n v! e (μ T +ν T ). The null hypothesis is H 0 : μ T = ν T, to be tested against alternative H 1 : μ T ν T. We are interested in the difference between the two means; the sum is a nuisance parameter. Hence, consider: P (N v N u + N v = N) = P (N N v)p (N v ) P (N) = μn N v T e μ T ν N v T e ν T /(μ T + ν T ) N e (μ T +ν T ) (N N v )! N v! N! Note! Example of use of ( )( ) Nv ( ) N Nv = NNv νt μt conditional likelihood.. μ T + ν T μ T + ν T 21 Frank Porter, Caltech Seminar, February 26, 2008

22 Normalization test (general case) This probability permits us to construct a uniformly most powerful test of our hypothesis.* It is a binomial distribution, for given N. The uniformly most powerful property holds independently of N, although the probabilities cannot be computed without N. The null hypothesis corresponds to μ T = ν T,thatis: P (N v N u + N v = N) = ( NNv )( ) 1 N. 2 For our example, with N = 1416 and N v = 633, thep -value is , for a two-tailed probability. Compare with our earlier estimate of in the normal approximation. For our binomial test we have conservatively included the endpoints (633 and 783). Mimicing more closely the normal estimate by including one-half the probability at the endpoints, we obtain , very close to the normal number. * E. L. Lehmann and Joseph P. Romano, Testing Statistical Hypotheses, 3rd ed., Springer, NY (2005), Thm 4.4.1). 22 Frank Porter, Caltech Seminar, February 26, 2008

23 Testing the Shape Catalog of Tests Many possible tests. We ll consider 7: Chi-square test (χ 2 ) Bhattacharyya distance measure (BDM) Kolmogorov-Smirnov test (KS) Cramér-von-Mises test (CVM) Anderson-Darling test (AD) Likelihood ratio test (ln λ) Likelihood value test (ln L) 23 Frank Porter, Caltech Seminar, February 26, 2008

24 Chi-square test for shape Even though we don t expect it to follow a χ 2 distribution, we may evaluate the test statistic: χ 2 = k i=1 ( ui N v ) 2 i u N v u i Nu 2 + v i Nv 2 If u i = v i = 0, the contribution to the sum from that bin is zero.. 24 Frank Porter, Caltech Seminar, February 26, 2008

25 Geometric (BDM) test for shape Geometric motivation: Let the bin contents of a histogram define a vector in a k-dimensional space. If two vectors are drawn from the same distribution (null hypothesis), they will tend to point in the same direction (not interested in the lengths of the vectors here). If we represent each histogram as a unit vector with components: {u 1 /N u,...,u k /N u }, and {v 1 /N v,...,v k /N v }, we may form the dot product test statistic: u v k T BDM = = u i v i N u N v N u N v This is known as the Bhattacharyya distance measure (BDM). This statistic is related to the χ 2 statistic the N u u v Nv dot product is close to the cross term in the χ 2 expression. i=1 1/2. 25 Frank Porter, Caltech Seminar, February 26, 2008

26 Sample application of BDM test Apply this formalism to our example. The sum over bins gives According to our estimated distribution of this statistic under the null hypothesis, this gives a P -value of 0.32, similar to the χ 2 test result (0.28). BDM per bin Frequency Bin index BDM statistic Bin-by-bin contributions to the BDM test statistic for the example. Estimated distribution of the BDM statistic for the null hypothesis in the example. 26 Frank Porter, Caltech Seminar, February 26, 2008

27 Kolmogorov-Smirnov test Another approach to a shape test may be based on the Kolmogorov- Smirnov (KS) idea: Estimate the maximum difference between observed and predicted cumulative distribution functions and compare with expectations. Modify the KS statistic to apply to comparison of histograms as follows. Assume neither histogram is empty. Form the cumulative distribution histograms according to: u ci = i u j /N u v ci = j=1 i v j /N v. j=1 Then compute the test statistic: T KS =max i u ci v ci. (We consider only the two-tail test here.) 27 Frank Porter, Caltech Seminar, February 26, 2008

28 Sample application of KS test Apply this formalism to our example. The maximum over bins is Estimating the distribution of this statistic under H 0 gives a P - value of 0.22, somewhat smaller than for the χ 2 test result, but indicating consistency of the histograms. The smaller KS P -value is presumably a reflection of the emphasis this test gives to the region near the peak of the distribution (i.e., the region where the largest fluctuations occur in Poisson statistics), where we indeed have a largish fluctuation. KS distance Frequency Bin index KS statistic Bin-by-bin distances for the KS test statistic for the example. Estimated PDF of the KS distance under H 0 in the example. 28 Frank Porter, Caltech Seminar, February 26, 2008

29 Cramér-von-Mises test The idea of the Cramér-von-Mises (CVM) test is to add up the squared differences between the cumulative distributions being compared. Used to compare an observed distribution with a presumed parent continuous probability distribution. Algorithm is adaptable to the two-sample comparison, and to the case of comparing two histograms. The test statistic for comparing the two samples x 1,x 2,...,x N and y 1,y 2,...,y M is*: NM N [ T = Ex (N + M) 2 (x i ) E y (x i ) ] 2 M [ + Ex (y j ) E y (y j ) ] 2, i=1 where E x is the empirical cumulative distribution for sampling x. That is, E x (x) =n/n if n of the sampled x i are less than or equal to x. j=1 * T. W. Anderson, On the Distribution of the Two-Sample Cramér-Von Mises Criterion, Ann. Math. Stat. 33 (1962) Frank Porter, Caltech Seminar, February 26, 2008

30 Cramér-von-Mises test Adaptation to comparing histograms Adapt this for the present application of comparing histograms with bin contents u 1,u 2,...,u k and v 1,v 2,...,v k with identical bin boundaries: Let z be a point in bin i, and define the empirical cumulative distribution function for histogram u as: E u (z) = i u i /N u. j=1 Then the test statistic is: T CVM = N u N v (N u + N v ) 2 k (u j + v j ) [ E u (z j ) E v (z j ) ] 2. j=1 30 Frank Porter, Caltech Seminar, February 26, 2008

31 Sample application of CVM test Apply this formalism to our example, finding T CVM = Theresulting estimated distribution under the null hypothesis is shown below. According to our estimated distribution of this statistic under the null hypothesis, this gives a P -value of 0.29, similar with the χ 2 test result. uorv bin index Frequency CVM statistic Example. Estimated PDF of the CVM statistic under H 0 for the example. 31 Frank Porter, Caltech Seminar, February 26, 2008

32 Anderson-Darling (AD) test for shape The Anderson-Darling (AD) test is another non-parametric comparison of cumulative distributions. It is similar to the Cramér-von-Mises statistic, but is designed to be sensitive to the tails of the CDF. The original statistic was designed to compare a dataset drawn from a continuous distribution, with CDF F 0 (x) under the null hypothesis: A 2 m = m [F m (x) F 0 (x)] 2 F 0 (x)[1 F 0 (x)] df 0(x), where F m (x) is the empirical CDF of dataset x 1,...x m. 32 Frank Porter, Caltech Seminar, February 26, 2008

33 Anderson-Darling (AD) test, adaptation to comparing histograms Scholz and Stephens* provide a form of this statistic for a k-sample test on grouped data (e.g., as might be used to compare k histograms). The expression of interest for two histograms is: T AD = 1 N u + N v k max 1 j=k min t j Σ j ( Nu + N v Σ j ) {[ (N u + N v )Σ uj N u Σ j ] 2 /Nu + [ (N u + N v )Σ vj N v Σ j ] 2 /Nv }, where k min is the first bin where either histogram has non-zero counts, k max is the number of bins counting up the the last bin where either histogram has non-zero counts, and Σ uj j u i, i=1 Σ vj j v i, and Σ j i=1 j t i =Σ uj +Σ vj. i=1 * F. W. Scholz and M. A. Stephens, k-sample Anderson-Darling Tests, J.Amer.Stat.Assoc. 82 (1987) Frank Porter, Caltech Seminar, February 26, 2008

34 Sample application of AD test for shape We apply this formalism to our example. The sum over bins gives According to our estimated distribution of this statistic under the null hypothesis, this gives a P -value of 0.42, somewhat larger than the χ 2 test result. The fact that it is larger is presumably due to the greater focus of this test on the tails, with less emphasis on the peak region (where the large fluctuation is seen in the example). uorv bin index Example. Frequency Anderson Darling statistic Estimated distribution of the AD test statistic for the null hypothesis for the example. 34 Frank Porter, Caltech Seminar, February 26, 2008

35 Likelihood ratio test for shape Base a shape test on the same conditional likelihood idea as for the normalization test. Now there is a binomial associated with each bin. Start with the null hypothesis, that the two histograms are sampled from the joint distribution: P (u, v) = k i=1 μ u i i u i! e μ i νvi i v i! e ν i, where ν i = aμ i for i =1, 2,...,k. That is, the shapes of the two histograms are the same, although the total contents may differ. With t i = u i + v i, and fixing the t i at the observed values*, we have the multi-binomial form: k ( )( ) vi ( ) ti v ti νi μi i P (v u + v = t) = v. i ν i + μ i ν i + μ i i=1 35 Frank Porter, Caltech Seminar, February 26, 2008

36 Likelihood ratio test for shape (continued) The null hypothesis is ν i = aμ i, i =1,...,k. Wewanttotestthis, but there are two complications: 1. The value of a is not specified; 2. We still have a multivariate distribution. For a, we will substitute an estimate from the data, namely the maximum likelihood estimator: â = N v N u. We use a likelihood ratio statistic to reduce the problem to a single variable. This will be the likelihood under the null hypothesis (with a given by its maximum likelihood estimator), divided by the maximum of the likelihood under the alternative hypothesis. 36 Frank Porter, Caltech Seminar, February 26, 2008

37 Likelihood ratio test for shape (continued) Thus, we form the ratio: λ = max H0 L(a v; u + v = t) k max H1 L({a i ν i /μ i } v; u + v = t) = i=1 ( â 1+â ) vi ( ) ti v 1 i 1+â ( âi 1+â i ) vi ( 1 1+â i ) ti v i. The maximum likelihood estimator, under H 1,fora i is just â i = v i /u i. Thus, we rewrite our test statistic as: λ = k ( 1+vi /u i i=1 1+N v /N u ) ti ( ) Nv u vi i. N u v i In practice, we ll work with 2lnλ = 2 k i=1 [ ( ) 1+vi /u t i ln i 1+N v /N u + v i ln If u i = v i =0, the bin contributes zero. ( ) If v i =0, contributionis 2lnλ i = 2t i ln Nu N u +N. v ( ) If u =0, the contribution is 2lnλ = 2t ln Nv. 37 Frank Porter, Caltech Seminar, February 26, 2008 ( )] Nv u i. N u v i

38 Sample application of ln λ test Apply this test to the example, obtaining 2lnλ = k i=1 2lnλ i = Asymptotically, 2lnλ should be distributed as a χ 2 with N DOF = k 1, or N DOF =35. If valid, this gives a P -value of 0.31, tobecompared with a probability of 0.35 according to the estimated actual distribution. We obtain nearly the same answer as the naive application of the chisquare calculation with no bins combined, a result of nearly bin-by-bin equality of the two statistics. uorv or χ bin index 38 Frank Porter, Caltech Seminar, February 26, 2008 Value of 2lnλ i (circles) orχ 2 i

39 Comparison of ln λ and χ 2 To investigate when this might hold more generally, we compare the values of 2lnλ i and χ 2 i as a function of u i and v i,below. Thetwo statistics agree when u i = v i with increasing difference away from that point. This agreement holds even for low statistics. However, shouldn t conclude that the chi-square approximation may be used for low statistics fluctuations away from equal numbers lead to quite different results when we get into the tails at low statistics. Our example doesn t really sample these tails the only bin with a large difference is already high statistics Value of 2lnλ i or χ 2 i as a function of u i and v i bin contents. 2 This plot assumes N u = N v. 0 T v chisq, u=0-2lnl, u=0 chisq, u=5-2lnl, u=5 chisq, u=10-2lnl, u=10 39 Frank Porter, Caltech Seminar, February 26, 2008

40 Likelihood value test An often-used but controversial goodness-of-fit statistic is the value of the likelihood at its maximum value under the null hypothesis. It can be demonstrated that this statistic carries little or no information in some situations. However, in the limit of large statistics it is essentially the chi-square statistic, so there are known situations were it is a plausible statistic to use. We look at it here. Using the results in the previous section, the test statistic is: ln L = k i=1 [ ln ( ti v i ) + t i ln If either N u =0or N v =0,thenln L =0. N u + v N u + N i ln N ] v. v N u 40 Frank Porter, Caltech Seminar, February 26, 2008

41 Sample application of the ln L test We apply this test to the example. The sum over bins gives 90. According to our estimated distribution of this statistic under the null hypothesis, this gives a P -value of 0.29, similar to the χ 2 test result. The fact that it is similar may be expected from the fact that our example is reasonably well-approximated by the large statistics limit. Frequency ln Likelihood Estimated distribution of the ln L test statistic for the null hypothesis in the opening example. 41 Frank Porter, Caltech Seminar, February 26, 2008

42 We have not been exhaustive... There are many other possible tests that could be considered, for example, schemes that partition the χ 2 to select sensitivity to different characteristics D. J. Best, Nonparametric Comparison of Two Histograms, Biometrics 50 (1994) 538.]. 42 Frank Porter, Caltech Seminar, February 26, 2008

43 Distributions Under the Null Hypothesis Our example may be too easy... When the asymptotic distribution may not be good enough, we would like to know the probability distribution of our test statistic under the null hypothesis. Difficulty: our null hypothesis is not completely specified! The problem is that the distribution depends on the values of ν i = aμ i. Our null hypothesis only says ν i = aμ i, but says nothing about what μ i might be. [It also doesn t specify a, but we have already discussed that complication, which appears manageable (although in extreme situations one might need to check for dependence on a).] 43 Frank Porter, Caltech Seminar, February 26, 2008

44 Estimating the null hypothesis We turn again to the data to make an estimate for μ i,tobeused in estimating the distribution of our test statistics. A straightforward approach is to use the M.L. parameter estimators (under H 0 ): ˆμ i = 1 1+â (u i + v i ), ˆν i = â 1+â (u i + v i ). where â = N v /N u The data is repeatedly simulated using these values for the parameters of the sampling distribution. For each simulation, a value of the test statistic is obtained. The distribution so obtained is then an estimate of the distribution of the test statistic under H 0,andP-values may be computed from this. Variations in the estimates for ˆμ i and â may be used to check robustness of the probability estimates obtained in this way. 44 Frank Porter, Caltech Seminar, February 26, 2008

45 Trouble in River City... We have just described the approach that was used to compute the estimated probabilities for the opening example. The bin contents are reasonably large, and this approach works well enough for this case. Unfortunately, this approach does very poorly in the low-statistics realm. We consider a simple test case: Suppose our data is sampled from a flat distribution with a mean of 1 count in each of 100 bins. 45 Frank Porter, Caltech Seminar, February 26, 2008

46 Algorithm to check estimated null hypothesis We test how well our estimated null hypothesis works for any given test statistic, T, as follows: 1. Generate a pair of histograms according to the distribution above. 2. Compute T for this pair of histograms. 3. Given the pair of histograms, compute the estimated null hypothesis according to the spcified prescription above. 4. Generate many pairs of histograms according to the estimated null hypothesis in order to obtain an estimated distribution for T. 5. Using the estimated distribution for T, determine the estimated P - value for the value of T found in step Repeat steps 2-6 many times and make a histogram of the estimated P -values. This histogram should be uniform if the estimated P -values are good estimates. 46 Frank Porter, Caltech Seminar, February 26, 2008

47 Checking the estimated null distribution The next two slides show tests of the estimated null distribution for each the 7 test statistics. Shown are distributions of the simulated P -values. The data are generated according to H 0, consisting of 100 bin histograms for: A mean of 100 counts/bin (left column), or A mean of 1 count/bin (other 3 columns). The estimates of H 0 are: Weighted bin-by-bin average (left two columns), Each bin mean given by the average bin contents in each histogram (third column), Estimated with a Gaussian kernel estimator (right column) based on the contents of both histograms. The χ 2 is computed without combining bins. 47 Frank Porter, Caltech Seminar, February 26, 2008

48 F requency <counts/bin> = 100 <counts/bin> = 1 H0 is weighted bin-by-bin average H0 from overall average H0 from Gaussian kernel estimator χ F requency BDM F requency F requency Estimated probability KS CVM Estimated probability Estimated probability Estimated probability 48 Frank Porter, Caltech Seminar, February 26, 2008

49 F requency <counts/bin> = 100 <counts/bin> = 1 H0 is weighted bin-by-bin average H0 from overall average H0 from Gaussian kernel estimator AD F requency ln λ F requency lnl Estimated probability Estimated probability Estimated probability Estimated probability 49 Frank Porter, Caltech Seminar, February 26, 2008

50 Summary of tests of null distribution estimates Probability that the null hypothesis will be rejected with a cut at 1% on the estimated distribution. H 0 is estimated with the bin-by-bin algorithm in the first two columns, by the uniform histogram algorithm in the third column, and by Gaussian kernel estimation in the fourth column. Test statistic Probability (%) Probability (%) Probability (%) Probability (%) Binmean= H 0 estimate bin-by-bin bin-by-bin uniform kernel χ ± ± ± ± 0.28 BDM 0.91 ± ± ± ± 0.22 KS 1.12 ± ± ± ± 0.27 CVM 1.09 ± ± ± ± 0.28 AD 1.15 ± ± ± ± 0.29 lnλ 0.97 ± ± ± ± 0.34 lnl 0.97 ± ± ± ± Frank Porter, Caltech Seminar, February 26, 2008

51 Conclusions from tests of null distribution estimates In the small statistics case, if the null hypothesis were to be rejected at the estimated 0.01 probability, the bin-by-bin algortihm for estimating a i would actually reject H 0 : 19% ofthetimefortheχ 2 statistic, 16% of the time for the BDM statistic, 24% ofthetimefor the ln λ statistic, and 29% of the time for the L statistics, all unacceptably larger than the desired 1%. TheKS,CVM,andADstatistics are all consistent with the desired 1%. In the large statistics case, where sampling is from histograms with a mean of 100 counts in each bin, all test statistics display the desired flat distribution. The χ 2,lnλ, andlnl statistics perform essentially identically at high statistics, as expected, since in the normal approximation they are equivalent. 51 Frank Porter, Caltech Seminar, February 26, 2008

52 Problem appears for low statistics The issue for the bin-by-bin approach appears at low statistics. Intuition: Consider the likely scenario at low statistics that some bins will have zero counts in both histograms. In this case our algorithm for the estimated null hypothesis yields a zero mean for these bins. The simulation used to determine the probability distribution for the test statistic will always have zero counts in these bins, that is, there will always be agreement between the two histograms. Thus, the simulation will find that low values of the test statistic are more probable than it should. The same study with a mean of 100 counts per bin we find that the probability estimates are valid, at least this far into the tails. The AD, CVM, and KS tests are more robust under our estimates of H 0 than the others, as they tend to emphasize the largest differences and are not so sensitive to bins that always agree. For these statistics, our bin-by-bin procedure for estimating H 0 does well even for low statistics, although we caution again that we are not examining the far tails of the distribution. 52 Frank Porter, Caltech Seminar, February 26, 2008

53 Obtaining better null distribution estimates A simple approach to salvaging the situation in the low statistics regime involves relying on the often valid assumption that the underlying H 0 distribution is smooth. Then only need to estimate a few parameters to describe the smooth distribution, and effectively more statistics are available. Assuming a smooth background represented by a uniform distribution yields correct results. This is cheating a bit, since we perhaps aren t supposed to know that this is really what we are sampling from. It may be remarked that the ln L and, perhaps, to a much lesser extent the BDM statistic, do not give the desired 1% result, but now err on the conservative side. It may be possible to mitigate this with a different algorithm. We may expect the power of these statistics to suffer under the approach taken here. 53 Frank Porter, Caltech Seminar, February 26, 2008

54 More honest Try a kernel estimator Since we aren t supposed to know that our null distribution is uniform, we also try a kernel estimator for the null distribution, using the sum of the observed histograms as input. We try a Gaussian kernel, with a standard deviation of 2. In general, this works pretty well, though there is room for improvement. The bandwidth was chosen here to be rather small for technical reasons; a larger bandwidth would presumably improve the results. Density bin 54 Frank Porter, Caltech Seminar, February 26, 2008 Sample Gaussian kernel density estimate of the null hypothesis (for

55 Comparison of Power of Tests The power depends on what the alternative hypothesis is. We mostly investigate adding a Gaussian component on top of a uniform background distribution. Motivated by the scenario where one distribution appears to show some peaking structure, while the other does not. We also look at a different extreme, a rapidly varying alternative. 55 Frank Porter, Caltech Seminar, February 26, 2008

56 Gaussian alternaitve The Gaussian alternative data are generated as follows: The background (null distribution) has a mean of one event per histogram bin. The Gaussian has a mean of 50 and a standard deviation of 5, in units of bin number. We vary the amplitude of the Gaussian and count how often the null hypothesis is rejected at the 1% confidence level. The amplitude is measured in percent, for example a 25% Gaussian has a total amplitude corresponding to an average of 25% of the total counts in the histogram, including the (small) tails extending beyond the histogram boundaries. The Gaussian counts are added to the counts from the null distribution. 56 Frank Porter, Caltech Seminar, February 26, 2008

57 Mean counts/bin Bin number Counts/bin Bin number Left: The mean bin contents for a 25% Gaussian on a flat background of one count/bin (note the suppressed zero). Right: Example sampling from the 25% Gaussian (filled blue dots) and from the uniform background (open red squares). The distribution of P -values) is shown for seven different test statistics. Three different magnitudes of the Gaussian amplitude are displayed. The power of the tests to reject the null hypothesis at the 99% confidence level is summarized in Table and in Fig for several different alternative hypothesis amplitudes % Frank Porter, 25% Caltech Seminar, February 50% 26, χ

58 Distribution of estimated probability, under H0, that the test statistic is worse than that observed, for seven different test statistics. The data are generated according to a uniform distribution, consisting of 100 bin histograms with a mean of 1 count, for one histogram, and for the other histogram with a uniform distribution plus a Gaussian of strength 12.5% (left column), 25% (middle column), and 50% (right column). The χ 2 is computed without combining bins. Power (%) Gaussian Amplitude (%) Summary of power of seven different test statistics, for the alternative hypothesis with a Gaussian bump. Left: linear vertical scale; Right: logarithmic vertical scale. 58 Frank Porter, Caltech Seminar, February 26, 2008

59 In Table we take a look at the performance of our seven statistics for histograms with large bin contents. It is interesting that in this largestatistics case, for the χ 2 and similar tests, the power to reject a dip is greater than the power to reject a bump of the same area. This is presumably because the error estimates for the χ 2 are based on the square root of the observed counts, and hence give smaller errors for smaller bin contents. We also observe that the comparative strength of the KS, CVM, and AD tests versus the χ 2,BDM,lnλ, andlnl tests in the small statistics situation is largely reversed in the large statistics case. Estimates of power for seven different test statistics, as a function of H 1. The comparison histogram (H 0 ) is generated with all k = 100 bins Poisson of mean 100. The selection is at the 99% confidence level. 59 Frank Porter, Caltech Seminar, February 26, 2008

60 H0 5-5 Statistic % % % χ ± ± ± 0.7 BDM 0.97 ± ± ± 0.7 KS 1.03 ± ± ± 1.0 CVM 0.91 ± ± ± 1.2 AD 0.91 ± ± ± 1.2 ln λ 0.91 ± ± ± 0.7 ln L 0.97 ± ± ± Frank Porter, Caltech Seminar, February 26, 2008

61 Power study for a sawtooth alternative To get an idea of what happens for a radically different alternative to the null distribution, we consider sensitivity to sampling from the sawtooth distribution as shown in figure. This is to be compared once again to samplings from the uniform histogram. The results are tabulated. The percentage sawtooth here refers to the fraction of the null hypothesis mean. That is, a 100% sawtooth on a 1 count/bin background oscillates between a mean of 0 counts/bin and 2 counts/bin. The period of the sawtooth is always two bins. H0 or H1 mean Bin number Counts/bin Bin number 61 Frank Porter, Caltech Seminar, February 26, 2008

62 Left: The mean bin contents for a 50% sawtooth on a flat background of one count/bin (blue), compared with the flat background means (red). Right: Example sampling from the 50% sawtooth (filled blue dots) and from the uniform background (open red squares). In this example, the χ 2 and likelihood ratio tests are the clear winners, with BDM next. The KS, CVM, and AD tests reject the null hypothesis with the same probability as for sampling from a true null distribution. This very poor performance for these tests is readily understood, as these tests are all based on the cumulative distributions, which average out local oscillations. 62 Frank Porter, Caltech Seminar, February 26, 2008

63 Power summary for sawtooth alternative Estimates of power for seven different test statistics, for a sawtooth alternative distribution Statistic % % χ ± ± 1.2 BDM 1.9 ± ± 1.2 KS 0.85 ± ± 0.2 CVM 0.91 ± ± 0.2 AD 0.91 ± ± 0.3 ln λ 4.5 ± ± 1.2 ln L 0.30 ± ± Frank Porter, Caltech Seminar, February 26, 2008

64 Conclusions These studies have demonstrated some important lessons in goodnessof-fit testing: 1. There is no single best test for all applications. Statements such as test X is better than test Y are empty without giving more context. For example, the Anderson-Darling test is often very powerful in testing normality of data against alternatives with non-normal tails (such as the Cauchy distribution)*. However, we have seen that it is not always especially powerful in other situations. The more we know about what we wish to test for, the more reliably we can choose a powerful test. Each of the tests investigated here may be reasonable to use, depending on the circumstance. Even the controversial L test works as well as the others sometimes. However, there is no known * M. A. Stephens, EDF Statistics for Goodness of Fit and Some Comparisons, Jour. Amer. Stat. Assoc. 69 (1974) Frank Porter, Caltech Seminar, February 26, 2008

65 situation where it actually performs better than all of the others, and indeed the situations where it is observed to perform as well are here limited to those where it is equivalent to another test. Thus, there seems to be no reason to advocate including it in the toolkit. 2. Computing probabilities via simulations is a very useful technique. However, it must be done with care. The issue of tests with an incompletely specified null hypothesis is particularly insidious. Simply generating a distribution according to some assumed null distribution can lead to badly wrong results. Where this could occur, it is important to verify the validity of the procedure. Note that we have only looked into the tails to the 1% level. The validity must be checked to whatever level of probability is needed for the results. Thus, we cannot blindly assume the results quoted here at the 1% level will still be true at, say, the 0.1% level. We have concentrated on the specific question of comparing two his- 65 Frank Porter, Caltech Seminar, February 26, 2008

66 tograms. However, the general considerations apply more generally, to testing whether two datasets are consistent with being drawn from the same distribution, and to testing whether a dataset is consistent with a predicted distribution. The KS, CVM, AD, ln L, andl tests may all be constructed for these other situations (as well as the χ 2 and BDM, if we bin the data). 66 Frank Porter, Caltech Seminar, February 26, 2008

Goodness-of-Fit Pitfalls and Power Example: Testing Consistency of Two Histograms

Goodness-of-Fit Pitfalls and Power Example: Testing Consistency of Two Histograms Goodness-of-Fit Pitfalls and Power Example: Testing Consistency of Two Histograms Sometimes we have two histograms and are faced with the question: Are they consistent? That is, are our two histograms

More information

Recall the Basics of Hypothesis Testing

Recall the Basics of Hypothesis Testing Recall the Basics of Hypothesis Testing The level of significance α, (size of test) is defined as the probability of X falling in w (rejecting H 0 ) when H 0 is true: P(X w H 0 ) = α. H 0 TRUE H 1 TRUE

More information

Frank Porter February 26, 2013

Frank Porter February 26, 2013 116 Frank Porter February 26, 2013 Chapter 6 Hypothesis Tests Often, we want to address questions such as whether the possible observation of a new effect is really significant, or merely a chance fluctuation.

More information

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Chapte The McGraw-Hill Companies, Inc. All rights reserved. er15 Chapte Chi-Square Tests d Chi-Square Tests for -Fit Uniform Goodness- Poisson Goodness- Goodness- ECDF Tests (Optional) Contingency Tables A contingency table is a cross-tabulation of n paired observations

More information

Statistical Methods for Astronomy

Statistical Methods for Astronomy Statistical Methods for Astronomy Probability (Lecture 1) Statistics (Lecture 2) Why do we need statistics? Useful Statistics Definitions Error Analysis Probability distributions Error Propagation Binomial

More information

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary Patrick Breheny October 13 Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/25 Introduction Introduction What s wrong with z-tests? So far we ve (thoroughly!) discussed how to carry out hypothesis

More information

Constructing Ensembles of Pseudo-Experiments

Constructing Ensembles of Pseudo-Experiments Constructing Ensembles of Pseudo-Experiments Luc Demortier The Rockefeller University, New York, NY 10021, USA The frequentist interpretation of measurement results requires the specification of an ensemble

More information

Statistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That

Statistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That Statistics Lecture 2 August 7, 2000 Frank Porter Caltech The plan for these lectures: The Fundamentals; Point Estimation Maximum Likelihood, Least Squares and All That What is a Confidence Interval? Interval

More information

Lecture 2: CDF and EDF

Lecture 2: CDF and EDF STAT 425: Introduction to Nonparametric Statistics Winter 2018 Instructor: Yen-Chi Chen Lecture 2: CDF and EDF 2.1 CDF: Cumulative Distribution Function For a random variable X, its CDF F () contains all

More information

Solution: chapter 2, problem 5, part a:

Solution: chapter 2, problem 5, part a: Learning Chap. 4/8/ 5:38 page Solution: chapter, problem 5, part a: Let y be the observed value of a sampling from a normal distribution with mean µ and standard deviation. We ll reserve µ for the estimator

More information

Confidence Intervals. Confidence interval for sample mean. Confidence interval for sample mean. Confidence interval for sample mean

Confidence Intervals. Confidence interval for sample mean. Confidence interval for sample mean. Confidence interval for sample mean Confidence Intervals Confidence interval for sample mean The CLT tells us: as the sample size n increases, the sample mean is approximately Normal with mean and standard deviation Thus, we have a standard

More information

14.30 Introduction to Statistical Methods in Economics Spring 2009

14.30 Introduction to Statistical Methods in Economics Spring 2009 MIT OpenCourseWare http://ocw.mit.edu 4.0 Introduction to Statistical Methods in Economics Spring 009 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Physics 509: Bootstrap and Robust Parameter Estimation

Physics 509: Bootstrap and Robust Parameter Estimation Physics 509: Bootstrap and Robust Parameter Estimation Scott Oser Lecture #20 Physics 509 1 Nonparametric parameter estimation Question: what error estimate should you assign to the slope and intercept

More information

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions. The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions. A common problem of this type is concerned with determining

More information

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede Hypothesis Testing: Suppose we have two or (in general) more simple hypotheses which can describe a set of data Simple means explicitly defined, so if parameters have to be fitted, that has already been

More information

Robustness and Distribution Assumptions

Robustness and Distribution Assumptions Chapter 1 Robustness and Distribution Assumptions 1.1 Introduction In statistics, one often works with model assumptions, i.e., one assumes that data follow a certain model. Then one makes use of methodology

More information

More on Estimation. Maximum Likelihood Estimation.

More on Estimation. Maximum Likelihood Estimation. More on Estimation. In the previous chapter we looked at the properties of estimators and the criteria we could use to choose between types of estimators. Here we examine more closely some very popular

More information

UNIVERSITÄT POTSDAM Institut für Mathematik

UNIVERSITÄT POTSDAM Institut für Mathematik UNIVERSITÄT POTSDAM Institut für Mathematik Testing the Acceleration Function in Life Time Models Hannelore Liero Matthias Liero Mathematische Statistik und Wahrscheinlichkeitstheorie Universität Potsdam

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

Physics 509: Non-Parametric Statistics and Correlation Testing

Physics 509: Non-Parametric Statistics and Correlation Testing Physics 509: Non-Parametric Statistics and Correlation Testing Scott Oser Lecture #19 Physics 509 1 What is non-parametric statistics? Non-parametric statistics is the application of statistical tests

More information

SIMULATED POWER OF SOME DISCRETE GOODNESS- OF-FIT TEST STATISTICS FOR TESTING THE NULL HYPOTHESIS OF A ZIG-ZAG DISTRIBUTION

SIMULATED POWER OF SOME DISCRETE GOODNESS- OF-FIT TEST STATISTICS FOR TESTING THE NULL HYPOTHESIS OF A ZIG-ZAG DISTRIBUTION Far East Journal of Theoretical Statistics Volume 28, Number 2, 2009, Pages 57-7 This paper is available online at http://www.pphmj.com 2009 Pushpa Publishing House SIMULATED POWER OF SOME DISCRETE GOODNESS-

More information

Fitting a Straight Line to Data

Fitting a Straight Line to Data Fitting a Straight Line to Data Thanks for your patience. Finally we ll take a shot at real data! The data set in question is baryonic Tully-Fisher data from http://astroweb.cwru.edu/sparc/btfr Lelli2016a.mrt,

More information

Practice Problems Section Problems

Practice Problems Section Problems Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,

More information

Asymptotic Statistics-VI. Changliang Zou

Asymptotic Statistics-VI. Changliang Zou Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Statistics Challenges in High Energy Physics Search Experiments

Statistics Challenges in High Energy Physics Search Experiments Statistics Challenges in High Energy Physics Search Experiments The Weizmann Institute of Science, Rehovot, Israel E-mail: eilam.gross@weizmann.ac.il Ofer Vitells The Weizmann Institute of Science, Rehovot,

More information

Joseph O. Marker Marker Actuarial a Services, LLC and University of Michigan CLRS 2010 Meeting. J. Marker, LSMWP, CLRS 1

Joseph O. Marker Marker Actuarial a Services, LLC and University of Michigan CLRS 2010 Meeting. J. Marker, LSMWP, CLRS 1 Joseph O. Marker Marker Actuarial a Services, LLC and University of Michigan CLRS 2010 Meeting J. Marker, LSMWP, CLRS 1 Expected vs Actual Distribution Test distributions of: Number of claims (frequency)

More information

Statistics and Data Analysis

Statistics and Data Analysis Statistics and Data Analysis The Crash Course Physics 226, Fall 2013 "There are three kinds of lies: lies, damned lies, and statistics. Mark Twain, allegedly after Benjamin Disraeli Statistics and Data

More information

Random Number Generation. CS1538: Introduction to simulations

Random Number Generation. CS1538: Introduction to simulations Random Number Generation CS1538: Introduction to simulations Random Numbers Stochastic simulations require random data True random data cannot come from an algorithm We must obtain it from some process

More information

Goodness-of-fit Tests for the Normal Distribution Project 1

Goodness-of-fit Tests for the Normal Distribution Project 1 Goodness-of-fit Tests for the Normal Distribution Project 1 Jeremy Morris September 29, 2005 1 Kolmogorov-Smirnov Test The Kolmogorov-Smirnov Test (KS test) is based on the cumulative distribution function

More information

P Values and Nuisance Parameters

P Values and Nuisance Parameters P Values and Nuisance Parameters Luc Demortier The Rockefeller University PHYSTAT-LHC Workshop on Statistical Issues for LHC Physics CERN, Geneva, June 27 29, 2007 Definition and interpretation of p values;

More information

Stat 5421 Lecture Notes Fuzzy P-Values and Confidence Intervals Charles J. Geyer March 12, Discreteness versus Hypothesis Tests

Stat 5421 Lecture Notes Fuzzy P-Values and Confidence Intervals Charles J. Geyer March 12, Discreteness versus Hypothesis Tests Stat 5421 Lecture Notes Fuzzy P-Values and Confidence Intervals Charles J. Geyer March 12, 2016 1 Discreteness versus Hypothesis Tests You cannot do an exact level α test for any α when the data are discrete.

More information

Testing Homogeneity Of A Large Data Set By Bootstrapping

Testing Homogeneity Of A Large Data Set By Bootstrapping Testing Homogeneity Of A Large Data Set By Bootstrapping 1 Morimune, K and 2 Hoshino, Y 1 Graduate School of Economics, Kyoto University Yoshida Honcho Sakyo Kyoto 606-8501, Japan. E-Mail: morimune@econ.kyoto-u.ac.jp

More information

Finding Outliers in Monte Carlo Computations

Finding Outliers in Monte Carlo Computations Finding Outliers in Monte Carlo Computations Prof. Michael Mascagni Department of Computer Science Department of Mathematics Department of Scientific Computing Graduate Program in Molecular Biophysics

More information

Hypothesis testing:power, test statistic CMS:

Hypothesis testing:power, test statistic CMS: Hypothesis testing:power, test statistic The more sensitive the test, the better it can discriminate between the null and the alternative hypothesis, quantitatively, maximal power In order to achieve this

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.1 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

HYPOTHESIS TESTING: FREQUENTIST APPROACH.

HYPOTHESIS TESTING: FREQUENTIST APPROACH. HYPOTHESIS TESTING: FREQUENTIST APPROACH. These notes summarize the lectures on (the frequentist approach to) hypothesis testing. You should be familiar with the standard hypothesis testing from previous

More information

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire

More information

Statistics notes. A clear statistical framework formulates the logic of what we are doing and why. It allows us to make precise statements.

Statistics notes. A clear statistical framework formulates the logic of what we are doing and why. It allows us to make precise statements. Statistics notes Introductory comments These notes provide a summary or cheat sheet covering some basic statistical recipes and methods. These will be discussed in more detail in the lectures! What is

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

My data doesn t look like that..

My data doesn t look like that.. Testing assumptions My data doesn t look like that.. We have made a big deal about testing model assumptions each week. Bill Pine Testing assumptions Testing assumptions We have made a big deal about testing

More information

E. Santovetti lesson 4 Maximum likelihood Interval estimation

E. Santovetti lesson 4 Maximum likelihood Interval estimation E. Santovetti lesson 4 Maximum likelihood Interval estimation 1 Extended Maximum Likelihood Sometimes the number of total events measurements of the experiment n is not fixed, but, for example, is a Poisson

More information

Notes on the Multivariate Normal and Related Topics

Notes on the Multivariate Normal and Related Topics Version: July 10, 2013 Notes on the Multivariate Normal and Related Topics Let me refresh your memory about the distinctions between population and sample; parameters and statistics; population distributions

More information

Density Estimation (II)

Density Estimation (II) Density Estimation (II) Yesterday Overview & Issues Histogram Kernel estimators Ideogram Today Further development of optimization Estimating variance and bias Adaptive kernels Multivariate kernel estimation

More information

Goodness of Fit Test and Test of Independence by Entropy

Goodness of Fit Test and Test of Independence by Entropy Journal of Mathematical Extension Vol. 3, No. 2 (2009), 43-59 Goodness of Fit Test and Test of Independence by Entropy M. Sharifdoost Islamic Azad University Science & Research Branch, Tehran N. Nematollahi

More information

Multivariate Distributions

Multivariate Distributions Copyright Cosma Rohilla Shalizi; do not distribute without permission updates at http://www.stat.cmu.edu/~cshalizi/adafaepov/ Appendix E Multivariate Distributions E.1 Review of Definitions Let s review

More information

Analysis of Statistical Algorithms. Comparison of Data Distributions in Physics Experiments

Analysis of Statistical Algorithms. Comparison of Data Distributions in Physics Experiments for the Comparison of Data Distributions in Physics Experiments, Andreas Pfeiffer, Maria Grazia Pia 2 and Alberto Ribon CERN, Geneva, Switzerland 2 INFN Genova, Genova, Italy st November 27 GoF-Tests (and

More information

Primer on statistics:

Primer on statistics: Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood

More information

Chapter 26: Comparing Counts (Chi Square)

Chapter 26: Comparing Counts (Chi Square) Chapter 6: Comparing Counts (Chi Square) We ve seen that you can turn a qualitative variable into a quantitative one (by counting the number of successes and failures), but that s a compromise it forces

More information

Chapter 7: Hypothesis testing

Chapter 7: Hypothesis testing Chapter 7: Hypothesis testing Hypothesis testing is typically done based on the cumulative hazard function. Here we ll use the Nelson-Aalen estimate of the cumulative hazard. The survival function is used

More information

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur Lecture No. # 38 Goodness - of fit tests Hello and welcome to this

More information

What s New in Econometrics. Lecture 13

What s New in Econometrics. Lecture 13 What s New in Econometrics Lecture 13 Weak Instruments and Many Instruments Guido Imbens NBER Summer Institute, 2007 Outline 1. Introduction 2. Motivation 3. Weak Instruments 4. Many Weak) Instruments

More information

Statistics for Data Analysis. Niklaus Berger. PSI Practical Course Physics Institute, University of Heidelberg

Statistics for Data Analysis. Niklaus Berger. PSI Practical Course Physics Institute, University of Heidelberg Statistics for Data Analysis PSI Practical Course 2014 Niklaus Berger Physics Institute, University of Heidelberg Overview You are going to perform a data analysis: Compare measured distributions to theoretical

More information

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances Advances in Decision Sciences Volume 211, Article ID 74858, 8 pages doi:1.1155/211/74858 Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances David Allingham 1 andj.c.w.rayner

More information

Application of Homogeneity Tests: Problems and Solution

Application of Homogeneity Tests: Problems and Solution Application of Homogeneity Tests: Problems and Solution Boris Yu. Lemeshko (B), Irina V. Veretelnikova, Stanislav B. Lemeshko, and Alena Yu. Novikova Novosibirsk State Technical University, Novosibirsk,

More information

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015 STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots March 8, 2015 The duality between CI and hypothesis testing The duality between CI and hypothesis

More information

On the Triangle Test with Replications

On the Triangle Test with Replications On the Triangle Test with Replications Joachim Kunert and Michael Meyners Fachbereich Statistik, University of Dortmund, D-44221 Dortmund, Germany E-mail: kunert@statistik.uni-dortmund.de E-mail: meyners@statistik.uni-dortmund.de

More information

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland Frederick James CERN, Switzerland Statistical Methods in Experimental Physics 2nd Edition r i Irr 1- r ri Ibn World Scientific NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TAIPEI CHENNAI CONTENTS

More information

Statistical Methods in Particle Physics Lecture 2: Limits and Discovery

Statistical Methods in Particle Physics Lecture 2: Limits and Discovery Statistical Methods in Particle Physics Lecture 2: Limits and Discovery SUSSP65 St Andrews 16 29 August 2009 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan

More information

A nonparametric two-sample wald test of equality of variances

A nonparametric two-sample wald test of equality of variances University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 211 A nonparametric two-sample wald test of equality of variances David

More information

Statistical Intervals (One sample) (Chs )

Statistical Intervals (One sample) (Chs ) 7 Statistical Intervals (One sample) (Chs 8.1-8.3) Confidence Intervals The CLT tells us that as the sample size n increases, the sample mean X is close to normally distributed with expected value µ and

More information

Data modelling Parameter estimation

Data modelling Parameter estimation ASTR509-10 Data modelling Parameter estimation Pierre-Simon, Marquis de Laplace 1749-1827 Under Napoleon, Laplace was a member, then chancellor, of the Senate, and received the Legion of Honour in 1805.

More information

Confidence intervals

Confidence intervals Confidence intervals We now want to take what we ve learned about sampling distributions and standard errors and construct confidence intervals. What are confidence intervals? Simply an interval for which

More information

Are Declustered Earthquake Catalogs Poisson?

Are Declustered Earthquake Catalogs Poisson? Are Declustered Earthquake Catalogs Poisson? Philip B. Stark Department of Statistics, UC Berkeley Brad Luen Department of Mathematics, Reed College 14 October 2010 Department of Statistics, Penn State

More information

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky Empirical likelihood with right censored data were studied by Thomas and Grunkmier (1975), Li (1995),

More information

Inference for Binomial Parameters

Inference for Binomial Parameters Inference for Binomial Parameters Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University D. Bandyopadhyay (VCU) BIOS 625: Categorical Data & GLM 1 / 58 Inference for

More information

Extreme Value Theory.

Extreme Value Theory. Bank of England Centre for Central Banking Studies CEMLA 2013 Extreme Value Theory. David G. Barr November 21, 2013 Any views expressed are those of the author and not necessarily those of the Bank of

More information

2.1.3 The Testing Problem and Neave s Step Method

2.1.3 The Testing Problem and Neave s Step Method we can guarantee (1) that the (unknown) true parameter vector θ t Θ is an interior point of Θ, and (2) that ρ θt (R) > 0 for any R 2 Q. These are two of Birch s regularity conditions that were critical

More information

STAT 536: Genetic Statistics

STAT 536: Genetic Statistics STAT 536: Genetic Statistics Tests for Hardy Weinberg Equilibrium Karin S. Dorman Department of Statistics Iowa State University September 7, 2006 Statistical Hypothesis Testing Identify a hypothesis,

More information

STAT 4385 Topic 01: Introduction & Review

STAT 4385 Topic 01: Introduction & Review STAT 4385 Topic 01: Introduction & Review Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 Outline Welcome What is Regression Analysis? Basics

More information

Bias Variance Trade-off

Bias Variance Trade-off Bias Variance Trade-off The mean squared error of an estimator MSE(ˆθ) = E([ˆθ θ] 2 ) Can be re-expressed MSE(ˆθ) = Var(ˆθ) + (B(ˆθ) 2 ) MSE = VAR + BIAS 2 Proof MSE(ˆθ) = E((ˆθ θ) 2 ) = E(([ˆθ E(ˆθ)]

More information

Learning Objectives for Stat 225

Learning Objectives for Stat 225 Learning Objectives for Stat 225 08/20/12 Introduction to Probability: Get some general ideas about probability, and learn how to use sample space to compute the probability of a specific event. Set Theory:

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

Stochastic Simulation

Stochastic Simulation Stochastic Simulation APPM 7400 Lesson 3: Testing Random Number Generators Part II: Uniformity September 5, 2018 Lesson 3: Testing Random Number GeneratorsPart II: Uniformity Stochastic Simulation September

More information

Discovery significance with statistical uncertainty in the background estimate

Discovery significance with statistical uncertainty in the background estimate Glen Cowan, Eilam Gross ATLAS Statistics Forum 8 May, 2008 Discovery significance with statistical uncertainty in the background estimate Introduction In a search for a new type of event, data samples

More information

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Jeremy S. Conner and Dale E. Seborg Department of Chemical Engineering University of California, Santa Barbara, CA

More information

Physics 6720 Introduction to Statistics April 4, 2017

Physics 6720 Introduction to Statistics April 4, 2017 Physics 6720 Introduction to Statistics April 4, 2017 1 Statistics of Counting Often an experiment yields a result that can be classified according to a set of discrete events, giving rise to an integer

More information

Fuzzy and Randomized Confidence Intervals and P-values

Fuzzy and Randomized Confidence Intervals and P-values Fuzzy and Randomized Confidence Intervals and P-values Charles J. Geyer and Glen D. Meeden June 15, 2004 Abstract. The optimal hypothesis tests for the binomial distribution and some other discrete distributions

More information

Semi-parametric predictive inference for bivariate data using copulas

Semi-parametric predictive inference for bivariate data using copulas Semi-parametric predictive inference for bivariate data using copulas Tahani Coolen-Maturi a, Frank P.A. Coolen b,, Noryanti Muhammad b a Durham University Business School, Durham University, Durham, DH1

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Statistics. Lecture 4 August 9, 2000 Frank Porter Caltech. 1. The Fundamentals; Point Estimation. 2. Maximum Likelihood, Least Squares and All That

Statistics. Lecture 4 August 9, 2000 Frank Porter Caltech. 1. The Fundamentals; Point Estimation. 2. Maximum Likelihood, Least Squares and All That Statistics Lecture 4 August 9, 2000 Frank Porter Caltech The plan for these lectures: 1. The Fundamentals; Point Estimation 2. Maximum Likelihood, Least Squares and All That 3. What is a Confidence Interval?

More information

3. (a) (8 points) There is more than one way to correctly express the null hypothesis in matrix form. One way to state the null hypothesis is

3. (a) (8 points) There is more than one way to correctly express the null hypothesis in matrix form. One way to state the null hypothesis is Stat 501 Solutions and Comments on Exam 1 Spring 005-4 0-4 1. (a) (5 points) Y ~ N, -1-4 34 (b) (5 points) X (X,X ) = (5,8) ~ N ( 11.5, 0.9375 ) 3 1 (c) (10 points, for each part) (i), (ii), and (v) are

More information

Bivariate Rainfall and Runoff Analysis Using Entropy and Copula Theories

Bivariate Rainfall and Runoff Analysis Using Entropy and Copula Theories Entropy 2012, 14, 1784-1812; doi:10.3390/e14091784 Article OPEN ACCESS entropy ISSN 1099-4300 www.mdpi.com/journal/entropy Bivariate Rainfall and Runoff Analysis Using Entropy and Copula Theories Lan Zhang

More information

Statistical distributions: Synopsis

Statistical distributions: Synopsis Statistical distributions: Synopsis Basics of Distributions Special Distributions: Binomial, Exponential, Poisson, Gamma, Chi-Square, F, Extreme-value etc Uniform Distribution Empirical Distributions Quantile

More information

Parametric Empirical Bayes Methods for Microarrays

Parametric Empirical Bayes Methods for Microarrays Parametric Empirical Bayes Methods for Microarrays Ming Yuan, Deepayan Sarkar, Michael Newton and Christina Kendziorski April 30, 2018 Contents 1 Introduction 1 2 General Model Structure: Two Conditions

More information

2008 Winton. Statistical Testing of RNGs

2008 Winton. Statistical Testing of RNGs 1 Statistical Testing of RNGs Criteria for Randomness For a sequence of numbers to be considered a sequence of randomly acquired numbers, it must have two basic statistical properties: Uniformly distributed

More information

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur Lecture No. # 36 Sampling Distribution and Parameter Estimation

More information

Review. December 4 th, Review

Review. December 4 th, Review December 4 th, 2017 Att. Final exam: Course evaluation Friday, 12/14/2018, 10:30am 12:30pm Gore Hall 115 Overview Week 2 Week 4 Week 7 Week 10 Week 12 Chapter 6: Statistics and Sampling Distributions Chapter

More information

Chapter 11. Hypothesis Testing (II)

Chapter 11. Hypothesis Testing (II) Chapter 11. Hypothesis Testing (II) 11.1 Likelihood Ratio Tests one of the most popular ways of constructing tests when both null and alternative hypotheses are composite (i.e. not a single point). Let

More information

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game. EdPsych/Psych/Soc 589 C.J. Anderson Homework 5: Answer Key 1. Probelm 3.18 (page 96 of Agresti). (a) Y assume Poisson random variable. Plausible Model: E(y) = µt. The expected number of arrests arrests

More information

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between 7.2 One-Sample Correlation ( = a) Introduction Correlation analysis measures the strength and direction of association between variables. In this chapter we will test whether the population correlation

More information

Loglikelihood and Confidence Intervals

Loglikelihood and Confidence Intervals Stat 504, Lecture 2 1 Loglikelihood and Confidence Intervals The loglikelihood function is defined to be the natural logarithm of the likelihood function, l(θ ; x) = log L(θ ; x). For a variety of reasons,

More information

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n = Hypothesis testing I I. What is hypothesis testing? [Note we re temporarily bouncing around in the book a lot! Things will settle down again in a week or so] - Exactly what it says. We develop a hypothesis,

More information

Nonparametric tests. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 704: Data Analysis I

Nonparametric tests. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 704: Data Analysis I 1 / 16 Nonparametric tests Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I Nonparametric one and two-sample tests 2 / 16 If data do not come from a normal

More information

Importance Sampling and. Radon-Nikodym Derivatives. Steven R. Dunbar. Sampling with respect to 2 distributions. Rare Event Simulation

Importance Sampling and. Radon-Nikodym Derivatives. Steven R. Dunbar. Sampling with respect to 2 distributions. Rare Event Simulation 1 / 33 Outline 1 2 3 4 5 2 / 33 More than one way to evaluate a statistic A statistic for X with pdf u(x) is A = E u [F (X)] = F (x)u(x) dx 3 / 33 Suppose v(x) is another probability density such that

More information

H 2 : otherwise. that is simply the proportion of the sample points below level x. For any fixed point x the law of large numbers gives that

H 2 : otherwise. that is simply the proportion of the sample points below level x. For any fixed point x the law of large numbers gives that Lecture 28 28.1 Kolmogorov-Smirnov test. Suppose that we have an i.i.d. sample X 1,..., X n with some unknown distribution and we would like to test the hypothesis that is equal to a particular distribution

More information

Stat 451 Lecture Notes Simulating Random Variables

Stat 451 Lecture Notes Simulating Random Variables Stat 451 Lecture Notes 05 12 Simulating Random Variables Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 22 in Lange, and Chapter 2 in Robert & Casella 2 Updated:

More information

Statistical Methods in Particle Physics

Statistical Methods in Particle Physics Statistical Methods in Particle Physics Lecture 11 January 7, 2013 Silvia Masciocchi, GSI Darmstadt s.masciocchi@gsi.de Winter Semester 2012 / 13 Outline How to communicate the statistical uncertainty

More information

Central limit theorem. Paninski, Intro. Math. Stats., October 5, probability, Z N P Z, if

Central limit theorem. Paninski, Intro. Math. Stats., October 5, probability, Z N P Z, if Paninski, Intro. Math. Stats., October 5, 2005 35 probability, Z P Z, if P ( Z Z > ɛ) 0 as. (The weak LL is called weak because it asserts convergence in probability, which turns out to be a somewhat weak

More information