CMS Internal Note. The content of this note is intended for CMS internal use and distribution only

Size: px

Start display at page:

Download "CMS Internal Note. The content of this note is intended for CMS internal use and distribution only"

Buck Lamb
6 years ago
Views:

1 Available on CMS information server CMS IN 2003/xxxx CMS Internal Note The content of this note is intended for CMS internal use and distribution only August 26, 2003 Expected signal observability at future experiments V. Bartsch, G. Quast Institut für Experimentelle Kernphysik, Universität Karlsruhe Abstract Several methods to quantify the significance of an expected signal at future experiments have been used or suggested in literature. In this note, comparisons are presented with a method based on the likelihood ratio of the background hypothesis and the signal-plus-background hypothesis. A large number of Monte Carlo experiments are performed to investigate the properties of the various methods and to check whether the probability of a background fluctuation having produced the claimed significance of the discovery is properly described. In addition, the best possible separation between the two hypotheses should be provided, in other words, the discovery potential of a future experiment be maximal. Finally, a practical method to apply a likelihood-based definition of the significance is suggested in this note. Signal and background contributions are determined from a likelihood fit based on shapes only, and the probability density distributions of the significance thus determined are found to be of a Gaussian shape even with small statistics.

2 1 Introduction There are several methods to quantify the probability of a new physics discovery in a future experiment. These methods can be coarsely classified as event counting methods and likelihood methods. Event counting methods rely on a pre-specified definition of a signal region to determine the observed numbers of signal and background events. Likelihood methods usually allow for free parameters, like to total number of signal and background events within a given range, and take into account the shape of the distributions of signal and background events. Instead of using the fitted numbers of signal and background events, likelihood methods offer the possibility to directly use the values of likelihood found at the best-fit point to determine the significance level of a signal. In the following, some event counting and likelihood methods are investigated in a large number of simulated experiments ( toy experiments ). A relatively simple and practical likelihood method is proposed to determine the expected significance of a discovery in a future experiment, which still holds in the case of small statistics. 2 Overview of methods The following subsections present an overview of some methods of estimating the significance, S, of an observed signal. In high-energy physics, significance is usually understood as the number of standard deviations an observed signal is above expected background fluctuations. Implicitly, it is understood that S follows a standard Gaussian distribution with a mean of zero and a standard deviation of one. In statistics literature, significance level is the probability to find a value of a test statistic beyond a certain pre-specified critical value. For non- Gaussian distributions, the significance level has to be converted into an equivalent number of truly Gaussian sigmas to arrive at the common terminology of a high-energy physicist. With this in mind, a given value of S ( = number of sigmas ) corresponds to the probability that the claimed signal is caused merely by fluctuations of the background, and this probability is obtained by performing the corresponding integrals of a standard Gaussian distribution. Since a signal is usually searched for in many bins of a distribution, and in many channels, a very high value of the significance must be used before an observed peak found somewhere in some distribution can be claimed to be an observation of a signal. An example is the case of a search for a signal with an expected mass resolution of 1 GeV in a mass range of 100 GeV. The probability of a two-sigma upward fluctuation of the background in a given two-gev wide region is only about 2 %, but a signal may appear in any one of 50 such regions, and therefore a pure background sample will, on average, result in finding one peak with a significance of two. The general agreement is that the value of S of a signal should exceed five. (The significance level, or the corresponding one-sided Gaussian probability, in other words the integral of the standard Gaussian distribution from five to infinity, is ) 2.1 Event Counting Counting methods use the number of signal events, N s, and the number of background events, N b, observed in some signal region to define the significance S. This procedure requires working with binned distributions, which in turn means that bin positions and bin widths have to be fixed; if these bin positions and widths are fixed after observation of an excess above the background somewhere, possibilities for subjective choices open up. Furthermore, the optimal choice of a signal region around the mean to maximise the sensitivity depends on the background level, which is another disadvantage of such methods. The definitions given below may be considered as statistical estimators of the significance of an observation from the events found in an experiment. Two variants are frequently used, another one, more robust against downward fluctuations of the background, was recently suggested [1]: S c1 = S c2 = N s Nb, (1) N s Ns + N b, (2) S c12 = 2 ( N s + N b N b ) [1]. (3) The formula for the third method is strictly only valid in the Gaussian limit of the Poisson distribution; for small statistics, the value of S c12 is tabulated. 2

3 2.2 Likelihood methods Likelihood methods are frequently used in data analysis to extract the number of signal and background events from a fit to one or more distributions which discriminate signal from background. One considers two different hypotheses, the null hypothesis assuming that the observed distribution is formed by background only, and an alternative hypothesis assuming the presence of signal and background. The ratio of likelihoods of the null hypothesis and an alternative hypothesis has also proven to provide a powerful test statistic in high energy physics applications (see, e.g. the searches for the Standard Model Higgs boson performed at LEP 2 [2].) In the most simple case, a binned or unbinned likelihood fit is used to determine the contributions from signal and background events in the distribution of the invariant mass of observed final states. The number of signal events is taken to follow a signal distribution, whereas the total background results from the parametrisation of the background distribution. These distributions are motivated by theoretical expectations, Monte Carlo simulation or may be determined from the data; the latter is often the case for background shapes. The observed distribution of invariant masses with known normalised signal and background distribution of f s and f b may be described as f(m; p s 1,...ps n, pb 1,..., pb m ) = N s f s (m; p s 1,...ps n ) + N b f b (m; p s 1,...ps n, pb 1,..., pb m ). Here, p s i and pb i are parameters describing the signal and background distributions, e.g. a Breit-Wigner folded with a Gaussian distribution for the signal, and a second-order polynomial for the background. In cases of small statistics, the shapes of these distributions are usually fixed, and only N s, p s 1 = m 0 (the position of the peak) and N b are free parameters in the fit. The background is determined from the signal-free regions, and therefore the statistical precision on the mean background level depends on the width of these effective side bands. In practice, a region as large as possible around the signal should be used in order to minimise the uncertainties arising from background fluctuations. The standard maximum likelihood method takes into account only the shape of the distribution. If the normalisation and its error are also of importance, the extended maximum likelihood method is usually applied, which has an extra factor in the likelihood function to account for Poisson fluctuations in the total number of events observed. Note that such likelihood methods are independent of any a-priori, or a-posteriori, specification of a signal region. Since the fit to the distribution of observed events properly takes into account background fluctuations, these contribute to the uncertainty on the number of signal events. The relative error on N s thus obtained measures the distance to zero observed events in units of the error, and is sometimes directly quoted as a significance, S L1 = N s N s. (4) As is shown below, this is questionable at large values of S L1, because for non-gaussian errors the five-sigma interval is not simply five times the one-sigma interval. When testing for the presence of a signal component in a distribution without precise information about its size, as discussed here, it may be more appropriate to use the relative error on the signal fraction, f s = N s /(N s + N b ), as an estimator of significance. Technically, this corresponds to performing a standard maximum-likelihood fit instead of applying the extended likelihood method. A second fit with N s fixed to zero allows comparison of the likelihood obtained for this null hypothesis, in other words absence of a signal, L B, with the likelihood from the full signal-plus-background fit, L S+B. The likelihood ratio Q = L S+B / L B is used as a test statistic to distinguish the two hypotheses. For large statistics, which justify to use the Gaussian distribution to describe the distributions of numbers of observed events in each bin of a (potentially multi-dimensional) histogram, lnq is identical to half of the difference in χ 2 of the signal-plusbackground and the background-only hypotheses. The definition of significance based on the outlined method is S L2 = 2 lnq. (5) It is worth noting that the two likelihood-based estimators for the significance become equal if the distribution of 2 lnq(n s ) is a parabola, in other words in the Gaussian limit of infinitely large statistics. Usually, likelihood functions are approximated well by a parabola close to the best-fit point, but the parabolic behaviour up to a distance of 5 σ requires really large event numbers. The relation between lnq and a fit for the number of signal events in a standard likelihood-based fit is illustrated below in the discussion of Fig. 9, which shows an example of the dependence of the likelihood on the assumed number of signal events. 3

4 The distribution of Q in a series of experiments, its probability density function ( pdf ), is of crucial importance for the calculation of discovery probabilities in the presence of a real signal, or of fake probabilities due to fluctuations of the background. Likelihood ratio tests have been studied extensively in literature and good textbooks on the topic exist (e.g. [3]). In the large-statistics limit, the logarithm of the likelihood ratio, multiplied by two, is expected to follow a χ 2 -distribution with a number of degrees of freedom given by the difference in the number of free parameters between the alternative hypothesis and the null hypothesis [4]. In some cases studied in literature the expected large-statistics distribution is exactly valid even with small statistics. When testing the presence of a signal on top of the background at a fixed peak position, 2 lnq = S L2 2 is expected to follow a χ 2 distribution with one degree of freedom. The distribution of S L2 is thus given by the positive half of a standard Gaussian distribution. This is the theoretical justification to call 2 lnq a significance in the sense stated above. 2.3 Remarks on likelihoods and event counting Applying the likelihood principle laid out above to one bin only, namely the signal region, allows a likelihood estimator to be used also for event counting methods. Assuming an observation of N obs events in the signal region with an expected number of signal and background event of N s and N b, respectively, and applying Poisson statistics leads to a likelihood ratio of ( Q = 1 + N ) Nobs s exp( N s ). (6) N b Setting the expectation value of N obs to N s + N b, a likelihood estimator based on event counting, S cl, might be defined. In the limit of large numbers N s and N b, the Poisson distribution can be replaced by a Gaussian, and S cl becomes equivalent to S c1. The performance of S cl is better than S c1, but much inferior to S c12 or S L2, and therefore is not further studied. 3 Monte Carlo Study of estimators for signal significance If statistics are sufficiently large, all of the above estimators show a Gaussian distribution. When searching for something new, however, one is very often dealing with a small-statistics problem. When preparing an experiment, it is important to compare different methods and approaches on equal footings to choose the best strategy to ensure the discovery of new phenomena. At the statistical limit, two important questions arise: What is the probability to actually observe an existing signal with a pre-specified significance with a given amount of data? Or, what is the integrated luminosity needed to reach that pre-specified significance with a certain probability? What is probability that data with the observed or higher significance would be obtained if there is only background and no signal? It is obvious that the performance of the estimators for the significance defined above are quite different, in the small-statistics regime. It is important to note that the significance observed in a single experiment is a random number - depending on fluctuations of the signal and the background. To answer the questions raised above, Monte Carlo studies with a large number of toy experiments have therefore to be performed to find the probability density function ( pdf ) of the various definitions of S given above. The pdf may turn out to be far from Gaussian, because the common rules of statistics and many familiar theorems are no longer valid in the realm of small numbers. The expected significance may be defined as the mean or, better, the median of the pdf thus determined. It has become common practice to quote the expected significance together with the actually observed one. If the signal cross sections and the background level are well modelled, the two values should be rather close to one another. Special care has to be taken when the background fluctuates downwards, in other words when the observed background level is smaller than expected. Although used frequently, method S c1 is an obvious candidate to drastically overestimate the significance in such cases. On the other hand, when there is no signal, but background events cluster around one value, it is again method S c1 which shows the very bad behaviour of leading to a false announcement of a discovery more often than other methods. A crucial quantity to look at is the so-called confidence level of the background-only hypothesis, CL b, which is defined as the integral over the pdf of S obtained for a pure background sample from minus infinity to a pre-specified critical value S crit, beyond which an observation would be considered as incompatible with the 4

5 background-only hypothesis. (1 CL b ), also called p-value in modern literature [5], describes the probability that the background mimics a signal with a value of S larger than S crit. Very often the determination of the pdf is not a trivial task, in particular if multi-dimensional distributions and many search channels are involved. Monte Carlo methods for its determination may therefore have to be complemented by other methods [6]. In many cases, the various definitions of S themselves are not truly significances in the sense stated in the introduction, but merely constitute a test statistic, and discovery and fake probabilities must be calculated using the pdf of S obtained from simulated samples of background and signal events. The toy experiments in this study were motivated by the problems in defining the signal significance for the search of a Higgs boson decaying to four muons. The signal is very clean and sits on top of a small background, but particularly at low masses the event rate is small. The production cross section and the background rates have large uncertainties, and the signal is best searched for by only using the shapes of signal and background distributions. The number of observed signal events is best determined by using an unbinned likelihood fit. The number of observed signal events needed in order to reach the magical bound of S 5, or the integrated luminosity required for a 50 % chance of a discovery in this channel, was found to depend strongly on the kind of estimator used - of course this strong dependence on the significance estimator is a consequence of the relatively small number of events. According to S c1, an observation of five signal events over a background of one event should be enough, while the significance obtained from S L1 for this case is almost a factor of two lower. The toy experiments were tailored for the study of the Higgs boson search in the H 4µ channel, but then generalised to other, less favourable background situations. To cope with the large uncertainties of the theoretical predictions for the signal and background cross sections, N s and N b were treated as free parameters. In fits to the signal-plus-background samples also the mass of he signal, m 0, was left free. The likelihood ratio Q was determined from the likelihood values at the best-fit points for the signal-plus background and the background-only hypothesis. Here, the procedure adopted is different from other searches, e.g. the search for the standard-model Higgs boson at LEP, with very precisely known production cross section and well-known background levels. What matters in this case is the difference in shape between signal and background - a peak with a fixed width and position on top of a background described by a straight line. The signal was taken to be of a Gaussian shape with a radiative tail, with a resolution (in other words the standard deviation of the Gaussian) of 1.3 GeV; the background was taken to be constant between 110 and 150 GeV. The background fraction was varied to range from 15 % of the signal (the typical case for the H 4µ channel), up to 150 % of the signal. The total number of events in each sample was allowed to fluctuate according to the Poisson distribution, and the Poisson mean was chosen such that the mean of the likelihood estimator reached a significance of S L2 =5. An example of one toy experiment at small statistics is shown in Fig. 1. The symbols with error bars each indicate a mass value drawn randomly according to the mass distribution of the weighted sum of signal and background events. The curve shows the result of an unbinned likelihood fit to these mass values, with the extended likelihood method. In this way, the numbers of signal and background events, N s and N b, respectively, are simultaneously determined. The distribution of various definitions of significance from such toy experiments is shown in Fig. 2. The number of signal and background events needed for the estimators of significance derived from event counting methods, S c1 and S c12, are determined in a region of ±1 1 2σ around the centre of the peak. The background level here is about 40 % of the signal, the mean number of signal events is 25, and the background is roughly two events per GeV, or seven events within the signal region defined above. The mean and r.m.s. of the various definitions of significance are shown in Table 1 below. Table 1: Summary of the mean and r.m.s. of the various definitions of significance for 25 signal events and 40% background level as discussed in the text. mean S r.m.s. S c S c S L It can be seen that S c1 gives much larger values than S c12 or S L2. Gaussian fits are overlaid in the figure, and the agreement is good for S c12 and S L2, while S c1 is only poorly described by the Gaussian fit. The mean of the S c12 or S L2 distributions is very close to five, both distributions are rather symmetric, and therefore the median is very close to the mean. Thus the distributions imply that a signal will be found with a value of S greater than five 5

6 -1 # after 40fb m H [GeV] Figure 1: Example of an unbinned likelihood fit to a number of data points shown as symbols. The data points demonstrate how the real data could look like after the first 40 fb 1 at LHC. The shaded histogram shows the distribution of signal plus background obtained from a large number of weighted simulated events. The fit to the background assumes a flat background. in 50 % of such experiments, or with 50 % probability in the one real experiment CMS will hopefully be able to perform after the LHC start-up. Beside the distributions of Fig. 2, the reliability of an estimator of significance has to be checked. In other words: how often would a false discovery be claimed for a certain choice of the minimum required significance S crit? Because of the very high values of the required significances the pdf must be properly modelled for a pure background sample even in the tails of the distribution. The same distributions as above are shown in Fig. 3, but this time for toy experiments with background only. The mass, i.e. the position of the mean of the Gaussian was fixed in the fits to correspond to the signal position of the previous example. The peak seen in bin zero of the distribution is of a technical origin, because the number of signal events was constrained to be greater or equal to zero in the fit 1). The overlaid fit, which neglects the first bin, is of Gaussian shape. As can be seen, up to the highest observed significance, S c12 and S L2 are amazingly well described by a Gaussian distribution. It should, however, be noted that even with one million experiments, the distribution is only tested up to 4.5 standard deviations; the probability to find any entry beyond 5 in one million experiments is expected to be only about 0.3 for a perfect Gaussian shape. The good agreement with the Gaussian shape suggests that it will probably hold to even larger values of S than could be tested with one million samples. S c1, on the contrary, is not compatible with a standard Gaussian with mean around zero and standard deviation of one, and delivers significance beyond a value of five far too often. This means that S c1 can not be quoted directly as a significance in the sense defined in the introduction for the particular background-to-signal ratio of this example. The Gaussian nature of the pdf for the estimators of significance S c12 and S L2, as obtained from the backgroundonly sample, makes the calculation of (1 CL b ) straightforward. More examples with lower or higher background levels are shown in Fig. 4 and Fig. 5. Fig. 4 shows a small statistics case, where S c1 is very non-gaussian, and too many background samples result in a value of S c1 beyond five. This situation is typical of the searches in the H 4µ channel. With more background events, and hence more signal events needed to reach an average signal significance of five, S c1 becomes more similar to S L2. The 1) It was verified that relaxing this condition, i.e. allowing negative number of signal events for large downward fluctuations of the background, results in the expected unmodified positive half of a standard Gaussian. 6

7 s/sqrt(b) s_1 Entries 9791 Mean RMS Constant ± Mean ± Sigma ± (sqrt(s+b)-sqrt(b)) s_12 Entries 9791 Mean RMS Constant ± Mean 4.8 ± Sigma ± sqrt(2lnq) significance s_l Entries 9791 Mean RMS 1.12 Constant ± Mean 5.03 ± Sigma ± Figure 2: Histograms of S c1, S c12 and S L2 determined from from successful fits to toy experiments, with N b, N s and m 0, the particle mass, as free parameters. The curves show the results of a Gaussian fit to the distributions. In addition to the Mean and RMS of the histograms the parameters of the fitted Gaussian, Constant, Mean and Sigma, are shown in the plots. The number of signal events is 25, the background level amounts 40%. 7

8 s/sqrt(b) s_1 2(sqrt(s+b)-sqrt(b)) s_12 Entries Entries Constant Constant Mean Mean Sigma Sigma sqrt(2lnq) significance s_l Entries Constant Mean Sigma Figure 3: Probability density functions determined from toy experiments for background only, in the absence of a real signal, with N s and N b as free parameters and m 0 fixed. Again, the estimators of significance S 1, S 12 and S L2 are shown. The fit parameters shown in each plot refer to the parameters of a Gaussian distribution. The distribution of S c1 cannot be described well by a Gaussian centred around zero, the standard deviation is far above one, and there are too many events observed at a significance beyond five, which should only occur with a probability of about , in other words 0.3 expected occurrences for a total of about one million samples. The estimators S c12 and S L2, on the contrary are well described by a Gaussian with mean around zero and a standard deviation close to one. 8

9 s/sqrt(b) sqrt(2lnq) Figure 4: Probability density functions of estimators of significance S c1 and S L2 for small statistics (11 signal events within ±1.5σ over a background of 1.5 events). Filled: pure background sample, open histogram: background plus signal. The background distributions are based on toy experiments, the distributions for signal plus background on toy experiments. The Gaussian fit to the distribution of S L2 for the background-only sample has a mean of and σ is 1.0. agreement between S c12 and S L2 is always very good 2), and the Gaussian nature of these distributions obtained from background only is confirmed in all cases. As slight modification of the procedure followed above, the position of the searched signal may be left as a free parameter, in other words a peak of the fixed shape is allowed to show up anywhere in the mass range considered. An example is shown in Fig. 6, where the peak is searched for in the whole mass window (from 115 GeV to 145 GeV). The centre of the background distribution is now shifted, because it is expected, on average, to find at least one bump somewhere in the mass window. According to theory, the large-statistics limit for the pdf of S 2 L2 is a χ 2 -distribution with two degrees of freedom 3). Such a procedure might open up the possibility for getting more realistic significance levels than is possible with the five-sigma postulate for background fluctuations at the observed peak position only. 4 Practical method to determine the expected significance Having shown above that the estimator of significance based on the likelihood ratio of the signal-plus-background and the background-only hypothesis, S L2 = 2 lnq, is very well described by a Gaussian shape and that the standard deviation of this Gaussian is one for a pure background sample in the absence of a signal, only one task remains to be done: to define a practical and simple method for estimating the expectation value of S L2 for a given amount of data. It is clear that the full procedure outlined above, in other words the generation of a large number of simulated background events, is not practical for each of the many phenomena searched for in many scenarios and over the wide energy range accessible at the LHC. It is worth noticing that the situation may not always be as simple as in the examples considered here. The background may be perfectly known from studies performed on real data, or rather precise theoretical constraints on the expected signal rate may exist, or systematic errors on the theoretical signal and background rates need to be included, or the signal exists in a multi-dimensional parameter space, and possibly many other scenarios. Such complications can lead to a non-gaussian shape of S L2, and high-statistics Monte Carlo experiments with detailed simulations of the event properties and very refined numerical tools are needed to determine the probability 2) S c12 is not shown in the figures, but is included in the comparison shown in Table 2 below. 3) There are deviations from this shape predominantly at small values of S L2, which possibly are a consequence of the adopted fitting procedure: the number of signal events is constrained to be 0, and the fit does not search for the largest peak in the whole region. 9

10 s/sqrt(b) sqrt(2lnq) Figure 5: Probability density functions for relatively large statistics for significance S c1 and S L2. Filled: pure background sample, open histogram: background plus signal, for large statistics ( 27 signal events within ±1.5σ over a background of 21 events). The background distributions are based on toy experiments, the distributions for signal plus background on toy experiments. The Gaussian fit to the distribution of S L2 for the background-only sample has a mean of and σ is s/sqrt(b) s_1 Entries Constant 7194 Mean Sigma sqrt(2lnq) significance s_l Entries Constant 8769 Mean Sigma Figure 6: Probability density functions determined from toy experiments for background only, in the absence of a real signal and with peak position as a free parameter. The mean of the distributions is now shifted, because it is expected to find some statistical cumulation somewhere. Again, fits of a Gaussian distribution are overlaid for illustration. 10

11 -1 # after 20fb m H [GeV] Figure 7: Histogram of the reconstructed invariant mass of four muons, for an integrated luminosity of 20 fb 1. The histogram is the weighted sum of signal events from H 4µ (shown in red) and of various background samples (blue). The straight line represents a fit line for the background-only hypothesis, in contrast to Fig. 1 here a slope of the background is parametrised, the curve is obtained under the assumption of the signal plus-backgroundhypothesis. The number of signal events is 5.2 ± 2.8. The difference in negative log-likelihood corresponding to the curve and the line is 5.1, corresponding to a signal significance of S L2 = 3.2. CL b with which background can mimic a signal [6]. We think, however, that it must be possible to restrict such complete studies to only a few cases for each class of problems, and then use a simpler method to obtain the expected significance in different search channels and at different signal positions. This is demonstrated below for the example with large uncertainties on the signal and background rates and the detection of a signal depending on a comparison of the signal and background shapes. The pdf of the significance definition based on likelihood ratio, S L2, was found to be of a Gaussian shape, both for pure background and for signal-plus-background samples, over a wide range of background-to-signal ratio. On a pure background sample, S L2 directly provides the correct significance levels. Given these facts, what needs to be done is to estimate the median of the pdf of S L2, which is equal to the expectation value for this case of a symmetric distribution. Therefore in this note a new approach is presented which evaluates the whole downweighted Monte Carlo statistics. Although there is no strict mathematical proof yet for this approach, evidence for its correctness from Monte Carlo is given below. In the previous section, the mean value of S was determined by performing many toy experiments using the available simulated events in quantities corresponding to one real experiment, and then taking the average of the individual fit results. The mean value can also be obtained by performing only one set of likelihood fits to the weighted sums of the signal and background events from Monte Carlo simulation. Instead of averaging fit results, we propose here to apply the corresponding fits to the average distributions. This is illustrated in Fig. 7, which shows invariant mass distribution for the H 4µ channel over a rather small background. The two curves show the results of two binned log-likelihood fits, the first one with a function describing the signal-plus-background shape and the straight line which assumes that there is only background. The bin size of the sample of weighted simulated events is small compared to the bin size that would be usable for the small number of expected signal events, and therefore the performed binned fit is a good approximation to the unbinned fits on the small samples of expected real events. In the limit of zero bin size binned and unbinned likelihood fits become equivalent. The distribution of S L2 obtained from a series of toy experiments using independent fractions of the available Monte Carlo sample, such that each of them simulates the possible outcome of the real experiment, is shown in Fig. 8. The mean value of this distribution, S L2 = 3.4 with an r.m.s. of 1.0, is equal to the difference in the values of the negative logarithm of the likelihoods, or the logarithm of the likelihood ratio, found in the two fits to the histograms in Fig. 7, which is S L2 = 3.2. Here, one should note that the expectation values of other quantities may 11

12 # Gedanken-experiments sqrt(2lnq) Figure 8: Distribution of 2 lnq from unbinned log-likelihood fits from 1000 simulated experiments, with simulated events which entered into the histogram of Fig. 7. The mean value of this distribution is 3.4 and agrees well with the value found above from the histogram fit. also be determined from these histograms, as is common practice. In particular, N s and N b and their uncertainties can be obtained from fits to the histograms, allowing other estimators of significance based on event counting to be calculated. What was just illustrated is the proposed method to estimate the expectation value of the significance S L2. The distributions of signal and background events obtained from the full samples of simulated events, are weighted to represent a given integrated luminosity and then added. A binned log-likelihood fit assuming Poisson statistics in each bin is a good approximation of an unbinned likelihood fit to be performed later in the (low-statistics) real experiment. The values of the logarithms of the likelihoods observed in two fits, one assuming there is background only and the other one taking into account also the signal shape, are then used to calculate 4) 2 lnq = 2 (lnls+b lnl B ). (7) This is the desired expectation value of the significance for the future experiment. The agreement between the mean of pdfs from the toy experiments and the value obtained from the histogram fits was verified for the various background situations already investigated in the previous section, and is quantified in Table 2 at the end of this section. Given that some kind of fit to determine the number of signal events above the background is usually performed anyway, the method suggested here to determine the significance according to S L2 does not imply a large extra effort. If the distributions of signal and background events are also available in tabulated form, such a procedure is easy to apply to the results of any given analysis. For better understanding of the proposed method it is illustrative to investigate the behaviour of the likelihood between the two extremes of the signal-plus-background and the background-only hypotheses. A smooth change from one to the other can be obtained by varying the number of signal events and determining the minimum of the negative log-likelihood with respect to the parameters describing the background-only hypothesis for each value of N s, as is shown in Fig. 9. Such a global log-likelihood curve is very useful to determine the n-σ error intervals of a fit parameter 5). Near the minimum, the error on the number of signal events can be read off the curve at the point where the negative log-likelihood increases by 1 2, or, more general, the value where n standard deviations are 4) Note that e.g. in the ROOT package, the factor of two under the square-root is already defined into the likelihood-function! 5) The same procedure is used by MINOS in the fitting package MINUIT and in the two-dimensional case for the calculation of contour lines. 12

13 reached corresponds to ( lnl) = n 2 /2. The number of signal events normalised to its error provided the loglikelihood estimator S L1 defined in the introduction; with this definition of significance one only takes into account the likelihood around the minimum up to ( lnl) = 1/2 and measures the distance to zero signal events in units of this one-sigma error. The significance S L2, on the other hand, takes the difference of the negative log-likelihood at zero signal events and at the minimum, ln Q = ( lnl(n s = 0)) ( lnl min ). In the example shown here, the value at N s = 0 is 5.1 above the minimum, so that n 2 /2 = 5.1 and hence n = 3.2, i.e. S L2 = 3.2, as already stated above. Note the asymmetry of the curve with respect to the minimum; the rise towards small values is much larger. This asymmetry is the reason why an estimate based on the likelihood values at the minimum and at zero, which goes into S L2, is different from the estimator S L1 which only uses values of the likelihood curve close to the minimum. This demonstrates that the three-sigma interval is not necessarily equal to three times the one-sigma interval! -lnl 18 -lnl lnq σ +1σ N S Figure 9: Dependence of lnl on N s for the histogram fit shown in Fig. 7. At each point on the curve, N s is fixed, and a new fit is performed to find the minimum w.r.t. all other parameters. From this curve, the n-σ error bands for N s can be determined from the increase relative to the minimum value, as shown for the (in this case asymmetric) ±1σ error interval. The value of the likelihood at the best-fit point corresponds to lnl S+B, the likelihood value at zero signal events corresponds to lnl B, the difference thus is ln Q. There is a slight technical complication in the proposed procedure. The most frequently used fitting tools, PAW or ROOT [7], assume integer bin contents in binned likelihood fits, as would be the case for distributions obtained in real experiments - the number of events observed is always an integer. Here, however, we are dealing with the number of events expected in a future experiment, obtained from sums of weighted simulated events, and therefore the bin contents takes non-integer value. It is even a small non-integer number if the bin size is small, because the latter is adjusted to match the available number of simulated events. Therefore, the conversion to an integer poses a problem. Here, we propose a generalisation of the usual Poisson distribution by interpolating the factorial n! by the Γ function, n! = Γ(n + 1). This modified distribution interpolates the Poission distribution quite nicely, but needs further checking for very small numbers. Now, n may be a non-integer bin entry, and the Poisson likelihood is defined for each bin. Such a modified fit function has been implemented as an addition to the ROOT framework, making its use technically quite easy. As a test, the number of events expected in each bin can be multiplied by a large number N, this could be for example N = 100 or , allowing an unmodified Poisson likelihood to be used. The scaling of all errors from such fits was found to follow the N-law, while all central values of the fit parameters remained the same. 13

14 Estimators of significance other than S L2 may also be determined from the histogram fit. A comparison of the histogram method with the results from the toy experiments for S c1, S c12 and S L2 is shown in Table 2 for different event statistics and background levels. The event numbers were chosen such that the likelihood estimator of significance, S L2, approaches a value of five. There is impressive agreement between the mean values from the toy experiments and the expected significance from the histogram fit. While this can be considered as a Monte Carlo proof of the procedure, a formal analytical proof still has to be done. The values of S c12 and S L2 agree very well with each other, although S c12 is always slightly smaller than S L2 ; this is understandable, because the counting methods use only a part of the available statistics within the signal bin, while the likelihood uses the full information available. The significance S c1 only becomes comparable to the other two for large statistics (40 signal events over a background of 55 events in the signal region, corresponding to the last line of the table.); at small statistics (in other words a very clean signal near the discovery threshold ) this simple method largely overestimates the significance. Table 2: Comparison of methods to determine estimators of significance. Given are the total number of events, N tot, in the mass range from 110 to 150 GeV (in other words a range about 30 times larger than the assumed mass resolution), the number of signal events, N s, the mean and r.m.s. of the distributions of significance from the toy experiments, and the expected significance from the histogram fit, for S c1, S c12 and S L2, respectively. The signal region for the event counting methods was taken to be ±1.5 times the resolution around the mean peak position. event sample S c1 S c12 S L2 N tot N s mean rms hist mean rms hist mean rms hist Conclusion Several estimators for the significance, S, of a discovery at a future experiment have been studied in this note. The significance S is understood here in the sense of a Gaussian significance, which expresses the probability of a background fluctuation having produced the claimed signal in terms of the numbers of standard deviations of a Gaussian distribution. A large number of Monte Carlo experiments were performed to study the reliability and performance of such estimators. Methods based on counting signal and background events in a certain signal region were contrasted to likelihoodbased methods exploiting the full shape of the invariant mass distribution. For sufficiently large statistics, all of the estimators are applicable, but in the case of low backgrounds the most simple one, S c1 = N s / N b, leads to a large overestimation of the significance, and worse, often leads to high values of significance in cases where there is background only. A method based on the likelihood ratio of the signal-plus-background and the backgroundonly hypothesis, named S L2, gives the best performance. In addition, the distribution of significance for a pure background sample shows a Gaussian behaviour, in other words the significance from this method can be directly translated into a probability that the background may have mimicked the claimed effect. Among the event counting methods, S c12 = 2 ( N s + N b N b ) reaches an almost equally good performance. Unbinned likelihood fits, however, offer the invaluable advantage of being independent of the choice of bin widths and bin positions and independent of the definition of a signal region. For signal and background contributions determined from a likelihood fit based on shapes only, it has been shown with a large number of toy experiments that the corresponding probability density distribution of the values of significance obtained from pure background and from signal-plus-background samples follow Gaussian distributions. A practical method was suggested to determine the expected significance S L2, in this case corresponding to the median of the distributions from the toy experiments. The suggested method to determine the expected significance of a future experiment is based on only two binned likelihood fits to the invariant mass distribution with a fine binning, as is obtained from weighted simulated events. An overview of the performance at the discovery limit, S 5, of the method S c1 and of the other two methods found favourable in this study, is shown for various levels of background in Table 2. Figures 3, 4 and 5 show the significance distributions obtained on pure background samples; S L2 and S c12 are well compatible with being interpreted as Gaussian significance, whereas S c1 largely underestimates the probability of background fluctuations. 14

15 Acknowledgements We wish to thank Bob Cousins and Louis Lyons for fruitful discussions and valuable input to this note. References [1] S.I. Bityukov, N.V. Krasnikov, On observability of signal over background, NIM A452: , 2000 [2] The LEP Collaborations ALEPH, DELPHI, L3 and OPAL, Search for the Standard Model Higgs Boson at LEP, Phys. Lett. B565 (2003) 61 [3] S. Brandt, Data Analysis, Springer Heidelberg US [4] S.S. Wilks, Ann. Math. Stat., 9 (1938) 60 [5] K. Hagiwara et al., Phys. Rev. D 66, (2002) [6] K. Cranmer, UWStat Tools: Documentation & User Manual, University of Wisconsin, Madison [7] ROOT, An Object Oriented Data Analysis Framework, User s Guide, available on the internet, 15

Detection of Z Gauge Bosons in the Di-muon Decay Mode in CMS

Detection of Z Gauge Bosons in the Di-muon Decay Mode in CM Robert Cousins, Jason Mumford and Viatcheslav Valuev University of California, Los Angeles for the CM collaboration Physics at LHC Vienna, -7