CMS Internal Note. The content of this note is intended for CMS internal use and distribution only

Size: px
Start display at page:

Download "CMS Internal Note. The content of this note is intended for CMS internal use and distribution only"

Transcription

1 Available on CMS information server CMS IN 2003/xxxx CMS Internal Note The content of this note is intended for CMS internal use and distribution only August 26, 2003 Expected signal observability at future experiments V. Bartsch, G. Quast Institut für Experimentelle Kernphysik, Universität Karlsruhe Abstract Several methods to quantify the significance of an expected signal at future experiments have been used or suggested in literature. In this note, comparisons are presented with a method based on the likelihood ratio of the background hypothesis and the signal-plus-background hypothesis. A large number of Monte Carlo experiments are performed to investigate the properties of the various methods and to check whether the probability of a background fluctuation having produced the claimed significance of the discovery is properly described. In addition, the best possible separation between the two hypotheses should be provided, in other words, the discovery potential of a future experiment be maximal. Finally, a practical method to apply a likelihood-based definition of the significance is suggested in this note. Signal and background contributions are determined from a likelihood fit based on shapes only, and the probability density distributions of the significance thus determined are found to be of a Gaussian shape even with small statistics.

2 1 Introduction There are several methods to quantify the probability of a new physics discovery in a future experiment. These methods can be coarsely classified as event counting methods and likelihood methods. Event counting methods rely on a pre-specified definition of a signal region to determine the observed numbers of signal and background events. Likelihood methods usually allow for free parameters, like to total number of signal and background events within a given range, and take into account the shape of the distributions of signal and background events. Instead of using the fitted numbers of signal and background events, likelihood methods offer the possibility to directly use the values of likelihood found at the best-fit point to determine the significance level of a signal. In the following, some event counting and likelihood methods are investigated in a large number of simulated experiments ( toy experiments ). A relatively simple and practical likelihood method is proposed to determine the expected significance of a discovery in a future experiment, which still holds in the case of small statistics. 2 Overview of methods The following subsections present an overview of some methods of estimating the significance, S, of an observed signal. In high-energy physics, significance is usually understood as the number of standard deviations an observed signal is above expected background fluctuations. Implicitly, it is understood that S follows a standard Gaussian distribution with a mean of zero and a standard deviation of one. In statistics literature, significance level is the probability to find a value of a test statistic beyond a certain pre-specified critical value. For non- Gaussian distributions, the significance level has to be converted into an equivalent number of truly Gaussian sigmas to arrive at the common terminology of a high-energy physicist. With this in mind, a given value of S ( = number of sigmas ) corresponds to the probability that the claimed signal is caused merely by fluctuations of the background, and this probability is obtained by performing the corresponding integrals of a standard Gaussian distribution. Since a signal is usually searched for in many bins of a distribution, and in many channels, a very high value of the significance must be used before an observed peak found somewhere in some distribution can be claimed to be an observation of a signal. An example is the case of a search for a signal with an expected mass resolution of 1 GeV in a mass range of 100 GeV. The probability of a two-sigma upward fluctuation of the background in a given two-gev wide region is only about 2 %, but a signal may appear in any one of 50 such regions, and therefore a pure background sample will, on average, result in finding one peak with a significance of two. The general agreement is that the value of S of a signal should exceed five. (The significance level, or the corresponding one-sided Gaussian probability, in other words the integral of the standard Gaussian distribution from five to infinity, is ) 2.1 Event Counting Counting methods use the number of signal events, N s, and the number of background events, N b, observed in some signal region to define the significance S. This procedure requires working with binned distributions, which in turn means that bin positions and bin widths have to be fixed; if these bin positions and widths are fixed after observation of an excess above the background somewhere, possibilities for subjective choices open up. Furthermore, the optimal choice of a signal region around the mean to maximise the sensitivity depends on the background level, which is another disadvantage of such methods. The definitions given below may be considered as statistical estimators of the significance of an observation from the events found in an experiment. Two variants are frequently used, another one, more robust against downward fluctuations of the background, was recently suggested [1]: S c1 = S c2 = N s Nb, (1) N s Ns + N b, (2) S c12 = 2 ( N s + N b N b ) [1]. (3) The formula for the third method is strictly only valid in the Gaussian limit of the Poisson distribution; for small statistics, the value of S c12 is tabulated. 2

3 2.2 Likelihood methods Likelihood methods are frequently used in data analysis to extract the number of signal and background events from a fit to one or more distributions which discriminate signal from background. One considers two different hypotheses, the null hypothesis assuming that the observed distribution is formed by background only, and an alternative hypothesis assuming the presence of signal and background. The ratio of likelihoods of the null hypothesis and an alternative hypothesis has also proven to provide a powerful test statistic in high energy physics applications (see, e.g. the searches for the Standard Model Higgs boson performed at LEP 2 [2].) In the most simple case, a binned or unbinned likelihood fit is used to determine the contributions from signal and background events in the distribution of the invariant mass of observed final states. The number of signal events is taken to follow a signal distribution, whereas the total background results from the parametrisation of the background distribution. These distributions are motivated by theoretical expectations, Monte Carlo simulation or may be determined from the data; the latter is often the case for background shapes. The observed distribution of invariant masses with known normalised signal and background distribution of f s and f b may be described as f(m; p s 1,...ps n, pb 1,..., pb m ) = N s f s (m; p s 1,...ps n ) + N b f b (m; p s 1,...ps n, pb 1,..., pb m ). Here, p s i and pb i are parameters describing the signal and background distributions, e.g. a Breit-Wigner folded with a Gaussian distribution for the signal, and a second-order polynomial for the background. In cases of small statistics, the shapes of these distributions are usually fixed, and only N s, p s 1 = m 0 (the position of the peak) and N b are free parameters in the fit. The background is determined from the signal-free regions, and therefore the statistical precision on the mean background level depends on the width of these effective side bands. In practice, a region as large as possible around the signal should be used in order to minimise the uncertainties arising from background fluctuations. The standard maximum likelihood method takes into account only the shape of the distribution. If the normalisation and its error are also of importance, the extended maximum likelihood method is usually applied, which has an extra factor in the likelihood function to account for Poisson fluctuations in the total number of events observed. Note that such likelihood methods are independent of any a-priori, or a-posteriori, specification of a signal region. Since the fit to the distribution of observed events properly takes into account background fluctuations, these contribute to the uncertainty on the number of signal events. The relative error on N s thus obtained measures the distance to zero observed events in units of the error, and is sometimes directly quoted as a significance, S L1 = N s N s. (4) As is shown below, this is questionable at large values of S L1, because for non-gaussian errors the five-sigma interval is not simply five times the one-sigma interval. When testing for the presence of a signal component in a distribution without precise information about its size, as discussed here, it may be more appropriate to use the relative error on the signal fraction, f s = N s /(N s + N b ), as an estimator of significance. Technically, this corresponds to performing a standard maximum-likelihood fit instead of applying the extended likelihood method. A second fit with N s fixed to zero allows comparison of the likelihood obtained for this null hypothesis, in other words absence of a signal, L B, with the likelihood from the full signal-plus-background fit, L S+B. The likelihood ratio Q = L S+B / L B is used as a test statistic to distinguish the two hypotheses. For large statistics, which justify to use the Gaussian distribution to describe the distributions of numbers of observed events in each bin of a (potentially multi-dimensional) histogram, lnq is identical to half of the difference in χ 2 of the signal-plusbackground and the background-only hypotheses. The definition of significance based on the outlined method is S L2 = 2 lnq. (5) It is worth noting that the two likelihood-based estimators for the significance become equal if the distribution of 2 lnq(n s ) is a parabola, in other words in the Gaussian limit of infinitely large statistics. Usually, likelihood functions are approximated well by a parabola close to the best-fit point, but the parabolic behaviour up to a distance of 5 σ requires really large event numbers. The relation between lnq and a fit for the number of signal events in a standard likelihood-based fit is illustrated below in the discussion of Fig. 9, which shows an example of the dependence of the likelihood on the assumed number of signal events. 3

4 The distribution of Q in a series of experiments, its probability density function ( pdf ), is of crucial importance for the calculation of discovery probabilities in the presence of a real signal, or of fake probabilities due to fluctuations of the background. Likelihood ratio tests have been studied extensively in literature and good textbooks on the topic exist (e.g. [3]). In the large-statistics limit, the logarithm of the likelihood ratio, multiplied by two, is expected to follow a χ 2 -distribution with a number of degrees of freedom given by the difference in the number of free parameters between the alternative hypothesis and the null hypothesis [4]. In some cases studied in literature the expected large-statistics distribution is exactly valid even with small statistics. When testing the presence of a signal on top of the background at a fixed peak position, 2 lnq = S L2 2 is expected to follow a χ 2 distribution with one degree of freedom. The distribution of S L2 is thus given by the positive half of a standard Gaussian distribution. This is the theoretical justification to call 2 lnq a significance in the sense stated above. 2.3 Remarks on likelihoods and event counting Applying the likelihood principle laid out above to one bin only, namely the signal region, allows a likelihood estimator to be used also for event counting methods. Assuming an observation of N obs events in the signal region with an expected number of signal and background event of N s and N b, respectively, and applying Poisson statistics leads to a likelihood ratio of ( Q = 1 + N ) Nobs s exp( N s ). (6) N b Setting the expectation value of N obs to N s + N b, a likelihood estimator based on event counting, S cl, might be defined. In the limit of large numbers N s and N b, the Poisson distribution can be replaced by a Gaussian, and S cl becomes equivalent to S c1. The performance of S cl is better than S c1, but much inferior to S c12 or S L2, and therefore is not further studied. 3 Monte Carlo Study of estimators for signal significance If statistics are sufficiently large, all of the above estimators show a Gaussian distribution. When searching for something new, however, one is very often dealing with a small-statistics problem. When preparing an experiment, it is important to compare different methods and approaches on equal footings to choose the best strategy to ensure the discovery of new phenomena. At the statistical limit, two important questions arise: What is the probability to actually observe an existing signal with a pre-specified significance with a given amount of data? Or, what is the integrated luminosity needed to reach that pre-specified significance with a certain probability? What is probability that data with the observed or higher significance would be obtained if there is only background and no signal? It is obvious that the performance of the estimators for the significance defined above are quite different, in the small-statistics regime. It is important to note that the significance observed in a single experiment is a random number - depending on fluctuations of the signal and the background. To answer the questions raised above, Monte Carlo studies with a large number of toy experiments have therefore to be performed to find the probability density function ( pdf ) of the various definitions of S given above. The pdf may turn out to be far from Gaussian, because the common rules of statistics and many familiar theorems are no longer valid in the realm of small numbers. The expected significance may be defined as the mean or, better, the median of the pdf thus determined. It has become common practice to quote the expected significance together with the actually observed one. If the signal cross sections and the background level are well modelled, the two values should be rather close to one another. Special care has to be taken when the background fluctuates downwards, in other words when the observed background level is smaller than expected. Although used frequently, method S c1 is an obvious candidate to drastically overestimate the significance in such cases. On the other hand, when there is no signal, but background events cluster around one value, it is again method S c1 which shows the very bad behaviour of leading to a false announcement of a discovery more often than other methods. A crucial quantity to look at is the so-called confidence level of the background-only hypothesis, CL b, which is defined as the integral over the pdf of S obtained for a pure background sample from minus infinity to a pre-specified critical value S crit, beyond which an observation would be considered as incompatible with the 4

5 background-only hypothesis. (1 CL b ), also called p-value in modern literature [5], describes the probability that the background mimics a signal with a value of S larger than S crit. Very often the determination of the pdf is not a trivial task, in particular if multi-dimensional distributions and many search channels are involved. Monte Carlo methods for its determination may therefore have to be complemented by other methods [6]. In many cases, the various definitions of S themselves are not truly significances in the sense stated in the introduction, but merely constitute a test statistic, and discovery and fake probabilities must be calculated using the pdf of S obtained from simulated samples of background and signal events. The toy experiments in this study were motivated by the problems in defining the signal significance for the search of a Higgs boson decaying to four muons. The signal is very clean and sits on top of a small background, but particularly at low masses the event rate is small. The production cross section and the background rates have large uncertainties, and the signal is best searched for by only using the shapes of signal and background distributions. The number of observed signal events is best determined by using an unbinned likelihood fit. The number of observed signal events needed in order to reach the magical bound of S 5, or the integrated luminosity required for a 50 % chance of a discovery in this channel, was found to depend strongly on the kind of estimator used - of course this strong dependence on the significance estimator is a consequence of the relatively small number of events. According to S c1, an observation of five signal events over a background of one event should be enough, while the significance obtained from S L1 for this case is almost a factor of two lower. The toy experiments were tailored for the study of the Higgs boson search in the H 4µ channel, but then generalised to other, less favourable background situations. To cope with the large uncertainties of the theoretical predictions for the signal and background cross sections, N s and N b were treated as free parameters. In fits to the signal-plus-background samples also the mass of he signal, m 0, was left free. The likelihood ratio Q was determined from the likelihood values at the best-fit points for the signal-plus background and the background-only hypothesis. Here, the procedure adopted is different from other searches, e.g. the search for the standard-model Higgs boson at LEP, with very precisely known production cross section and well-known background levels. What matters in this case is the difference in shape between signal and background - a peak with a fixed width and position on top of a background described by a straight line. The signal was taken to be of a Gaussian shape with a radiative tail, with a resolution (in other words the standard deviation of the Gaussian) of 1.3 GeV; the background was taken to be constant between 110 and 150 GeV. The background fraction was varied to range from 15 % of the signal (the typical case for the H 4µ channel), up to 150 % of the signal. The total number of events in each sample was allowed to fluctuate according to the Poisson distribution, and the Poisson mean was chosen such that the mean of the likelihood estimator reached a significance of S L2 =5. An example of one toy experiment at small statistics is shown in Fig. 1. The symbols with error bars each indicate a mass value drawn randomly according to the mass distribution of the weighted sum of signal and background events. The curve shows the result of an unbinned likelihood fit to these mass values, with the extended likelihood method. In this way, the numbers of signal and background events, N s and N b, respectively, are simultaneously determined. The distribution of various definitions of significance from such toy experiments is shown in Fig. 2. The number of signal and background events needed for the estimators of significance derived from event counting methods, S c1 and S c12, are determined in a region of ±1 1 2σ around the centre of the peak. The background level here is about 40 % of the signal, the mean number of signal events is 25, and the background is roughly two events per GeV, or seven events within the signal region defined above. The mean and r.m.s. of the various definitions of significance are shown in Table 1 below. Table 1: Summary of the mean and r.m.s. of the various definitions of significance for 25 signal events and 40% background level as discussed in the text. mean S r.m.s. S c S c S L It can be seen that S c1 gives much larger values than S c12 or S L2. Gaussian fits are overlaid in the figure, and the agreement is good for S c12 and S L2, while S c1 is only poorly described by the Gaussian fit. The mean of the S c12 or S L2 distributions is very close to five, both distributions are rather symmetric, and therefore the median is very close to the mean. Thus the distributions imply that a signal will be found with a value of S greater than five 5

6 -1 # after 40fb m H [GeV] Figure 1: Example of an unbinned likelihood fit to a number of data points shown as symbols. The data points demonstrate how the real data could look like after the first 40 fb 1 at LHC. The shaded histogram shows the distribution of signal plus background obtained from a large number of weighted simulated events. The fit to the background assumes a flat background. in 50 % of such experiments, or with 50 % probability in the one real experiment CMS will hopefully be able to perform after the LHC start-up. Beside the distributions of Fig. 2, the reliability of an estimator of significance has to be checked. In other words: how often would a false discovery be claimed for a certain choice of the minimum required significance S crit? Because of the very high values of the required significances the pdf must be properly modelled for a pure background sample even in the tails of the distribution. The same distributions as above are shown in Fig. 3, but this time for toy experiments with background only. The mass, i.e. the position of the mean of the Gaussian was fixed in the fits to correspond to the signal position of the previous example. The peak seen in bin zero of the distribution is of a technical origin, because the number of signal events was constrained to be greater or equal to zero in the fit 1). The overlaid fit, which neglects the first bin, is of Gaussian shape. As can be seen, up to the highest observed significance, S c12 and S L2 are amazingly well described by a Gaussian distribution. It should, however, be noted that even with one million experiments, the distribution is only tested up to 4.5 standard deviations; the probability to find any entry beyond 5 in one million experiments is expected to be only about 0.3 for a perfect Gaussian shape. The good agreement with the Gaussian shape suggests that it will probably hold to even larger values of S than could be tested with one million samples. S c1, on the contrary, is not compatible with a standard Gaussian with mean around zero and standard deviation of one, and delivers significance beyond a value of five far too often. This means that S c1 can not be quoted directly as a significance in the sense defined in the introduction for the particular background-to-signal ratio of this example. The Gaussian nature of the pdf for the estimators of significance S c12 and S L2, as obtained from the backgroundonly sample, makes the calculation of (1 CL b ) straightforward. More examples with lower or higher background levels are shown in Fig. 4 and Fig. 5. Fig. 4 shows a small statistics case, where S c1 is very non-gaussian, and too many background samples result in a value of S c1 beyond five. This situation is typical of the searches in the H 4µ channel. With more background events, and hence more signal events needed to reach an average signal significance of five, S c1 becomes more similar to S L2. The 1) It was verified that relaxing this condition, i.e. allowing negative number of signal events for large downward fluctuations of the background, results in the expected unmodified positive half of a standard Gaussian. 6

7 s/sqrt(b) s_1 Entries 9791 Mean RMS Constant ± Mean ± Sigma ± (sqrt(s+b)-sqrt(b)) s_12 Entries 9791 Mean RMS Constant ± Mean 4.8 ± Sigma ± sqrt(2lnq) significance s_l Entries 9791 Mean RMS 1.12 Constant ± Mean 5.03 ± Sigma ± Figure 2: Histograms of S c1, S c12 and S L2 determined from from successful fits to toy experiments, with N b, N s and m 0, the particle mass, as free parameters. The curves show the results of a Gaussian fit to the distributions. In addition to the Mean and RMS of the histograms the parameters of the fitted Gaussian, Constant, Mean and Sigma, are shown in the plots. The number of signal events is 25, the background level amounts 40%. 7

8 s/sqrt(b) s_1 2(sqrt(s+b)-sqrt(b)) s_12 Entries Entries Constant Constant Mean Mean Sigma Sigma sqrt(2lnq) significance s_l Entries Constant Mean Sigma Figure 3: Probability density functions determined from toy experiments for background only, in the absence of a real signal, with N s and N b as free parameters and m 0 fixed. Again, the estimators of significance S 1, S 12 and S L2 are shown. The fit parameters shown in each plot refer to the parameters of a Gaussian distribution. The distribution of S c1 cannot be described well by a Gaussian centred around zero, the standard deviation is far above one, and there are too many events observed at a significance beyond five, which should only occur with a probability of about , in other words 0.3 expected occurrences for a total of about one million samples. The estimators S c12 and S L2, on the contrary are well described by a Gaussian with mean around zero and a standard deviation close to one. 8

9 s/sqrt(b) sqrt(2lnq) Figure 4: Probability density functions of estimators of significance S c1 and S L2 for small statistics (11 signal events within ±1.5σ over a background of 1.5 events). Filled: pure background sample, open histogram: background plus signal. The background distributions are based on toy experiments, the distributions for signal plus background on toy experiments. The Gaussian fit to the distribution of S L2 for the background-only sample has a mean of and σ is 1.0. agreement between S c12 and S L2 is always very good 2), and the Gaussian nature of these distributions obtained from background only is confirmed in all cases. As slight modification of the procedure followed above, the position of the searched signal may be left as a free parameter, in other words a peak of the fixed shape is allowed to show up anywhere in the mass range considered. An example is shown in Fig. 6, where the peak is searched for in the whole mass window (from 115 GeV to 145 GeV). The centre of the background distribution is now shifted, because it is expected, on average, to find at least one bump somewhere in the mass window. According to theory, the large-statistics limit for the pdf of S 2 L2 is a χ 2 -distribution with two degrees of freedom 3). Such a procedure might open up the possibility for getting more realistic significance levels than is possible with the five-sigma postulate for background fluctuations at the observed peak position only. 4 Practical method to determine the expected significance Having shown above that the estimator of significance based on the likelihood ratio of the signal-plus-background and the background-only hypothesis, S L2 = 2 lnq, is very well described by a Gaussian shape and that the standard deviation of this Gaussian is one for a pure background sample in the absence of a signal, only one task remains to be done: to define a practical and simple method for estimating the expectation value of S L2 for a given amount of data. It is clear that the full procedure outlined above, in other words the generation of a large number of simulated background events, is not practical for each of the many phenomena searched for in many scenarios and over the wide energy range accessible at the LHC. It is worth noticing that the situation may not always be as simple as in the examples considered here. The background may be perfectly known from studies performed on real data, or rather precise theoretical constraints on the expected signal rate may exist, or systematic errors on the theoretical signal and background rates need to be included, or the signal exists in a multi-dimensional parameter space, and possibly many other scenarios. Such complications can lead to a non-gaussian shape of S L2, and high-statistics Monte Carlo experiments with detailed simulations of the event properties and very refined numerical tools are needed to determine the probability 2) S c12 is not shown in the figures, but is included in the comparison shown in Table 2 below. 3) There are deviations from this shape predominantly at small values of S L2, which possibly are a consequence of the adopted fitting procedure: the number of signal events is constrained to be 0, and the fit does not search for the largest peak in the whole region. 9

10 s/sqrt(b) sqrt(2lnq) Figure 5: Probability density functions for relatively large statistics for significance S c1 and S L2. Filled: pure background sample, open histogram: background plus signal, for large statistics ( 27 signal events within ±1.5σ over a background of 21 events). The background distributions are based on toy experiments, the distributions for signal plus background on toy experiments. The Gaussian fit to the distribution of S L2 for the background-only sample has a mean of and σ is s/sqrt(b) s_1 Entries Constant 7194 Mean Sigma sqrt(2lnq) significance s_l Entries Constant 8769 Mean Sigma Figure 6: Probability density functions determined from toy experiments for background only, in the absence of a real signal and with peak position as a free parameter. The mean of the distributions is now shifted, because it is expected to find some statistical cumulation somewhere. Again, fits of a Gaussian distribution are overlaid for illustration. 10

11 -1 # after 20fb m H [GeV] Figure 7: Histogram of the reconstructed invariant mass of four muons, for an integrated luminosity of 20 fb 1. The histogram is the weighted sum of signal events from H 4µ (shown in red) and of various background samples (blue). The straight line represents a fit line for the background-only hypothesis, in contrast to Fig. 1 here a slope of the background is parametrised, the curve is obtained under the assumption of the signal plus-backgroundhypothesis. The number of signal events is 5.2 ± 2.8. The difference in negative log-likelihood corresponding to the curve and the line is 5.1, corresponding to a signal significance of S L2 = 3.2. CL b with which background can mimic a signal [6]. We think, however, that it must be possible to restrict such complete studies to only a few cases for each class of problems, and then use a simpler method to obtain the expected significance in different search channels and at different signal positions. This is demonstrated below for the example with large uncertainties on the signal and background rates and the detection of a signal depending on a comparison of the signal and background shapes. The pdf of the significance definition based on likelihood ratio, S L2, was found to be of a Gaussian shape, both for pure background and for signal-plus-background samples, over a wide range of background-to-signal ratio. On a pure background sample, S L2 directly provides the correct significance levels. Given these facts, what needs to be done is to estimate the median of the pdf of S L2, which is equal to the expectation value for this case of a symmetric distribution. Therefore in this note a new approach is presented which evaluates the whole downweighted Monte Carlo statistics. Although there is no strict mathematical proof yet for this approach, evidence for its correctness from Monte Carlo is given below. In the previous section, the mean value of S was determined by performing many toy experiments using the available simulated events in quantities corresponding to one real experiment, and then taking the average of the individual fit results. The mean value can also be obtained by performing only one set of likelihood fits to the weighted sums of the signal and background events from Monte Carlo simulation. Instead of averaging fit results, we propose here to apply the corresponding fits to the average distributions. This is illustrated in Fig. 7, which shows invariant mass distribution for the H 4µ channel over a rather small background. The two curves show the results of two binned log-likelihood fits, the first one with a function describing the signal-plus-background shape and the straight line which assumes that there is only background. The bin size of the sample of weighted simulated events is small compared to the bin size that would be usable for the small number of expected signal events, and therefore the performed binned fit is a good approximation to the unbinned fits on the small samples of expected real events. In the limit of zero bin size binned and unbinned likelihood fits become equivalent. The distribution of S L2 obtained from a series of toy experiments using independent fractions of the available Monte Carlo sample, such that each of them simulates the possible outcome of the real experiment, is shown in Fig. 8. The mean value of this distribution, S L2 = 3.4 with an r.m.s. of 1.0, is equal to the difference in the values of the negative logarithm of the likelihoods, or the logarithm of the likelihood ratio, found in the two fits to the histograms in Fig. 7, which is S L2 = 3.2. Here, one should note that the expectation values of other quantities may 11

12 # Gedanken-experiments sqrt(2lnq) Figure 8: Distribution of 2 lnq from unbinned log-likelihood fits from 1000 simulated experiments, with simulated events which entered into the histogram of Fig. 7. The mean value of this distribution is 3.4 and agrees well with the value found above from the histogram fit. also be determined from these histograms, as is common practice. In particular, N s and N b and their uncertainties can be obtained from fits to the histograms, allowing other estimators of significance based on event counting to be calculated. What was just illustrated is the proposed method to estimate the expectation value of the significance S L2. The distributions of signal and background events obtained from the full samples of simulated events, are weighted to represent a given integrated luminosity and then added. A binned log-likelihood fit assuming Poisson statistics in each bin is a good approximation of an unbinned likelihood fit to be performed later in the (low-statistics) real experiment. The values of the logarithms of the likelihoods observed in two fits, one assuming there is background only and the other one taking into account also the signal shape, are then used to calculate 4) 2 lnq = 2 (lnls+b lnl B ). (7) This is the desired expectation value of the significance for the future experiment. The agreement between the mean of pdfs from the toy experiments and the value obtained from the histogram fits was verified for the various background situations already investigated in the previous section, and is quantified in Table 2 at the end of this section. Given that some kind of fit to determine the number of signal events above the background is usually performed anyway, the method suggested here to determine the significance according to S L2 does not imply a large extra effort. If the distributions of signal and background events are also available in tabulated form, such a procedure is easy to apply to the results of any given analysis. For better understanding of the proposed method it is illustrative to investigate the behaviour of the likelihood between the two extremes of the signal-plus-background and the background-only hypotheses. A smooth change from one to the other can be obtained by varying the number of signal events and determining the minimum of the negative log-likelihood with respect to the parameters describing the background-only hypothesis for each value of N s, as is shown in Fig. 9. Such a global log-likelihood curve is very useful to determine the n-σ error intervals of a fit parameter 5). Near the minimum, the error on the number of signal events can be read off the curve at the point where the negative log-likelihood increases by 1 2, or, more general, the value where n standard deviations are 4) Note that e.g. in the ROOT package, the factor of two under the square-root is already defined into the likelihood-function! 5) The same procedure is used by MINOS in the fitting package MINUIT and in the two-dimensional case for the calculation of contour lines. 12

13 reached corresponds to ( lnl) = n 2 /2. The number of signal events normalised to its error provided the loglikelihood estimator S L1 defined in the introduction; with this definition of significance one only takes into account the likelihood around the minimum up to ( lnl) = 1/2 and measures the distance to zero signal events in units of this one-sigma error. The significance S L2, on the other hand, takes the difference of the negative log-likelihood at zero signal events and at the minimum, ln Q = ( lnl(n s = 0)) ( lnl min ). In the example shown here, the value at N s = 0 is 5.1 above the minimum, so that n 2 /2 = 5.1 and hence n = 3.2, i.e. S L2 = 3.2, as already stated above. Note the asymmetry of the curve with respect to the minimum; the rise towards small values is much larger. This asymmetry is the reason why an estimate based on the likelihood values at the minimum and at zero, which goes into S L2, is different from the estimator S L1 which only uses values of the likelihood curve close to the minimum. This demonstrates that the three-sigma interval is not necessarily equal to three times the one-sigma interval! -lnl 18 -lnl lnq σ +1σ N S Figure 9: Dependence of lnl on N s for the histogram fit shown in Fig. 7. At each point on the curve, N s is fixed, and a new fit is performed to find the minimum w.r.t. all other parameters. From this curve, the n-σ error bands for N s can be determined from the increase relative to the minimum value, as shown for the (in this case asymmetric) ±1σ error interval. The value of the likelihood at the best-fit point corresponds to lnl S+B, the likelihood value at zero signal events corresponds to lnl B, the difference thus is ln Q. There is a slight technical complication in the proposed procedure. The most frequently used fitting tools, PAW or ROOT [7], assume integer bin contents in binned likelihood fits, as would be the case for distributions obtained in real experiments - the number of events observed is always an integer. Here, however, we are dealing with the number of events expected in a future experiment, obtained from sums of weighted simulated events, and therefore the bin contents takes non-integer value. It is even a small non-integer number if the bin size is small, because the latter is adjusted to match the available number of simulated events. Therefore, the conversion to an integer poses a problem. Here, we propose a generalisation of the usual Poisson distribution by interpolating the factorial n! by the Γ function, n! = Γ(n + 1). This modified distribution interpolates the Poission distribution quite nicely, but needs further checking for very small numbers. Now, n may be a non-integer bin entry, and the Poisson likelihood is defined for each bin. Such a modified fit function has been implemented as an addition to the ROOT framework, making its use technically quite easy. As a test, the number of events expected in each bin can be multiplied by a large number N, this could be for example N = 100 or , allowing an unmodified Poisson likelihood to be used. The scaling of all errors from such fits was found to follow the N-law, while all central values of the fit parameters remained the same. 13

14 Estimators of significance other than S L2 may also be determined from the histogram fit. A comparison of the histogram method with the results from the toy experiments for S c1, S c12 and S L2 is shown in Table 2 for different event statistics and background levels. The event numbers were chosen such that the likelihood estimator of significance, S L2, approaches a value of five. There is impressive agreement between the mean values from the toy experiments and the expected significance from the histogram fit. While this can be considered as a Monte Carlo proof of the procedure, a formal analytical proof still has to be done. The values of S c12 and S L2 agree very well with each other, although S c12 is always slightly smaller than S L2 ; this is understandable, because the counting methods use only a part of the available statistics within the signal bin, while the likelihood uses the full information available. The significance S c1 only becomes comparable to the other two for large statistics (40 signal events over a background of 55 events in the signal region, corresponding to the last line of the table.); at small statistics (in other words a very clean signal near the discovery threshold ) this simple method largely overestimates the significance. Table 2: Comparison of methods to determine estimators of significance. Given are the total number of events, N tot, in the mass range from 110 to 150 GeV (in other words a range about 30 times larger than the assumed mass resolution), the number of signal events, N s, the mean and r.m.s. of the distributions of significance from the toy experiments, and the expected significance from the histogram fit, for S c1, S c12 and S L2, respectively. The signal region for the event counting methods was taken to be ±1.5 times the resolution around the mean peak position. event sample S c1 S c12 S L2 N tot N s mean rms hist mean rms hist mean rms hist Conclusion Several estimators for the significance, S, of a discovery at a future experiment have been studied in this note. The significance S is understood here in the sense of a Gaussian significance, which expresses the probability of a background fluctuation having produced the claimed signal in terms of the numbers of standard deviations of a Gaussian distribution. A large number of Monte Carlo experiments were performed to study the reliability and performance of such estimators. Methods based on counting signal and background events in a certain signal region were contrasted to likelihoodbased methods exploiting the full shape of the invariant mass distribution. For sufficiently large statistics, all of the estimators are applicable, but in the case of low backgrounds the most simple one, S c1 = N s / N b, leads to a large overestimation of the significance, and worse, often leads to high values of significance in cases where there is background only. A method based on the likelihood ratio of the signal-plus-background and the backgroundonly hypothesis, named S L2, gives the best performance. In addition, the distribution of significance for a pure background sample shows a Gaussian behaviour, in other words the significance from this method can be directly translated into a probability that the background may have mimicked the claimed effect. Among the event counting methods, S c12 = 2 ( N s + N b N b ) reaches an almost equally good performance. Unbinned likelihood fits, however, offer the invaluable advantage of being independent of the choice of bin widths and bin positions and independent of the definition of a signal region. For signal and background contributions determined from a likelihood fit based on shapes only, it has been shown with a large number of toy experiments that the corresponding probability density distribution of the values of significance obtained from pure background and from signal-plus-background samples follow Gaussian distributions. A practical method was suggested to determine the expected significance S L2, in this case corresponding to the median of the distributions from the toy experiments. The suggested method to determine the expected significance of a future experiment is based on only two binned likelihood fits to the invariant mass distribution with a fine binning, as is obtained from weighted simulated events. An overview of the performance at the discovery limit, S 5, of the method S c1 and of the other two methods found favourable in this study, is shown for various levels of background in Table 2. Figures 3, 4 and 5 show the significance distributions obtained on pure background samples; S L2 and S c12 are well compatible with being interpreted as Gaussian significance, whereas S c1 largely underestimates the probability of background fluctuations. 14

15 Acknowledgements We wish to thank Bob Cousins and Louis Lyons for fruitful discussions and valuable input to this note. References [1] S.I. Bityukov, N.V. Krasnikov, On observability of signal over background, NIM A452: , 2000 [2] The LEP Collaborations ALEPH, DELPHI, L3 and OPAL, Search for the Standard Model Higgs Boson at LEP, Phys. Lett. B565 (2003) 61 [3] S. Brandt, Data Analysis, Springer Heidelberg US [4] S.S. Wilks, Ann. Math. Stat., 9 (1938) 60 [5] K. Hagiwara et al., Phys. Rev. D 66, (2002) [6] K. Cranmer, UWStat Tools: Documentation & User Manual, University of Wisconsin, Madison [7] ROOT, An Object Oriented Data Analysis Framework, User s Guide, available on the internet, 15

Detection of Z Gauge Bosons in the Di-muon Decay Mode in CMS

Detection of Z Gauge Bosons in the Di-muon Decay Mode in CMS Detection of Z Gauge Bosons in the Di-muon Decay Mode in CM Robert Cousins, Jason Mumford and Viatcheslav Valuev University of California, Los Angeles for the CM collaboration Physics at LHC Vienna, -7

More information

Discovery significance with statistical uncertainty in the background estimate

Discovery significance with statistical uncertainty in the background estimate Glen Cowan, Eilam Gross ATLAS Statistics Forum 8 May, 2008 Discovery significance with statistical uncertainty in the background estimate Introduction In a search for a new type of event, data samples

More information

Statistics for the LHC Lecture 2: Discovery

Statistics for the LHC Lecture 2: Discovery Statistics for the LHC Lecture 2: Discovery Academic Training Lectures CERN, 14 17 June, 2010 indico.cern.ch/conferencedisplay.py?confid=77830 Glen Cowan Physics Department Royal Holloway, University of

More information

Confidence Limits and Intervals 3: Various other topics. Roger Barlow SLUO Lectures on Statistics August 2006

Confidence Limits and Intervals 3: Various other topics. Roger Barlow SLUO Lectures on Statistics August 2006 Confidence Limits and Intervals 3: Various other topics Roger Barlow SLUO Lectures on Statistics August 2006 Contents 1.Likelihood and lnl 2.Multidimensional confidence regions 3.Systematic errors: various

More information

Statistics Challenges in High Energy Physics Search Experiments

Statistics Challenges in High Energy Physics Search Experiments Statistics Challenges in High Energy Physics Search Experiments The Weizmann Institute of Science, Rehovot, Israel E-mail: eilam.gross@weizmann.ac.il Ofer Vitells The Weizmann Institute of Science, Rehovot,

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

Statistical Methods for Particle Physics Lecture 3: Systematics, nuisance parameters

Statistical Methods for Particle Physics Lecture 3: Systematics, nuisance parameters Statistical Methods for Particle Physics Lecture 3: Systematics, nuisance parameters http://benasque.org/2018tae/cgi-bin/talks/allprint.pl TAE 2018 Centro de ciencias Pedro Pascual Benasque, Spain 3-15

More information

Statistical Methods in Particle Physics

Statistical Methods in Particle Physics Statistical Methods in Particle Physics Lecture 11 January 7, 2013 Silvia Masciocchi, GSI Darmstadt s.masciocchi@gsi.de Winter Semester 2012 / 13 Outline How to communicate the statistical uncertainty

More information

Use of the likelihood principle in physics. Statistics II

Use of the likelihood principle in physics. Statistics II Use of the likelihood principle in physics Statistics II 1 2 3 + Bayesians vs Frequentists 4 Why ML does work? hypothesis observation 5 6 7 8 9 10 11 ) 12 13 14 15 16 Fit of Histograms corresponds This

More information

Systematic uncertainties in statistical data analysis for particle physics. DESY Seminar Hamburg, 31 March, 2009

Systematic uncertainties in statistical data analysis for particle physics. DESY Seminar Hamburg, 31 March, 2009 Systematic uncertainties in statistical data analysis for particle physics DESY Seminar Hamburg, 31 March, 2009 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan

More information

Modern Methods of Data Analysis - WS 07/08

Modern Methods of Data Analysis - WS 07/08 Modern Methods of Data Analysis Lecture VII (26.11.07) Contents: Maximum Likelihood (II) Exercise: Quality of Estimators Assume hight of students is Gaussian distributed. You measure the size of N students.

More information

Statistical Methods in Particle Physics Lecture 2: Limits and Discovery

Statistical Methods in Particle Physics Lecture 2: Limits and Discovery Statistical Methods in Particle Physics Lecture 2: Limits and Discovery SUSSP65 St Andrews 16 29 August 2009 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan

More information

Introduction to the Terascale: DESY 2015

Introduction to the Terascale: DESY 2015 Introduction to the Terascale: DESY 2015 Analysis walk-through exercises EXERCISES Ivo van Vulpen Exercise 0: Root and Fitting basics We start with a simple exercise for those who are new to Root. If you

More information

Statistical Methods for Particle Physics Lecture 4: discovery, exclusion limits

Statistical Methods for Particle Physics Lecture 4: discovery, exclusion limits Statistical Methods for Particle Physics Lecture 4: discovery, exclusion limits www.pp.rhul.ac.uk/~cowan/stat_aachen.html Graduierten-Kolleg RWTH Aachen 10-14 February 2014 Glen Cowan Physics Department

More information

RooStatsCms: a tool for analyses modelling, combination and statistical studies

RooStatsCms: a tool for analyses modelling, combination and statistical studies RooStatsCms: a tool for analyses modelling, combination and statistical studies D. Piparo, G. Schott, G. Quast Institut für f Experimentelle Kernphysik Universität Karlsruhe Outline The need for a tool

More information

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Lecture 5 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

A Calculator for Confidence Intervals

A Calculator for Confidence Intervals A Calculator for Confidence Intervals Roger Barlow Department of Physics Manchester University England Abstract A calculator program has been written to give confidence intervals on branching ratios for

More information

Higgs boson searches at LEP II

Higgs boson searches at LEP II PROCEEDINGS Higgs boson searches at LEP II Department of Physics and Astronomy University of Glasgow Kelvin Building University Avenue Glasgow G12 8QQ UK E-mail: D.H.Smith@cern.ch Abstract: A brief overview

More information

Statistics and Data Analysis

Statistics and Data Analysis Statistics and Data Analysis The Crash Course Physics 226, Fall 2013 "There are three kinds of lies: lies, damned lies, and statistics. Mark Twain, allegedly after Benjamin Disraeli Statistics and Data

More information

Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests

Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests http://benasque.org/2018tae/cgi-bin/talks/allprint.pl TAE 2018 Benasque, Spain 3-15 Sept 2018 Glen Cowan Physics

More information

Statistical Tools in Collider Experiments. Multivariate analysis in high energy physics

Statistical Tools in Collider Experiments. Multivariate analysis in high energy physics Statistical Tools in Collider Experiments Multivariate analysis in high energy physics Lecture 5 Pauli Lectures - 10/02/2012 Nicolas Chanon - ETH Zürich 1 Outline 1.Introduction 2.Multivariate methods

More information

Journeys of an Accidental Statistician

Journeys of an Accidental Statistician Journeys of an Accidental Statistician A partially anecdotal account of A Unified Approach to the Classical Statistical Analysis of Small Signals, GJF and Robert D. Cousins, Phys. Rev. D 57, 3873 (1998)

More information

Discovery Potential for the Standard Model Higgs at ATLAS

Discovery Potential for the Standard Model Higgs at ATLAS IL NUOVO CIMENTO Vol.?, N.?? Discovery Potential for the Standard Model Higgs at Glen Cowan (on behalf of the Collaboration) Physics Department, Royal Holloway, University of London, Egham, Surrey TW EX,

More information

Hypothesis testing. Chapter Formulating a hypothesis. 7.2 Testing if the hypothesis agrees with data

Hypothesis testing. Chapter Formulating a hypothesis. 7.2 Testing if the hypothesis agrees with data Chapter 7 Hypothesis testing 7.1 Formulating a hypothesis Up until now we have discussed how to define a measurement in terms of a central value, uncertainties, and units, as well as how to extend these

More information

Statistics for Particle Physics. Kyle Cranmer. New York University. Kyle Cranmer (NYU) CERN Academic Training, Feb 2-5, 2009

Statistics for Particle Physics. Kyle Cranmer. New York University. Kyle Cranmer (NYU) CERN Academic Training, Feb 2-5, 2009 Statistics for Particle Physics Kyle Cranmer New York University 91 Remaining Lectures Lecture 3:! Compound hypotheses, nuisance parameters, & similar tests! The Neyman-Construction (illustrated)! Inverted

More information

Search for the Standard Model Higgs boson at LEP

Search for the Standard Model Higgs boson at LEP Physics Letters B 565 (2003) 61 75 www.elsevier.com/locate/npe Search for the Standard Model Higgs boson at LEP ALEPH Collaboration 1 DELPHI Collaboration 2 L3 Collaboration 3 OPAL Collaboration 4 The

More information

MODIFIED FREQUENTIST ANALYSIS OF SEARCH RESULTS (THE CL s METHOD)

MODIFIED FREQUENTIST ANALYSIS OF SEARCH RESULTS (THE CL s METHOD) MODIFIED FREQUENTIST ANALYSIS OF SEARCH RESULTS (THE CL s METHOD) A. L. Read University of Oslo, Department of Physics, P.O. Box 148, Blindern, 316 Oslo 3, Norway Abstract The statistical analysis of direct

More information

Statistics for Data Analysis. Niklaus Berger. PSI Practical Course Physics Institute, University of Heidelberg

Statistics for Data Analysis. Niklaus Berger. PSI Practical Course Physics Institute, University of Heidelberg Statistics for Data Analysis PSI Practical Course 2014 Niklaus Berger Physics Institute, University of Heidelberg Overview You are going to perform a data analysis: Compare measured distributions to theoretical

More information

Discovery potential of the SM Higgs with ATLAS

Discovery potential of the SM Higgs with ATLAS Discovery potential of the SM Higgs with P. Fleischmann On behalf of the Collaboration st October Abstract The discovery potential of the Standard Model Higgs boson with the experiment at the Large Hadron

More information

Recent developments in statistical methods for particle physics

Recent developments in statistical methods for particle physics Recent developments in statistical methods for particle physics Particle Physics Seminar Warwick, 17 February 2011 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk

More information

The profile likelihood ratio and the look elsewhere effect in high energy physics

The profile likelihood ratio and the look elsewhere effect in high energy physics The profile likelihood ratio and the look elsewhere effect in high energy physics Gioacchino Ranucci Istituto Nazionale di Fisica Nucleare Via Celoria 6-33 Milano Italy Phone: +39--53736 Fax: +39--53767

More information

Some Statistical Tools for Particle Physics

Some Statistical Tools for Particle Physics Some Statistical Tools for Particle Physics Particle Physics Colloquium MPI für Physik u. Astrophysik Munich, 10 May, 2016 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk

More information

How to find a Higgs boson. Jonathan Hays QMUL 12 th October 2012

How to find a Higgs boson. Jonathan Hays QMUL 12 th October 2012 How to find a Higgs boson Jonathan Hays QMUL 12 th October 2012 Outline Introducing the scalar boson Experimental overview Where and how to search Higgs properties Prospects and summary 12/10/2012 2 The

More information

Detection of Z Gauge Bosons in the Di-muon Decay Mode

Detection of Z Gauge Bosons in the Di-muon Decay Mode Detection of Z Gauge Bosons in the Di-muon Decay Mode Robert Cousins, Jason Mumford and Slava Valuev University of California, Los Angeles CMS Physics Meeting June, CMS Physics Meeting June, Introduction

More information

Final status of L3 Standard Model Higgs searches and latest results from LEP wide combinations. Andre Holzner ETHZ / L3

Final status of L3 Standard Model Higgs searches and latest results from LEP wide combinations. Andre Holzner ETHZ / L3 Final status of L3 Standard Model Higgs searches and latest results from LEP wide combinations Andre Holzner ETHZ / L3 Outline Higgs mechanism Signal processes L3 Detector Experimental signatures Data

More information

Combined Higgs Results

Combined Higgs Results Chapter 2 Combined Higgs Results This chapter presents the combined ATLAS search for the Standard Model Higgs boson. The analysis has been performed using 4.7 fb of s = 7 TeV data collected in 2, and 5.8

More information

Asymptotic formulae for likelihood-based tests of new physics

Asymptotic formulae for likelihood-based tests of new physics Eur. Phys. J. C (2011) 71: 1554 DOI 10.1140/epjc/s10052-011-1554-0 Special Article - Tools for Experiment and Theory Asymptotic formulae for likelihood-based tests of new physics Glen Cowan 1, Kyle Cranmer

More information

Recall the Basics of Hypothesis Testing

Recall the Basics of Hypothesis Testing Recall the Basics of Hypothesis Testing The level of significance α, (size of test) is defined as the probability of X falling in w (rejecting H 0 ) when H 0 is true: P(X w H 0 ) = α. H 0 TRUE H 1 TRUE

More information

Observation of a New Particle with a Mass of 125 GeV

Observation of a New Particle with a Mass of 125 GeV Observation of a New Particle with a Mass of 125 GeV CMS Experiment, CERN 4 July 2012 Summary In a joint seminar today at CERN and the ICHEP 2012 conference[1] in Melbourne, researchers of the Compact

More information

Higgs and Z τ + τ in CMS

Higgs and Z τ + τ in CMS Higgs and Z τ + τ in CMS Christian Veelken for the CMS Collaboration Moriond EWK Conference, March 14 th 2011 Z τ + τ - Production @ 7 TeV τ + Z τ - CMS Measurement of Z/γ* l + l -, l = e/µ: σ BR(Z/γ*

More information

Statistical Methods for Discovery and Limits in HEP Experiments Day 3: Exclusion Limits

Statistical Methods for Discovery and Limits in HEP Experiments Day 3: Exclusion Limits Statistical Methods for Discovery and Limits in HEP Experiments Day 3: Exclusion Limits www.pp.rhul.ac.uk/~cowan/stat_freiburg.html Vorlesungen des GK Physik an Hadron-Beschleunigern, Freiburg, 27-29 June,

More information

Why I never believed the Tevatron Higgs sensitivity claims for Run 2ab Michael Dittmar

Why I never believed the Tevatron Higgs sensitivity claims for Run 2ab Michael Dittmar Why I never believed the Tevatron Higgs sensitivity claims for Run 2ab Michael Dittmar 18.03.09 Why I never believed the Tevatron Higgs sensitivity claims for Run 2ab Michael Dittmar 18.03.09 How to judge

More information

Investigation of Possible Biases in Tau Neutrino Mass Limits

Investigation of Possible Biases in Tau Neutrino Mass Limits Investigation of Possible Biases in Tau Neutrino Mass Limits Kyle Armour Departments of Physics and Mathematics, University of California, San Diego, La Jolla, CA 92093 (Dated: August 8, 2003) We study

More information

Likelihood Statistics

Likelihood Statistics Likelihood Statistics Anja Vest IEKP, Uni Karlsruhe October 25 Event weights Likelihood plots Confidence levels TLimit root class Likelihood Ratio Experimental result = configuration of events that agrees

More information

Statistics. Lent Term 2015 Prof. Mark Thomson. 2: The Gaussian Limit

Statistics. Lent Term 2015 Prof. Mark Thomson. 2: The Gaussian Limit Statistics Lent Term 2015 Prof. Mark Thomson Lecture 2 : The Gaussian Limit Prof. M.A. Thomson Lent Term 2015 29 Lecture Lecture Lecture Lecture 1: Back to basics Introduction, Probability distribution

More information

Search for Higgs Bosons at LEP. Haijun Yang University of Michigan, Ann Arbor

Search for Higgs Bosons at LEP. Haijun Yang University of Michigan, Ann Arbor Search for Higgs Bosons at LEP Haijun Yang University of Michigan, Ann Arbor L3 On behalf of the L3 Collaboration American Physical Society Meeting(APS03), Philadelphia April 5-8, 2003 OUTLINE Introduction

More information

Modeling the SM Higgs Boson Mass Signal

Modeling the SM Higgs Boson Mass Signal Modeling the SM Higgs Boson Mass Signal Ryan Killick 2012 CERN Summer Student Report Abstract A physically motivated model has been developed in order to characterize and extract a mass measurement for

More information

Introductory Statistics Course Part II

Introductory Statistics Course Part II Introductory Statistics Course Part II https://indico.cern.ch/event/735431/ PHYSTAT ν CERN 22-25 January 2019 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan

More information

W mass measurement in the ATLAS experiment

W mass measurement in the ATLAS experiment W mass measurement in the ATLAS experiment DSM/Irfu/SPP, CEA/Saclay 99 Gif sur Yvette Cedex France E-mail: Nathalie.Besson@cern.ch A precise measurement of the mass of the W boson will be essential to

More information

E. Santovetti lesson 4 Maximum likelihood Interval estimation

E. Santovetti lesson 4 Maximum likelihood Interval estimation E. Santovetti lesson 4 Maximum likelihood Interval estimation 1 Extended Maximum Likelihood Sometimes the number of total events measurements of the experiment n is not fixed, but, for example, is a Poisson

More information

Hypothesis testing (cont d)

Hypothesis testing (cont d) Hypothesis testing (cont d) Ulrich Heintz Brown University 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 1 Hypothesis testing Is our hypothesis about the fundamental physics correct? We will not be able

More information

Events with High P T Leptons and Missing P T and Anomalous Top at HERA

Events with High P T Leptons and Missing P T and Anomalous Top at HERA with High Leptons and Missing and Anomalous Top at HERA David South (DESY) On Behalf of the H Collaboration IIth International Workshop on Deep Inelastic Scattering (DIS 24) Štrbské Pleso, High Tatras,

More information

Introduction. The Standard Model

Introduction. The Standard Model Ph.D. Thesis in Engineering Physics Supervisor: Assoc. Prof. Dr. Ayda BEDDALL Co-supervisor: Assist. Prof. Dr. Andrew BEDDALL By Ahmet BNGÜL Page 1 Introduction Chapter 1-2 High energy collisions of sub-atomic

More information

Overview of the Higgs boson property studies at the LHC

Overview of the Higgs boson property studies at the LHC Overview of the Higgs boson property studies at the LHC Giada Mancini, a, Roberto Covarelli b a LNF-INFN and University of Roma Tor Vergata, b INFN and University of Torino E-mail: giada.mancini@lnf.infn.it,

More information

PoS(ICHEP2012)238. Search for B 0 s µ + µ and other exclusive B decays with the ATLAS detector. Paolo Iengo

PoS(ICHEP2012)238. Search for B 0 s µ + µ and other exclusive B decays with the ATLAS detector. Paolo Iengo Search for B s µ + µ and other exclusive B decays with the ATLAS detector. On behalf of the ATLAS Collaboration INFN Naples, Italy E-mail: paolo.iengo@cern.ch The ATLAS experiment, collecting data in pp

More information

arxiv: v3 [physics.data-an] 24 Jun 2013

arxiv: v3 [physics.data-an] 24 Jun 2013 arxiv:07.727v3 [physics.data-an] 24 Jun 203 Asymptotic formulae for likelihood-based tests of new physics Glen Cowan, Kyle Cranmer 2, Eilam Gross 3, Ofer Vitells 3 Physics Department, Royal Holloway, University

More information

Stephen R. Armstrong CERN EP Division CH-1211 Geneva 23, SWITZERLAND

Stephen R. Armstrong CERN EP Division CH-1211 Geneva 23, SWITZERLAND New Results on B 0 s Mixing from LEP Stephen R. Armstrong CERN EP Division CH-1211 Geneva 23, SWITZERLAND Contribution to Flavour Physics and CP Violation 16-18 May 2002 Philadelphia, Pennsylvania, USA

More information

Two Early Exotic searches with dijet events at ATLAS

Two Early Exotic searches with dijet events at ATLAS ATL-PHYS-PROC-2011-022 01/06/2011 Two Early Exotic searches with dijet events at ATLAS University of Toronto, Department of Physics E-mail: rrezvani@physics.utoronto.ca This document summarises two exotic

More information

Objective Experiments Glossary of Statistical Terms

Objective Experiments Glossary of Statistical Terms Objective Experiments Glossary of Statistical Terms This glossary is intended to provide friendly definitions for terms used commonly in engineering and science. It is not intended to be absolutely precise.

More information

Charm Baryon Studies at BABAR

Charm Baryon Studies at BABAR W.Mader@Physik.TU-Dresden.de Institut für Kern- und Teilchenphysik Technische Universität Dresden Institutsseminar IKTP 15. Juni 006 Outline 1 Introduction The BABAR Detector and PEP-II 3 Reconstruction

More information

tth searches at ATLAS and CMS Thomas CALVET for the ATLAS and CMS collaborations Stony Brook University Apr 11 th, 2018

tth searches at ATLAS and CMS Thomas CALVET for the ATLAS and CMS collaborations Stony Brook University Apr 11 th, 2018 tth searches at ATLAS and CMS Thomas CALVET for the ATLAS and CMS collaborations Stony Brook University SM@LHC2018 Apr 11 th, 2018 Cross-section (pb) The Higgs Top Sector Higgs boson discovery in 2012

More information

Primer on statistics:

Primer on statistics: Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood

More information

Single top quark production at CDF

Single top quark production at CDF Available online at www.sciencedirect.com Nuclear and Particle Physics Proceedings 73 75 (16) 74 79 www.elsevier.com/locate/nppp Single top quark production at CDF Sandra Leone, on behalf of the CDF Collaboration

More information

arxiv: v1 [astro-ph.he] 7 Mar 2018

arxiv: v1 [astro-ph.he] 7 Mar 2018 Extracting a less model dependent cosmic ray composition from X max distributions Simon Blaess, Jose A. Bellido, and Bruce R. Dawson Department of Physics, University of Adelaide, Adelaide, Australia arxiv:83.v

More information

Fitting a Straight Line to Data

Fitting a Straight Line to Data Fitting a Straight Line to Data Thanks for your patience. Finally we ll take a shot at real data! The data set in question is baryonic Tully-Fisher data from http://astroweb.cwru.edu/sparc/btfr Lelli2016a.mrt,

More information

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

Search for high mass diphoton resonances at CMS

Search for high mass diphoton resonances at CMS Search for high mass diphoton resonances at CMS 51st Rencontres de Moriond Electroweak session Thursday 17th 2016, La Thuile (Italy) Pasquale Musella (ETH Zurich) on behalf of the CMS collaboration Motivation

More information

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Statistical Methods in Particle Physics Lecture 1: Bayesian methods Statistical Methods in Particle Physics Lecture 1: Bayesian methods SUSSP65 St Andrews 16 29 August 2009 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan

More information

Statistics for the LHC Lecture 1: Introduction

Statistics for the LHC Lecture 1: Introduction Statistics for the LHC Lecture 1: Introduction Academic Training Lectures CERN, 14 17 June, 2010 indico.cern.ch/conferencedisplay.py?confid=77830 Glen Cowan Physics Department Royal Holloway, University

More information

PoS(CKM2016)117. Recent inclusive tt cross section measurements. Aruna Kumar Nayak

PoS(CKM2016)117. Recent inclusive tt cross section measurements. Aruna Kumar Nayak Recent inclusive tt cross section measurements Institute of Physics, Bhubaneswar, India E-mail: Aruna.Nayak@cern.ch Results of the recent measurements for the inclusive tt production cross section in the

More information

An introduction to Bayesian reasoning in particle physics

An introduction to Bayesian reasoning in particle physics An introduction to Bayesian reasoning in particle physics Graduiertenkolleg seminar, May 15th 2013 Overview Overview A concrete example Scientific reasoning Probability and the Bayesian interpretation

More information

LECTURE NOTES FYS 4550/FYS EXPERIMENTAL HIGH ENERGY PHYSICS AUTUMN 2013 PART I A. STRANDLIE GJØVIK UNIVERSITY COLLEGE AND UNIVERSITY OF OSLO

LECTURE NOTES FYS 4550/FYS EXPERIMENTAL HIGH ENERGY PHYSICS AUTUMN 2013 PART I A. STRANDLIE GJØVIK UNIVERSITY COLLEGE AND UNIVERSITY OF OSLO LECTURE NOTES FYS 4550/FYS9550 - EXPERIMENTAL HIGH ENERGY PHYSICS AUTUMN 2013 PART I PROBABILITY AND STATISTICS A. STRANDLIE GJØVIK UNIVERSITY COLLEGE AND UNIVERSITY OF OSLO Before embarking on the concept

More information

VBF SM Higgs boson searches with ATLAS

VBF SM Higgs boson searches with ATLAS VBF SM Higgs boson searches with Stefania Xella (for the collaboration) Niels Bohr Institute, Copenhagen University, Denmark E-mail: xella@nbi.dk The observation of a Standard Model Higgs boson produced

More information

ATLAS: LHC 2016: 240 GeV Higgs Mass State at 3.6 sigma

ATLAS: LHC 2016: 240 GeV Higgs Mass State at 3.6 sigma ATLAS: LHC 2016: 240 GeV Higgs Mass State at 3.6 sigma Frank Dodd (Tony) Smith, Jr. - July 2017 - vixra 1707.0367 5 July 2017 ATLAS released ATLAS-CONF-2017-058 saying:... A search for heavy resonances

More information

The Signal Estimator Limit Setting Method

The Signal Estimator Limit Setting Method EUROPEAN LABORATORY FOR PARTICLE PHYSICS (CERN) arxiv:physics/9812030v1 [physics.data-an] 17 Dec 1998 December 15, 1998 The Signal Estimator Limit Setting Method Shan Jin a,, Peter McNamara a, a Department

More information

First two sided limit on BR(B s μ + μ - ) Matthew Herndon, University of Wisconsin Madison SUSY M. Herndon, SUSY

First two sided limit on BR(B s μ + μ - ) Matthew Herndon, University of Wisconsin Madison SUSY M. Herndon, SUSY First two sided limit on BR(B s μ + μ - ) Matthew Herndon, University of Wisconsin Madison SUSY 2011 M. Herndon, SUSY 2011 1 B s(d) μ + μ - Beyond the SM Indirect searches for new physics Look at processes

More information

Introduction to Statistical Methods for High Energy Physics

Introduction to Statistical Methods for High Energy Physics Introduction to Statistical Methods for High Energy Physics 2011 CERN Summer Student Lectures Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan

More information

Physics 509: Error Propagation, and the Meaning of Error Bars. Scott Oser Lecture #10

Physics 509: Error Propagation, and the Meaning of Error Bars. Scott Oser Lecture #10 Physics 509: Error Propagation, and the Meaning of Error Bars Scott Oser Lecture #10 1 What is an error bar? Someone hands you a plot like this. What do the error bars indicate? Answer: you can never be

More information

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1 Lecture 2 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

Some Statistics. V. Lindberg. May 16, 2007

Some Statistics. V. Lindberg. May 16, 2007 Some Statistics V. Lindberg May 16, 2007 1 Go here for full details An excellent reference written by physicists with sample programs available is Data Reduction and Error Analysis for the Physical Sciences,

More information

Measurements of the vector boson production with the ATLAS detector

Measurements of the vector boson production with the ATLAS detector Measurements of the vector boson production with the ATLAS detector A. Lapertosa 1,2,, on behalf of the ATLAS Collaboration 1 Università degli Studi di Genova 2 INFN Sezione di Genova Abstract. Measurements

More information

Statistics. Lecture 4 August 9, 2000 Frank Porter Caltech. 1. The Fundamentals; Point Estimation. 2. Maximum Likelihood, Least Squares and All That

Statistics. Lecture 4 August 9, 2000 Frank Porter Caltech. 1. The Fundamentals; Point Estimation. 2. Maximum Likelihood, Least Squares and All That Statistics Lecture 4 August 9, 2000 Frank Porter Caltech The plan for these lectures: 1. The Fundamentals; Point Estimation 2. Maximum Likelihood, Least Squares and All That 3. What is a Confidence Interval?

More information

theta a framework for template-based modeling and inference

theta a framework for template-based modeling and inference theta a framework for template-based modeling and inference Thomas Müller Jochen Ott Jeannine Wagner-Kuhr Institut für Experimentelle Kernphysik, Karlsruhe Institute of Technology (KIT), Germany June 17,

More information

A New Measurement of η b (1S) From ϒ(3S) Radiative Decay at CLEO

A New Measurement of η b (1S) From ϒ(3S) Radiative Decay at CLEO A New Measurement of η b (1S) From ϒ(S) Radiative Decay at CLEO Sean Dobbs (for the CLEO Collaboration) Northwestern University, Evanston, IL 628, USA arxiv:11.228v1 [hep-ex] 1 Jan 21 Abstract. Using CLEO

More information

Handling uncertainties in background shapes: the discrete profiling method

Handling uncertainties in background shapes: the discrete profiling method Journal of Instrumentation OPEN ACCESS Handling uncertainties in background shapes: the discrete profiling method To cite this article: P.D. Dauncey et al 5 JINST P45 View the article online for updates

More information

26, 24, 26, 28, 23, 23, 25, 24, 26, 25

26, 24, 26, 28, 23, 23, 25, 24, 26, 25 The ormal Distribution Introduction Chapter 5 in the text constitutes the theoretical heart of the subject of error analysis. We start by envisioning a series of experimental measurements of a quantity.

More information

BABAR results on Lepton Flavour Violating (LFV) searches for τ lepton + pseudoscalar mesons & Combination of BABAR and BELLE LFV limits

BABAR results on Lepton Flavour Violating (LFV) searches for τ lepton + pseudoscalar mesons & Combination of BABAR and BELLE LFV limits BABAR results on Lepton Flavour Violating (LFV) searches for τ lepton + pseudoscalar mesons & Combination of BABAR and BELLE LFV limits Swagato Banerjee 9 th International Workshop on Tau Lepton Physics

More information

arxiv: v1 [physics.data-an] 2 Mar 2011

arxiv: v1 [physics.data-an] 2 Mar 2011 Incorporating Nuisance Parameters in Likelihoods for Multisource Spectra J. S. Conway University of California, Davis, USA arxiv:1103.0354v1 [physics.data-an] Mar 011 1 Overview Abstract We describe here

More information

Error analysis for efficiency

Error analysis for efficiency Glen Cowan RHUL Physics 28 July, 2008 Error analysis for efficiency To estimate a selection efficiency using Monte Carlo one typically takes the number of events selected m divided by the number generated

More information

Statistical Methods for Particle Physics (I)

Statistical Methods for Particle Physics (I) Statistical Methods for Particle Physics (I) https://agenda.infn.it/conferencedisplay.py?confid=14407 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan

More information

arxiv:hep-ex/ v3 9 Feb 2007

arxiv:hep-ex/ v3 9 Feb 2007 Reduction of the Statistical Power Per Event Due to Upper Lifetime Cuts in Lifetime Measurements Jonas Rademacker University of Bristol arxiv:hep-ex/05004v3 9 Feb 007 Abstract A cut on the maximum lifetime

More information

Search for SM Higgs Boson at CMS

Search for SM Higgs Boson at CMS Search for SM Higgs Boson at CMS James D. Olsen Princeton University (on behalf of the CMS Collaboration) Higgs Hunting Workshop July 29, 2011 Documentation: HIG-11-011 https://twiki.cern.ch/twiki/bin/view/cmspublic/physicsresults

More information

The Higgs boson discovery. Kern-und Teilchenphysik II Prof. Nicola Serra Dr. Annapaola de Cosa Dr. Marcin Chrzaszcz

The Higgs boson discovery. Kern-und Teilchenphysik II Prof. Nicola Serra Dr. Annapaola de Cosa Dr. Marcin Chrzaszcz The Higgs boson discovery Kern-und Teilchenphysik II Prof. Nicola Serra Dr. Annapaola de Cosa Dr. Marcin Chrzaszcz Higgs production at the LHC g H VBF (Vector Boson Fusion) g gg fusion q 2 W/Z q 0 2 H

More information

FERMI NATIONAL ACCELERATOR LABORATORY

FERMI NATIONAL ACCELERATOR LABORATORY FERMI NATIONAL ACCELERATOR LABORATORY arxiv:0908.1374v1 [hep-ex] 10 Aug 2009 TEVEWWG/WZ 2009/01 FERMILAB-TM-2439-E CDF Note 9859 D0 Note 5965 10 th August 2009 Updated Combination of CDF and D0 Results

More information

Higgs couplings and mass measurements with ATLAS. Krisztian Peters CERN On behalf of the ATLAS Collaboration

Higgs couplings and mass measurements with ATLAS. Krisztian Peters CERN On behalf of the ATLAS Collaboration Higgs couplings and mass measurements with ATLAS CERN On behalf of the ATLAS Collaboration July observation: qualitative picture A single state observed around ~125 GeV Qualitatively all observations consistent

More information

CMS: Priming the Network for LHC Startup and Beyond

CMS: Priming the Network for LHC Startup and Beyond CMS: Priming the Network for LHC Startup and Beyond 58 Events / 2 GeV arbitrary units Chapter 3. Physics Studies with Muons f sb ps pcore ptail pb 12 117.6 + 15.1 Nb bw mean 249.7 + 0.9 bw gamma 4.81 +

More information

B-Tagging in ATLAS: expected performance and and its calibration in data

B-Tagging in ATLAS: expected performance and and its calibration in data B-Tagging in ATLAS: expected performance and and its calibration in data () on behalf of the ATLAS Collaboration Charged Higgs 2008 Conference (Uppsala: 15-19 September 2008) Charged Higgs Conference -

More information

Statistical Methods for Particle Physics Lecture 3: systematic uncertainties / further topics

Statistical Methods for Particle Physics Lecture 3: systematic uncertainties / further topics Statistical Methods for Particle Physics Lecture 3: systematic uncertainties / further topics istep 2014 IHEP, Beijing August 20-29, 2014 Glen Cowan ( Physics Department Royal Holloway, University of London

More information

Z boson studies at the ATLAS experiment at CERN. Giacomo Artoni Ph.D Thesis Project June 6, 2011

Z boson studies at the ATLAS experiment at CERN. Giacomo Artoni Ph.D Thesis Project June 6, 2011 Z boson studies at the ATLAS experiment at CERN Giacomo Artoni Ph.D Thesis Project June 6, 2011 Outline Introduction to the LHC and ATLAS ((Very) Brief) Z boson history Measurement of σ Backgrounds Acceptances

More information

Higgs Searches and Properties Measurement with ATLAS. Haijun Yang (on behalf of the ATLAS) Shanghai Jiao Tong University

Higgs Searches and Properties Measurement with ATLAS. Haijun Yang (on behalf of the ATLAS) Shanghai Jiao Tong University Higgs Searches and Properties Measurement with ATLAS Haijun Yang (on behalf of the ATLAS) Shanghai Jiao Tong University LHEP, Hainan, China, January 11-14, 2013 Outline Introduction of SM Higgs Searches

More information