Likelihood Template Fitting - PDF Free Download

Momentum Scale Estimation Using Maximum Likelihood Template Fitting by Yu Zeng Department of Statistics Duke University Date: Approved: David L. Banks, Chair David B. Dunson Ashutosh V. Kotwal A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in the Department of Statistics in the Graduate School of Duke University 2010

Abstract (Statistics) Momentum Scale Estimation Using Maximum Likelihood Template Fitting by Yu Zeng Department of Statistics Duke University Date: Approved: David L. Banks, Chair David B. Dunson Ashutosh V. Kotwal An abstract of a thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in the Department of Statistics in the Graduate School of Duke University 2010

Abstract A maximum likelihood template fitting procedure is performed by using Υ µ + µ events to extract the momentum scale, a scale factor applied to measured momentum, of the CDF detector at Fermilab. The constructed invariant mass spectrum from data events is compared with the invariant mass spectrum from Monte Carlo simulated events, with the momentum scale varying as a free parameter in the simulation. The invariant mass spectrum from simulation which best matches the data spectrum gives the maximum likelihood estimation of the momentum scale. We find the momentum scale is p/p = ( 1.330 ± 0.028 (stat) ± 0.099 (syst) ) 10 3. iv

Contents Abstract List of Tables List of Figures List of Abbreviations and Symbols iv viii ix xii 1 Introduction 1 1.1 Motivation of W boson mass measurement............... 1 1.2 Relationship between momentum scale measurement and W mass measurement............................... 2 1.3 The computing.............................. 6 1.4 Thesis structure.............................. 6 2 Overview of Momentum Scale Measurement 7 2.1 General procedure............................ 7 2.2 Some general issues............................ 10 2.2.1 Statistical error and systematic error.............. 10 2.2.2 Binning issue........................... 10 2.2.3 Choice of step size in p/p................... 11 2.2.4 χ 2 and goodness of fit...................... 11 2.2.5 Parameter tuning......................... 13 3 The Data and Monte Carlo Samples 15 v

3.1 The Data sample............................. 15 3.1.1 Invariant mass spectrum from data............... 16 3.2 The Monte Carlo samples........................ 16 3.2.1 Invariant mass spectrum from simulation............ 17 3.2.2 Effect on χ 2 due to finite MC sample size and background description............................. 19 4 Template Fitting with Maximum Likelihood 25 4.1 Template construction.......................... 25 4.1.1 Construction of the inclusive template............. 26 4.1.2 Construction of non-inclusive templates in parameter space R 28 4.2 Maximum Likelihood method...................... 29 4.2.1 Variance of Maximum Likelihood Estimator.......... 31 4.3 Bias in the fitter............................. 35 4.4 Necessary steps before the final momentum scale measurement.... 36 4.4.1 Background parameterization.................. 36 4.4.2 Resolution tuning in detector simulation............ 37 4.4.3 Data bias removal......................... 38 4.4.4 Material simulation check.................... 42 5 Results 45 5.1 Momentum scale and the statistical error................ 45 5.2 Momentum scale systematic errors................... 46 5.2.1 PYTHIA event generation..................... 47 5.2.2 Recycling technique........................ 49 5.2.3 Binning.............................. 49 5.2.4 Step size in p/p......................... 49 5.2.5 Fit range............................. 50 vi

5.2.6 Background parameterization.................. 51 5.2.7 Resolution model......................... 56 5.2.8 Applied data correction..................... 56 5.2.9 Material simulation........................ 58 5.2.10 Selection cuts........................... 59 5.3 Combine momentum scale statistical and systematic errors...... 61 5.3.1 Error propagation......................... 61 5.3.2 Summary of systematic and statistical errors.......... 63 6 Summary 65 A Invariant Mass 66 B Selection Cuts 67 C The Poisson Distribution 68 C.1 The Poisson postulates.......................... 68 D Other Systematic Errors 71 D.1 Quantum electrodynamics and energy loss model........... 71 D.2 Magnetic field nonuniformities...................... 72 D.3 Trigger efficiency (p T cut)........................ 74 Bibliography 76 vii

List of Tables 1.1 Comparison of required precisions in M t and M W to constrain M H precision to 100 GeV............................ 2 5.1 Summary of systematic errors and statistical error on p/p and on M W. 64 B.1 Requirements on muons from Υ µµ................. 67 viii

List of Figures 1.1 m W and m t contours [4].......................... 3 1.2 The CDF detector and the coordinate system.............. 4 1.3 Transverse mass m T spectrum from simulation. Blue histogram is for input M W = 80 GeV, and red histogram is for input M W = 81 GeV. The Jacobian edge of the m T distribution is quite sensitive to the input M W value.............................. 4 2.1 Flow chart of momentum scale (or p/p) measurement......... 8 3.1 The final invariant mass spectrum from data.............. 17 3.2 Distribution of p Υ T before and after tuning. Blue points represent data and the red histogram represents events from PYTHIA.......... 18 3.3 Comparison of invariant mass spectrum m µµ at different simulation stages from PYTHIA and from PYTHIA plus detector simulation..... 19 3.4 Comparison of not recycling and recycling PYTHIA-level generated events 10 times. Plots (a) and (b) show the Data-MC fit without and with the recycling, respectively. Plots (c) and (d) show the negative log-likelihood curve (ln L ln L max ) as a function of parameter p/p without and with the recycling, respectively (see discussion of likelihood in Sec. 4.2)........................... 20 3.5 Standard error ellipse for the estimators ˆθ i and ˆθ j when correlation coefficient ρ ij < 0 [6]............................ 22 4.1 m µµ spectrum from MC for p/p = 0.004 and p/p = 0....... 27 4.2 Inclusive Monte Carlo templates constructed with Υ µ + µ simulated events. The p/p axis is binned into 800 bins from 0.004 to 0 with equal width 0.005 10 3. The m µµ axis is binned into 320 bins from 8.8 GeV to 11.2 GeV with equal bin width 0.0075 GeV (7.5 MeV). 28 ix

4.3 Validation plots show that the fitter is unbiased............. 35 4.4 Inclusive Υ µµ fit for resolution scale factor h 0 = 0.968, 0.978 and 0.988.................................... 39 4.5 The χ 2 as a function of hit resolution scale factor h 0. The red line is for χ 2 = χ 2 min + 1. Blues lines indicate the locations of the statistical error and the central value......................... 40 4.6 p/p vs. cot θ before correction where cot θ = cot θµ + cot θ µ. The black curve is a parabolic fit to the blue points with function p 0 + p 1 ( cot θ) + p 2 ( cot θ) 2..................... 41 4.7 p/p vs. cot θ after correction where cot θ = cot θµ + cot θ µ. Correction parameters: α = 1.00160, β = 0.50 10 7. The black curve is a parabolic fit to the blue points with function p 0 + p 1 ( cot θ) + p 2 ( cot θ) 2............................... 41 4.8 Vary best tuned α = 1.00160 by ±0.001 with β = 0.50 10 7 fixed.. 42 4.9 Vary β = 0.50 10 7 by ±1.0 10 7 with α = 1.00160 fixed..... 43 4.10 p/p vs. 1/p T (µ)............................ 44 5.1 Final central value of p/p with its statistical error........... 46 5.2 Log-likelihood curve and signed χ..................... 46 5.3 Distribution of p T (Υ) when p 1 varies ±0.001 from ˆp 1 = 0.325..... 48 5.4 χ 2 as a function of p 1........................... 48 5.5 Υ µµ fit when ˆp 1 varies by 1σ.................... 49 5.6 Binning effect of m µµ on p/p...................... 50 5.7 Vary fit range by 20%........................... 50 5.8 Fit range (9.43 0.36, 9.43 + 0.36) GeV for B level ±3σ band tuning.. 52 5.9 Back to normal fit range (9.43 0.185, 9.43 + 0.185) GeV....... 53 5.10 Fit range (9.43 0.36, 9.43 + 0.36) for B slope ±3σ band tuning..... 54 5.11 Fit range (9.43 0.36, 9.43 + 0.36) for B slope ±3σ band tuning..... 55 5.12 Vary h 0 = 0.978 by ±3σ.......................... 56 5.13 Vary α by ±3σ............................... 57 x

5.14 Vary β by ±3σ............................... 58 5.15 Vary M scale by ±3σ............................ 59 5.16 Vary the cut on N hit by ±5 from 25................... 60 5.17 Vary the cut on d 0 by ±0.1 from 0.3 cm............... 60 5.18 Vary the cut on z 0 by ±1.0 from 3.0 cm............... 61 D.1 χ 2 as a function of Qscale factor q 0.................... 72 D.2 Inclusive Υ µµ fit with Qscale factor q 0 = 1.46............ 73 D.3 Inclusive Υ µµ fit without magnetic field correction......... 73 D.4 M W with and without applying the magnetic field correction...... 74 D.5 Inclusive Υ µµ when p T cut is raised by 0.2 GeV........... 75 xi

List of Abbreviations and Symbols Abbreviations SM CDF COT MC KeV MeV GeV The Standard Model. The Collider Detector at Fermilab. It is a 100-ton complex detector which measures particles coming out of the proton antiproton collison. Central Outer Tracker, a component of CDF detector. It is a 3-meter long cylinder for measuring tracks. Monte Carlo simulation. A technique to make inference by performing a large number of trail runs and analyzing their colletive results. Kiloelectron Volt. Unit of energy, momentum and mass when c = 1, = 1. Megaelectron Volt. Unit of energy, momentum and mass when c = 1, = 1. Gigaelectron Volt. Unit of energy, momentum and mass when c = 1, = 1 (1 GeV = 10 3 MeV = 10 6 KeV = 10 9 ev). Symbols W M W ν Υ M Υ W particle. It has a positive or negative charge. The mass of the W particle. Neutrino. It has no charge. Υ particle. It can decay to a µ + µ pair. The mass of the Υ particle. It is known to be 9.46 GeV with extremely high precision. xii

µ Muon particle. It has a positive charge or negative charge with its mass precisely known to be 105.6 MeV. µ will leave a track in COT. µ + Positively charged muon particle. µ Negatively charged muon particle. m µµ Invariant mass of µ + µ pairs. See Appendix A. m T Transverse mass of µν pairs which are decay products of W. Transverse mass is constructed in a similar way as invariant mass. The only difference is that it only uses information on the plane perpendicular to beam axis z at z = 0. W µν A W particle decays to a charged µ and a neutral ν. Υ µµ A Υ particle decays to a µ + and a µ. p T Amount of momentum perpendicular to beam axis z. p T = p 2 x + p 2 y. p Total amount of momentum. p = p 2 x + p 2 y + p 2 z. p T (Υ) p T carried by Υ. p T (µ ± ) p T carried by µ + or µ. 1/p T (µ) Average of 1/p T (µ + ) and 1/p T (µ ). p/p Correction of momentum normalized to total momentum. 1 + p/p Momentum scale. θ The angle of µ track relative to the proton beam direction. See Fig. 1.2. cot θ The cot θ difference between µ + and µ, i.e., cot θ µ + cot θ µ. cot θ Track parameter. Cotangent of the angle of µ track w.r.t. beam axis z. z 0 Track parameter. Initial position of µ track along beam axis z. c d 0 φ 0 Track parameter. Curvature of µ track. Track parameter. Impact parameter of µ track. Track parameter. Angle of µ track in x y plane. xiii

1 Introduction 1.1 Motivation of W boson mass measurement In high energy physics, also known as particle physics, there is a theory called the Standard Model [1], which is a quantum field theory describing the physical properties of all fundamental particles and their mutual interactions. The Standard Model is quite successful at explaining experimentally observed phenomena, and more importantly, it gave successful predictions of some undiscovered particles which were later confirmed to exist. Though the Standard Model predictions agree well with almost all experimental observations until now, a fundamental particle, also known as the Higgs boson, is still missing from the observed particle spectrum. However, the existence of the Higgs boson is required to make the Standard Model theory self-consistent. If the Standard Model is a self-consistent theory, then the mass of the undiscovered Higgs boson is related to the W boson mass (M W ) and the top quark mass (M t ). By measuring M W and M t precisely, we can infer the possible Higgs boson mass (M H ) within the Standard Model framework. Their mass relationship (in units 1

of GeV, see pg. xii) is best illustrated by Eqn. (1.1) [3]: M W = 80.364 + 0.525 [ ( ) 2 ( ) ( ) Mt MH 1] 0.0579 ln 0.008 ln 2 MH 172 100 100 (1.1) where other terms with small contributions are neglected. From Eqn. (1.1) we can see higher values of M W suggests smaller values of the Higgs mass for a fixed M t, and vice versa. Assume the central values M W = 80.364 GeV, M t = 172.0 GeV, and M H = 100 GeV. Table 1.1 shows the required precision of M W compared with the required precision of M t for the same constraining power on the precision of M H. The M W, M t and M H relationship is shown in Fig. 1.1, where the diagonal lines are for different values of M H and the blue contour corresponds to a 68% confidence level contour of M W and M t measurements. We can see the constraint on M H from the current measurement precision of M W is not as powerful as that from the measurement precision of M t. It is thus important to measure the W boson mass more precisely. Table 1.1: Comparison of required precisions in M t and M W to constrain M H precision to 100 GeV. Parameter Shift (GeV) M t (GeV) M W (GeV) M H = +100-7.362-0.044 1.2 Relationship between momentum scale measurement and W mass measurement As one of the most important, yet difficult, analyses in experimental high energy physics, the W boson mass is measured at hadron colliders by constructing transverse 2

August 2009 80.5 LEP2 and Tevatron (prel.) LEP1 and SLD 68% CL m W [GeV] 80.4 80.3 α m H [GeV] 114 300 1000 150 175 200 m t [GeV] Figure 1.1: m W and m t contours [4]. mass m T, the shape of which is a function of M W as well as other factors: m T = 2p µ T pν T (1 cos φ) (1.2) where µ (muon) and ν (neutrino) are decaying particles of W boson via W µν; p µ T (pν T ) is the momentum of µ (ν) in the transverse plane, a plane perpendicular to beam axis z; φ is the angle between µ and ν in the transverse plane. The coordinate system and the CDF detector are shown in Fig. 1.2. The m T spectrum constructed using Eqn. (1.2) is shown in Fig.1.3. The transverse mass is used because it is a kinematic quantity which is invariant under longitudinal lorentz boosts (boosts along the beam axis z). The m T distribution is not like any of the statistical distributions we already know. The shape of m T arises from a combined effect of the true m W value, dynamics of W boson production (p T (W )), W boson decay angular distribution, and 3

Figure 1.2: The CDF detector and the coordinate system. Figure 1.3: Transverse mass m T spectrum from simulation. Blue histogram is for input M W = 80 GeV, and red histogram is for input M W = 81 GeV. The Jacobian edge of the m T distribution is quite sensitive to the input M W value. 4

the detector effects which the W boson decay products will experience. The m T Jacobian edge position near 80 GeV is most sensitive to M W. Once the detector response and the W boson production and decay mechanisms are well understood and simulated, we can fit the m T distribution from Monte Carlo simulation against m T distribution obtained from real data to extract M W. By inspecting Eqn. (1.2), we can see that if the measured p µ T is bigger than the true p µ T, then the mmeasure T distribution will shift to the right compared with the m true T distribution. This will lead to a bigger M measure W smaller than the true p µ measure T, then we will get a smaller MW than MW true. If the measured pµ T is than M true. It is thus W important to measure p µ T (or pµ ) precisely. A proposed way to get the most precise p µ is using a particle, whose mass is precisely known, as a reference to calibrate the momentum p µ measured in the CDF detector. The process of determining how much the measured momentum should be corrected to match the true momentum, thus to match the mass of the particle, is called momentum scale (1 + p/p) measurement. Once this scale factor (1 + p/p) is found, it will be applied to the measured momentum to get the best estimate of true momentum p true = (1 + p/p) p measure. (1.3) One way to measure (1 + p/p) (or equivalently p/p) is to use Υ(1S) µ + µ events. The Υ(1S) particle is a flavorless b b meson which was first discovered at Fermilab in 1977 [5] with its mass precisely measured to be at 9.46 GeV by other experiments (see Particle Data Group [6]). Its well-known mass value makes it an ideal candidate to precisely calibrate CDF-measured momenta. (1S) is a label to distinguish it from other similar resonances with higher masses, which are known as Υ(2S) and Υ(3S). For simplicity we will just use Υ to denote Υ(1S) throughout this thesis when no confusion could arise. Using Υ µ + µ data events, we can construct an invariant mass spectrum m Data µµ (See Appendix A for the definition of 5

invariant mass). Similary, using Υ µ + µ simulated events, we can construct a m MC µµ spectrum. If we repeat the construction of the m MC µµ spectrum by applying a different scale factor (1 + p/p) to the measured momentum, we can construct a series of such m MC µµ spectra, with each of the spectra i generated at a pre-defined value (1 + p/p) i. The m Data µµ spectrum is then compared with each of the m MC µµ spectra to figure out which MC spectrum matches the data best. The m MC µµ spectrum k that matches the m Data µµ spectrum best then tells us the momentum scale should be (1 + p/p) k. The way to find this best match is by using a maximum-likelihood method for each m MC µµ template. See discussions in Chapter 4 for details. 1.3 The computing All analysis codes are written in C++ except the PYTHIA [7] event generation, which is written in FORTRAN. Some shell scripts are used for automatic job submission and automatic background tuning. Monte Carlo simulation jobs are submitted to the CDF Analysis Farm [8], a network of computer farms, to run hundreds of the jobs at the same time instead of running them one by one. All outputs are directed to the same location upon completion and are then recombined. The typical time to run 100 parallel jobs of the Υ µ + µ simulation is 15-20 hours. 1.4 Thesis structure This thesis consists of 6 chapters. Chapter 1 gives the general introduction; Chapter 2 to Chapter 5 discusses the momentum scale estimation using maximum likelihood template fitting, as well as the statistical and systematic errors; Chapter 6 summarizes the obtained results. Additional discussions relevant to this thesis are presented in Appendices. Terminologies used in this thesis are collected and explained in the section of Abbreviations and Symbols on pg. xii and pg. xiii. 6

Overview of Momentum Scale Measurement 2 2.1 General procedure As illustrated in Fig. 2.1, the momentum scale measurement contains 3 major steps: STEP 1: On the data side, we use Υ µ + µ data events collected by the CDF detector to construct a binned invariant mass spectrum m Data µµ. The location and shape of the m Data µµ spectrum is a combined effect of Υ µ + µ production and the interaction of decaying µ ± with the CDF detector (see the definition of invariant mass in Appendix A). STEP 2: On the simulation side, we take three sub-steps to construct a binned invariant mass spectrum m MC µµ. STEP 2.1: The first step is to use the Monte Carlo event generator PYTHIA [7] to simulate the production of Υ µ + µ events. At this stage, no detector effects are simulated. The invariant mass m P ythia µµ constructed at this stage is sharply peaked at m Υ and approximates a δ-function compared with the spread of m Data µµ. STEP 2.2: As a next step, these simulated events are propagated through a fast detector simulation to simulate how µ ± interacts with materials inside the detector. 7

Raw Data MC generator ( interested events) Reconstruction Track Parameters(TP) Condensed Data (Root trees) Detector simulation Reconstruction Event Selection Data TP Updated TP Invariant mass Invariant mass Data Histogram ML fitting Simulation Histograms Correct Data Yes Data bias? Simulation OK? No No Yes Final result! Figure 2.1: Flow chart of momentum scale (or p/p) measurement. 8

When this step is done, the mass spectrum m MC µµ will become much wider than m P ythia µµ due to detector resolution effects and it will look similar to m Data µµ if the detector simulation is correct. STEP 2.3: The background observed in the m Data µµ spectrum is parameterized and added to the m MC µµ spectrum to form an updated m MC µµ spectrum (see Sec. 4.4.1 for the discussion of background tuning). STEP 3: We then fit the m MC µµ spectrum against the data m Data µµ spectrum and can get a negative log-likelihood value ln L using the Poisson distribution as the probability function (see Sec. 4.2 for detailed discussion of likelihood function). If we scale the momentum in simulation by a scale factor (1 + p/p), and repeat STEP 2.2, we can get a new m MC µµ spectrum in STEP 2.3. Thus we can get a new ln L in STEP 3. If we repeat the above processes by varying p/p, we can get a series of ( p/p, ln L) pairs. If we plot them on x-y plane with the x axis as the trial value of p/p and corresponding ln L values on the y axis, we can get a ln L curve as a function of p/p. The x-coordinate of the minimum of ln L (equivalently the maximum of ln L) is the Maximum Likelihood Estimator (MLE) p/p. Thus we know momentum measured in our data should be corrected as p true = (1 + p/p) p measure. (2.1) Besides extracting the MLE p/p, the above three steps can be used to explore possible bias in data. They can also be used to validate the parameterization in our detector simulation. Actually, before getting a reliable p/p, we have to remove data bias first as well as make sure our detector simulation agrees with data. 9

2.2 Some general issues 2.2.1 Statistical error and systematic error Statistical error is a random deviation between an observed value and the true value caused by the sampling or measurement noise. Statistical error can be reduced by repeated measurements. Systematic error is the bias in measurement caused by incomplete knowledge of nuisance parameters [9]. It cannot be reduced by repeating measurements. In our analysis, the m µµ spectrum is a function of p/p as well as other nuisance parameters listed in Table 5.1. We vary those parameters in the simulation to quantify the corresponding systematic errors. 2.2.2 Binning issue Our whole analysis is constructed based on histograms (binned spectrum) because the binning method is advantageous for fast calculations (see Fan and Marron [10], Wand [11] for detailed discussion). Generally speaking, we need to consider the following facts in choosing the number of bins N bin : 1) N bin should not be too small, otherwise we will lose the resolution in original measurements; 2) For presentation reasons and to make the shape of spectrum clear, N bin should not be too large [12]. We choose N bin to satisfy the above two conditions and confirm the bias caused by our choice of bins is negligible compared with the precision we want to reach. In our analysis, we deal with binning in m µµ and R with R to be cot θ or 1/p µ T. To get a high precision measurement, we need fine binning in m µµ. As to the binning in R (R = cot θ or 1/p µ T ), we just need N bin to be sufficient to capture the overall bias in data [13] [14] [15] (we use 16 cot θ bins in range (-1.6, 1.6)), or to check the material description in simulation [13] [14] [15] (we use 12 1/p µ T bins in range 10

(0, 0.3)). Binning in R space is discussed in the construction of the non-inclusive template in Sec. 4.1.2. As to the binning in m µµ, we force the binning of m MC µµ spectrum and m Data µµ spectrum to be the same, with N bin of m Data µµ spectrum chosen such that we can see a clear Υ peak with clear tails. The bias on fitted p/p due to the binning in m µµ, which turns out to be very small and can also be neglected, is studied in detail in Chapter 5. 2.2.3 Choice of step size in p/p We vary trial values of p/p from 0.004 to 0 with a fixed step, which directly relates to the precision of p/p we want to measure. The range ( 0.004, 0) is chosen because other analysis [13] has shown p/p lies within a negative region close to 0. We choose the step size in p/p in such a way that the bias on p/p, when converted into M W, is negligible when compared with the goal of measuring M W within 25 MeV. In principle, if the size of step goes to 0 (thus infinite number of p/p trial values), we can remove this bias. In our analysis, we divide the p/p range ( 0.004, 0) into 800 regions with step size ɛ p/p = 5 10 6, which will introduce a maximum bias of 0.4 MeV on M W. Since the bias is so small compared with 25 MeV, its effect can be neglected and further reducing the step size in p/p would not help in improving the precision of M W measurement. 2.2.4 χ 2 and goodness of fit One commonly used goodness-of-fit test is based on Pearson s χ 2 statistic [16], which can tell the level of agreement between an observed data histogram and a MC simulated histogram. The χ 2 goodness-of-fit test approximates χ 2 in distribution under the null hypothesis when the following assumptions are satisfied [17]: 1) Random sampling of data; 2) A sufficiently large sample size; 3) Adequate number of entries 11

in a given category; 4) Independent observations. For m Data µµ and m MC µµ with the same binning and assuming the MC histogram bin has no associated statistical error, the Pearson s χ 2 is given by: χ 2 = n i=1 ( ) N Data i Ni MC 2 Ni MC (2.2) with n i=1 N Data i = n i=1 N MC i (2.3) where n is the total number of bins within a pre-selected fit range along m µµ axis; N Data i is the observed number of data events in bin i; N MC i is the number of simulated events in bin i with the background and the MC-to-Data normalization also taken into account. Details of how to choose the fit range, how to parameterize the background and how to calculate N MC i are discussed in Sec. 3.1.1, Sec. 4.4.1 and Sec. 4.2, respectively. The change from the calculated χ 2 using Eqn. (2.2) due to our large but finite MC sample and due to uncertainties in background model is discussed in Sec. 3.2.2. Since χ 2 statistic gives the sum of squares of the differences between data and MC in units of the standard deviation, a large χ 2 usually means a large discrepancy between data and MC. It can also be shown that the χ 2 statistic asymptotically approaches a χ 2 distribution with degrees of freedom df = n p, where n is the number of bins and p is the reduction in degrees of freedom [18]. In case of extracting p/p via Maximum Likelihood method (see Sec. 4.2), the calculated χ 2 statistic will approximate a χ 2 n 2 distribution if the size of data sample is sufficiently large. The degree of freedom df equals to n 2 rather than n because of the normalization and the need to estimate p/p. Since the expected value of χ 2 n 2 distributed variable is n 2, the expectation value of the variable χ 2 /(n 2) should be 1 and thus χ 2 /(n 2) is often quoted as a measure of agreement between 12

data and MC. If χ 2 /(n 2) is much greater than 1, then the theoretical model used to generate MC samples is deficient; if χ 2 /(n 2) is much less than 1, we may need to check whether the variance used in calculating the χ 2 statistic (N MC i in our case) is overestimated. A P -value [18] [19] can also be calculated by comparing the value of the χ 2 statistic with a χ 2 n 2 distribution. Since the P -value is defined to be the probability of obtaining a test statistic, under the null hypothesis, at least as extreme as the one that was observed, we can thus calculate the P -value by integrating the χ 2 n 2 distribution from the observed χ 2 to infinity with P = χ 2 f(x; n 2)dx (2.4) f(x; n 2) χ 2 n 2. If one does not extract any parameter but just normalizes a MC histogram to the data histogram, as shown in Fig. 3.2 and Fig. 5.3, the χ 2 statistic will approximate a χ 2 n 1 distribution, with n as the total bin number. The P -value will then be with P = χ 2 f(x; n 1)dx (2.5) f(x; n 1) χ 2 n 1. The R [20] package supplies a command 1-pchisq(χ 2, df) to calculate this P -value. For presentation reasons, the calculated χ 2 is rounded to the nearest integer and added to each plot in this thesis. When we tune parameters and study systematic errors, the original χ 2 values are used. 2.2.5 Parameter tuning Throughout this thesis, our simulation parameters are tuned to their best values by minimizing the calculated χ 2 in a pre-selected fit range. This is based on the 13

least square method, which is equivalent to maximum likelihood method when the Poisson distributed N Data i approximates to a Gaussian [18]. In addition, it can be shown that one standard deviation from the best tuned parameter value will increase the χ 2 from χ 2 min to χ 2 min + 1 [21] [22]. We can thus vary a given parameter until χ 2 increases from χ 2 min to χ 2 min + 1 to find the corresponding one standard deviation of that parameter. 14

3 The Data and Monte Carlo Samples 3.1 The Data sample Data in High Energy Physics (HEP) take several forms [23] depending on the stages of data processing. In the time flow, it is called raw data, reconstructed data and skimmed (condensed) data in ROOT-based n-tuple formats, where ROOT is an object oriented framework for large scale data analysis in HEP [24]. We use the ROOT-based n-tuple in data analysis. A ROOT n-tuple is a collection of events categorized according to events properties and it is a derived data format to be used for the final analysis in high energy physics. For our analysis, we use n-tuples as the input data format and apply standard selection cuts (see Appendix B), then reconstruct m Data µµ. This approach is equivalent to first writing out µ ± track parameters for each event into a text file, then using those track parameters to construct m Data µµ spectrum. Without loss of generality, we can treat the data as stored in a text file with each line representing a Υ µ + µ event. For a given such event, there are 10 associated quantites with the first 5 quantities and last 5 quantities being the track parameters of µ + and µ, 15

respectively. Those track parameters of µ ± already contain the CDF detector effects. 3.1.1 Invariant mass spectrum from data The Υ µ + µ data used in this thesis were recorded by the CDF detector from Febuary 4, 2002 to August 4, 2007 [25]. We have a total of 464, 802 Υ µµ candidate events within the invariant mass range (9.43 0.185, 9.43 + 0.185) GeV after our standard selection (see Appendix B). The range is chosen such that it can cover the observed spread of Υ peak and we use the same range to fit MC histograms against the data histogram. The number 464, 802 includes the background contribution. If we subtract the best tuned linear background, as discussed in Sec. 5.1, we get within the same fit range 238, 999 events, which can be viewed as the total number of pure Υ µµ events in the data. The final m µµ spectrum constructed from data is shown in Fig. 3.1, where the peak around 9.43 GeV is the Υ resonance and the peak around 10 GeV is Υ(2S) which is not of interest to us. Since Υ(2S) can also decay to µ + µ, they appear in the reconstructed m µ + µ spectrum. For our analysis, we will only consider the Υ peak. We choose the bin width such that there are 50 bins within the fitting window (9.43 0.185, 9.43 + 0.185) GeV. The possible systematic bias caused by this binning method is studied by doubling the bin width and halving the bin width. A detailed discussion is shown in Chapter 5. 3.2 The Monte Carlo samples After the simulated production of Υ µ + µ and simulated interaction of µ ± with CDF detector, we write out the track parameters for each generated µ. Similar to the format of data events, each simulated event is stored in a 10-column text file with the first 5 columns being the track parameters associated with µ + and the last 5 columns being the track parameters associated with µ. 16

events 15000 10000 5000 9 9.5 10 m µµ (GeV) Figure 3.1: The final invariant mass spectrum from data. 3.2.1 Invariant mass spectrum from simulation The simulation of Υ µ + µ events within CDF detector has two stages. The first stage is event generation which only simulates the production of Υ µµ events without any detector effects. We generate 50 million Υ µ + µ events using the MC event generator PYTHIA [7]. Parameters which control the transverse momentum distribution of the generated Υ (p MC T (Υ)) are tuned to match p Data T (Υ) because PYTHIA does not describe it well. See Fig. 3.2 (a) and (b) for comparison before tuning and after tuning. The systematic uncertainty due to the tuning parameters are studied in Chapter 5. At this stage, if we construct the invariant mass m P ythia µµ using PYTHIA output, we will get a sharp spectrum with very narrow width, see Fig. 3.3 (a). The second stage of simulation is simulating the CDF detector effects on µ ± tracks produced by PYTHIA. At this stage, the same standard selection cuts (Appendix B), which are already applied to the data, are applied to the simulated events. After this second stage, track parameters of µ ± get updated from their original PYTHIAgenerated values. Using the updated track parameters, we can construct a new invariant mass spectrum m P ythia+detector µµ. As shown in Fig. 3.3 (b), the m P ythia+detector µµ 17

24000 22000 20000 18000 16000 14000 12000 10000 8000 6000 4000 2000 0 0 2 4 6 8 10 12 14 p (#ups) (GeV) T (a) Distribution of p Υ T before tuning ) 24000 22000 20000 18000 16000 14000 12000 10000 8000 6000 4000 2000 χ 2 / DoF = 28.96 / 29 0 0 2 4 6 8 10 12 14 p (Υ (GeV) (b) Distribution of p Υ T after tuning T Figure 3.2: Distribution of p Υ T before and after tuning. Blue points represent data and the red histogram represents events from PYTHIA. spectrum is about 1200 times wider than the m P ythia µµ spectrum indicating most of the effects come from the detector simulation (the number 1200 is obtained by taking the ratio of the RMS on plot (a) and the RMS on plot (b)). This fact indicates that even if we feed the same PYTHIA event into the detector simulation, we will still get quite different behavior and quite different invariant mass m P ythia+detector µµ due to numerous random processes in the detector simulation [26]. The expected number of times we can use the same event in Fig. 3.3 (a) to simulate events in Fig. 3.3 (b) before the simulated events become identical is at least on the order of 10 3. Thus, if we use the same PYTHIA event just a few times ( 10 3 ) through the detector simulation, we will get events with quite different behavior and they can be viewed, to a large extent, as uncorrelated. Considering the time and computing resources needed 1 to generate 10 times more PYTHIA events, and using the fact that almost all of the limitation comes from the detector simulation, we feed each of the 50 M PYTHIA generated events 10 times into the detector simulation to mimic the process of generating 500 M PYTHIA events and 1 (a). It is time consuming to generate 500 M events; (b). The CDF Analysis Farm (CAF) has limitations on the size of the input files that store the PYTHIA events. It also has limitations on the number of jobs to be submitted parallely at the same time. 18

events 3 10 1600 1400 HepgMass Entries 1.50253e+07 Mean 9.46 RMS 0.0001054 events 600 3 10 Upsmass Entries 1.50253e+07 Mean 9.423 RMS 0.1282 1200 500 1000 400 800 300 600 400 200 200 100 0 9.459 9.4592 9.4594 9.4596 9.4598 9.46 9.4602 9.4604 9.4606 9.4608 9.461 Pythia m µµ (a) m µµ after PYTHIA event generation 0 9 9.5 10 10.5 11 Pythia+Sim m µµ (b) m µµ after PYTHIA event generation and detector simulation Figure 3.3: Comparison of invariant mass spectrum m µµ at different simulation stages from PYTHIA and from PYTHIA plus detector simulation. then feeding into the detector simulation. The possible bias caused by this technique is shown in Fig. 3.4. We see a change in p/p to be 0.01 10 3, which is a 0.8 MeV effect on M W. This change comes from a combined effect of the bias caused by recycling and the bias caused by the statistical fluctuation due to the size of the MC samples, which can be seen in the minimum region in Fig. 3.4 (c). This small change in p/p is negligible when compared with the 25 MeV precision goal of the M W measurement. We thus use the recycled MC samples as the default MC samples and quote 0.8 MeV as a systematic error along with other systematic errors in Chapter 5. 3.2.2 Effect on χ 2 due to finite MC sample size and background description For the MC histogram which matches with the data best (i.e., the 534th histogram), there are 1.34157 10 8 events within (9.43 0.185, 9.43 + 0.185) GeV interval before normalizing to data entries and before adding backgrounds to MC. By comparing in the fit range (9.43 0.185, 9.43 + 0.185) the total number of data events after background subtraction and the the total number of pure MC signal events, we can 19

events 15000-3 p/p = (-1.32 ± 0.0275 ) x 10 stat events 15000-3 p/p = (-1.33 ± 0.0275 ) x 10 stat 10000 χ 2 /dof = 61 / 48 10000 χ 2 /dof = 61 / 48 5000 5000 9 9.5 10 m µµ (GeV) (a) Without recycling PYTHIA-level events 9 9.5 10 m µµ (GeV) (b) Recycling PYTHIA-level events 10 times ln L max - ln L 5 4.5 4 3.5 ln L max - ln L 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0-0.0015-0.00145-0.0014-0.00135-0.0013-0.00125-0.0012 p/p (c) Without recycling PYTHIA-level events 3 2.5 2 1.5 1 0.5 0-0.0015-0.00145-0.0014-0.00135-0.0013-0.00125-0.0012 p/p (d) Recycling PYTHIA-level events 10 times Figure 3.4: Comparison of not recycling and recycling PYTHIA-level generated events 10 times. Plots (a) and (b) show the Data-MC fit without and with the recycling, respectively. Plots (c) and (d) show the negative log-likelihood curve (ln L ln L max ) as a function of parameter p/p without and with the recycling, respectively (see discussion of likelihood in Sec. 4.2). get the MC scaling factor S = 238999 0.00178. (3.1) 1.34157 108 The effect of simulation statistical error on the calculated χ 2 (denoted as χ 2 to distinguish it from χ 2 in Sec. 2.2.4) can be illustrated by the following derivations: χ 2 = n i=1 ( N Data i N MC i ) Ni MC 2 + σ 2 Ni MC (3.2) 20

with N MC i = S Ni MC (0) + N Bkg i (3.3) σ 2 N MC i = σ 2 S Ni MC (0) + σ2 N bkg i = S 2 Ni MC (0) + σ 2 N bkg i (3.4) where m is the total number of bins within a pre-selected fit range, Ni MC (0) is the original number of pure MC events in bin i, N Bkg i i, σ 2 N bkg i is the number of bkg events in bin is the variance of background in bin i. If we plug Eqn. (3.3) and (3.4) into Eqn. (3.2), we will get χ 2 = = n ( i=1 n i=1 = χ 2 S N MC i [ ( Ni Data 1 + [ ( Ni Data (0) + N Bkg i S N MC i S Ni MC ) ( + S Ni MC (0) + N Bkg i 1 S2 N MC i S N MC i (0)+σ 2 N bkg i (0)+N Bkg i )] 2 (0) + N Bkg i S 2 N MC i )] 2 (0) + N Bkg i (0) + σ 2 N bkg i 1 + 1 S2 N MC i S N MC i ) (0)+σ 2 N bkg i (0)+N Bkg i (3.5) with χ 2 the same as in equation as Eqn. (2.2), assuming the size of MC sample to be infinity and assuming no error on background parametrization: χ 2 = n i=1 [ ( Ni Data S N MC i S Ni MC (0) + N Bkg i )] 2 (0) + N Bkg i = n i=1 ( ) N Data i Ni MC 2 Ni MC. (3.6) As discussed in Sec. 4.4.1, the background is parametrized as N bkg i = B level + B slope (m µµ 9.43) (3.7) 21

with ˆB level = 4517 (σ Blevel 7) and ˆB slope = 250 (σ Bslope 39), which are discussed in detail in Sec. 5.2.6. We thus have σ 2 N bkg i = σ 2 B level + σ 2 B slope (m µµ 9.43) 2 + 2 (m µµ 9.43) Cov(B level, B slope ) (3.8) If there is no constraint, B level and B slope should be independent. When using χ 2 minimization as the criterion to tune parameters B level and B slope, they become weakly correlated. If we vary B level by ±1σ Blevel from its central value 4517, B slope needs to be changed by 2 to keep the χ 2 in the pre-selcted fit range minimized. As we can see from Fig. 3.5, the correlation coefficient between B level and B slope should satisfy ρ σ Bslope = 2. Thus we have ρ 0.05 and Cov(B level, B slope ) = ρ σ Blevel σ Bslope 13.65. (3.9) Figure 3.5: Standard error ellipse for the estimators ˆθ i and ˆθ j when correlation coefficient ρ ij < 0 [6]. Thus Eqn. (3.8) will become σ 2 N bkg i 49 + 1521 (m µµ 9.43) 2 27.3 (m µµ 9.43). (3.10) 22

Since we fit in the region 0.185 < (m µµ 9.43) < 0.185, we have: σ 2 N bkg i 49 27.3 (m µµ 9.43) 49 27.3 0.185 44. (3.11) Using the value of S obtained in Eqn. (3.1) and with the m µµ peak estimation from Fig. 3.3 (b) to be 600 10 3, we have S 2 Ni MC (0) 0.00178 2 600 10 3 = 1.9, which is much smaller than σ 2. This means the variance from background dominates N bkg i the correction term in χ 2 calculation. When m µµ = 9.43 (the bin at the peak position), Eqn. (3.10) gives σ 2 N bkg i Thus Eqn. (3.5) can be approximated as 49. χ 2 χ 2 1 1 + 49 15000 0.997χ 2 (3.12) where 15000 is read from the peak of Fig. 3.1. When m µµ = 9.43 0.185, which corresponds to the bin farthest away from the peak position in the left tail within the fit range (9.43 0.185, 9.43 + 0.185), we get σ 2 N bkg i 106. Eqn. (3.5) then becomes χ 2 χ 2 1 1 + 106 4563 0.977χ 2 (3.13) where 4563 is obtained by using Eqn. (3.7). When m µµ = 9.43 + 0.185, which corresponds to the bin farthest away from the peak position within the fit range (9.43 0.185, 9.43 + 0.185), we get σ 2 N bkg i 96. 23

Eqn. (3.5) then becomes χ 2 χ 2 1 1 + 96 4471 0.979χ 2 (3.14) where 4471 is obtained by using Eqn. (3.7). The real χ 2 should thus be somewhere between the above extreme cases: 0.977χ 2 χ 2 0.997χ 2. (3.15) This means the finite MC sample and the background parameterization used in our analysis will reduce the χ 2, which is calculated by using Eqn. (2.2) with the assumption of infinite MC sample and accurate background theory, by at most 2.3%. Since the effect reduces χ 2 by 2% at most, it has a small effect (at most 1%) on the 1σ band of simulation paramters, which are obtained by increasing the χ 2 by 1 from χ 2 min. To cover this effect, we can scale those systematic errors (see Table 5.1) by a factor of 1.01. The increase in those systematic errors turns out to be negligible when compared with the 25 MeV precision goal of M W measurement. In summary, we do not need to worry about the χ 2 calculation using Eqn. (2.2) since it is an approximation good enough for our purpose. A more general discussion of fitting using finite Monte Carlo samples can be found at [27]. 24

Template Fitting with Maximum Likelihood 4 4.1 Template construction In our analysis, a template is a binned multi-dimensional histogram generated using MC simulated Υ µ + µ events [28]. It has discrete trial values of p/p with equal distances in the x axis and binned invariant mass spectrum m µµ in the y axis. The number of events is in the z axis. There are two kinds of templates used in this analysis. One is an inclusive template, which is generated using all simulated events; the other is a non-inclusive template in bins of physical quantity R, which can give valuable information on how the extracted p/p varies as a function of R. The sum of the non-inclusive templates over all bins in R will lead to the inclusive template and the splitting of the inclusive template according to the range of R values gives non-inclusive templates. We use non-inclusive templates in R = cot θ bins, where cot θ = cot θ µ + cot θ µ (see definitions of θ and cot θ on pg. xi), to study the possible bias in our data which has not been corrected by off-line calibration [29] and alignment [30]. Details of data correction in cot θ bins are shown in Sec. 4.4.3. After the data bias is removed, we 25

then use non-inclusive templates in R = 1/p µ T bins to check the material description in our detector simulation, since the linear dependence of p/p on 1/p µ T indicates the amount of material parameterized in our fast detector simulation does not match the real detector material (see reference [13] for the reasoning). The validation of material in our simulation is shown in Sec.4.4. Once the above two steps are successfully done, we can claim we understand the data and our simulation describes the data well. As a final step, we then use the inclusive template to fit against m Data µµ to extract the final momentum scale correction factor p/p. 4.1.1 Construction of the inclusive template For a given simulated event, we can calculate the momentum p µ and energy E µ using the track parameters associated with each µ. Once we have p µ and E µ, we can use the definition of invariant mass (see Appendix A) to calculate the invariant mass of µ + and µ : m µ + µ = (E µ + + E µ ) 2 ( p µ + + p µ ) 2 (4.1) with E µ ± = m 2 µ + (p x µ ±2 + p y µ ±2 + p z µ ±2 ). (4.2) If we repeat the calculation for each simulated event using Eqn. (4.1) and (4.2), we will then get a distribution of m MC µ + µ which is called the invariant mass spectrum of µ + µ. We bin this m MC µµ spectrum in such a way that it has the same binning as the m Data µµ spectrum. Using all simulated events, a m MC µµ inclusive template is formed by a series of m MC µµ binned histograms which are generated by multiplying momentum scale factor (1 + p/p) in front of p i µ ± (i = x, y, z) in Eqn. (4.1) and (4.2). As the p/p varies from 0.004 to 0 with a step of 5 10 6, or equivalently as (1 + p/p) varies from 0.996 to 1 with a step of 5 10 6, the 800 constructed m MC µ + µ 26

spectra will be slightly different from each other; see Fig. 4.1 for the comparison of m MC µµ spectrum generated at p/p = 0.004 and m MC µµ spectrum generated at p/p = 0. All other m MC µµ spectra will lie between them. 3 10 events 6000 4000 p/p=0 Entries 1.431928e+08 Mean 9.423 RMS 0.1282 p/p=-0.004 Entries 1.429369e+08 Mean 9.387 RMS 0.1265 2000 0 9 10 11 m µµ (GeV) Figure 4.1: m µµ spectrum from MC for p/p = 0.004 and p/p = 0. For ease in accessing all p/p trial values in an efficient way, we re-arranged the inclusive template by turning the 1-to-1 correspondence between m MC µµ (i) and ( p/p) i into a binned 3-d histogram, with m MC µµ and p/p as its y and x axes, respectively, and N event the z axis. This is illustrated in Fig. 4.2, where a slice of 3-d histogram along p/p axis is the corresponding invariant mass spectrum m MC µµ (i) for a given value ( p/p) i. Since it is a 1-to-1 correspondence between p/p input value and the generated m MC µµ spectrum, we do not need to worry about the case where two or more spectra are generated with the same p/p input. Using this template, the fitter goes through all 800 m MC µµ histograms and fits against the m Data µµ spectrum by constructing the likelihood function (see Sec. 4.2). Thus, we will have 800 likelihood values. The m MC µµ spectrum that matches m Data µµ spectrum best will give the maximum likelihood. Once we find the best m MC µµ spectrum i, we will know the associated p/p value 27

Figure 4.2: Inclusive Monte Carlo templates constructed with Υ µ + µ simulated events. The p/p axis is binned into 800 bins from 0.004 to 0 with equal width 0.005 10 3. The m µµ axis is binned into 320 bins from 8.8 GeV to 11.2 GeV with equal bin width 0.0075 GeV (7.5 MeV). p/p = 0.004 + 5 10 6 (i 1). This p/p is thus the Maximum Likelihood estimate of parameter p/p. The issues of step size choice in p/p and binning in m µµ are discussed in Chapter 2. Our step size choice in p/p and binning choice in m µµ turn out to be negligible when compared with the precision we want. 4.1.2 Construction of non-inclusive templates in parameter space R The same idea can be extended to the construction of the non-inclusive template in R (R = cot θ or R = 1/p µ T ) with different purposes. As mentioned before, templates in cot θ bins can help to figure out the bias in data by studying how the extracted p/p varies as a function of cot θ. Templates in 1/p T (µ) bins are used to determine whether the material description in our simulation matches the data. 28

We construct 16 non-inclusive templates in R = cot θ bins by selecting events passing requirement 1.6 + 0.2 (i 1) < cot θ < 1.6 + 0.2 i with i = 1,, 16 and construct 12 non-inclusive templates in R = 1/p T (µ) bins by selecting events passing requirement 0.025 (j 1) < 1/p T (µ) < 0.025 j with j = 1,, 12. There, each of the non-inclusive templates looks like Fig. 4.2, but with fewer N events since it is just a sub-sample of the inclusive template. Take R = cot θ as an example: we bin the cot θ space from 1.6 to 1.6 into 16 bins with equal bin width 0.2. The range (-1.6, 1.6) is chosen based on a previous analysis [13]. For a given cot θ bin i (i = 1,, 16), we first select simulated events passing the requirement 0.16 + 0.2 (i 1) < cot θ < 0.16 + 0.2 i, then use those selected events to construct m MC µµ spectrum as p/p varies in the same way as discussed in Sec. 4.1.1. For each cot θ bin, we can generate a corresponding template in forms of 3-d binned histogram and get a MLE of p/p associated with that cot θ bin. By repeating the same process over all 16 cot θ bins, we can get 16 MLEs of p/p. Using those 16 MLEs and 16 cot θ ranges, we can then study the dependence of MLEs on cot θ. Since different values of cot θ represent different decay topologies [13], the dependence of MLEs on cot θ means the MLEs depend on how Υ decays to µ + µ, which is contradictory to the physics principle that invariant mass is invariant in any reference frame. A conclusion is then drawn that there must exist a bias in the data in the measurement of mass, causing the measured mass to depend on cot θ µ ±. This bias needs to be corrected using a physicsbased hypothesis such that the dependence of MLE on cot θ is removed. See Sec. 4.4.3 for detailed discussion of removing the data bias. 4.2 Maximum Likelihood method The method of Maximum Likelihood (ML) is a statistical technique to estimate the parameters given a finite sample of data [18] [31] [32]. It is the core technique for 29