Data Fitting - Lecture 6

Size: px
Start display at page:

Download "Data Fitting - Lecture 6"

Transcription

1 1 Central limit theorem Data Fitting - Lecture 6 Recall that for a sequence of independent, random variables, X i, one can define a mean, µ i, and a variance σi. then the sum of the means also forms a mean and the sum of the variances also forms a variance. This is true for any distribution, provided the X i are independent, and the means and variances are mathematically reasonable. The central limit theorem states how the sum is distributed for N ; N S lim N N µ i σ i P N (0, 1) In the above, S = µ i, and P N (0, 1) is the normal distribution with mean 0 and variance 1. This means that no matter the original distribution for the variables, the mean of the means is normally distributed for a sufficiently large sample. The theorem holds in the general case, but here assume that µ i = µ and σ i = σ. Suppose a set of random variants, X i with expectation values, µ, and variances, σ. Define; Y N = Xi N Z N = Y N µ σ/ N The expectation value and variance of Z N is; E(Z N ) = 0 V (Z N ) = 1 Then we find that; lim N P(Z N < a) = 1/π a dx e X / For example, it is generally accepted that a χ distribution with N 30 is reasonably normal. To consider normality of a distribution, moments higher than may be important. 1

2 Example As previously mentioned, any Monte Carlo simulation requires the generation of a series of random numbers. These are usually distributed between 0 and 1, but a number distributed according to a probability density function is also useful. Using the Central Limit Theorem one can produce a random number generator distributed by the Normal probability distribution. Choose the i th number from a uniform generator to represent µ i. Then for a sequence of random numbers between 0 and 1, N, form; N µi N/ g = N/1 In the above we have used that the average of a flat distribution, µ i = 1/ with a variance, σ = 1/1. (A uniform distribution for a X b has E(X) = a + b and V (X) = E(X ) µ (b a) = 1 ) It is clear that the normal distribution, P N (0, 1) is generated as N. In practice, g is close to normal when N 10, but the tails of the distribution are cut with respect to the Normal distribution. 3 Covariance Suppose a normal distribution defined by the two random variables x and y. Bayes theorem can be used to write; P x (x/y) = P(x y)/p y (y) P y (y/x) = P(x y)/p x (x) Where we have previously shown that if the probabilities are independent; P x (x/y) = P x (x); P y (y/x) = P y (y); The meaning of the intersection of the probability sets A and B is shown in Figure 1. The expectation value (mean) is the 1 st moment; E(x) = dxdy xp(x y) and the expectation of a general function f(x, y) is ;

3 A B A B A B Figure 1: The intersection of probability distributions E(f) = dxdy f(x, y) P(x y) The variance is defined as (x 1 = x and x = y), ; σ (f) = dxdy E[(f E(f)) ] P(x y) If the variables are independent, P(x y) = P x (x)p y (y). The double integral separates into the multiplication of the integral over x and the integral over y. E(x) = dxdy xf(x, y) E(y) = dxdy y f(x, y) σ x = E([x E(x)] ) σ y = E([y E(y)] ) Thus for example, the probability of P(x y) is the product of the two normal probability distributions in x and y which is also a normal distribution. 4 Covariance A joint probability distribution of two random variables has a covariance defined by; 3

4 cov(x, y) = E([x E(x)][y E(y)]) = E(xy) E(x)E(y) If the random variables are mutually independent then their joint probability density factors, f(x, y) = f 1 (x)f (y). This results in the expectation value; E(xy) = dxf 1 (x) x dy f (y) = E(x)E(y) Thus the covariance vanishes for independent variables. However, the converse is not necessarily true.that is, if the covariance vanishes the variables are not necessarily independent. The correlation coefficient is defined by; corr(x, y) = ρ(x, y) = cov(x, y) σ x σ y Covariance can be defined between any two variables of a many-variable system. A correlated error would develop if for example, the density were measured as a function of temperature and pressure. For a given density, temperature and pressure are correlated by the equation of state. Thus for the density, ρ; δ ρ = ρ T dt + ρ P dp (δ ρ) = ( ρ T ) dt + ( ρ P ) dp + ( ρ T ρ )( ) dt dp P Thus correlations are present and if the equation connecting the variables is not linear then we must approximate the errors by the first order differential terms, ie we assume the errors are small. As an example, Table finds the correlation between two random selections of random numbers. These are given in the 1 st two columns. The next columns give the deviations from the average values, and the last column gives the covariance. The average variance is divided by N 1 instead of N since one degree of freedom has been used to find the mean. The correlation coefficient is obtained from the covariance by dividing by the standard deviations as illustrated above. The results from Table are; ρ(x, y) = (x µx )(y µ y )/9 σ x σ y = 0.00 As ρ 0 the data do not seem correlated. The correlation coefficient, ρ, is bounded by ±1. Note that the covariance of a random variable with itself equals the variance of that variable. 4

5 Figure : The correlation between sets of 10 random numbers 5 Error for a multi-variant system Suppose two independent, normal distributions of the variable x and y. The probabilities about the means are; P(x) = 1 π σx e x /σ x P(y) = 1 π σy e y /σ y Because the variables are independent; P(x y) = 1 πσ x σ y exp[ (1/)( x σ x ) (1/)( y σ y ) ] Then as a matter of notation write; ( ) 1/σ M = x 0 0 1/σy For the e 1 error contour in the (x, y) plane; ( ) ( ) x x y M = 1 y 5

6 Multiplication gives the equation; ( x σ x ) + ( y σ y ) = 1 This generates an ellipse with axes, x and y in the x, y plane. The inverse of the matrix is M M 1 = I. ( ) σ M 1 = x 0 0 σy Here M 1 is the error matrix. It is diagonal because x and y are independent. Off-diagonal terms indicate error correlations. In general an element of the error matrix due to variables x i is connected to the previous notation by identifying x 1 = x, x = y. M ij = E([x i E(x i )][x j E(x j )]) The error matrix is symmetric and off diagonal terms are given by the covariance; cov(x i, x j ) = E(xy) E(x)E(y) E(x i x j ) = dxdy P(x i x j ) x i x j Although the above development assumed a normal distribution, the error matrix formulation is now defined for any probability distribution. However, the development is based on the normal distribution, and assumes that the measured function is linear or the errors are small. 5.1 Example First suppose (x, y) are independent with; σ x = /4 σ y = / The error matrix is diagonal as shown in Figure 3. Suppose we rotate the axes by an angle θ. The rotation matrix is; ( ) cos(θ) sin(θ) sin(θ) cos(θ) This produces the coordinates x and y. ( ) ( ) x x = cos(θ) y sin(θ) y x sin(θ) + y cos(θ) 6

7 1/ X X X 1 1/8 1/8 X 1 4 X 1+ X = 1/ 1/ Figure 3: An example of uncorrelated and correlated error in variables To continue the above example, let cos(θ) = 1/. Substitute into the error matrix equation for the new coordinates, and invert the error matrix. This results in, x [ cos (θ) σ x + sin (θ) σy ] + y [ sin (θ) σx 1/[13 x 6 (3) x y + 7 y ] = 1 + cos (θ) σy ] x y [cos(θ) sin(θ)][1/σx 1/σy] = 1 The probability distribution has not changed but the error axes are rotated and there is a correlation. The matrix equation is ; ( ) ( x y 13/ 3 ) ( ) 3/ 3 x 3/ 7/ y = 1 The error matrix is the inversion of M. σ x = 14/64 σ y = 6/64 cov(x, y ) = 6 3/64 ( ) M 1 = (/64) This is shown in the figure 3. If the covariance were negative it would mean that increasing x means decreasing y. The general form for a normal distribution is 7

8 P(x, y) = 1 1 π σ x σ EXP[ (1/)( 1 y 1 ρ 1 ρ )[x /σx + y /σy ρ xy σ x σ ]] y The correlation coefficient is ρ. Now extend this to k variables. The probability is; P = 1 1 (π) k/ M 1/ EXP[ (1/) x M x] In the above M 1 is the error matrix, x is the variable vector, and M is the determinant. In the -D case; ( ) M 1 σx = cov(xy) cov(xy) σy ( ) M = 1 1/σ x ρ/σ x σ y 1 ρ ρ/σ x σ y 1/σy Note that MM 1 = 1 and 1 1 ρ is the normalization. Correlations in multi-dimensions are difficult to handle and should be avoided if possible by working with independent variables. 6 Changing variables in the error matrix The error matrix can be obtained by observing the measurement of a set of variables a number of times while holding all other variables fixed,. If the variables are independent, the off-diagonal terms vanish. On the other hand the error matrix can be manipulated into diagonal form or in fact, the error for any variable set can be found. Thus suppose we measure a function of variables, y = y(x 1, x ). The error in y is then; δy = y x 1 δx 1 + y x δx δy = ( y x 1 ) δx 1 + ( y x ) δx + y x 1 y x δx 1 δx The expectation value E[(y E(y)) ] = δy. The error matrix takes the form; ( ) δy = y y ( ) y δx 1 δx 1 δx x 1 x 1 x δx 1 δx δx y x The matrix is the error matrix, M 1, and the vectors, D, are the errors in the respective coordinates. σ y = D 1 M 1 D 8

9 Note that we deal here with differential quantities approximated by differentials. If y(x 1, x ) is not linear in the variables, the errors should be small or the development is inaccurate. Suppose we change the variable from y to z. In matrix notation, this is written; ( ) y y y δ z y z y δyδz δyδz δ = x 1 x x 1 x 1 x 1 x 1 z z z y z y z x 1 x x x x x This takes the following form where T is the transformation matrix; σ y = D 1 T 1 MT D 7 Example Suppose we measure the coordinates (x, y) of a point, and transform this to a polar coordinates. r = x + y tan(θ) = y/x Assume that (x, y) are independent. To find the (r, θ) error matrix evaluate the transformed matrix. ( σr cov(r, θ) cov(r, θ) σθ ) = ( xr y r y x r r ) ( σ x 0 0 σ y ) ( x r y y r x r The transformation matrix is not symmetric which requires a reordering of the elements. Also the transformation is not linear so the error matrix must be small. cov(r, θ) = xy r 3 (σy σ x ) ( ) σr cov(r, θ) cov(r, θ) σθ = (1/r ) ( x σx + y σy (xy/r)(σy σ x ) ) (xy/r)(σy σx) (1/r )(y σx + x σy) Although the normal distribution was used to develop the approach to error estimation, the normal distribution is not always valid. For example, the tails of the normal distribution fall rapidly, usually so rapidly that small effects can influence a fit to the data. Also in counting, particularly with few events, the binomial (or Poisson) distribution may give a better representation. However, one can introduce techniques to deal with distributions that r ) 9

10 have larger tails or have points that lie well outside normally expected errors (outliers). The subject of robust estimators (covered later) deals these cases. 8 Modeling of data - General comments Given a set of data, one usually wishes to summarize them by fitting to some model. This means, adjusting the parameter set to agree with the model using some criteria. Modeling can be as simple as selecting a set of polynomials with appropriate coefficients that best fit the data, or by selecting parameters in a completed theory. Thus in general; 1. One must obtain a set of optimum parameters that represent a set of functions.. One must obtain an error estimate in these parameters 3. One must obtain an over all measure of the statistical goodness of fit of the model to the total data set. To implement the fitting procedure one needs a figure-of-merit by which comparison between different parameter sets can be made. This function measures agreement between the data set and the model for a particular choice of parameters. We proceed by finding the minimum in the figure-of-merit function with respect to the parameters. Of course the data are not exact and a model, even if correct, will not exactly fit the data points. Thus the goodness-of-fit is compared to a statistical standard. The most used test is based on χ, and defined by; χ = data (Observed value - expected value) (error in observed value) The calculation of χ results in a χ probability distribution which allows a goodness-of-fit to be determined by the probability that on repetition different expectation values will be found. 9 Least squares as a Likelihood Estimator Suppose we fit N data points, X i, to a model with M parameters, a j. Using the model the prediction is; y(x) = y(x i, a j ) for all i and j. The figure of merit function is χ. 10

11 χ = N [y i y(x i ; a j )] Then, it is not meaningful to ask what is the probability that a set of parameters is correct? because we do not draw the parameters from a statistically infinite number of models. There is only one model and the data set is fitted to this model. We can determine, however, the probability that a data set occurs given the parameters. Then we must apply Bayes theorem. Suppose we have a signal, F(ω, t), and a source of background noise, G(σ). We obtain a set of measurements, D i, for a set of independent parameters, t i. D i = F(ω, t i ) + G(σ(t i )) The object is to obtain ω. Assume the noise function is normally distributed. P(D ω) = ( 1 ) N ( π i Q i = (D i F(ω, t i )) σi To reduce complexity, define; 1 σ i ) EXP[ i Q i ] (σ/σ i )D i D i (σ/σ i )F(ω, t i ) F(ω, t i ) P(D ω, σ) = ( 1 π ) N ( 1 σ )N EXP[ i Q i ] Q i = (D i F(ω, t i )) In this case, σ and ω are to be determined by the fit. The parameters, ω, may enter the probability in various forms. This probability is the likelihood and is product of the separate probabilities of each data point, as they are independent. P N [EXP([D i F i (ω)] /σ )] Maximizing the likelihood is the same as minimizing the negative of its logarithm, and this is the same as minimizing, χ. χ = N (D i F i (ω)) σ The least squares estimator is a likelihood estimation if the measurement errors are inde- 11

12 pendent and are normally distributed. Recall that the normal distribution results in a sum of a large number of small deviations as shown by the central limit theorem. 9.1 Example Fit a straight line to data points as an example of a general technique. Each point, x i, is associated with a data measurement, d i, which has some error, σ i. The model to represent the data is assumed to have the functional form, y = ax + b, where a and b are to be varied to find the best representation of the model to the data. At this point we begin to introduce the experimental uncertainty by applying Bayes theorem; P(M/D, σ) = P(D/M, σ)p(m/σ) P(D/σ) The denominator is a normalization that is obtained by integration over all parameters and is not so important. P(M σ) is the probability of the model, M, and P(D M, σ) is the likelihood of the data, D, given the model. We assume the likelihood is normally distributed and given by; In this case ; P(D M, σ) (1/σ) N exp[ (1/)(Q/σ) ] Q = N [y i ax i b)] where the errors, σ i = σ, are constant. There are N data points, and the model parameters are a and b. Bayes theorem then finds the probability of the model given the data and the error. Note we can calculate the data if we have the model, P(D M, σ) so Bayes theorem inverts the probabilities giving the result we seek. Now we assume a uniform prior probability, P(M σ) = constant. ie all models are equally probable. This means that P(M D, σ) is proportional to the likelihood, so maximizing the likelihood maximizes the probability. P(D M, σ) (parameters i ) exp[ (1/)(Q/σ) ] Q (parameters i ) = 0 This leads to the following simultaneous equations; (y i ax i b) x i = 0 i 1

13 (y i ax i b) = 0 i Put in matrix form; ( x i xi xi N ) ( a b ) = ( ) yi x i yi As we intend to deal with a linear equation set, or will linearize the equations in the limit of small errors, the matrix formulation of the equations is most efficient. The above equations are solved to obtain the best fit values for the parameters, a and b. In this case, the equation is; C A = Q A = C 1 Q 9. Generalization Now suppose each data set has its own standard deviation, σ i, and the model has the general form, y = y(x i, a 1, a a k ), where the a j are model parameters. This is equivalent to a least square fit ( chi-square ). To the extent that the data are normally distributed, the χ function is a sum of N normally distributed functions that are properly normalized. Once fit, these functions are not statistically independent because of the constrained equation set, so that the number of independent equations is ν = N L. Then L is the number of fitted parameters, which is called the degree of freedom. The error in the χ fit is normally distributed. P(χ /a i ) (χ ) ν/ 1 exp[ (1/)χ ] 9.3 Non-linear functions We could attempt to solve non-linear equations by any mathematical technique. However, they are usually too complicated to solve directly so that one can attempt to find a perturbation solution about a minimum in χ space. Thus one looks for a linear expansion of the probability in terms of the parameters, and iterates this to convergence. The error in the fit is obtained from a calculation of the area remaining in the tail of the χ distribution as a function of χ for a specific degree of freedom, see figure 4. As an example, suppose a set of parameters which form the coordinates of the vector, a. Then expand χ ( a) about its minimum value. 13

14 Figure 4: Percentage error in the tail of χ distributions χ ( a) γ d a + (1/) a D a In the above, d, is the vector with components, χ a i a= a0 and D ij = χ a i a j. Thus; χ = D a d The above equation is designed so that χ = 0 at the minimum where a = a 0. The perturbation equation is then; a p = a + D 1 [ χ ( a)] Then place a p a in the above and iterate to convergence. Near the minimum the function χ is parabolic and a is changed so that the iteration follows the path of steepest descent along the parabola to the minimum. 14

15 10 Maximum likelihood method For some problems, the maximum likelihood method is easier to apply than the least-square technique, but for a normal distribution the results are statistically equivalent. However, the likelihood method can be used with a general probability density function, and Bayes theorem can be applied as a learning filter. It has the drawback of needing a normalized probability density that must be updated as the parameters change. We find that; 1. The maximum likelihood uses events as they occur (see the Kalman filter described later). The likelihood can be used with low statistics where it is most efficient 3. The likelihood and χ methods are equivalent for a normal distribution 4. The likelihood requires substantial computation, especially due to the need for renormalization of the probability density. 5. It is difficult to determine the errors when using the likelihood method. 11 Importance of Normalization of the Likelihood Suppose a set of data consistent with the angular distribution; y = dn d(cos(θ)) = N[1 + (b/a)cos (θ)] In the above N is the normalization factor which makes; 1 1 N = d(cos(θ)) y = 1 1 [1 + b/3a] Note that if y does not remain appropriately normalized the maximum likelihood method does not work. For the i th event we calculate; y i = N[1 + (b/a)cos (θ i )] This is the probability density of observation of that event. Obviously it depends on (b/a). Then apply Bayes theorem; P(w D, M) = P(D w, M)P(w m) P(D M) 15

16 We identify the above as; P(post) = P(likelihood)P(prior) P(normalization) Thus we must maximize the likelihood based on the probability estimator (ie the probability density function (pdf)). Note that the pdf does not need to be normally distributed. For the example here, the likelihood, L, is a product of the y i for each event of the sample. This represents the probability of independent events. L(a/b) = M y i Note that the probability should not depend on the order of events in the product so there should be a factor of N! multiplying the likelihood, however constant factors are not important. It is only the factors of (b/a) that need to be considered. To obtain the result, we maximize L (actually ln[l]) by varying the parameters. Therefore write; l = ln[l] = N ln(y i ) For a large number of observations (n ), L tends to be normally distributed, at least near its maximum. Thus by expansion; l = l max + dl dp δp + (1/) d l dp δ P + At the maximum the second term, dl, vanishes. If we assume; dp L = A EXP[ (P P 0) c ] then we identify; l = ln(a) δ P/c and d l dp = 1/c. Then if L is normally distributed, the standard distribution can be used for the error. On the other hand we could use, [ l P ] 1/, or an average of this function over the measured range. For an element of the inverse error matrix we use; M ij = P i P j l where P k (i, j = k) is evaluated at the maximum value of P. Then we see that; 1. The maximum likelihood can use events as they occur, so one does not need data tables and it can be used for a few events. 16

17 . It is most efficient for low numbers of events. 3. The likelihood and least squares methods are equivalent when the probability and errors are normally distributed. 4. When constraints are to be imposed on the parameters, then these are inserted by restricting the parameters to remain in the allowed regions. Lagrange multipliers can be used where necessary. However, problems may arise if the maximum probability is on or near a boundary. 5. It is always best to fit the data with background without attempting to subtract out the background events. 6. The likelihood method requires substantial computation, especially due to the normalization requirements. 7. It is difficult to estimate error using the likelihood method, and to determine how well the data are represented by the model. 1 Examples We wish to determine the lifetime of a decaying particle. N events are observed, characterized by the flight path l i, from production to decay point. The experiment has an upper and lower bound on the distance it can measure. The time as a function of distance is; t i = l i β i γ i c In the above, βγc is the Lorentz boost for the particle. The limits are imposed as; t i (mx) = l i(mx) β i γ i c t i (mn) = l i(mn) β i γ i c The observations are independent so the likelihood is the product of the probability of each observation. L = N (1/τ) e t i/τ e t i(mn)/τ e t i(mx)/τ The value of τ is the lifetime to be determined, and the denominator of the above equation normalizes the probability. Then take the log of the likelihood. ln[l] = N [ t 1 /τ ln[e ti(mn)/τ e ti(mx)/τ ] N ln(τ) 17

18 The maximum likelihood estimate we seek, τ, is obtained by; ln[l] τ = (1/τ ) N [t i g i (τ)] Nτ = 0 g i (τ) = t i(mn)e t i(mn)/tau t i (mx)e t i(mx)/τ e t i(mn)/τ e t i(mx)/τ This equation may be solved by Newton s method which we applied in the discussion of non-linear equations. This is an iterative approach. To determine the variance, take the inverse of the nd derivative of the log-likelihood. If we define F = N [t i g i ] Nτ then; ln(l) τ = F/τ 3 + F /τ = F/τ F /τ 0 = 0 where τ 0 is the extracted lifetime from the likelihood estimator. Thus the standard deviation using σ = l P is; σ τ 0 N[ 1 + g 0 + (/τ 0 )( t g 0 )] g i = (1/N) N g i As another example we wish to fit a measurement of a particle mass, M 0, to a Breit-Wigner distribution. F(M i ) = Γ/ (M i M 0 ) + (γ/) + a[(1 + b(m i M ) + c(m i M ) The second term in the above is the assumed background with a, b, c parameters. The resonance expression has parameters Γ, M 0. We allow a fit for M mn < M 0 < M mx, so we need to keep the likelihood normalized over this interval. There are then 5 independent parameters. The log-likelihood is defined as; l = ln(y i ) y i = M mx F(M i ) M mn dm i F(M i ) Note that the normalization marginalizes out M i in the equation. Remember at each step in an iteration one must keep the likelihood normalized. In the above case the normalization integral can be analytically evaluated. It will depend on the parameters to be fitted, so when the minimization of l is taken, the normalization contributes to the derivative. This procedure while straightforward, requires significant computation. 18

19 As a final example we wish to fit a circle to the motion of a charged particle moving in a magnetic field. The equations of non-relativistic motion are (the result is also relativisticly correct); m dv x dt m dv y dt m dv z dt = qb z V y = qb z V x = 0 Ignore motion along the field direction. The coupled equations for the velocity and position are solved to obtain; X = X 0 + Rsin(ωt + φ) Y = Y 0 + Rcos(ωt + φ) ω = qb z m There are 3 unknown parameters, the center of the circular motion (X 0, Y 0 ) and the radius of the circle, R. Now we set up a least-square likelihood function in the form; F = n [Di R ] i D i = (X i X 0 ) + (Y i Y 0 ) Then minimize the estimator, F. F X 0 = i F Y 0 = i F R = i Solving these equations; R = X 0 = i N D i FC GB AC B (X i X 0 )[D i R ] = 0 (Y i Y 0 )[D i R ] = 0 [D i R ] = 0 19

20 Y 0 = GA FB CA B A = ( i X i ) N i X i B = ( i Y i )( i X i ) N i X i Y i C = ( i Y i ) N i Y i F = (1/)[ i X i + i Y i ] i X i (N/)[ i X 3 i + i Y i X i] G = (1/)[ i X i + i Y i ] i Y i (N/)[ i X 3 i + i Y i X i ] Note that the value of R is determined by the mean value of R which weights larger values of R more signifcantly. The error matrix has off diagonal terms so the parameters are correlated. M ij = l X 0 R = 8 i (X i X 0 ) 13 Hypothesis testing In developing a χ (or another test), or in the generation of simulated data to provide design and physical insight into a problem, a random selection of events from a probability distribution is required. This selection is initially begun by obtaining a set of random numbers from a computer program. Obviously, truly random numbers cannot be calculated, but a pseudo-random string of numbers which passes certain checks on randomness are available in computer routines. The simplest of these are linear congruential generators and have a finite length before they repeat, depending on their sophistication and the number of computer bits. There also can be correlations between random numbers in the string. Therefore, one should always use a well designed random number generator. The selection of a random deviate from any probability distribution P(x) can be obtained using the rejection method. Two random numbers are selected a 1 and a to first choose the variable, a 1 = x max, and then select the probability height, a = f max (x). If a P(x) then a random event is selected,otherwise the process is repeated. This technique is illustrated in Figure 5 The procedure outlined in the last section describes how well a model fits the data. The χ formulation is expected to be normally distributed about the mean value of the degrees of freedom with a variance equal to σ. Using the error function we can then determine the probability, F, that the χ curve lies exceeds the value of χ. If the experiment were 0

21 nd random deviate P(x) accept reject F(x) 1 st random deviate x Figure 5: A graphical illustration of the rejection method to select an event from a general probability distribution repeated N times the obtained value of χ would exceed the average value by F N. After completion of the analysis, there are possibilities; 1. We could reject a true hypothesis due to a statistical fluctuation. We could accept a false hypothesis due to a good χ 3. When testing a hypothesis one should test the entire distribution 4. One could reject data points using a χ cut that lie outside some value. However, this is dangerous as it may effect the statistical analysis. Note that we have no way of testing all hypotheses so we can t direct compare them and test the probability of the hypothesis being correct. We can test only how well a particular model represents the data. 14 Combining measurements Suppose we have a measurement and obtain a probability that a set of variables, a 1 (i), with errors, σ 1 (i), representing the data with a minimum in χ. Now suppose we take a 1

22 nd measurement obtaining the values a (i) and σ (i). Bayes theorem can be used to show that these may be combined to obtain a new set of parameters and errors. For statistically independent measurements of the same variable; a(i) = a 1(i)/σ 1 (i) + a (i)/σ (i) 1/σ 1 (i) + 1/σ (i) 1/σ = 1/σ 1 (i) + 1/σ (i) If not statistically independent, try to find an independent set of variables, and then combine the errors, transforming the matrix of errors back to the original set of variables. If this cannot be done, then use the covariance to obtain the off diagonal terms and develop the variance as described in an earlier section. Near the edge of a physical region one can apply Bayes theorem. Suppose one is measuring a mass near zero. The mass cannot be negative, so the result must have a lower bound. Using Bayes theorem one uses a prior which excludes the possibility that the mass 0. Using a flat prior, for example; P(Model D) = 1 0 M M max 0 otherwise Then the normalization must be over 0 M max as the insertion of posterior and prior assumptions are iterated. The Likelihood simply contains the probability of the instrumental efficiency for the detection of the mass Uncertainty due to systematic error Suppose a measurement with a normally distributed result. There are also calibration errors and other instrumental errors which are not associated with event counting. Generally, inclusion of systematic error is difficult to handle unless this error can be modeled. If the systematic errors are normally distributed these could be included by addition in quadratures, assuming statistical independence. σ = σ (counting) + σ (i) Most often the systematic error is not normal, and for other distributions the error model should be entered into a probability density function. A problem arises due to the fact that the measurements produce correlated errors so that the likelihood function is not a product of the separate probabilities. However, one may assume that the sum and difference of the errors is normally distributed and then compute an average value of the error as explained in a section above. As an example, suppose a measurement with a normally distributed signal. We also as-

23 sume a normally distributed statistical error. Assume that these are independent variables. The prior probability is then assumed to be; P(M x, z) = P(M 1 x)p(m 1 z) = A 1 e σ z z πσz The Likelihood is; P(x µ x, z) = 1 e (x µ x z) σ πσ P(µ x x ) = 1 πσσ z dz e (x µ x z) /σ e (z) /σ z /Norm Norm = 1 πσσ z dz dµx e (x µx z) /σ e (z) /σ z This results in the expected; P(µ x x ) = 1 πσ T e (x µx) /σ T In this case the systematic error is included as would be any independent variable. If the error has a mean different from zero, this would be handled by inserting z (z z 0 ). In the situation where variables are measured with the same apparatus, they will be correlated though the systematic error. The inclusion of systematic error can be handled it can be modeled. 15 Example Suppose an instrument measuring energy. Its absolute energy calibration is assumed t0 lie between 1-10%. The statistical error of a test measurement is 18%. The measured energy is 30 units. Proceed as follows. 1. The energy is corrected using the best guess for the calibration; (10-1)%/ = 5 %. This has a statistical uncertainty of 18/ E % 30( ± 0.18/ 30(1.05)) = 31.5 ± 1.0. Include the uncertainty in the calibration constant. This uncertainty is constant between (1-10)% so the standard deviation for a flat distribution is (0.1/ 1)31.5 = Thus σ = (1) + (0.9) = 1.8 σ 1.3 3

24 16 Robust estimators In any experimental data there are always some points that lie well outside reasonable error. Certainly such points occur for a normal distribution, and these points, particularly when using a χ test, can influence the result. Note that points far from the mean, contribute the most in χ. Robust estimators are less sensitive to these points, and can be used in some cases to trim the data. BUT Be Careful The use of these estimators is not based on statistics. One such estimator uses the likelihood function; ρ = Y i y(x i ) σ i This equally weights outlying points with those close to the mean. Another distribution which results in a Lorentzian probability is; ρ = ln[1 + (1/)( Y i y(x i ) σ i )] With this later distribution the weighting factor increases and then decreases as ( Y i y(x i ) σ i )] increases. An illustration of the application of a robust estimator is shown in figure 6 It is also possible to remove a number, n/, of events furthest above the mean and redetermine the estimators. Then remove n/ of the events furthest below the mean and re-determine the estimators. Then take the average of the two results. This assumes, of course that the distribution has a mean. As indicated previously, the Cauchy distribution does not have a mean. Table 1 gives the best mean estimator for various distributions. Table 1: The table shows best estimator of the mean for various probability distributions Distribution Normal Uniform Cauchy Minimum Variance Estimator Simple Mean Midrange Maximum Likelihood Estimate 4

25 Figure 6: An illustration of the application of a robust estimator to better fit a straight line to data with outliers 5

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Probability quantifies randomness and uncertainty How do I estimate the normalization and logarithmic slope of a X ray continuum, assuming

More information

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows. Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage

More information

Physics 403. Segev BenZvi. Propagation of Uncertainties. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Propagation of Uncertainties. Department of Physics and Astronomy University of Rochester Physics 403 Propagation of Uncertainties Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Maximum Likelihood and Minimum Least Squares Uncertainty Intervals

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

18 Bivariate normal distribution I

18 Bivariate normal distribution I 8 Bivariate normal distribution I 8 Example Imagine firing arrows at a target Hopefully they will fall close to the target centre As we fire more arrows we find a high density near the centre and fewer

More information

01 Probability Theory and Statistics Review

01 Probability Theory and Statistics Review NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement

More information

Statistical Methods in Particle Physics

Statistical Methods in Particle Physics Statistical Methods in Particle Physics Lecture 3 October 29, 2012 Silvia Masciocchi, GSI Darmstadt s.masciocchi@gsi.de Winter Semester 2012 / 13 Outline Reminder: Probability density function Cumulative

More information

5 Operations on Multiple Random Variables

5 Operations on Multiple Random Variables EE360 Random Signal analysis Chapter 5: Operations on Multiple Random Variables 5 Operations on Multiple Random Variables Expected value of a function of r.v. s Two r.v. s: ḡ = E[g(X, Y )] = g(x, y)f X,Y

More information

Multiple Random Variables

Multiple Random Variables Multiple Random Variables Joint Probability Density Let X and Y be two random variables. Their joint distribution function is F ( XY x, y) P X x Y y. F XY ( ) 1, < x

More information

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Lecture 25: Review. Statistics 104. April 23, Colin Rundel Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April

More information

If we want to analyze experimental or simulated data we might encounter the following tasks:

If we want to analyze experimental or simulated data we might encounter the following tasks: Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

Statistical Methods in Particle Physics

Statistical Methods in Particle Physics Statistical Methods in Particle Physics. Probability Distributions Prof. Dr. Klaus Reygers (lectures) Dr. Sebastian Neubert (tutorials) Heidelberg University WS 07/8 Gaussian g(x; µ, )= p exp (x µ) https://en.wikipedia.org/wiki/normal_distribution

More information

Statistics notes. A clear statistical framework formulates the logic of what we are doing and why. It allows us to make precise statements.

Statistics notes. A clear statistical framework formulates the logic of what we are doing and why. It allows us to make precise statements. Statistics notes Introductory comments These notes provide a summary or cheat sheet covering some basic statistical recipes and methods. These will be discussed in more detail in the lectures! What is

More information

ACM 116: Lectures 3 4

ACM 116: Lectures 3 4 1 ACM 116: Lectures 3 4 Joint distributions The multivariate normal distribution Conditional distributions Independent random variables Conditional distributions and Monte Carlo: Rejection sampling Variance

More information

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

conditional cdf, conditional pdf, total probability theorem?

conditional cdf, conditional pdf, total probability theorem? 6 Multiple Random Variables 6.0 INTRODUCTION scalar vs. random variable cdf, pdf transformation of a random variable conditional cdf, conditional pdf, total probability theorem expectation of a random

More information

Infinite Series. 1 Introduction. 2 General discussion on convergence

Infinite Series. 1 Introduction. 2 General discussion on convergence Infinite Series 1 Introduction I will only cover a few topics in this lecture, choosing to discuss those which I have used over the years. The text covers substantially more material and is available for

More information

Joint Gaussian Graphical Model Review Series I

Joint Gaussian Graphical Model Review Series I Joint Gaussian Graphical Model Review Series I Probability Foundations Beilun Wang Advisor: Yanjun Qi 1 Department of Computer Science, University of Virginia http://jointggm.org/ June 23rd, 2017 Beilun

More information

Probability Distributions - Lecture 5

Probability Distributions - Lecture 5 Probability Distributions - Lecture 5 1 Introduction There are a number of mathematical models of probability density functions that represent the behavior of physical systems. In this lecture we explore

More information

Statistics and Data Analysis

Statistics and Data Analysis Statistics and Data Analysis The Crash Course Physics 226, Fall 2013 "There are three kinds of lies: lies, damned lies, and statistics. Mark Twain, allegedly after Benjamin Disraeli Statistics and Data

More information

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as L30-1 EEL 5544 Noise in Linear Systems Lecture 30 OTHER TRANSFORMS For a continuous, nonnegative RV X, the Laplace transform of X is X (s) = E [ e sx] = 0 f X (x)e sx dx. For a nonnegative RV, the Laplace

More information

Statistics for scientists and engineers

Statistics for scientists and engineers Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3

More information

Preliminary statistics

Preliminary statistics 1 Preliminary statistics The solution of a geophysical inverse problem can be obtained by a combination of information from observed data, the theoretical relation between data and earth parameters (models),

More information

EE4601 Communication Systems

EE4601 Communication Systems EE4601 Communication Systems Week 2 Review of Probability, Important Distributions 0 c 2011, Georgia Institute of Technology (lect2 1) Conditional Probability Consider a sample space that consists of two

More information

ENGG2430A-Homework 2

ENGG2430A-Homework 2 ENGG3A-Homework Due on Feb 9th,. Independence vs correlation a For each of the following cases, compute the marginal pmfs from the joint pmfs. Explain whether the random variables X and Y are independent,

More information

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1 Classical Artificial Intelligence

More information

Probability Theory and Statistics. Peter Jochumzen

Probability Theory and Statistics. Peter Jochumzen Probability Theory and Statistics Peter Jochumzen April 18, 2016 Contents 1 Probability Theory And Statistics 3 1.1 Experiment, Outcome and Event................................ 3 1.2 Probability............................................

More information

Lecture Note 1: Probability Theory and Statistics

Lecture Note 1: Probability Theory and Statistics Univ. of Michigan - NAME 568/EECS 568/ROB 530 Winter 2018 Lecture Note 1: Probability Theory and Statistics Lecturer: Maani Ghaffari Jadidi Date: April 6, 2018 For this and all future notes, if you would

More information

Statistical Methods in Particle Physics

Statistical Methods in Particle Physics Statistical Methods in Particle Physics Lecture 11 January 7, 2013 Silvia Masciocchi, GSI Darmstadt s.masciocchi@gsi.de Winter Semester 2012 / 13 Outline How to communicate the statistical uncertainty

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R In probabilistic models, a random variable is a variable whose possible values are numerical outcomes of a random phenomenon. As a function or a map, it maps from an element (or an outcome) of a sample

More information

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) 1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For

More information

More than one variable

More than one variable Chapter More than one variable.1 Bivariate discrete distributions Suppose that the r.v. s X and Y are discrete and take on the values x j and y j, j 1, respectively. Then the joint p.d.f. of X and Y, to

More information

Continuous Random Variables

Continuous Random Variables 1 / 24 Continuous Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay February 27, 2013 2 / 24 Continuous Random Variables

More information

Exercises and Answers to Chapter 1

Exercises and Answers to Chapter 1 Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean

More information

The Treatment of Numerical Experimental Results

The Treatment of Numerical Experimental Results Memorial University of Newfoundl Department of Physics Physical Oceanography The Treatment of Numerical Experimental Results The purpose of these notes is to introduce you to some techniques of error analysis

More information

Review of Probability Theory

Review of Probability Theory Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving

More information

LECTURE NOTES FYS 4550/FYS EXPERIMENTAL HIGH ENERGY PHYSICS AUTUMN 2013 PART I A. STRANDLIE GJØVIK UNIVERSITY COLLEGE AND UNIVERSITY OF OSLO

LECTURE NOTES FYS 4550/FYS EXPERIMENTAL HIGH ENERGY PHYSICS AUTUMN 2013 PART I A. STRANDLIE GJØVIK UNIVERSITY COLLEGE AND UNIVERSITY OF OSLO LECTURE NOTES FYS 4550/FYS9550 - EXPERIMENTAL HIGH ENERGY PHYSICS AUTUMN 2013 PART I PROBABILITY AND STATISTICS A. STRANDLIE GJØVIK UNIVERSITY COLLEGE AND UNIVERSITY OF OSLO Before embarking on the concept

More information

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)? ECE 830 / CS 76 Spring 06 Instructors: R. Willett & R. Nowak Lecture 3: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we

More information

Statistics and data analyses

Statistics and data analyses Statistics and data analyses Designing experiments Measuring time Instrumental quality Precision Standard deviation depends on Number of measurements Detection quality Systematics and methology σ tot =

More information

Statistics, Data Analysis, and Simulation SS 2015

Statistics, Data Analysis, and Simulation SS 2015 Statistics, Data Analysis, and Simulation SS 2015 08.128.730 Statistik, Datenanalyse und Simulation Dr. Michael O. Distler Mainz, 27. April 2015 Dr. Michael O. Distler

More information

Data Analysis I. Dr Martin Hendry, Dept of Physics and Astronomy University of Glasgow, UK. 10 lectures, beginning October 2006

Data Analysis I. Dr Martin Hendry, Dept of Physics and Astronomy University of Glasgow, UK. 10 lectures, beginning October 2006 Astronomical p( y x, I) p( x, I) p ( x y, I) = p( y, I) Data Analysis I Dr Martin Hendry, Dept of Physics and Astronomy University of Glasgow, UK 10 lectures, beginning October 2006 4. Monte Carlo Methods

More information

Introduction to Probability and Stocastic Processes - Part I

Introduction to Probability and Stocastic Processes - Part I Introduction to Probability and Stocastic Processes - Part I Lecture 2 Henrik Vie Christensen vie@control.auc.dk Department of Control Engineering Institute of Electronic Systems Aalborg University Denmark

More information

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete

More information

Review of Statistics

Review of Statistics Review of Statistics Topics Descriptive Statistics Mean, Variance Probability Union event, joint event Random Variables Discrete and Continuous Distributions, Moments Two Random Variables Covariance and

More information

Statistics 351 Probability I Fall 2006 (200630) Final Exam Solutions. θ α β Γ(α)Γ(β) (uv)α 1 (v uv) β 1 exp v }

Statistics 351 Probability I Fall 2006 (200630) Final Exam Solutions. θ α β Γ(α)Γ(β) (uv)α 1 (v uv) β 1 exp v } Statistics 35 Probability I Fall 6 (63 Final Exam Solutions Instructor: Michael Kozdron (a Solving for X and Y gives X UV and Y V UV, so that the Jacobian of this transformation is x x u v J y y v u v

More information

2 (Statistics) Random variables

2 (Statistics) Random variables 2 (Statistics) Random variables References: DeGroot and Schervish, chapters 3, 4 and 5; Stirzaker, chapters 4, 5 and 6 We will now study the main tools use for modeling experiments with unknown outcomes

More information

Multivariate Distribution Models

Multivariate Distribution Models Multivariate Distribution Models Model Description While the probability distribution for an individual random variable is called marginal, the probability distribution for multiple random variables is

More information

Signal Processing - Lecture 7

Signal Processing - Lecture 7 1 Introduction Signal Processing - Lecture 7 Fitting a function to a set of data gathered in time sequence can be viewed as signal processing or learning, and is an important topic in information theory.

More information

Data Analysis and Monte Carlo Methods

Data Analysis and Monte Carlo Methods Lecturer: Allen Caldwell, Max Planck Institute for Physics & TUM Recitation Instructor: Oleksander (Alex) Volynets, MPP & TUM General Information: - Lectures will be held in English, Mondays 16-18:00 -

More information

Preliminary Statistics. Lecture 3: Probability Models and Distributions

Preliminary Statistics. Lecture 3: Probability Models and Distributions Preliminary Statistics Lecture 3: Probability Models and Distributions Rory Macqueen (rm43@soas.ac.uk), September 2015 Outline Revision of Lecture 2 Probability Density Functions Cumulative Distribution

More information

3. Probability and Statistics

3. Probability and Statistics FE661 - Statistical Methods for Financial Engineering 3. Probability and Statistics Jitkomut Songsiri definitions, probability measures conditional expectations correlation and covariance some important

More information

Practice Examination # 3

Practice Examination # 3 Practice Examination # 3 Sta 23: Probability December 13, 212 This is a closed-book exam so do not refer to your notes, the text, or any other books (please put them on the floor). You may use a single

More information

Let X and Y denote two random variables. The joint distribution of these random

Let X and Y denote two random variables. The joint distribution of these random EE385 Class Notes 9/7/0 John Stensby Chapter 3: Multiple Random Variables Let X and Y denote two random variables. The joint distribution of these random variables is defined as F XY(x,y) = [X x,y y] P.

More information

04. Random Variables: Concepts

04. Random Variables: Concepts University of Rhode Island DigitalCommons@URI Nonequilibrium Statistical Physics Physics Course Materials 215 4. Random Variables: Concepts Gerhard Müller University of Rhode Island, gmuller@uri.edu Creative

More information

STATISTICS OF OBSERVATIONS & SAMPLING THEORY. Parent Distributions

STATISTICS OF OBSERVATIONS & SAMPLING THEORY. Parent Distributions ASTR 511/O Connell Lec 6 1 STATISTICS OF OBSERVATIONS & SAMPLING THEORY References: Bevington Data Reduction & Error Analysis for the Physical Sciences LLM: Appendix B Warning: the introductory literature

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information

Introduction and Vectors Lecture 1

Introduction and Vectors Lecture 1 1 Introduction Introduction and Vectors Lecture 1 This is a course on classical Electromagnetism. It is the foundation for more advanced courses in modern physics. All physics of the modern era, from quantum

More information

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester Physics 403 Numerical Methods, Maximum Likelihood, and Least Squares Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Quadratic Approximation

More information

Statistics. Lent Term 2015 Prof. Mark Thomson. 2: The Gaussian Limit

Statistics. Lent Term 2015 Prof. Mark Thomson. 2: The Gaussian Limit Statistics Lent Term 2015 Prof. Mark Thomson Lecture 2 : The Gaussian Limit Prof. M.A. Thomson Lent Term 2015 29 Lecture Lecture Lecture Lecture 1: Back to basics Introduction, Probability distribution

More information

This does not cover everything on the final. Look at the posted practice problems for other topics.

This does not cover everything on the final. Look at the posted practice problems for other topics. Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry

More information

Physics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester Physics 403 Choosing Priors and the Principle of Maximum Entropy Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Odds Ratio Occam Factors

More information

Topics in Probability and Statistics

Topics in Probability and Statistics Topics in Probability and tatistics A Fundamental Construction uppose {, P } is a sample space (with probability P), and suppose X : R is a random variable. The distribution of X is the probability P X

More information

Lecture 2: Review of Probability

Lecture 2: Review of Probability Lecture 2: Review of Probability Zheng Tian Contents 1 Random Variables and Probability Distributions 2 1.1 Defining probabilities and random variables..................... 2 1.2 Probability distributions................................

More information

Introduction to Normal Distribution

Introduction to Normal Distribution Introduction to Normal Distribution Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 17-Jan-2017 Nathaniel E. Helwig (U of Minnesota) Introduction

More information

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 89 Part II

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 8. For any two events E and F, P (E) = P (E F ) + P (E F c ). Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 Sample space. A sample space consists of a underlying

More information

Review (Probability & Linear Algebra)

Review (Probability & Linear Algebra) Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint

More information

Final Exam # 3. Sta 230: Probability. December 16, 2012

Final Exam # 3. Sta 230: Probability. December 16, 2012 Final Exam # 3 Sta 230: Probability December 16, 2012 This is a closed-book exam so do not refer to your notes, the text, or any other books (please put them on the floor). You may use the extra sheets

More information

The Delta Method and Applications

The Delta Method and Applications Chapter 5 The Delta Method and Applications 5.1 Local linear approximations Suppose that a particular random sequence converges in distribution to a particular constant. The idea of using a first-order

More information

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) D. ARAPURA This is a summary of the essential material covered so far. The final will be cumulative. I ve also included some review problems

More information

A Probability Review

A Probability Review A Probability Review Outline: A probability review Shorthand notation: RV stands for random variable EE 527, Detection and Estimation Theory, # 0b 1 A Probability Review Reading: Go over handouts 2 5 in

More information

18.440: Lecture 26 Conditional expectation

18.440: Lecture 26 Conditional expectation 18.440: Lecture 26 Conditional expectation Scott Sheffield MIT 1 Outline Conditional probability distributions Conditional expectation Interpretation and examples 2 Outline Conditional probability distributions

More information

Parameter Estimation and Fitting to Data

Parameter Estimation and Fitting to Data Parameter Estimation and Fitting to Data Parameter estimation Maximum likelihood Least squares Goodness-of-fit Examples Elton S. Smith, Jefferson Lab 1 Parameter estimation Properties of estimators 3 An

More information

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu Home Work: 1 1. Describe the sample space when a coin is tossed (a) once, (b) three times, (c) n times, (d) an infinite number of times. 2. A coin is tossed until for the first time the same result appear

More information

Introduction to Computational Finance and Financial Econometrics Matrix Algebra Review

Introduction to Computational Finance and Financial Econometrics Matrix Algebra Review You can t see this text! Introduction to Computational Finance and Financial Econometrics Matrix Algebra Review Eric Zivot Spring 2015 Eric Zivot (Copyright 2015) Matrix Algebra Review 1 / 54 Outline 1

More information

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n JOINT DENSITIES - RANDOM VECTORS - REVIEW Joint densities describe probability distributions of a random vector X: an n-dimensional vector of random variables, ie, X = (X 1,, X n ), where all X is are

More information

STA 2201/442 Assignment 2

STA 2201/442 Assignment 2 STA 2201/442 Assignment 2 1. This is about how to simulate from a continuous univariate distribution. Let the random variable X have a continuous distribution with density f X (x) and cumulative distribution

More information

Chapter 5 continued. Chapter 5 sections

Chapter 5 continued. Chapter 5 sections Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

Multivariate Distributions

Multivariate Distributions IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Introduction to Error Analysis

Introduction to Error Analysis Introduction to Error Analysis Part 1: the Basics Andrei Gritsan based on lectures by Petar Maksimović February 1, 2010 Overview Definitions Reporting results and rounding Accuracy vs precision systematic

More information

Statistical Methods in Particle Physics

Statistical Methods in Particle Physics Statistical Methods in Particle Physics Lecture 10 December 17, 01 Silvia Masciocchi, GSI Darmstadt Winter Semester 01 / 13 Method of least squares The method of least squares is a standard approach to

More information

Robots Autónomos. Depto. CCIA. 2. Bayesian Estimation and sensor models. Domingo Gallardo

Robots Autónomos. Depto. CCIA. 2. Bayesian Estimation and sensor models.  Domingo Gallardo Robots Autónomos 2. Bayesian Estimation and sensor models Domingo Gallardo Depto. CCIA http://www.rvg.ua.es/master/robots References Recursive State Estimation: Thrun, chapter 2 Sensor models and robot

More information

Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2

Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2 You can t see this text! Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2 Eric Zivot Spring 2015 Eric Zivot (Copyright 2015) Probability Review - Part 2 1 /

More information

Introduction to Machine Learning

Introduction to Machine Learning What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes

More information

2 Functions of random variables

2 Functions of random variables 2 Functions of random variables A basic statistical model for sample data is a collection of random variables X 1,..., X n. The data are summarised in terms of certain sample statistics, calculated as

More information

Physics 509: Error Propagation, and the Meaning of Error Bars. Scott Oser Lecture #10

Physics 509: Error Propagation, and the Meaning of Error Bars. Scott Oser Lecture #10 Physics 509: Error Propagation, and the Meaning of Error Bars Scott Oser Lecture #10 1 What is an error bar? Someone hands you a plot like this. What do the error bars indicate? Answer: you can never be

More information

Algorithms for Uncertainty Quantification

Algorithms for Uncertainty Quantification Algorithms for Uncertainty Quantification Tobias Neckel, Ionuț-Gabriel Farcaș Lehrstuhl Informatik V Summer Semester 2017 Lecture 2: Repetition of probability theory and statistics Example: coin flip Example

More information

Statistics for Data Analysis. Niklaus Berger. PSI Practical Course Physics Institute, University of Heidelberg

Statistics for Data Analysis. Niklaus Berger. PSI Practical Course Physics Institute, University of Heidelberg Statistics for Data Analysis PSI Practical Course 2014 Niklaus Berger Physics Institute, University of Heidelberg Overview You are going to perform a data analysis: Compare measured distributions to theoretical

More information

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ). .8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics

More information

Lecture 11. Probability Theory: an Overveiw

Lecture 11. Probability Theory: an Overveiw Math 408 - Mathematical Statistics Lecture 11. Probability Theory: an Overveiw February 11, 2013 Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 1 / 24 The starting point in developing the

More information

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities PCMI 207 - Introduction to Random Matrix Theory Handout #2 06.27.207 REVIEW OF PROBABILITY THEORY Chapter - Events and Their Probabilities.. Events as Sets Definition (σ-field). A collection F of subsets

More information

Strong Lens Modeling (II): Statistical Methods

Strong Lens Modeling (II): Statistical Methods Strong Lens Modeling (II): Statistical Methods Chuck Keeton Rutgers, the State University of New Jersey Probability theory multiple random variables, a and b joint distribution p(a, b) conditional distribution

More information

Covariance. Lecture 20: Covariance / Correlation & General Bivariate Normal. Covariance, cont. Properties of Covariance

Covariance. Lecture 20: Covariance / Correlation & General Bivariate Normal. Covariance, cont. Properties of Covariance Covariance Lecture 0: Covariance / Correlation & General Bivariate Normal Sta30 / Mth 30 We have previously discussed Covariance in relation to the variance of the sum of two random variables Review Lecture

More information

Statistical techniques for data analysis in Cosmology

Statistical techniques for data analysis in Cosmology Statistical techniques for data analysis in Cosmology arxiv:0712.3028; arxiv:0911.3105 Numerical recipes (the bible ) Licia Verde ICREA & ICC UB-IEEC http://icc.ub.edu/~liciaverde outline Lecture 1: Introduction

More information