Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Similar documents
Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Propagation of Uncertainties. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester

Parametric Techniques Lecture 3

Parametric Techniques

STAT 730 Chapter 4: Estimation

Statistics: Learning models from data

CSC321 Lecture 18: Learning Probabilistic Models

Statistics 3858 : Maximum Likelihood Estimators

A Probability Review

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

A732: Exercise #7 Maximum Likelihood

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Foundations of Statistical Inference

Lecture 4: Probabilistic Learning

Bayesian Regression Linear and Logistic Regression

Bayesian Decision and Bayesian Learning

A Bayesian Treatment of Linear Gaussian Regression

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Lecture 10: Parameter Estimation for ODE Models

Physics 403. Segev BenZvi. Credible Intervals, Confidence Intervals, and Limits. Department of Physics and Astronomy University of Rochester

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

7 Gaussian Discriminant Analysis (including QDA and LDA)

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

F & B Approaches to a simple model

Elliptically Contoured Distributions

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

Introduction to Machine Learning

An Introduction to Bayesian Linear Regression

Physics 403 Probability Distributions II: More Properties of PDFs and PMFs

HT Introduction. P(X i = x i ) = e λ λ x i

Review (Probability & Linear Algebra)

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Physics 509: Error Propagation, and the Meaning of Error Bars. Scott Oser Lecture #10

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Week 1 Quantitative Analysis of Financial Markets Distributions A

Statistics, Data Analysis, and Simulation SS 2015

Foundations of Statistical Inference

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Review of probability

COM336: Neural Computing

Bayesian Decision Theory

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

OR MSc Maths Revision Course

Statistics and Data Analysis

Stat 5102 Lecture Slides Deck 3. Charles J. Geyer School of Statistics University of Minnesota

Modern Methods of Data Analysis - WS 07/08

Machine Learning 4771

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Primer on statistics:

Exercises and Answers to Chapter 1

Bayesian Inference for Normal Mean

ECE 636: Systems identification

Matrices and Deformation

Linear Models A linear model is defined by the expression

STAT 830 Bayesian Estimation

Least squares: introduction to the network adjustment

16.584: Random Vectors

Multivariate Statistics

The Multivariate Gaussian Distribution [DRAFT]

Statistical Data Analysis Stat 3: p-values, parameter estimation

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

Functions of Several Variables

Part 4: Multi-parameter and normal models

STAT215: Solutions for Homework 2

Introduction to Machine Learning

Math Review Sheet, Fall 2008

ESTIMATION THEORY. Chapter Estimation of Random Variables

Density Estimation. Seungjin Choi

1 Data Arrays and Decompositions

g-priors for Linear Regression

Physics 403. Segev BenZvi. Classical Hypothesis Testing: The Likelihood Ratio Test. Department of Physics and Astronomy University of Rochester

Bayesian Interpretations of Regularization

The Multivariate Gaussian Distribution

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

6.867 Machine Learning

Machine Learning Lecture 2

18 Bivariate normal distribution I

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Consider the joint probability, P(x,y), shown as the contours in the figure above. P(x) is given by the integral of P(x,y) over all values of y.

Introduction to Maximum Likelihood Estimation

LECTURE NOTES FYS 4550/FYS EXPERIMENTAL HIGH ENERGY PHYSICS AUTUMN 2013 PART I A. STRANDLIE GJØVIK UNIVERSITY COLLEGE AND UNIVERSITY OF OSLO

Covariance function estimation in Gaussian process regression

Statistics for scientists and engineers

ECE531 Lecture 10b: Maximum Likelihood Estimation

Stat 451 Lecture Notes Numerical Integration

CSC411 Fall 2018 Homework 5

2. A Basic Statistical Toolbox

Statistics. Lent Term 2015 Prof. Mark Thomson. 2: The Gaussian Limit

Estimation Tasks. Short Course on Image Quality. Matthew A. Kupinski. Introduction

CPSC 340: Machine Learning and Data Mining

P Values and Nuisance Parameters

Topic 12 Overview of Estimation

POLI 8501 Introduction to Maximum Likelihood Estimation

2. As we shall see, we choose to write in terms of σ x because ( X ) 2 = σ 2 x.

Transcription:

Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester

Table of Contents 1 Review of Last Class Best Estimates and Reliability Properties of a Good Estimator 2 Parameter Estimation in Multiple Dimensions Return of the Quadratic Approximation The Hessian Matrix and its Geometrical Interpretation Maximum of the Quadratic Form Covariance 3 Multidimensional Estimators Gaussian Mean and Width Student-t Distribution χ 2 Distribution Segev BenZvi (UR) PHY 403 2 / 35

Best Estimates and Reliability We can identify the best estimator ˆx of a PDF by maximizing p(x D, I ): dp d 2 p = 0, dx ˆx dx 2 < 0 ˆx We assessed the reliability of the estimator by Taylor expanding L = ln p about the best value: ( ˆσ 2 = d 2 L dx 2 ˆx ) 1 This only works when the quadratic approximation is reasonable For an asymmetric PDF, it s better to use a confidence interval when reporting the reliability of an estimate For a multimodal PDF, there could be no single best estimate, and calculating reliability becomes complicated. Don t summarize the PDF, just report the whole thing Segev BenZvi (UR) PHY 403 3 / 35

Example Estimators from Last Class Best estimator of binomial probability p (n successes in N trials): ˆp = n N, ˆσ2 = n(n n) N 3 Arithmetic mean: best estimator of Gaussian with known variance σ 2 : ˆµ = 1 N N x i, i=1 ˆσ 2 = σ2 N Weighted mean: best estimator of Gaussian with different error bars: ˆµ = / N N x i w i w i, ˆσ 2 = i=1 i=1 1 N i=1 w, w i = 1/σi 2 i Reduces to arithmetic result when σ i = σ. Segev BenZvi (UR) PHY 403 4 / 35

Bayesian vs. Frequentist Interpretations Bayesian: given a measurement, we have some confidence that our best estimate of a parameter lies within some range of the data Frequentist: given the true value of the parameters, we have some confidence that our measurement lies within some range of the true value Difference: p(θ D, I ) versus p(d θ, I ). Segev BenZvi (UR) PHY 403 5 / 35

(Frequentist) Properties of a Good Estimator A good estimator should be: 1. Consistent. The estimate tends toward the true value with more data: lim ˆθ = θ N 2. Unbiased. The expectation value is equal to the true value: b = ˆθ θ = dx p(x θ) ˆθ(x) θ = 0 3. Efficient. The variance of the estimator is as small as possible (minimum variance bound, to be discussed): var (ˆθ) = dx p(x θ) (ˆθ(x) ˆθ) 2 MSE = (ˆθ θ) 2 = var (ˆθ) + b 2 It is not always possible to satisfy all three requirements. Segev BenZvi (UR) PHY 403 6 / 35

Table of Contents 1 Review of Last Class Best Estimates and Reliability Properties of a Good Estimator 2 Parameter Estimation in Multiple Dimensions Return of the Quadratic Approximation The Hessian Matrix and its Geometrical Interpretation Maximum of the Quadratic Form Covariance 3 Multidimensional Estimators Gaussian Mean and Width Student-t Distribution χ 2 Distribution Segev BenZvi (UR) PHY 403 7 / 35

Parameter Estimation in Higher Dimensions Moving to more dimensions: x x, p(x D, I ) p(x D, I ) As in the 1D case, the posterior PDF still encodes all the information we need to get the best estimator. The maximum of the PDF gives the best estimate of the quantities x = {x j }. We solve the set of simultaneous equations p x i = 0 {ˆxj } Question: how to we make sure that we re at the maximum and not a minimum or a saddle point? Segev BenZvi (UR) PHY 403 8 / 35

The Quadratic Approximation Revisited It s easier to deal with L = ln p({x j } D, I ), so let s do that. Let s also simplify to 2D, without loss of generality, so that x = (x, y). The maximum of the posterior satisfies L L = 0 and = 0 x ˆx,ŷ y ˆx,ŷ Look at the behavior of L about the maximum using its Taylor expansion: L = L(ˆx, ŷ) + 1 2 L 2 x 2 (x ˆx) 2 + 1 2 L ˆx,ŷ 2 y 2 (y ŷ) 2 ˆx,ŷ + 2 L (x ˆx)(y ŷ) +... x y ˆx,ŷ where the linear terms are zero because we re at the maximum. Segev BenZvi (UR) PHY 403 9 / 35

The Hessian Matrix As in the 1D case, the quadratic terms in the expansion dominate the behavior near the maximum. Insight: rewrite the quadratic terms in matrix notation: Q = 1 ( ) ( ) ( ) A C x ˆx x ˆx y ŷ 2 C B y ŷ = 1 2 (x ˆx) H(ˆx)(x ˆx) where H(ˆx) is a 2 2 symmetric matrix with components A = 2 L x 2, B = 2 L ˆx,ŷ y 2, C = 2 L ˆx,ŷ x y Note: H( ˆx), the matrix of second derivatives, is called the Hessian matrix of L. ˆx,ŷ Segev BenZvi (UR) PHY 403 10 / 35

Geometrical Interpretation Contour of Q in xy plane is an ellipse centered at (ˆx, ŷ) Orientation and eccentricity are determined by the values of A, B, and C Principal axes correspond to the eigenvectors of H. I.e., if we solve ( A C C B ) ( x y Hx = λx ) = λ ( x y ) we get two eigenvalues λ 1 and λ 2 which are inversely related to the square of the semi-major and semi-minor axes of the ellipse Segev BenZvi (UR) PHY 403 11 / 35

Condition for a Maximum L(ˆx) is a maximum if the quadratic form Q(x ˆx) = Q( x) < 0 x. If H is symmetric, there exists an orthogonal matrix O = ( ) e 1 e 2 such that ( ) O λ1 0 HO = D = 0 λ 2 where e 1 and e 2 are the eigenvectors of H. Therefore, H = ODO, and we can express Q as Q x H x = x (ODO ) x = (O x) D(O x) = x D x = λ 1 (x ˆx) 2 + λ 2 (y ŷ) 2 Q < 0 iff λ 1 and λ 2 are both negative. Segev BenZvi (UR) PHY 403 12 / 35

Condition for a Maximum The eigenvalues of H are given by where λ 1(2) = 1 2 Tr H + ( ) (Tr H) 2 /4 det H Tr H = A + B, det H = AB C 2 Intuition: what happens if the cross term C = 0? Then the principal axes of the ellipse defined by Q are aligned with the x and y axes and the eigenvalues reduce to λ 1 = A, λ 2 = B Analogous to the 1D case, we can associate the error bars on ˆx and ŷ as the inverse root of the diagonal terms of the Hessian, or ( ) 1 ( ) 1 ˆσ x 2 = λ 1 1 = 2 L x 2, ˆσ y 2 = λ 2 1 = 2 L y 2 ˆx,ŷ ˆx,ŷ Segev BenZvi (UR) PHY 403 13 / 35

General Case: C 0 What happens when the off-diagonal term of H is nonzero? Let s work in 2D. If we were only interested in the reliability of ˆx, then we would evaluate the behavior of the marginal distribution about the maximum p(x D, I ) = p(x, y D, I ) dy Using our quadratic approximation, p(x, y D, I ) = exp L exp Q: ( ) 1 p(x D, I ) exp 2 x H x dy ( ) 1 = exp 2 (Ax 2 + By 2 + 2Cxy) dy, where (without loss of generality) we set ˆx = ŷ = 0. Segev BenZvi (UR) PHY 403 14 / 35

General Case: C 0 Solving the Gaussian Integral Factor out terms in x, and explicitly change signs because we know that Q < 0: p(x D, I ) = e 1 2 (Ax2 +By 2 +2Cxy) dy = e 1 2 Ax2 e 1 2 (By 2 +2Cxy) dy = e 1 2 (A+ C2 B ) x 2 e 1 Cx B(y+ 2 B ) 2 where we completed the square: By 2 + 2Cxy = B(y + Cx/B) 2 C 2 x 2 /B, allowing us to rearrange the xy cross term. The remaining integral is a Gaussian integral of form ) exp ( u2 2σ 2 du = σ 2π dy Segev BenZvi (UR) PHY 403 15 / 35

General Case: C 0 Expressions for σ x and σ y Therefore, the marginal distribution becomes ( 2π p(x D, I ) = B exp 1 AB C 2 ) x 2 2 B 2π = ( B exp x 2 ), where σ 2 x = 2σ 2 x B AB C 2 = H yy det H Similarly, if we solve instead for p(y D, I ), we ll find that σ 2 y = A AB C 2 = H xx det H Note: we absorbed a negative sign back into A and B to match the properties of the Hessian. Segev BenZvi (UR) PHY 403 16 / 35

Connection to Variance and Covariance Recall the definition of variance for a 1D PDF: var (x) = (x µ) 2 = dx (x µ) 2 p(x D, I ) This can extended using the 2D PDF σx 2 = (x ˆx) 2 = dx dy (x ˆx) 2 p(x, y D, I ) If we use the quadratic approximation for p(x, y D, I ), we find and similarly, σ 2 x = H yy det H σ 2 y = (y ŷ) 2 = H xx det H, the same expressions we just derived (convince yourself). Segev BenZvi (UR) PHY 403 17 / 35

Connection to Variance and Covariance Also recall the definition of covariance: σxy 2 = (x ˆx)(y ŷ) = dx dy (x ˆx)(y ŷ) p(x, y D, I ) = C AB C 2 = H xy det H if we use the quadratic expansion of p(x, y D, I ). Putting it all together: the covariance matrix, defined a couple of weeks ago, is the negative inverse of the Hessian matrix: ( ) σ 2 x σ xy = σ xy σ 2 y ( ) 1 B C AB C 2 = C A ( ) 1 A C = H 1 (ˆx) C B Segev BenZvi (UR) PHY 403 18 / 35

Covariance Matrix Geometric Interpretation C = 0 implies x and y are completely uncorrelated. The contours of the posterior PDF are symmetric As C increases, the PDF becomes more and more elongated For C = ± AB, the contours are infinitely wide in one direction (though the prior on x or y could vanish somewhere) Also, while C = ± AB implies ˆx and ŷ are totally unreliable, the linear correlation y = ±mx (with m = AB) can still be inferred Segev BenZvi (UR) PHY 403 19 / 35

Caution: Using the Correct Error Bar Be careful about calculating the uncertainty on a parameter in a multidimensional PDF Right: σii 2 = H 1 ii, from marginalization of p(x D, I ) Wrong: get σ 2 ii by holding parameters x j i fixed at their optimal values (underestimate!) See difference in error bars from two procedures at left Reason: when using the Hessian, don t confuse the inverse of the diagonals of H for the diagonals of H 1 Segev BenZvi (UR) PHY 403 20 / 35

Table of Contents 1 Review of Last Class Best Estimates and Reliability Properties of a Good Estimator 2 Parameter Estimation in Multiple Dimensions Return of the Quadratic Approximation The Hessian Matrix and its Geometrical Interpretation Maximum of the Quadratic Form Covariance 3 Multidimensional Estimators Gaussian Mean and Width Student-t Distribution χ 2 Distribution Segev BenZvi (UR) PHY 403 21 / 35

Gaussian PDF: Both µ and σ 2 Unknown Last time we derived best estimators for a Gaussian distribution using p(µ σ, D, I ), i.e., σ was given. Now we have the tools to calculate p(µ D, I ) = 0 p(µ, σ D, I ) dσ. I.e., we can calculate the best estimator for σ 2 not known a priori. First we have to express the joint posterior PDF to a likelihood and prior using Bayes Theorem: p(µ, σ D, I ) p(d µ, σ, I ) p(µ, σ I ) If the data are independent, then by the product rule [ ] p(d µ, σ, I ) = (2πσ 2 ) N/2 exp 1 N 2σ 2 (x i µ) 2 i=1 Segev BenZvi (UR) PHY 403 22 / 35

Gaussian PDF: Priors on µ, σ Now we need to define the prior p(µ, σ I ). Let s assume the priors for µ and σ are independent: p(µ, σ I ) = p(µ I ) p(σ I ) Since µ is a location parameter it makes sense to choose a uniform prior 1 p(µ I ) = µ max µ min Since σ is a scale parameter we ll use a Jeffreys prior: p(σ I ) = 1 σ ln (σ max /σ min ) Let s also assume the prior ranges on µ and σ are large and don t cut off the integration in a weird way Segev BenZvi (UR) PHY 403 23 / 35

Aside: Parameterization of σ Note that we parameterized our width prior in terms of σ, not the variance σ 2. Does the parameterization make a difference? For the Jeffreys prior in σ, p(σ I ) dσ = k dσ σ where k depends on the limits of σ. Now convert to variance ν. Since σ = ν, Therefore, dσ = dν 2 ν p(σ I ) dσ = p(ν I ) dν = k dν 2ν = k dν ν So the Jeffreys prior has the same form if we work in terms of σ or σ 2. Question: would this also be the case for a uniform prior? Segev BenZvi (UR) PHY 403 24 / 35

Posterior PDF of µ Substitute the likelihood and prior into our expression for p(µ D, I ): p(µ D, I ) = 0 p(d µ, σ, I ) p(µ I ) p(σ I ) dσ (2π) N/2 µ ln (σ max /σ min ) σmax σ min σ (N+1) e 1 N 2σ 2 i=1 (x i µ) 2 dσ Let σ = 1/t so that dσ = dt/t 2 : p(µ D, I ) tmax t min t N 1 e t2 N i=1 (x i µ) 2 dt Change variables again so that τ = t (xi µ) 2 : [ N ] N/2 p(µ D, I ) (x i µ) 2 i=1 Segev BenZvi (UR) PHY 403 25 / 35

Best Estimator and Reliability As in past calculations, we maximize L = ln p: [ L = N N ] 2 ln (x i µ) 2 i=1 dl = N N i=1 (x i ˆµ) dµ N ˆµ i=1 (x i ˆµ) = 0 2 This can only be satisfied if the numerator is zero, so ˆµ = x = 1 N N i=1 x i In other words, the best estimate of the PDF is still just the arithmetic mean of the measurements x i Segev BenZvi (UR) PHY 403 26 / 35

Best Estimator and Reliability The second derivative gives the estimate of the width: d 2 L N 2 dµ 2 = N ˆµ i=1 (x i ˆµ) 2 Therefore, setting ˆσ 2 = (d 2 L/dµ 2 ) 1 we find that µ = ˆµ ± S N, where we define S 2 = 1 N N (x i ˆµ) 2 = 1 N i=1 N (x i x) 2 i=1 This is almost the usual definition of sample variance but it s narrower because we divide by 1/N instead of 1/(N 1). Segev BenZvi (UR) PHY 403 27 / 35

Aside: Uniform Distribution in σ Suppose at the beginning of this problem we didn t choose a Jeffreys prior for σ, but a uniform prior such that { constant σ > 0 p(σ I ) = 0 otherwise In this case, the posterior PDF would have been [ N ] (N 1)/2 p(µ D, I ) (x i µ) 2 i=1 and the width estimator would have been the usual sample variance S 2 = 1 N 1 N i=1 (x i ˆµ) 2 = 1 N 1 N (x i x) 2 In other words, the Jeffreys prior gives us a narrower constraint on ˆµ! Segev BenZvi (UR) PHY 403 28 / 35 i=1

Student-t Distribution Let s look back at our PDF but not make the quadratic approximation. First, write where N (x i µ) 2 = N( x µ) 2 + V, i=1 V = N (x i x) 2 i=1 Substituting into the PDF gives p(µ D, I ) [ N( x µ) 2 + V ] N/2 This is the heavy-tailed Student-t distribution, used for estimating µ when σ is unknown and N is small Segev BenZvi (UR) PHY 403 29 / 35

Student-t Distribution Published pseudonymously by William S. Gosset of Guinness Brewery in 1908 [1] t-distributions describe small samples drawn from a normally distributed population Used to estimate the error on a mean when only a few samples N are available, σ unknown Basis of the frequentist t-test to compare two data sets As N large, the tails of the distribution are killed off (Central Limit Theorem) Segev BenZvi (UR) PHY 403 30 / 35

Best Estimate of σ Now that we ve calculate the best estimate of a mean, what s the best estimate of σ given a set of measurements? Start with the posterior PDF p(σ D, I ): p(σ D, I ) = = p(µ, σ D, I ) dµ Plugging in our likelihood and priors gives p(σ D, I ) = p(d µ, σ, I ) p(µ I ) p(σ I ) dµ (2π) N/2 µmax µ ln (σ max /σ min ) σ (N+1) e µ min µmax σ (N+1) e V 2σ 2 e N( x µ) 2 2σ 2 dµ µ min 1 N 2σ 2 i=1 (x i µ) 2 dµ Segev BenZvi (UR) PHY 403 31 / 35

χ 2 Distribution Ignoring all constant terms (including the integral over µ) leaves ( p(σ D, I ) σ N exp V ) 2σ 2 Note that if we had used a uniform prior for σ we would have ( p(σ D, I ) σ (N 1) exp V ) 2σ 2 Let s maximize this expression: L = ln p = (N 1) ln σ V 2σ 2 dl (N 1) = + V dσ ˆσ σ σ 3 = 0 ˆσ 2 = V N 1 = 1 N (x i x) 2 = s 2 N 1 i=1 Segev BenZvi (UR) PHY 403 32 / 35

χ 2 Distribution Taking the second derivative of L gives d 2 L dσ 2 = N 1 ˆσ ˆσ 2 3V ˆσ 4 (N 1)ˆσ2 = ˆσ 4 2(N 1) = ˆσ 2 Therefore, the optimal value of the width is σ = ˆσ ± ˆσ 2(N 1) 3(N 1)ˆσ2 ˆσ 4 Note: with the change of variables X = V /σ 2, we see that ( p(σ D, I ) σ (N 1) exp X ) 2 is the χ 2 ν distribution with ν = 2(N 1). Segev BenZvi (UR) PHY 403 33 / 35

Summary We related the width of a multidimensional distribution the Hessian matrix H to the covariance matrix via [σ 2 ] ij = [ H 1 ] ij Caution: the right way to get the uncertainty on a parameter from a multidimensional distribution is to marginalize p(x, y,... D, I ) The wrong way to get the uncertainty on a parameter from such a distribution is to fix parameters y, z,... at the optimal values and find the uncertainty on x When marginalizing σ in a Gaussian distribution, we obtain the Student-t distribution Whem marginalizing µ in a Gaussian distribution, we obtain the χ 2 2(N 1) distribution Segev BenZvi (UR) PHY 403 34 / 35

References I [1] Student (W.S. Gosset). The Probable Error of a Mean. In: Biometrika 6 (1908), pp. 1 25. Segev BenZvi (UR) PHY 403 35 / 35