Probability distribution functions

Size: px

Start display at page:

Download "Probability distribution functions"

Preston Holt
5 years ago
Views:

1 Probability distribution functions Everything there is to know about random variables Imperial Data Analysis Workshop (2018) Elena Sellentin Sterrewacht Universiteit Leiden, NL Ln(a) Sellentin Universiteit Leiden 1 / 29

2 Cross validation Given the following data points, how do we prove which curve describes the data best? Ln(a) Sellentin Universiteit Leiden 2 / 29

3 Cross validation We all know this is wrong, but how do we prove it? E tot = 1 N = 0 f i f (x) 2 2 i Ln(a) Sellentin Universiteit Leiden 3 / 29

4 Cross validation E tot = 1 N > 0 f i f (x) 2 2 i Ln(a) Sellentin Universiteit Leiden 4 / 29

5 Cross validation E JtheD tot E smooth tot = 1 f i f JtheD (x) 2 2 N i = 1 f i f smooth (x) 2 2 N i Ln(a) Sellentin Universiteit Leiden 5 / 29

6 An improved distance χ 2 = (x µ) C 1 (x µ) Ln(a) Sellentin Universiteit Leiden 6 / 29

7 Cross validation We saw that join-the-dots perfectly explains one data set, but then failed catastrophically on the repeated measurement. The fitted curve explained the first dataset somewhat worse, but then correctly predicted the outcome of the second measurement. The best model minimizes a distance, e.g... E tot = 1 f i f (x) 2 2 (1) N... for current and future measurements. It minimizes the distance to taken data, and not taken but statistically iid data. i Ln(a) Sellentin Universiteit Leiden 7 / 29

8 Probability distributions Left: Probability density functions P(x) Right: Their cumulative distribution functions C(x) = x P(x )dx. Ln(a) Sellentin Universiteit Leiden 8 / 29

9 Why moments lose information What are......mean, variance, skewness, curtosis, moments, cumulants? The m-th moment is defined as x m = x m pdf(x) dx. (2) Moments are scalars (or tensors). Pdfs are full functions ( more information). From a pdf, moments can be computed. The inverse is not automatically true. Ln(a) Sellentin Universiteit Leiden 9 / 29

10 Advantages of pdfs For both: No mean, no variance, no skewness, no exc. curtosis, no moment-generating function. But the full pdfs exist! For some distribs, not even their zeroth moment (the normalization) exists improper priors. Ln(a) Sellentin Universiteit Leiden 10 / 29

11 Say you know P(x). But theory can only explain y = f (x), some function of x. P(x) contains all information about x. But how do you find P(y)? Example: x = E, the energy of particles (fermions and bosons) in a star. Observable is y = L, the luminosity of the star. Aim: To learn about the inner structure and composition of the star. The function f (x) is then nuclear physics + radiative transfer. Ln(a) Sellentin Universiteit Leiden 11 / 29

12 Ways to find P(y) Case 1: y depends only on one random variable x. Variable transformation. Case 2: y depends on more random variables, e.g. u and v. 1 Product distribs: u, v independent, find P(y) for y = uv? 2 Ratio distributions: u, v independent, find P(y) for y = u/v? 3 Distributions of sums: P(y), y = i u i? convolutions. Ln(a) Sellentin Universiteit Leiden 12 / 29

13 Ways to find P(y): one r.v. Analytically: 1 Via transformation of variables P(x)dx = P(y)dy. 2 Via the cumulative distribution function. 3 Via moment-generating functions ( literature.) Numerically: 1 Via sampling ( Daniel Mortlock, Andrew Jaffe) Ln(a) Sellentin Universiteit Leiden 13 / 29

14 Ways to find P(y): many r.vs. y = f (u, v, x,...), all random. Finding P(y) always means From joint distribution of u, v, x... to distrib of y. 1 What is P(u, v, x,...)? 2 If all are independent, then it is P(u)P(v)P(x)... 3 If they are not independent (Bayesian) Hierarchical Models. 4 Marginalize over everything but y. Ln(a) Sellentin Universiteit Leiden 14 / 29

15 Maximum Likelihood estimation Common knowledge: Best fitting parameters minimum χ 2 Gaussian likelihood: L(x θ) exp( χ 2 /2) Minimum-χ 2 is the maximum of the Gaussian likelihood If you don t have a Gaussian noise process? x D and D G? Maximum likelihood estimation is more general! x, θ, D(x) Then the likelihood is L(x θ) = D(x θ) The best-fitting params are then where θ L(x θ) = 0 Ln(a) Sellentin Universiteit Leiden 15 / 29

16 Power spectra Maximum Likelihood: Find θ for which θ L(x θ) = 0 x = C (l), 1 P C (l) = 2`+1 m alm alm From alm G(0, C` ) C ` Γ(ν, C` ) Ln(a) Sellentin Universiteit Leiden 16 / 29

17 Other common distributions Poisson distribution (galaxy clusters) Chi-squared and T 2 -distributions; χ 2 /degf 1-test. Hildebrandt et al. (KiDS team) Sellentin & Heavens, MNRAS (2016) Ln(a) Sellentin Universiteit Leiden 17 / 29

18 Other common distributions Uniform distribution (Fourier phases) Coles & Chiang (2000) and Sefusatti & Scoccimarro (2005) Ln(a) Sellentin Universiteit Leiden 18 / 29

19 From likelihood to posterior MNRAS, A&A, JCAP, Phys. Rev. astro in general: P(θ x) = L(x θ)π(θ) π(x) π(θ 1 ) = δ D (θ 1 ) (Conditional distributions.) π(x) = L(x θ)π(θ)d n θ (Evidence) P(θ x)π(x) = L(x θ)π(θ). π(x): have you manipulated your data taking process? Is there a selection effect hiding somewhere? π(θ) does not necessarily need to be normalizable: Improper priors (still need a proper posterior). Ln(a) Sellentin Universiteit Leiden 19 / 29

20 Multivariate distributions Reducing the dimensionality: (1) taking conditionals and (2) taking marginals Ln(a) Sellentin Universiteit Leiden 20 / 29

21 Conditional distributions of stars Inferred variables: stellar mass, (absolute) luminosity, age, stellar population-type (A or B, say) Ln(a) Sellentin Universiteit Leiden 21 / 29

22 Multivariate projections Ln(a) Sellentin Universiteit Leiden 22 / 29

23 Multivariate projections Ln(a) Sellentin Universiteit Leiden 23 / 29

24 Multivariate penguin Ln(a) Sellentin Universiteit Leiden 24 / 29

25 Cosmological parameters from Planck Ln(a) Sellentin Universiteit Leiden 25 / 29

26 An easy (non-astronomical) example Ln(a) Sellentin Universiteit Leiden 26 / 29

27 Alex Data: Alex is British; The average number of children a British woman gives birth to is 1.8. Model: C Poisson(C; λ = 1.8) (Inspired by kids being integers.) Abbreviate: Number of children as C, Alex as A. What does P(C A) express? What does C P(C A)dC express? What is C P(C A)dC in numbers? Hint: The Poisson distribution is Poisson(x λ) = λx e λ x!. Careful: It looks smooth but has only integer support, since x is the integer-valued random variate. Ln(a) Sellentin Universiteit Leiden 27 / 29

28 Alex Does Alex abbreviate Alexander or Alexandra? New variable: G = {XX, XY}. What does π(xx A) express? What could it numerically be? What does π(a XX) express? What could it numerically be? What does π(a XX) = 0 imply? Ln(a) Sellentin Universiteit Leiden 28 / 29

29 Alex Does Alex abbreviate Alexander or Alexandra? For π(a XX) = 0, what is the meaning and the numerical value of C P(C A)π(A XX)dC? Ln(a) Sellentin Universiteit Leiden 29 / 29

Statistics for scientists and engineers

Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3