Strong Lens Modeling (II): Statistical Methods

Size: px

Start display at page:

Download "Strong Lens Modeling (II): Statistical Methods"

Aileen Ball
5 years ago
Views:

1 Strong Lens Modeling (II): Statistical Methods Chuck Keeton Rutgers, the State University of New Jersey

2 Probability theory multiple random variables, a and b joint distribution p(a, b) conditional distribution p(a b) marginal distribution p(a) = p(a, b) db note: p(a, b) = p(a b) p(b)

3 even if model is correct, measured data may not exactly match model predictions because of noise -d example: suppose model predicts d mod, and measured value follows a Gaussian distribution with uncertainty σ L(d obs d mod ) exp [ (dobs d mod ) 2 ] 2σ 2 e χ2 /2 call this the likelihood of the data given the model L d obs obs

4 if model predictions depend on set of parameters q, write this as L(d obs q) exp [ (dobs d mod (q)) 2 ] 2σ 2 how to use? when model is wrong, d obs is far from d mod so χ 2 is high and L is low; adjust model to reduce χ 2 and increase L maximum likelihood method L d

5 Bayesian inference goal: see what we can infer about the parameters from the data; so shift from p(d q) to p(q d) note: Bayes s theorem: p(d, q) = p(d q) p(q) = p(q d) p(d) p(q d) = p(d q) = likelihood L(d q) p(d q) p(q) p(d) p(q) = prior probability distribution for q p(d) = evidence (more later) p(q d) = posterior probability distribution for q given d idea: use the posterior to quantify constraints on the parameters

6 Quantifying constraints how do we use p(q d) to quantify parameter constraints? could use µ and σ; but those have specific meaning only for Gaussian distributions better to generalize: median and 68% confidence interval

7 Nuisance parameters suppose we have some joint p(a, b), but we are mainly interested in a we say b is a nuisance parameter probability theory lets us integrate out b to get marginalized distribution for a: p(a) = p(a, b) db in general, this is not the same as optimizing the nuisance parameter

8 example: with σ y = + x 2 p(x, y) exp [ (x µ x) 2 ] ] 2σx 2 exp [ y2 2σy 2 optimize: p(x) exp [ (x µ x) 2 ] 2σx 2 marginalize: p(x) ( + x 2 ) exp [ (x µ x) 2 ].4 2σ 2 x.3.2.

9 evidence quantifies overall probability of getting these data from this model: Z p(d) = L(d q) p(q) dq can be used to compare different models (even they have different numbers of parameters) L q q

10 Monte Carlo Markov Chains often it is inconvenient or even impossible to analyze full posterior instead, turn to statistical sampling: set of points {q k } drawn from the posterior (for now: assume flat priors, so p(q) L(q)) method: pick some starting point q postulate some trial distribution, p try (q) draw a trial point, q try, from p try ; probability to accept is [ ] L(qtry ) min L(q ), if accept trial point, put q 2 = q try ; otherwise, put q 2 = q. iterate!

11 Let trial distribution be a simple Gaussian step 2.

12 Let trial distribution be a simple Gaussian step 5.

13 Let trial distribution be a simple Gaussian step.

14 Let trial distribution be a simple Gaussian step 5.

15 Let trial distribution be a simple Gaussian step 2.

16 Let trial distribution be a simple Gaussian step 25.

17 Let trial distribution be a simple Gaussian step 3.

18 When to stop? want: to sample L well to get results that are independent of starting point solution: run multiple chains keep going until statistical properties of chains are equivalent throw away first half of each chain to eliminate memory of starting point

19 Multiple chains , chains, simple Gaussian steps.

20 Q) Can we pick trial distribution to make more efficient? A) Use covariance matrix of points so far with adaptive steps.

21 How big to make the steps? with tiny steps.

22 How big to make the steps? with adjustable step size.

23 results joint posterior, p(x, y): just plot all the sampled points y x

24 results marginalized posterior, p(x): just plot a histogram of the x-values of all the sampled points likewise for p(y) y

25 Nested sampling introduced by Skilling (24, 26) peel away layers of constant likelihood one by one estimate volume of each layer statistically combine (L i, V i ) values to estimate the Bayesian evidence get a sample of points as a by-product (courtesy R. Fadely) variants: Shaw et al. (27), Feroz & Hobson (28), Brewer et al. (29), Betancourt (2); statistical uncertainties: CRK (2)

26 given likelihood L(q) and prior π(q), write evidence as Z = L(q) π(q) dq define fractional volume with likelihood higher than L X(L) = π(q) dq L(q)>L in principle, can invert to find L(X), then write Z = L(X) dx discretize: if we can find a set of points (L i, X i ) then we can write Z = N nest i= L i (X i X i )

27 how to get the points? L i is easy draw uniformly (from prior) in region with L > L X i is harder in principle, requires integration proceed statistically...

28 consider M points drawn uniformly from region with L > L draw likelihood contours through them, let enclosed volumes be V > V 2 >... > V M these are random variables write V = V t where t (, ) then t is the largest of M random variables drawn uniformly between and characterized by probability distribution p(t) = Mt M t = M (M + )

30 begin with M live points drawn uniformly from full prior; let their likelihoods be L µ (µ =,..., M) at step k: extract live point with lowest L, call it k-th sampled point: L k = min µ (L µ ) estimate the associated volume as X k = X k t k where t k is a random number drawn from p(t) = Mt M replace extracted live point with a new point drawn from the priors but restricted to the region L(q) L k iterate for N nest steps

31 Nested sampling setup.

32 Nested sampling step.

33 Nested sampling step 2.

34 Nested sampling step 3.

35 Nested sampling step 4.

36 Nested sampling step 5.

37 Nested sampling step.

38 Nested sampling step 3.

39 Nested sampling step 5.

40 Nested sampling step 7.

41 Nested sampling step 9.

42 development of evidence: contributions from live points, sampled points, and total p (See CRK 2 for statistical uncertainties.)

Markov Chain Monte Carlo

Markov Chain Monte Carlo Department of Statistics The University of Auckland https://www.stat.auckland.ac.nz/~brewer/ Emphasis I will try to emphasise the underlying ideas of the methods. I will not be teaching specific software