Which model to use? How can we deal with these decisions automatically? Note flailing in data gaps and beyond ends for high M

Size: px

Start display at page:

Download "Which model to use? How can we deal with these decisions automatically? Note flailing in data gaps and beyond ends for high M"

Dina Fisher
5 years ago
Views:

1 Which model to use? Microlens modellers face many dilemmas: Blending or no blending? Keep original error bars, or resacle them? Simple scale factor, or more elaborate model. Point source or extended source? Free or fix limb darkening? Include parallax? Include xallarap? Single or binary lens? Single or binary source? Companion at d or 1/d? Low q or high q companion? Note flailing in data gaps and beyond ends for high M How can we deal with these decisions automatically?

2 A similar but simpler example: Fit N = 30 points with M = 1, 2, polynomial coefficients. Badness-of-fit statistic: χ 2 N M 1± 2 N M χ 2 rejects M = 1, 2, accepts M = 3, 4,... Note flailing in data gaps and beyond ends for high M χ 2 / (N M Higher M = more flexible model, lower χ 2, but less satisfactory fit. χ 2 is not the whole story. χ 2 / N

3 Bayes: parameter estimation D = data P( D α, M = (α, M (α,m M = model α = parameters Data Volume : (α,m (α,m Posterior : P(α D, M = Prior : P M Support : Z M (D Data µ i ± Likelihood : (α,m = exp χ 2 (D,α, M 2 dd = 2π N ( N / 2 P(D α,m P M = (α, M P(D α,m P M dα (α, M (α, M P M dα = (α,m ˆ α α P M Z M (D P M Parameter

4 Bayes: parameter estimation P(α D = Likelihood : = e χ 2 / 2 Prior : P P(D α P = L D P(D α P dα Support : Z(D P Z(D = L /Z D D / P Data Volume : = ( 2π N / 2 P dα = / 1. (α large => Model fits the data well. 2. (α small => Model has small error bars. 3. P(α large => Model is not too crazy. 4. Z( D support for the model (explain later Data µ i ± ˆ α α Model parameter

5 Bayes: parameter estimation Parameter uncertainties: Posterior distribution : P(α D = P Z(D Ratio of posterior probability densities: P(α 1 D P(α 2 D = L (α D 1 (α 2 (α 1 P(α 1 (α 2 P(α 2 1. Fit the data. 2. Small error bars. 3. P M Be not too crazy. 4. Z M (D same for all α. = exp Δχ 2 2 N (α 2 (α 1 P(α 1 P(α 2 Data µ i ± ˆ α α Model parameter

6 Penalty for large error bars: χ 2 = N ( 2 D i µ i σ f 2 2 χ 2 minimisation fails : χ 2 0 as σ Maximum likelihood : L(model = P(data model = -2ln L = χ 2 + ln( σ f 2 2 σ i i X i exp χ 2 /2 N ( ( 2π N / 2 σ f N ln(2π σ -2 ln L Σ ln σ 2 χ 2 ( 1/ 2 σ i

7 P(D α,m = (α,m (α,m = Comparing Models e χ 2 / 2 ( 2π N / 2 Z M (D Posterior : joint distribution over parameters α and models M : P(α, M D = P(D M,α P(α M P(M P(D M,α P(α M P(M dα dm (α,m P M (α, M = (α, M (α, M Posterior distribution on models (integrate over parameters α : P(M D = P(α, M D dα = P(M Z(D dα P M P(M Z(D (α, M (α,m P M dα = Z (D M Z(D P(M β model B ˆ β data model A ˆ α α

8 Comparing Models Posterior distribution : P(M D = Z M (D Z(D P(M P(A D P(B D = Z A (D Z B (D P(A P(B = A B P(A P(B Support for model M : Z M (D P M = P M dα β model B ˆ β data model A ˆ α α

9 Comparing Models Support for model M : Z M (D Z M (D ( α ˆ P M ( α ˆ ( α ˆ M k=1 P M ( 2π σ 2 ( α ˆ k 1/ 2 dα 2 = e (χmin / 2 M k=1 σ( ˆ α k ( 2π (N M / 2 N P M ( ˆ α 1. Penalty for bad fit: 2. Penalty for large error bars: 3. Penalty for fine tuning: ( α ˆ = e (χ 2 min ( posterior parameter volume / prior parameter volume. / 2 1 ( α ˆ = 1 2π P M ( ˆ α 2π ( N / 2 N ( M / 2 M σ( ˆ k=1 α k

10 A possible approach: 1. Use Galaxy model, adjusting parameters to fit observed OGLE/MOA event parameter distributions, to establish priors. 2. Run competing models in parallel. 3. Use MCMC with to track evolving parameters of each model. 4. Use MCMC to evaluate the relative probability of different models.

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality