Strong Lens Modeling (II): Statistical Methods

Similar documents
Markov Chain Monte Carlo

Multimodal Nested Sampling

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Introduction to Bayes

Who was Bayes? Bayesian Phylogenetics. What is Bayes Theorem?

Bayesian Phylogenetics

Bayesian Inference and MCMC

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

ECE295, Data Assimila0on and Inverse Problems, Spring 2015

Nested Sampling. Brendon J. Brewer. brewer/ Department of Statistics The University of Auckland

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Physics 509: Bootstrap and Robust Parameter Estimation

Bayesian Regression Linear and Logistic Regression

CSC 2541: Bayesian Methods for Machine Learning

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Bayesian room-acoustic modal analysis

Introduction to Bayesian methods in inverse problems

Introduc)on to Bayesian Methods

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Physics 403. Segev BenZvi. Propagation of Uncertainties. Department of Physics and Astronomy University of Rochester

STA 4273H: Sta-s-cal Machine Learning

CIS 390 Fall 2016 Robotics: Planning and Perception Final Review Questions

Bayesian Inference. Anders Gorm Pedersen. Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU)

Markov Chain Monte Carlo

Miscellany : Long Run Behavior of Bayesian Methods; Bayesian Experimental Design (Lecture 4)

Reminder of some Markov Chain properties:

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Quantitative Biology II Lecture 4: Variational Methods

Bayesian RL Seminar. Chris Mansley September 9, 2008

Statistics and Prediction. Tom

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

DRAGON ADVANCED TRAINING COURSE IN ATMOSPHERE REMOTE SENSING. Inversion basics. Erkki Kyrölä Finnish Meteorological Institute

An introduction to Bayesian reasoning in particle physics

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Bayesian analysis in nuclear physics

Density Estimation. Seungjin Choi

TDA231. Logistic regression

Learning the hyper-parameters. Luca Martino

Probabilistic Graphical Models

Basic Sampling Methods

Answers and expectations

Machine Learning Srihari. Probability Theory. Sargur N. Srihari

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Monte Carlo in Bayesian Statistics

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

(Extended) Kalman Filter

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Bayesian inference: what it means and why we care

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Markov Chain Monte Carlo methods

Bayesian Inference for the Multivariate Normal

Lecture 6: Markov Chain Monte Carlo

Bayesian Inference: What, and Why?

Empirical Risk Minimization is an incomplete inductive principle Thomas P. Minka

Bayesian Approaches Data Mining Selected Technique

Part 1: Expectation Propagation

Introduction to Probability and Statistics (Continued)

Data Analysis and Uncertainty Part 2: Estimation

Tools for Parameter Estimation and Propagation of Uncertainty

Approximate Bayesian computation: an application to weak-lensing peak counts

Introduction to Bayesian Learning. Machine Learning Fall 2018

Data Analysis I. Dr Martin Hendry, Dept of Physics and Astronomy University of Glasgow, UK. 10 lectures, beginning October 2006

Conditional probabilities and graphical models

Principles of Bayesian Inference

Inverse Problems in the Bayesian Framework

An example to illustrate frequentist and Bayesian approches

David Giles Bayesian Econometrics

Probabilistic Machine Learning

17 : Markov Chain Monte Carlo

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

STA 414/2104, Spring 2014, Practice Problem Set #1

Bayesian rules of probability as principles of logic [Cox] Notation: pr(x I) is the probability (or pdf) of x being true given information I

The University of Auckland Applied Mathematics Bayesian Methods for Inverse Problems : why and how Colin Fox Tiangang Cui, Mike O Sullivan (Auckland),

Notes on Machine Learning for and

Bayesian Phylogenetics:

MCMC Sampling for Bayesian Inference using L1-type Priors

Bayesian Networks BY: MOHAMAD ALSABBAGH

Inference and estimation in probabilistic time series models

Lecture : Probabilistic Machine Learning

A Brief Review of Probability, Bayesian Statistics, and Information Theory

Statistical Methods for Astronomy

An introduction to Bayesian statistics and model calibration and a host of related topics

PATTERN RECOGNITION AND MACHINE LEARNING

Markov Chain Monte Carlo

Be able to define the following terms and answer basic questions about them:

Variational Methods in Bayesian Deconvolution

Introduction to Machine Learning

Patterns of Scalable Bayesian Inference Background (Session 1)

Uncertainty quantification for Wavefield Reconstruction Inversion

Intro to Probability. Andrei Barbu

Learning Gaussian Process Models from Uncertain Data

Lecture 5: Bayes pt. 1

Why Try Bayesian Methods? (Lecture 5)

Mobile Robotics II: Simultaneous localization and mapping

Lecture 1 Bayesian inference

Lecture 2: From Linear Regression to Kalman Filter and Beyond

New Insights into History Matching via Sequential Monte Carlo

Preliminary statistics

Transcription:

Strong Lens Modeling (II): Statistical Methods Chuck Keeton Rutgers, the State University of New Jersey

Probability theory multiple random variables, a and b joint distribution p(a, b) conditional distribution p(a b) marginal distribution p(a) = p(a, b) db note: p(a, b) = p(a b) p(b).7.6.4.3.2.

even if model is correct, measured data may not exactly match model predictions because of noise -d example: suppose model predicts d mod, and measured value follows a Gaussian distribution with uncertainty σ L(d obs d mod ) exp [ (dobs d mod ) 2 ] 2σ 2 e χ2 /2 call this the likelihood of the data given the model L d obs obs

if model predictions depend on set of parameters q, write this as L(d obs q) exp [ (dobs d mod (q)) 2 ] 2σ 2 how to use? when model is wrong, d obs is far from d mod so χ 2 is high and L is low; adjust model to reduce χ 2 and increase L maximum likelihood method L d

Bayesian inference goal: see what we can infer about the parameters from the data; so shift from p(d q) to p(q d) note: Bayes s theorem: p(d, q) = p(d q) p(q) = p(q d) p(d) p(q d) = p(d q) = likelihood L(d q) p(d q) p(q) p(d) p(q) = prior probability distribution for q p(d) = evidence (more later) p(q d) = posterior probability distribution for q given d idea: use the posterior to quantify constraints on the parameters

Quantifying constraints how do we use p(q d) to quantify parameter constraints? could use µ and σ; but those have specific meaning only for Gaussian distributions better to generalize: median and 68% confidence interval.4.3.2..

Nuisance parameters suppose we have some joint p(a, b), but we are mainly interested in a we say b is a nuisance parameter probability theory lets us integrate out b to get marginalized distribution for a: p(a) = p(a, b) db in general, this is not the same as optimizing the nuisance parameter

example: with σ y = + x 2 p(x, y) exp [ (x µ x) 2 ] ] 2σx 2 exp [ y2 2σy 2 optimize: p(x) exp [ (x µ x) 2 ] 2σx 2 marginalize: p(x) ( + x 2 ) exp [ (x µ x) 2 ].4 2σ 2 x.3.2.

evidence quantifies overall probability of getting these data from this model: Z p(d) = L(d q) p(q) dq can be used to compare different models (even they have different numbers of parameters) L q q

Monte Carlo Markov Chains often it is inconvenient or even impossible to analyze full posterior instead, turn to statistical sampling: set of points {q k } drawn from the posterior (for now: assume flat priors, so p(q) L(q)) method: pick some starting point q postulate some trial distribution, p try (q) draw a trial point, q try, from p try ; probability to accept is [ ] L(qtry ) min L(q ), if accept trial point, put q 2 = q try ; otherwise, put q 2 = q. iterate!

Let trial distribution be a simple Gaussian. - - - 2 2.5 step 2.

Let trial distribution be a simple Gaussian. - - - 2 2.5 step 5.

Let trial distribution be a simple Gaussian. - - - 2 2.5 step.

Let trial distribution be a simple Gaussian. - - - 2 2.5 step 5.

Let trial distribution be a simple Gaussian. - - - 2 2.5 step 2.

Let trial distribution be a simple Gaussian. - - - 2 2.5 step 25.

Let trial distribution be a simple Gaussian. - - - 2 2.5 step 3.

When to stop? want: to sample L well to get results that are independent of starting point solution: run multiple chains keep going until statistical properties of chains are equivalent throw away first half of each chain to eliminate memory of starting point

Multiple chains - - - 2 2.5, chains, simple Gaussian steps.

Q) Can we pick trial distribution to make more efficient? A) Use covariance matrix of points so far. - - - 2 2.5 with adaptive steps.

How big to make the steps? - - - 2 2.5 with tiny steps.

How big to make the steps? - - - 2 2.5 with adjustable step size.

results joint posterior, p(x, y): just plot all the sampled points y..... 2. 2.5 x

results marginalized posterior, p(x): just plot a histogram of the x-values of all the sampled points likewise for p(y) 2.. 2.5 2.. y

Nested sampling introduced by Skilling (24, 26) peel away layers of constant likelihood one by one estimate volume of each layer statistically combine (L i, V i ) values to estimate the Bayesian evidence get a sample of points as a by-product (courtesy R. Fadely) variants: Shaw et al. (27), Feroz & Hobson (28), Brewer et al. (29), Betancourt (2); statistical uncertainties: CRK (2)

given likelihood L(q) and prior π(q), write evidence as Z = L(q) π(q) dq define fractional volume with likelihood higher than L X(L) = π(q) dq L(q)>L in principle, can invert to find L(X), then write Z = L(X) dx discretize: if we can find a set of points (L i, X i ) then we can write Z = N nest i= L i (X i X i )

how to get the points? L i is easy draw uniformly (from prior) in region with L > L X i is harder in principle, requires integration proceed statistically...

consider M points drawn uniformly from region with L > L draw likelihood contours through them, let enclosed volumes be V > V 2 >... > V M these are random variables write V = V t where t (, ) then t is the largest of M random variables drawn uniformly between and characterized by probability distribution p(t) = Mt M t = M (M + )

begin with M live points drawn uniformly from full prior; let their likelihoods be L µ (µ =,..., M) at step k: extract live point with lowest L, call it k-th sampled point: L k = min µ (L µ ) estimate the associated volume as X k = X k t k where t k is a random number drawn from p(t) = Mt M replace extracted live point with a new point drawn from the priors but restricted to the region L(q) L k iterate for N nest steps

2 - - - -2-2 3 4 Nested sampling setup.

2 - - - -2-2 3 4 Nested sampling step.

2 - - - -2-2 3 4 Nested sampling step 2.

2 - - - -2-2 3 4 Nested sampling step 3.

2 - - - -2-2 3 4 Nested sampling step 4.

2 - - - -2-2 3 4 Nested sampling step 5.

2 - - - -2-2 3 4 Nested sampling step.

2 - - - -2-2 3 4 Nested sampling step 3.

2 - - - -2-2 3 4 Nested sampling step 5.

2 - - - -2-2 3 4 Nested sampling step 7.

2 - - - -2-2 3 4 Nested sampling step 9.

development of evidence: contributions from live points, sampled points, and total.2..8.6.4.2 p (See CRK 2 for statistical uncertainties.)