Hierarchical Bayesian Modeling

Similar documents
Hierarchical Bayesian Modeling

Hierarchical Bayesian Modeling of Planet Populations

Hierarchical Bayesian Modeling of Planet Populations. Angie Wolfgang NSF Postdoctoral Fellow, Penn State

Introduction to Probabilistic Machine Learning

Convergence Diagnostics For Markov chain Monte Carlo. Eric B. Ford (Penn State) Bayesian Computing for Astronomical Data Analysis June 9, 2017

Miscellany : Long Run Behavior of Bayesian Methods; Bayesian Experimental Design (Lecture 4)

Probabilistic Catalogs and beyond...

Non-Parametric Bayes

Big Data Inference. Combining Hierarchical Bayes and Machine Learning to Improve Photometric Redshifts. Josh Speagle 1,

Introduction to Bayesian Methods

Measurement Error and Linear Regression of Astronomical Data. Brandon Kelly Penn State Summer School in Astrostatistics, June 2007

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Data-driven models of stars

Bayesian Methods for Machine Learning

arxiv: v2 [astro-ph.ep] 3 Sep 2014

Data Analysis Methods

Bayesian Machine Learning

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

PMR Learning as Inference

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Lecture 13 Fundamentals of Bayesian Inference

COMP90051 Statistical Machine Learning

Markov Chain Monte Carlo

Ages of stellar populations from color-magnitude diagrams. Paul Baines. September 30, 2008

Basics of Photometry

STA 4273H: Sta-s-cal Machine Learning

COMP90051 Statistical Machine Learning

COS513 LECTURE 8 STATISTICAL CONCEPTS

Data Exploration vis Local Two-Sample Testing

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

13: Variational inference II

Future Opportunities for Collaborations: Exoplanet Astronomers & Statisticians

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

CSC321 Lecture 18: Learning Probabilistic Models

ECE521 week 3: 23/26 January 2017

Modeling Environment

Density Estimation. Seungjin Choi

The Impact of Positional Uncertainty on Gamma-Ray Burst Environment Studies

Andy Casey Astrophysics; Statistics

Overlapping Astronomical Sources: Utilizing Spectral Information

Inference of cluster distance and geometry from astrometry

Celeste: Variational inference for a generative model of astronomical images

Approximate Bayesian Computation

Linear Dynamical Systems

Bayesian Regression Linear and Logistic Regression

Statistical Tools and Techniques for Solar Astronomers

Introduction to Bayes

Bayesian harmonic models for musical signal analysis. Simon Godsill and Manuel Davy

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007

Hierarchical Models & Bayesian Model Selection

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Bayes via forward simulation. Approximate Bayesian Computation

Bayesian Inference and MCMC

Statistics: Learning models from data

Statistical Searches in Astrophysics and Cosmology

Lecture 13 : Variational Inference: Mean Field Approximation

Active Galaxies and Galactic Structure Lecture 22 April 18th

The Gaia Mission. Coryn Bailer-Jones Max Planck Institute for Astronomy Heidelberg, Germany. ISYA 2016, Tehran

Bayesian Modeling for Type Ia Supernova Data, Dust, and Distances

Time Series and Dynamic Models

Probability distribution functions

Multilevel modeling of cosmic populations

Why Try Bayesian Methods? (Lecture 5)

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Linear Models A linear model is defined by the expression

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Bayesian Analysis for Natural Language Processing Lecture 2

Lecture : Probabilistic Machine Learning

STA 4273H: Statistical Machine Learning

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

Robust and accurate inference via a mixture of Gaussian and terrors

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Bayesian Approach 2. CSC412 Probabilistic Learning & Reasoning

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

Making rating curves - the Bayesian approach

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

A8824: Statistics Notes David Weinberg, Astronomy 8824: Statistics Notes 6 Estimating Errors From Data

Data Analysis and Uncertainty Part 2: Estimation

Bayesian Inference. Chapter 1. Introduction and basic concepts

Machine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples

STA 4273H: Statistical Machine Learning

Additional Keplerian Signals in the HARPS data for Gliese 667C from a Bayesian re-analysis

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Making precise and accurate measurements with data-driven models

STA414/2104 Statistical Methods for Machine Learning II

Latent Variable Models and Expectation Maximization

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL*

1 Bayesian Linear Regression (BLR)

Machine Learning 4771

Lectures in AstroStatistics: Topics in Machine Learning for Astronomers

arxiv:astro-ph/ v1 14 Sep 2005

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

Introduction to Machine Learning

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Sparse Composite Models of Galaxy Spectra (or Anything) using L1 Norm Regularization. Benjamin Weiner Steward Observatory University of Arizona

Particle Filtering a brief introductory tutorial. Frank Wood Gatsby, August 2007

Uncertain Inference and Artificial Intelligence

Efficient Likelihood-Free Inference

Transcription:

Hierarchical Bayesian Modeling Making scientific inferences about a population based on many individuals Angie Wolfgang NSF Postdoctoral Fellow, Penn State

Astronomical Populations Once we discover an object, we look for more... Kepler planets Lissauer, Dawson, & Tremaine, 2014 Schawinski et al. 2014 to characterize their properties and understand their origin.

Astronomical Populations Or we use many (often noisy) observations of a single object to gain insight into its physics.

Hierarchical Modeling is a statistically rigorous way to make scientific inferences about a population (or specific object) based on many individuals (or observations). Frequentist multi-level modeling techniques exist, but we will discuss the Bayesian approach today. Frequentist: variability of sample (If is the true value, what fraction of many hypothetical datasets would be as or more discrepant from as the observed one?) Bayesian: uncertainty of inference (What s the probability that is the true value given the current data?)

Understanding Bayes (We just learned how to evaluate p(θ x) numerically to infer θ from x ) Bayes Theorem (straight out of conditional probability) p(θ x) p(x θ) p(θ) x = data θ = the parameters of a model that can produce the data p() = probability density distribution of = conditional on, or given p(θ) = prior probability (How probable are the possible values of θ in nature?) p(x θ) = likelihood, or sampling distribution (Ties your model to the data probabilistically: how likely is the data you observed given specific θ values?) (But let s get a better intuition for the statistical model itself.) p(θ x) = posterior probability (A new prior distribution, updated with information contained in the data: what is the probability of different θ values given the data and your model?)

Applying Bayes p(θ x) p(x θ) p(θ) Example (1-D): Fitting an SED to photometry x = 17 measurements of Lν Model: Stellar Population Synthesis Nelson et al. 2014 θ = age of stellar population, star formation timescale τ, dust content AV, metallicity, redshift, choice of IMF, choice of dust reddening law Model can be summarized as f(x θ): Maps θ x. But this is NOT p(x θ) because f(x θ) is not a probability distribution!! x f(x θ)

Applying Bayes p(θ x) p(x θ) p(θ) Example (1-D): Fitting an SED to photometry x = 17 measurements of Lν Model: Stellar Population Synthesis θ = age of stellar population, star formation timescale τ, dust content AV, metallicity, redshift, choice of IMF, choice of dust reddening law Model can be summarized as f(x θ): Maps θ x. But this is NOT p(x θ) because f(x θ) is not a probability distribution!! If use Χ 2 for fitting, then you are implicitly assuming that: p(xi θ) = where μ = f(xi θ) and σ = statistical measurement error i.e. you are assuming Gaussian noise (if you could redo a specific xi the same way many times, you d find:) σ μ

Applying Bayes p(θ x) p(x θ) p(θ) Example (2-D): Fitting a PSF to an image Both likelihood and model are Gaussian!! x = matrix of pixel brightnesses p(x θ) = θ = μ, σ of Gaussian (location, FWHM of PSF) f(x θ) = 2-D Gaussian where μ = f(x θ) and Σ = noise (possibly spatially correlated)

Applying Bayes p(θ x) p(x θ) p(θ) Example (1-D): Fitting an SED to photometry x = 17 measurements of Lν Model: Stellar Population Synthesis θ = age of stellar population, star formation timescale τ, dust content AV, metallicity, redshift, choice of IMF, choice of dust reddening law Model can be summarized as f(x θ): Maps θ x. But this is NOT p(x θ) because f(x θ) is not a probability distribution!! Ok, now we know of one way to write p(x θ). What about p(θ)? 1) If we have a previous measurement/ inference of that object s metallicity, redshift, etc., use it with its error bars as p(θ). (Usually measured via Χ 2, so p(θ) is Gaussian with μ = measurement and σ = error. BUT full posteriors from previous analysis is better.) 2) Choose wide, uninformative distributions for all the parameters we don t know well. 3) Use distributions in nature from previous observations of similar objects.

Going Hierarchical Option #3 for p(θ): Use distributions in nature from previous observations of similar objects. Histograms of population properties, when normalized, can be interpreted as probability distributions for individual parameters: p(θ) = n(θ α)/ n(θ α)dθ = p(θ α) where n(θ α) is the function with parameters α that was fit to the histogram (or even the histogram itself, if you want to deal with a piecewise function!) For example, redshift was part of the θ for SED fitting. One could use the red lines (parametric form below) as p(z) = p(z α) = n(z α)/ n(z α)dz Ilbert et al. 2009 with n(z α) = and α = {a,b,c}. But BE CAREFUL of detection bias, selection effects, upper limits, etc.!!!!!!

Going Hierarchical Option #3 for p(θ): Use distributions in nature from previous observations of similar objects. Histograms of population properties, when normalized, can be interpreted as probability distributions for individual parameters: p(θ) = n(θ α)/ n(θ α)dθ = p(θ α) where n(θ α) is the function with parameters α that was fit to the histogram (or even the histogram itself, if you want to deal with a piecewise function!) Abstracting again. p(θ x) p(x θ) p(θ) p(θ x) p(x θ) p(θ α) (Almost there!!) Population helps make inference on individual

Going Hierarchical... but what if we want to use the individuals to infer things (the α s) about the population? i.e., p(θ α) contains some interesting physics and getting values for α given the data can help us understand it. p(θ x) p(x θ) p(θ α) p(α,θ x) p(x θ,α) p(θ α) p(α) If you truly don t care about the parameters for the individual objects, then you can marginalize over them: p(α x) [ p(x θ,α) p(θ α) dθ] p(α) = p(x α) p(α)

Graphically: Regular Bayes: p(θ x) p(x θ) p(θ) Hierarchical Bayes: p(α,θ x) p(x θ,α) p(θ α) p(α) Population Parameters Parameters Individual Parameters Observables Observables

Graphically: Regular Bayes: p(θ x) p(x θ) p(θ) Hierarchical Bayes: p(α,θ x) p(x θ,α) p(θ α) p(α) Even for an individual object, connection between parameters and observables can involve several layers. (Example: measuring mass of a planet) Parameters Latent Variables Observables physics Mpl RVs spectra physics Population Parameters Individual Parameters Observables Conditional independence between individuals:

HBM in Action: Model Exoplanet compositions: Wolfgang & Lopez, 2015 Population-wide Distributions Internal Structure Models Likelihood Wanted to understand BOTH: - compositions of individual super-earths (fraction of mass in a gaseous envelope: fenv) - the distribution of this composition parameter over the Kepler population (μ, σ).

HBM in Action: Results Exoplanet compositions: Wolfgang & Lopez, 2015 Posterior on population parameters: Marginal composition distribution: Width of distribution had not been previously characterized.

HBM in Action: Results Exoplanet compositions: Wolfgang & Lopez, 2015 Posteriors on composition parameter fenv for individual planets:

A Note About Shrinkage Hierarchical models pool the information in the individual data uncertainty in x1 when analyzed in hierarchical model uncertainty in x1 when analyzed by itself mean of distribution of x s which shrinks individual estimates toward the population mean and lowers overall RMS error. (A key feature of any multi-level modeling!)

A Note About Shrinkage Shrinkage in action: uncertainty in x1 when analyzed in hierarchical model Wolfgang, Rogers, & Ford, 2016 uncertainty in x1 when analyzed by itself mean of distribution of x s Gray = data Red = posteriors

Practical Considerations 1) Pay attention to the structure of your model!! Did you capture the important dependencies and correlations? Did you balance realism with a small number of population-level parameters? 2) Evaluating your model with the data (performing hierarchical MCMC): JAGS (http://mcmc-jags.sourceforge.net; can use stand-alone binary or interface with R) STAN (http://mc-stan.org/documentation/; interfaces with R, Python, Julia, MATLAB) Or write your own hierarchical MCMC code 3) Spend some time testing the robustness of your model: if you generate hypothetical datasets using your HBM and then run the MCMC on those datasets, how close do the inferences lie to the truth?

In Sum, Why HBM? Obtain simultaneous posteriors on individual and population parameters: self-consistent constraints on the physics Readily quantify uncertainty in those parameters Naturally deals with large measurement uncertainties and upper limits (censoring) Similarly, can account for selection effects *within* the model, simultaneously with the inference Enables direct, probabilistic relationships between theory and observations Framework for model comparison

Further Reading Introductory/General: DeGroot & Schervish, Probability and Statistics (Solid fundamentals) Gelman, Carlin, Stern, & Rubin, Bayesian Data Analysis (In-depth; advanced topics) Loredo 2013; arxiv:1208.3036 (Few-page intro/overview of multi-level modeling in astronomy) B.C. Kelly 2007 (HBM for linear regression, also applied to quasars) Some applications: Loredo & Wasserman, 1998 (Multi-level model for luminosity distribution of gamma ray bursts) Mandel et al. 2009 (HBM for Supernovae) Hogg et al. 2010 (HBM with importance sampling for exoplanet eccentricities) Andreon & Hurn, 2010 (HBM for galaxy clusters) Martinez 2015 (HBM for Milky Way satellites) Wolfgang, Rogers, & Ford 2016 (HBM for exoplanet mass-radius relationship)