Active Learning for Fitting Simulation Models to Observational Data

Similar documents
Actively Learning Level-Sets of Composite Functions

CosmoSIS Webinar Part 1

Introduction to CosmoMC

Large Scale Bayesian Inference

Statistics and Prediction. Tom

Aspects of large-scale structure. John Peacock UniverseNet Mytilene Sept 2007

Monte Carlo Markov Chains: A Brief Introduction and Implementation. Jennifer Helsby Astro 321

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Multimodal Nested Sampling

Introduction to Bayes

THE ROAD TO DARK ENERGY

Where we left off last time...

Algorithms for Higher Order Spatial Statistics

The History of the Universe in One Hour. Max Tegmark, MIT

Reminder of some Markov Chain properties:

Cosmology The Road Map

Galaxy population simulations

Process Watch: Having Confidence in Your Confidence Level

Cosmology with the ESA Euclid Mission

The Cosmic Microwave Background

An examination of the CMB large-angle suppression

Reducing The Computational Cost of Bayesian Indoor Positioning Systems

The microwave sky as seen by Planck

The Millennium Simulation: cosmic evolution in a supercomputer. Simon White Max Planck Institute for Astrophysics

Understanding the Properties of Dark Energy in the Universe p.1/37

The Early Universe John Peacock ESA Cosmic Vision Paris, Sept 2004

Cosmology with Peculiar Velocity Surveys

How Did the Universe Begin?

Chapter 21 Evidence of the Big Bang. Expansion of the Universe. Big Bang Theory. Age of the Universe. Hubble s Law. Hubble s Law

NEVER IGNORE A COINCIDENCE

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

The Inflating Universe. Max Tegmark, MIT

Actively Learning Level-Sets of Composite Functions. December 2007 CMU-ML

An introduction to Markov Chain Monte Carlo techniques

Rick Ebert & Joseph Mazzarella For the NED Team. Big Data Task Force NASA, Ames Research Center 2016 September 28-30

Cosmological Tests of Gravity

Data Analysis Methods

Probabilistic Graphical Models

Physics 661. Particle Physics Phenomenology. October 2, Physics 661, lecture 2

The Bias-Variance dilemma of the Monte Carlo. method. Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel

Data Analysis I. Dr Martin Hendry, Dept of Physics and Astronomy University of Glasgow, UK. 10 lectures, beginning October 2006

Markov Chain Monte Carlo

Astro 101 Slide Set: Multiple Views of an Extremely Distant Galaxy

Galaxy Formation Seminar 2: Cosmological Structure Formation as Initial Conditions for Galaxy Formation. Prof. Eric Gawiser

Fast Likelihood-Free Inference via Bayesian Optimization

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget

Outline. 1. Basics of gravitational wave transient signal searches. 2. Reconstruction of signal properties

Monte Carlo in Bayesian Statistics

Lecture 37 Cosmology [not on exam] January 16b, 2014

Learning the hyper-parameters. Luca Martino

Cosmic Rays and Bayesian Computations

Bayesian Phylogenetics:

MIT Exploring Black Holes

Planck constraints on neutrinos. Massimiliano Lattanzi Università di Ferrara on behalf of the Planck Collaboration

Median Statistics Analysis of Non- Gaussian Astrophysical and Cosmological Data Compilations

(Astro)Physics 343 Lecture # 13: cosmic microwave background (and cosmic reionization!)

A Look Back: Galaxies at Cosmic Dawn Revealed in the First Year of the Hubble Frontier Fields Initiative

Gravitational waves from the early Universe

CSC 446 Notes: Lecture 13

3. It is expanding: the galaxies are moving apart, accelerating slightly The mystery of Dark Energy

Astrophysics with GLAST: dark matter, black holes and other astronomical exotica

Efficiently Computing Minimax Expected-Size Confidence Regions

arxiv: v1 [astro-ph.co] 3 Apr 2019

Absolute Neutrino Mass from Cosmology. Manoj Kaplinghat UC Davis

What is the Dark Matter? William Shepherd Niels Bohr International Academy Johannes Gutenberg University Mainz

Defining Cosmological Parameters. Cosmological Parameters. Many Universes (Fig on pp.367)

AST1100 Lecture Notes

IAU Symposium #254, Copenhagen June 2008 Simulations of disk galaxy formation in their cosmological context

Nonparametric Inference and the Dark Energy Equation of State

Introduction. How did the universe evolve to what it is today?

Possible sources of very energetic neutrinos. Active Galactic Nuclei

Spacetime Diagrams. Relativity and Astrophysics Lecture 21 Terry Herter

CMB anisotropy & Large Scale Structure : Dark energy perspective

STAT 425: Introduction to Bayesian Analysis

N-body Simulations and Dark energy

A Galaxy Full of Black Holes. Harvard-Smithsonian Center for Astrophysics Origins Education Forum - STScI 1 Navigator Public Engagement Program - JPL

LSST, Euclid, and WFIRST

Modern Physics notes Spring 2005 Paul Fendley Lecture 38

Scientific Data Flood. Large Science Project. Pipeline

Observational cosmology: the RENOIR team. Master 2 session

Modelling the galaxy population

Cosmologists dedicate a great deal of effort to determine the density of matter in the universe. Type Ia supernovae observations are consistent with

Lecture 10: Cosmology and the Future of Astronomical Research

Growth of structure in an expanding universe The Jeans length Dark matter Large scale structure simulations. Large scale structure

Dark Matter in Particle Physics

A FIGURE OF MERIT ANALYSIS OF CURRENT CONSTRAINTS ON TESTING GENERAL RELATIVITY USING THE LATEST COSMOLOGICAL DATA SETS.

Anisotropy in the CMB

1 Using standard errors when comparing estimated values

Rupert Croft. QuickTime and a decompressor are needed to see this picture.

LSST Cosmology and LSSTxCMB-S4 Synergies. Elisabeth Krause, Stanford

Cosmology. Jörn Wilms Department of Physics University of Warwick.

Optimal Supernova Search Strategies

Exploring galactic environments with AMR simulations on Blue Waters

AST 301 Introduction to Astronomy

Introduction to MCMC. DB Breakfast 09/30/2011 Guozhang Wang

New techniques to measure the velocity field in Universe.

Cosmology. Introduction Geometry and expansion history (Cosmic Background Radiation) Growth Secondary anisotropies Large Scale Structure

Kavli IPMU-Berkeley Symposium "Statistics, Physics and Astronomy" January , 2018 Lecture Hall, Kavli IPMU

PHYSICS 107. Lecture 27 What s Next?

Transcription:

Active Learning for Fitting Simulation Models to Observational Data Jeff Schneider Example: Active Learning for Cosmology Model Fitting NASA / ESA 8 parameter space!" true parameters" real universe" 7 6 5 4 3 2 NASA 1 2 4 6 8 noisy observations" hypothesis test" hypothetical parameters, #"!% dark matter!% dark energy!hubble constant!several more mathematical model (simulation)" simulated observations" Goal: Find all parameter values that can not be rejected 8 7 6 5 4 3 2 1 2 4 6 8

Example: Active Learning for Cosmology Model Fitting NASA / ESA 8 parameter space!" true parameters" real universe" 7 6 5 4 3 2 NASA 1 2 4 6 8 noisy observations" hypothesis test" hypothetical parameters, #" mathematical model (simulation)"!% dark matter!% dark energy!hubble constant!several more simulated observations" Goal: Find all parameter values that can not be rejected 8 7 6 5 4 3 2 1 2 4 6 8 How should we search this high dimensional space? parameter 2 Use a Grid? parameter 1! A popular method! Guarantees good coverage! Doesn't scale well with number of parameters (exponential)! Does not focus effort on most important areas

MCMC for Credible Region Construction Metropolis-Hastings Algorithm! choose #! generate proposal from Q(#' # i )! accept proposal with probability P(#')Q(# i #')/P(# i )Q(#' # i )! if accepted # i+1 = #'! else # i+1 = # i! repeat to step 2! choosing P(#) = f(# x)f(#) yields samples from the posterior distribution! Q is often chosen to be a normal distribution suppose this is the true posterior: MCMC for Credible Region Construction Metropolis-Hastings Algorithm! choose #! generate proposal from Q(#' # i )! accept proposal with probability P(#')Q(# i #')/P(# i )Q(#' # i )! if accepted # i+1 = #' else # i+1 = # i 5! repeat to step 2! choosing P(#) = f(# x)f(#) yields samples from the posterior distribution! Q is often chosen to be a normal distribution MCMC wastes samples in places of no interest and thus converges to the true region slowly:

Active Learning for Confidence Region Construction? We hope for sampling more like this: Is it possible? A Second Problem for MCMC Especially in high dimensions it becomes nearly impossible to reach the second peak, and if you try, you fail to sample the first one well MCMC is sampling, but a search algorithm is needed

A Goodness of Fit Surface! goodness of fit is the hypothesis test statistic! goal: identify the regions not rejected by the hypothesis test! use active learning to choose the parameters to test Computing Function Level-Sets Approximate mapping from parameters to goodness of fit with Gaussian Process Sample Points to refine Gaussian Process parameter space,!" generate candidates Would like information gain metric But that s expensive, use approximations dataset Gaussian process choose experiment compute model Perform hypothesis test

Performance Comparison ground truth grid best MCMC $ 2 test with active learning Cosmological Results After over 1 million experiments A second plausible region of parameter space was identified 2! B 1 5 1 "[Bryan et al, Astrophysical Journal, 27]

But other tests rule it out Cosmic Microwave Background NASA / WMAP Science Team Supernovae NASA / CXC / Rutgers/ Warren & Hughes et al Large Scale Structure Virgo Consortium Fisher's Method for Combining p-values CMB Model WMAP Data (Astier et al 26) common parameter space!" hypothetical #" Supernova Model Supernova Data (Davis et al 27) Statistical Test p-value Confidence Balls Statistical Test p-value Summing log p-values yields a chisquare distribution combined p-value $2 Tests LSS Model LSS Data (Tegmark et al 26) Statistical Test p-value A new algorithm chooses both the parameters and which model to test with [Bryan and Schneider, ICML, 28]

Where Next?! Simulation efforts have moved to large scale structure and galaxies! Often require many more cycles and thus parallel execution! The quality and computational expense are traded off by choosing the number of particles! Current trend: "my simulation is bigger than yours"! An alternative: run lots of smaller simulations to find better matches to observational data Active Learning for Massive Scale Simulations! An active learning algorithm must -! choose parameters to test -! choose how much to invest in the test -! choose batches of tests -! tradeoff throughput and latency (ie how many cores to put on each)! How do we extend methods of "computation allocation" to other astrophysical problems? -! computation limited data mining -! telescope control for transients

Why do we build simulations?! To understand the dynamics of systems where we can not "watch" their evolution and thus learn the dynamics directly from data Is it possible to learn dynamics from data that is not in trajectories? see [Huang and Schneider, ICML 29]