Statistics and Prediction. Tom

Similar documents
Multimodal Nested Sampling

CosmoSIS Webinar Part 1

Strong Lens Modeling (II): Statistical Methods

Bayesian Inference and MCMC

Fisher Matrix Analysis of the Weak Lensing Spectrum

An introduction to Markov Chain Monte Carlo techniques

Introduction to CosmoMC

@R_Trotta Bayesian inference: Principles and applications

Large Scale Bayesian Inference

Data Analysis I. Dr Martin Hendry, Dept of Physics and Astronomy University of Glasgow, UK. 10 lectures, beginning October 2006

STA 4273H: Statistical Machine Learning

Introduction to Machine Learning CMU-10701

Bayes Nets: Sampling

Statistical techniques in cosmology

Markov Chain Monte Carlo

Bayesian Statistics in Cosmology

Active Learning for Fitting Simulation Models to Observational Data

CS 343: Artificial Intelligence

Approximate Bayesian computation: an application to weak-lensing peak counts

Weak Lensing. Alan Heavens University of Edinburgh UK

Introduction to Bayes

Statistical Searches in Astrophysics and Cosmology

Cosmology & CMB. Set5: Data Analysis. Davide Maino

SAMSI Astrostatistics Tutorial. More Markov chain Monte Carlo & Demo of Mathematica software

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Bayesian Regression Linear and Logistic Regression

Reminder of some Markov Chain properties:

Nonparametric Bayesian Methods (Gaussian Processes)

Cosmology on small scales: Emulating galaxy clustering and galaxy-galaxy lensing into the deeply nonlinear regime

Statistical techniques for data analysis in Cosmology

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester

Are VISTA/4MOST surveys interesting for cosmology? Chris Blake (Swinburne)

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Lecture 6: Markov Chain Monte Carlo

Bayesian data analysis in practice: Three simple examples

Results: MCMC Dancers, q=10, n=500

Recent advances in cosmological Bayesian model comparison

CPSC 540: Machine Learning

16 : Approximate Inference: Markov Chain Monte Carlo

Model Fitting in Particle Physics

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Warm dark matter with future cosmic shear data

Markov chain Monte Carlo Lecture 9

Markov Chain Monte Carlo

Cosmological Constraints from a Combined Analysis of Clustering & Galaxy-Galaxy Lensing in the SDSS. Frank van den Bosch.

Eco517 Fall 2013 C. Sims MCMC. October 8, 2013

Modern Cosmology / Scott Dodelson Contents

17 : Markov Chain Monte Carlo

Introduction to COSMOMC

Introduction to Machine Learning CMU-10701

STA 4273H: Statistical Machine Learning

STA 4273H: Sta-s-cal Machine Learning

Catherine Heymans. Observing the Dark Universe with the Canada- France Hawaii Telescope Lensing Survey

CMB Polarization and Cosmology

Practical Numerical Methods in Physics and Astronomy. Lecture 5 Optimisation and Search Techniques

MCMC Sampling for Bayesian Inference using L1-type Priors

Markov Chain Monte Carlo methods

Introduction to Machine Learning

Monte Carlo Inference Methods

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Bayesian Networks BY: MOHAMAD ALSABBAGH

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method

Detection ASTR ASTR509 Jasper Wall Fall term. William Sealey Gosset

ST 740: Markov Chain Monte Carlo

Probabilistic Graphical Models

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Bayesian Methods for Machine Learning

MCMC: Markov Chain Monte Carlo

Bayesian Inference for the Multivariate Normal

MCMC algorithms for fitting Bayesian models

The Dark Sector ALAN HEAVENS

What Gravitational Lensing tells us

Baryon Acoustic Oscillations (BAO) in the Sloan Digital Sky Survey Data Release 7 Galaxy Sample

Mapping the dark universe with cosmic magnification

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems

Fundamental Probability and Statistics

16 : Markov Chain Monte Carlo (MCMC)

Pattern Recognition and Machine Learning

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Oliver Schulte - CMPT 419/726. Bishop PRML Ch.

Markov Chain Monte Carlo (MCMC)

Advanced Statistical Modelling

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016

Monte Carlo Markov Chains: A Brief Introduction and Implementation. Jennifer Helsby Astro 321

Shape Measurement: An introduction to KSB

LECTURE 15 Markov chain Monte Carlo

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Monte Carlo in Bayesian Statistics

Markov chain Monte Carlo methods in atmospheric remote sensing

Monetary and Exchange Rate Policy Under Remittance Fluctuations. Technical Appendix and Additional Results

CosmoSIS a statistical framework for precision cosmology. Elise Jennings

Markov Chain Monte Carlo methods

Shear Power of Weak Lensing. Wayne Hu U. Chicago

Sampling versus optimization in very high dimensional parameter spaces

Machine Learning for Data Science (CS4786) Lecture 24

Where now? Machine Learning and Bayesian Inference

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Transcription:

Statistics and Prediction Tom Kitching tdk@roe.ac.uk @tom_kitching

David Tom David Tom photon What do we want to measure How do we measure Statistics of measurement Cosmological Parameter Extraction Understanding

Lensing equation Local mapping Recap General Relativity relates this to the gravitational potential Distortion matrix implies that distortion is elliptical : shear and convergence Simple formalise that relates the shear and convergence (observable) to the underlying gravitational potential

Recap Observed galaxies have instrinsic ellipticity and shear Reviewed shape measurement methods Moments - KSB Model fitting - lensfit Still an unsolved problem for largest most ambitious surveys Simulations STEP 1, 2 GREAT08, GREAT10

Mass mapping Correlation functions Power Spectra Recap

What will we learn? Predictive parameter and model estimation Fisher Matrix Evidence Real Life parameter and model selection MCMC

Tomography Generate 2D shear correlation in redshift bins Can auto correlate in a bin Or cross correlate between bin pairs i and j refer to redshift bin pairs z

w 0 =-0.45 w 0 =-0.95

Area 15,000 Euclid (space-based) 10,000 Pan-STARRS1 Pan-STARRS 2 5,000 HSC DES 1500 KiDS 500 150 RCS2 CFHTLenS 2011 2012 2013 2014 2015 2020 2025 Time

What do we want? How accurately can we estimate a model parameter from a given data set? Given a set of N data point x 1,,x N Want the estimator to be unbiased Give small error bars as possible The Best Unbiased Estimator A key Quantity in this is the Fisher (Information) Matrix

Fisher 1935 Tegmark, Taylor, Heavens 1997

likelihood parameter parameter 1 parameter 2

What is the (Fisher) Matrix? Lets expand a likelihood surface about the maximum likelihood point Can write this as a Gaussian Where the Hessian (covariance) is

What is the Fisher Matrix? The Hessian Matrix has some nice properties Conditional Error on α Marginal error on α Matrix inversion performed

likelihood parameter parameter 1 condition parameter 2 marginalise

What is the Fisher Matrix? The Fisher Matrix defined as the expectation of the Hessian matrix This allows us to make predictions about the performance of an experiment! The expected conditional error on α: σ α = (1/F αα ) The expected marginal error on α Matrix inversion performed

The Gaussian Case How do we calculate Fisher Matrices in practice? Assume that the likelihood is Gaussian

The Gaussian Case matrix identity derivative derivative

How to Calculate a Fisher Matrix We know the (expected) covariance and mean from theory Requires NO DATA! Worked example y=mx+c

Theory: y=mx y New Experiment can measure y with error σ(y) What value of x is best for measuring m? σ(y) x

The theory: y=mx Experiment can measure σ(y) Question: what x value is best? Covariance does not depend on m so first term zero dy/dm=x F mm =(1/σ 2 (y))x 2 σ(m)= [F -1 mm ]= [σ2 (y) x -2 ]=σ(y) x -1 Better measure of m at large x

Better at large x y More lever on m σ(y) x

Some nomenclature σ(m)= [F -1 mm ]= [σ2 (y) x -2 ]=σ(y) x -1 Need a fiducial value to make quantitative predictions If derivative is analytic then fairly straightforward (not always the case!)

Some nice properties of Fisher matrices F= A B ( C D ( Matrix Manipulation: Inversion F -1 Addition F=G+H Rotation G=RFR T Schur Complement F A =A-BD -1 C

Adding Extra Parameters To add parameters to a Fisher Matrix Simply extend the matrix

Combining Experiments If two experiments are independent then the combined error is simply F comb =F 1 +F 2 Same for n experiments If not independent need to have a single Fisher matrix with a joint covariance

Re-Parameterising

parameter 1a parameter 1b parameter 2b parameter 2a

Schur Complement Schur Complement F A =A-BD -1 C This is equivalent of the following operation Invert entire matrix F -1 Select the A-part of the inverse F -1 (A) Reinvert What you have is a new sub Fisher matrix What is does this correspond to?

Schur Complement is equivalent of marginalisation over parameters Schur Complement F A =A-BD -1 C < A Marginalises over parameters not-a parameter 1 condition parameter 2 marginalise

likelihood parameter parameter 1 condition parameter 2 marginalise

A Warning about derivatives B likelihood A parameter dl/dparameter= (B-A)/dp Or can approximate using q parabola Numerically must test if this is stable

Warning about the two terms Can we use both terms and get twice the information? No not usually (almost never in cosmology)

Fisher matrices must be positive definite A positive definite matrix has ONLY POSITIVE EIGENVALUES Because by definition the likelihood surface is assumed to be single peaked parameter Can get non-positive definite due to numerical inaccuracies Corresponds to a convex(negatively) curved surface Should always check matrices

Fisher Future Forecasting We now have a tool with which we can predict the accuracy of future experiments! Can easily Calculate expected parameter errors Combine experiments Change variables Add extra parameters

For shear the mean shear is zero, the information is in the covariance so (Hu, 1999) This is what is used to make predictions for cosmic shear and dark energy experiments

Dark Energy Expect constraints of 1% from Euclid

The theory: y=mx Experiment can measure σ(y) Question: what x value is best? dy/dm=x F mm =(1/σ 2 (y))x 2 σ(m)= [F -1 mm ]= [σ2 (y) x -2 ]=σ(y) x -1 Better measure of m at large x

The theory: w(z)=w 0 +w a (1-a) Experiment can measure σ(c l ) Question what redshift/area?

Know from Theory/Simulations Mean is shear is zero

Hu 1999 l+1/2 sums over m-modes f sky scales the covariance with the survey area Note also need to include noise on covariance C=C(l)+N N=σ(e)/n galaxy

Question we have address is: Given an experiment how accurate can I measure parameter values? Alternative/additional question is How accurately can I determine a model (set of parameters)

Bayes Theorem Prior Likelihood Evidence Posterior Measure likelihood of data given parameters Assume prior on parameters Evidence for a Model (set of parameters)

How to compute expected evidence? Evidence Bayes Factor=Ratio of Evidences

What does the Bayes factor mean? Odds: how much would you gamble? Jeffereys Scale (take with a pinch of salt) LnB < 1 inconclusive 1< LnB < 2.5 significant odds ~1:12 2.5 < LnB < 5.0 strong odds ~1:150 LnB > 5.0 decisive odds better than 1:150

Can assume Gaussian likelihoods and perform the integration (Heavens, Kitching, Verde, 2008) Can compute from the Fisher matrix the expected evidence for nested models Other similar approaches in Trotta (2008) Occam Factor can be seen

Occam Factor Occam s Razor: Simpler models are prefered Stops you over fitting your data Would need More Parameters

Example from neutrino mass from weak lensing Neutrinos have mass = model A Neutrinos do not have mass = model B

Example from neutrino mass from weak lensing decisive strong significant inconclusive Euclid-like survey: Kitching et al. 2008

What will we learn? Fisher matrices Likelihood sampling

Likelihood Sampling We have at least 10 cosmological parameters Others may be non-zero functions of scale and/ or redshift w(z)+1 b(z,k)-1

parameter 1 parameter 2 Grid. Evaluate likelihood function at a grid of points in parameter space What is wrong with this approach? N D

MCMC Methods Monte Carlo Markov Chain Randomly Sample Likelihood Space A chain of likelihood (or other) evaluations that is memoryless

Metropolis-Hastings Sample likelihood space with a random walk 1) Pick a point in parameter space x i 2) Evaluate likelihood L(x i ) 3) Pick a new random point x i+1 from a proposal distribution 4) Evaluate likelihood L(x i+1 ) 5) If a=l(x i+1 )/L(x i )>1 ACCEPT (and goto 3) 6) Accept with probability a Draw a uniform random number b and if b<a ACCEPT else reject

What choice of proposal? Common to choose multivariate Gaussian (inefficient for degenerate parameters) Could also choose the Fisher matrix

1) Pick a point in parameter space x i 2) Evaluate likelihood L(x i ) 3) Pick a new random point x i+1 from a proposal distribution 4) Evaluate likelihood L(x i+1 ) 5) If a=l(x i+1 )/L(x i )>1 ACCEPT (and goto 3) 6) Accept with probability a Draw a uniform random number b and if b<a ACCEPT else reject parameter 1 parameter 1 Rand(B) L4/L3>B L4<L3 L1 L3>L2 L2>L1 L4<L3 Rand(B) L4/L3<B parameter 2

The density of points is proportional to the likelihood run in

When to stop (never!) But chain will converge Gelman-Rubin Convergence variance between chains consistent with variance within a chain Run multiple chains; m group of n chains n chains W=mean variance over chains B=variance between chains σ unbiased estimate of target m groups of n chains V=1 means convergence

Most commonly used package is MCMC for Cosmology cosmomc http://cosmologist.info/ cosmomc/ Coupled with CAMB Used for WMAP analysis (and many others) Can download WMAP MCMC chains to play around with http://lambda.gsfc.nasa.gov/ toolbox/

Other MCMC methods What are the problems with MCMC? Multiple peaks Evidence? Have some different types/flavours Gibbs sampling Simulated annealing What alternatives? Nested Sampling Feroz & Hobson (2008), refined by Feroz, Hobson & Bridges (2008) Population Monte Carlo Kilbinger et al. 2009

Recap Predictive parameter and model estimation Fisher Matrix Evidence Real Life parameter and model selection MCMC

Area 15,000 Euclid (space-based) 10,000 Pan-STARRS1 Pan-STARRS 2 5,000 HSC DES 1500 KiDS 500 150 RCS2 CFHTLenS 2011 2012 2013 2014 2015 2020 2025 Time

Conclusion Lensing is a simple cosmological probe Directly related to General Relativity Simple linear image distortions Measurement from data is challenging Need lots of galaxies and very sophisticated experiments Lensing is a powerful probe of dark energy and dark matter