Cosmology & CMB. Set5: Data Analysis. Davide Maino

Similar documents
KIPMU Set 1: CMB Statistics. Wayne Hu

LFI frequency maps: data analysis, results and future challenges

Data analysis of massive data sets a Planck example

CMB Angular Power Spectra and their Likelihoods: in Theory and in (Planck) Practice. E. Hivon & S. Galli

Cosmology with CMB: the perturbed universe

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

STA414/2104 Statistical Methods for Machine Learning II

Gravitational Lensing of the CMB

Polarised synchrotron simulations for EoR experiments

An Estimator for statistical anisotropy from the CMB. CMB bispectrum

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester

V. The Thermal Beginning of the Universe

POWER SPECTRUM ESTIMATION FOR J PAS DATA

Sampling versus optimization in very high dimensional parameter spaces

arxiv: v2 [astro-ph.co] 5 Jan 2017 Accepted XXX. Received YYY; in original form ZZZ

Data Analysis I. Dr Martin Hendry, Dept of Physics and Astronomy University of Glasgow, UK. 10 lectures, beginning October 2006

CosmoSIS Webinar Part 1

BINGO simulations and updates on the performance of. the instrument

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!

Using training sets and SVD to separate global 21-cm signal from foreground and instrument systematics

Do Cosmological Perturbations Have Zero Mean?

Monte Carlo. Lecture 15 4/9/18. Harvard SEAS AP 275 Atomistic Modeling of Materials Boris Kozinsky

Challenges in Cosmic Microwave Background Data Analysis

Galaxies 626. Lecture 3: From the CMBR to the first star

Frequentist-Bayesian Model Comparisons: A Simple Example

CMB Lensing Reconstruction on PLANCK simulated data

Science with large imaging surveys

Big Bang, Big Iron: CMB Data Analysis at the Petascale and Beyond

Variational Methods in Bayesian Deconvolution

Primordial B-modes: Foreground modelling and constraints

Statistics and Prediction. Tom

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Detection ASTR ASTR509 Jasper Wall Fall term. William Sealey Gosset

Constraints on primordial abundances and neutron life-time from CMB

Physics of CMB Polarization and Its Measurement

Statistical techniques for data analysis in Cosmology

Multimodal Nested Sampling

Measurements of Degree-Scale B-mode Polarization with the BICEP/Keck Experiments at South Pole

Statistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That

CPSC 540: Machine Learning

Toward an Understanding of Foregrounds in the BICEP2 Region

An introduction to Markov Chain Monte Carlo techniques

Polarized Foregrounds and the CMB Al Kogut/GSFC

Supplementary Note on Bayesian analysis

Cosmic Variance of the Three-Point Correlation Function of the Cosmic Microwave Background

Examining the Effect of the Map-Making Algorithm on Observed Power Asymmetry in WMAP Data

Analysis of four-year COBE-DMR data

STA 4273H: Sta-s-cal Machine Learning

arxiv: v3 [astro-ph.co] 16 May 2011

Secondary Polarization

MODEL INDEPENDENT CONSTRAINTS ON THE IONIZATION HISTORY

Maximum Likelihood Estimation. only training data is available to design a classifier

DES Galaxy Clusters x Planck SZ Map. ASTR 448 Kuang Wei Nov 27

The data model. Panos Labropoulos. Kapteyn Instituut

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Approximate Inference Part 1 of 2

Single Lens Lightcurve Modeling

Statistical Data Analysis Stat 3: p-values, parameter estimation

n=0 l (cos θ) (3) C l a lm 2 (4)

Approximate Inference Part 1 of 2

The ultimate measurement of the CMB temperature anisotropy field UNVEILING THE CMB SKY

The Multivariate Gaussian Distribution [DRAFT]

Shear Power of Weak Lensing. Wayne Hu U. Chicago

Density Estimation. Seungjin Choi

STA 4273H: Statistical Machine Learning

Dynamic System Identification using HDMR-Bayesian Technique

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression

The first light in the universe

Wavelets Applied to CMB Analysis

Cosmic Microwave Background Anisotropy

arxiv:astro-ph/ v1 5 Aug 2006

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

THE PLANCK MISSION The most accurate measurement of the oldest electromagnetic radiation in the Universe

Cosmology & CMB. Set6: Polarisation & Secondary Anisotropies. Davide Maino

Lecture : Probabilistic Machine Learning

Bayesian Dynamic Linear Modelling for. Complex Computer Models

Practical Statistics

Computational Methods Short Course on Image Quality

Cross-Correlation of CFHTLenS Galaxy Catalogue and Planck CMB Lensing

Random Walks A&T and F&S 3.1.2

Non-Imaging Data Analysis

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Eco517 Fall 2004 C. Sims MIDTERM EXAM

Statistical Methods in Particle Physics

Scott Gaudi The Ohio State University Sagan Summer Workshop August 9, 2017

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring

The Pseudo-C l method: cosmic microwave background anisotropy power spectrum statistics for high precision cosmology

Lies, damned lies, and statistics in Astronomy

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Deconvoluting CMB (and other data sets)

Fisher Matrix Analysis of the Weak Lensing Spectrum

The cosmic background radiation II: The WMAP results. Alexander Schmah

Cosmic Microwave Background

STA 4273H: Statistical Machine Learning

Advanced Statistical Methods. Lecture 6

Pattern Recognition and Machine Learning

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

Statistical Challenges of Cosmic Microwave Background Analysis

D une masse de données à une poignée de paramètres cosmologiques: quelques aperçus sur le traitement des données du satellite Planck.

State-Space Methods for Inferring Spike Trains from Calcium Imaging

Transcription:

Cosmology & CMB Set5: Data Analysis Davide Maino

Gaussian Statistics Statistical isotropy states only two-point correlation function is needed and it is related to power spectrum Θ(ˆn) = lm Θ lm Y lm (ˆn) Θ lm Θ l m = δ ll δ mm CΘΘ l Reality of the field says Θ lm = ( 1) m Θ l m For Gaussian random field, power spectrum defines all higher moments e.g. Θ l1 m 1 Θ l2 m 2 Θ l3 m 3 Θ l4 m 4 = ( 1) m 1+m 2 δ l1 l 3 δ m1 m 3 δ l2 l 4 δ m2 m 4 Cl ΘΘ 1 Cl ΘΘ 2 + all pairs

Idealized Statistical Errors Take a noisy estimator of the multipoles in the map ˆΘ lm = Θ lm + N lm and take noise to be statistically isotropic N lm N l m = δ ll δ mm CNN l Construct an un-biased estimator of the power spectrum Ĉl ΘΘ = Cl ΘΘ Ĉ ΘΘ l = 1 2l + 1 The variance of the estimator is Ĉ ΘΘ l l m= l ˆΘ lm ˆΘ lm C NN l Ĉl ΘΘ Ĉl ΘΘ 2 = 2 ( C ΘΘ l 2l + 1 + Cl NN ) 2

Cosmic and Noise Variance The rms in the estimator is simply the total power spectrum reduce by 2/N modes where N modes is the number of m mode measurements Even a perfect experiment with Cl NN = 0 has some statistical variance due to the Gaussian nature of the field. Cosmic variance is the result of having only one sky to observe Noise variance is, sometimes, approximated by white noise C NN l = 4π N pix σ 2 wne l(l+1)fwhm2 /8ln2 where we also remove the effect of beam convolution with an antenna with a FWHM aperture

Idealized Parameter Forecasts A crude propagation of errors is often usual for estimation purposes Suppose C αβ described the covariance matrix of the estimators for a given parameter set π α Define F = C 1. Making an infinitesimal transformation to a new set of parameters p µ F µν = αβ π α p µ F αβ π β p ν In our case π α are the C l, the covariance is diagonal and p µ are cosmological parameters F µν = l 2(C ΘΘ l 2l + 1 + C NN l CΘΘ l ) 2 p µ C ΘΘ l p ν

Idealized Parameter Forecasts Polarization handled in the same way The matrix F is called Fisher Information Matrix and represents a local approximation to the transformation of the covariance and hence is only accurate for well constrained directions in parameter space Cramer Rao Inequality: No estimator can measure parameters p µ with error smaller that the diagonal elements of F 1. Set an hard limit to accuracy. Derivative are evaluate as finite differences (better, when possible, two-side difference) C ΘΘ l p µ = CΘΘ l pµ,0 (p µ,0 + h) C ΘΘ l (p µ,0 h) 2h Fisher matrix identifies parameter degeneracies but only the local direction i.e. errors are ellipses not bananas

Idealized Parameter Forecasts

Beyond Idealizations: Time Ordered Data For real data analysis the starting point is a sequence of time ordered data (TOD) out of the instrument Begin with a model of TOD as d t = P ti Θ i + n t where i is i th pixel in the map, d t is the data in the TOD taken at time t. Typically for Planck we have [d t ] = 10 10 and number of pixels 10 6 10 7 Noise n t is drawn from a distribution with, hopefully, known power spectrum n t n t = N d,tt

Practical Issues Present ground based, balloon-borne experiments and even more from satellite have TOD of the order of 10 10 time samples with output maps of about 10 6 10 7 pixels Inversion problem scales as Npixels 3 : it is not a trivial task CPU time and memory requirement

Pointing Matrix It encodes the way in which the sky is actually observed by the experiment Simplest case: a total power radiometer or a bolometer observing a given direction at a given time: P ij = 0 0 1 0 1 0 0 0 0 1 0 0 0 0 1 0 More complex case for e.g. WMAP which observed at the same time two points separate by 141 in the sky 0 0 1 1 1 0 1 0 P ij = 0 1 0 1 1 0 1 0

Pointing Matrix In general pointing will includes also the effect of beam convolution dωp(θ θ0, φ φ 0 )T(θ, φ) T(θ 0, φ 0 ) = dωp(θ, φ) During data analysis beam and pointing are treated separately and we derive m i as convolved by instrument beam and scanning strategy

Map-Making We would to extract the best possible map from our TOD. Let us construct a simple chi-squared estimator: χ 2 = tt ij (d t P ti m i )N 1 tt (d t P t jm j ) and minimize it with respect to the signal we are interested in (the map) χ 2 m i = 2 tt j P ti N 1 tt (d t P t jm j ) = 0 which has a solution given by P ti N 1 tt P t jm j = tt j tt j P ti N 1 tt d t

Map-Making We search for an estimator ˆm i solution to that equation ˆm i = ( tt P it N 1 tt P t j) 1 P ti N 1 tt d t where the first term states how the noise is distributed on the map according to noise properties and observing strategy The same equation but in matrix notation reads ˆm = (P T N 1 P) 1 P T N 1 d

Map-Making Is this an optimal (un-biased & minimum variance) estimator? Let us rewrite ˆm according to our data model: ˆm = (P T N 1 P) 1 P T N 1 (Pm + n) = m (P T N 1 P) 1 P T N 1 n and as long as n = 0 then ˆm = m un-biased Compute the variance of the estimator C N = (m ˆm)(m ˆm) T ) = (P T N 1 P) 1 P T N 1 nn T N 1 P(P T N 1 P) 1 since nn T = N we find C N = (P T N 1 P) 1 which could be demonstrated to be minimum variance.

Maximum Likelihood Map-making What is the best estimator of the underlying map Θ i? Construct the likelhood function: probability to get data given a theory L P[data theory]. In this case theory is the set of parameters Θ i L Θ (d t ) = 1 (2π) Nt/2 detn d exp [ 1 2 (d t P ti Θ i ) N 1 d,tt ( dt P t jθ j ) ] Bayes Theorem says that P[Θ i d t ], the probability that temperatures are equal to Θ i given the data d t, is proportional to the likelihood function L times a prior P(Θ i ), taken to be uniform P[Θ i d t ] P[d t Θ i ] L Θ (d t )

Maximum Likelihood Map-making Maximizing the likelihood of Θ i is simple since log-likelihood is quadratic (just a χ 2 ) Differentiating the argument in the exponential w.r.t. Θ i and setting to zero, leads to the estimator ˆΘ i = C N,ij P jt N 1 d,tt d t where C N (P T N 1 P) 1 is the covariance of the estimator Given large dimension of the TOD, direct matrix manipulation is unfeasible. A key simplifying assumption is the stationarity of the noise (does not change its properties along the observation time), that N d,tt depends only on (t t )

Foregrounds ML map-making can be applied to TOD of multiple observations frequencies N ν and hence obtain different frequency maps This is fundamental in order to separate genuine cosmological signal from other sources thanks to their different frequency spectrum A cleaned CMB map can be obtained by modeling the maps as ˆΘ ν i = A ν i Θ i + n ν i + f ν i where A ν i = 1 is maps are all at the same angular resolution otherwise it encodes the beam information; fi ν is the noise contributed by foregrounds This is another map-making problem. Given a covariance matrix for foreground noise (a prior from other data sets), same resolution 5 foregrounds: synchrotron, free-free, radio point sources at low-frequencies (ν < 100 GHz) and dust plus IR point sources at high frequencies (ν > 100 GHz)

Power Spectrum The next step in the chain after getting a clean CMB map is the extraction of the angular power spectrum which models the correlation of the signal between pixels C S,ij Θ i Θ j = l 2 T,l W l,ij Here W l is the window function and encodes the effect of beam and pixelization smoothing For the simplest case of a gaussian beam of width σ this is proportional to the Legendre polynomial P l (ˆn i ˆn j ) for the pixel separation multiplied by b 2 l e l(l+1)σ2

Band Powers In principle power spectrum at every l has to be extracted in a ML sense However with a finite patch of the sky, it is not possible to extract multipoles separate by l < 2π/L where L is the dimension of the survey So consider instead a theory with parameters 2 T,l constant on each band l chosen to match the survey forming a set of band powers B a The likelihood of the band powers given the map is ( 1 L B (Θ i ) = (2π) Np/2 exp 1 ) detc Θ 2 Θ ic 1 Θ,ij Θ j where C Θ = C S + N and N p is the number of pixel in the map

Band Power Estimation Brute force? C = C S + C N. We know that in pixel space with un-correlated noise C N is diagonal in harmonic space C S is diagonal with power spectrum on the diagonal Computing the determinant and inverse of C is not trivial and scales as N 3 p Brute force worked for Boomerang 98 with only N p = 57, 000

Band Power Estimation How to solve the issue? Different approaches: 1. Transform data to signal-to-noise basis throwing away contaminated modes (requires at least one Np 3 operation) 2. Use specific observing symmetries i.e. for azimuthal scanning but works only in ideal cases 3. Compute real space 2-pt correlation function and then get power spectrm or strategy likelihood computation scales as N 3/2 p

Monte Carlo Apodized Spherical Transform EstimatoR - MASTER Direct spherical harmonic transform of the observed map Easy to include properties specific of a given CMB experiment (survey geometry, scanning strategy, instrumental noise, glitches, spikes, etc) Calibrate un-wanted effects by means of Monte Carlo (MC) simulations of modeled observations and data analysis mode-mode coupling due to incomplete sky coverage described as a sky window

MASTER Given a map m i decompose in spherical harmonics to get a lm = dˆn m(ˆn)y lm (ˆn) Without instrumental noise an un-biased estimator of C th l C l = 1 2l + 1 l m= l a lm 2 For incomplete sky coverage we weight pixels with W(ˆn) and ã lm = dˆn m(ˆn)w(ˆn)y lm (ˆn) Ω p m(p)w(p)ylm (p) p is

MASTER The pseudo-power spectrum C l is then defined as C l = 1 2l + 1 l m= l ã lm 2 computing ã lm with a suitable pixelization (HEALPix) scales as Np 1/2 l max C l differs from the full-sky power spectrum but their ensemble average are related C l = l M ll C l where M ll accounts for mode-mode coupling due to incomplete sky

MASTER Beam, instrumental noise and data processing of TOD can be included in the power spectrum computation C l = l M ll F l B 2 l C l + Ñ l where B 2 l is the Window Function describing both beam and pixel smoothing, Ñ l is the averaged noise power spectrum and F l is the transfer function that models data analysis (filtering, map-making, etc.) the rms of our estimates are given by C l [ C l + N ] l 2 B 2 (2l + 1) lf l sky

MASTER Mode-mode kernel M ll is purely geometric and needs to be computed only once Filter function F l is estimated via signal-only MC with known input theoretical model and inverting the main equation Number of MC depends on the required accuracy and scales as N (s) MC From real data extract N tt and its frequency power spectrum construct noise only Gaussian TOD and re-do the actual analysis with noise-only simulations to get N l

MASTER Reduce correlation between C l s due to sky-cut and errors on single C l with binning together data in l space: create binning, P bl, and un-binning, Q lb, operators: C b = P bl C l Main equation becomes C l = K ll C l + Ñ l and we want a solution in the form of P bl K ll C l = P bl ( C l Ñ l )

MASTER Working a bit with the algebra we get where K bb = P bl M ll F l B 2 l Q l b C b = K 1 bb P b l ( C l Ñ l ) Best un-biased estimator of the whole sky noise and signal power spectrum is Ĉ b = K 1 bb P b l( C l Ñ l MC ) ˆN b = K 1 bb P b l Ñ l MC

MASTER Finally to get the covariance of Ĉb useful for extracting cosmological parameters we 1. smooth interpolate Ĉb to get a fiducial CMB model 2. perform a set of signal+noise MC simulations to get {Ĉb} 3. the covariance is given by C bb 4. the error bar is Ĉb = C 1/2 bb = (Ĉb Ĉb MC )(Ĉb Ĉb MC) MC

MASTER Figure : WMAP-7yr Temperature angular power spectrum

Cosmological Parameters The probability distribution of the band powers given the cosmological parameters p i is not Gaussian but this is often considered an adequate approximation (but for specific parameters) [ 1 L p (ˆB a ) (2π) Nc/2 exp 1 ] detc B 2 (ˆB a B a )C 1 B,ab (ˆB b B b ) Grid based approaches evaluate the likelihood in cosmological parameter space and then maximize: too demanding e.g. 10 parameters sampled at least in 10 point values each yields 10 10 power spectrum computation! Faster approaches monte carlo explore the likelihood space (Monte Carlo Markov Chains) Number of parameters p i 10 which is a radical compression from the N t 10 10 elements in the TOD!

MCMC Monte Carlo Markov Chain: a random walk in parameter space and each step depends only on the previous one Start with a set of cosmological parameters p m and compute the likelihood (the spectrum is observed) Take a random step in parameters space to p m+1 of size drawn from a multivariate Gaussian C p that is a guess of the parameters covariance matrix and then compute likelihood Drawn a uniform distributed random number (0, 1) and if the likelihood ratio exceeds this value take the step (add to Markov chain); if not then do not take the step and try with another one

Markov Chain Monte Carlo Generates sequence of random samples from an arbitrary probability density function Metropolis algorithm assume a gaussian distribution for parameter steps accept of reject the step simple and generally applicable

Metropolis Algorithm select initial parameter vector p 0 iterate as follows: at iteration number k 1. create new position p = p k + p where p is randomly chosen from a Gaussian distribution 2. evaluate r = L(p )/L(p k ) 3. accept position i.e. set p k+1 = p if r 1 otherwise set p k+1 = p k requires only computation of L creates a Markov chain since p k+1 depends only on p k.

MCMC With a complete chain of N M elements compute the mean of the chain and its variance σ 2 (p i ) = p i = 1 N M 1 N M 1 N M p m i m=1 N M m=1 (p m i p i ) 2 Trick is in arruring burn in (not sensitive to initial point), step size and convergence Usually requires running multiple independent chains with typically 10 4 elements per chain

MCMC & Parameter Forecast MCMC methods can be use to make parameter forecast more detailed and realistic than simple Fisher Matrix approach Create a mock set of band powers with instrument specific noise properties Run usual MCMC machine to get cosmological parameters Given actual accuracy but also reveal any possible bias in parameter determination as well as proper parameters correlation (no more simple ellipse but bananas!)

MCMC & Parameter Forecast