arxiv: v1 [astro-ph.im] 16 Dec 2014

Similar documents
arxiv: v2 [astro-ph.co] 28 Sep 2015 Submitted to MNRAS

A8824: Statistics Notes David Weinberg, Astronomy 8824: Statistics Notes 6 Estimating Errors From Data

Statistical Inference On the High-dimensional Gaussian Covarianc

Estimating sparse precision matrices

Using training sets and SVD to separate global 21-cm signal from foreground and instrument systematics

1 Confidence Intervals and Prediction Intervals for Feed-Forward Neural Networks

Angular power spectra and correlation functions Notes: Martin White

Modern Methods of Data Analysis - WS 07/08

Science with large imaging surveys

Polynomial Chaos and Karhunen-Loeve Expansion

Generalized Autoregressive Score Models

Modern Image Processing Techniques in Astronomical Sky Surveys

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Confidence Estimation Methods for Neural Networks: A Practical Comparison

Data Analysis I. Dr Martin Hendry, Dept of Physics and Astronomy University of Glasgow, UK. 10 lectures, beginning October 2006

Statistical inversion of the LOFAR Epoch of Reionization experiment data model

The Square Kilometre Array Epoch of Reionisation and Cosmic Dawn Experiment

Data Analysis Methods

(17) (18)

Signal Model vs. Observed γ-ray Sky

Frequentist-Bayesian Model Comparisons: A Simple Example

Cosmology & CMB. Set5: Data Analysis. Davide Maino

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR

Announcements (repeat) Principal Components Analysis

Sky Projected Shack-Hartmann Laser Guide Star

Functional Latent Feature Models. With Single-Index Interaction

Large-scale structure as a probe of dark energy. David Parkinson University of Sussex, UK

Post-Selection Inference

1 Statistics Aneta Siemiginowska a chapter for X-ray Astronomy Handbook October 2008

The Wishart distribution Scaled Wishart. Wishart Priors. Patrick Breheny. March 28. Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/11

System 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to:

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

8 Error analysis: jackknife & bootstrap

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Fin. Econometrics / 53

arxiv: v1 [astro-ph.im] 20 Jan 2017

Inference in VARs with Conditional Heteroskedasticity of Unknown Form

Principal Component Analysis

LECTURE NOTE #10 PROF. ALAN YUILLE

Statistical Tools and Techniques for Solar Astronomers

Cosmological parameters of modified gravity

INTRODUCTION TO PATTERN RECOGNITION

A Note on the Particle Filter with Posterior Gaussian Resampling

A new unscented Kalman filter with higher order moment-matching

Are Low Surface Brightness Discs Young?

Research Article The Laplace Likelihood Ratio Test for Heteroscedasticity

Journal of Statistical Research 2007, Vol. 41, No. 1, pp Bangladesh

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!

SIMON FRASER UNIVERSITY School of Engineering Science

Statistical Data Analysis Stat 3: p-values, parameter estimation

NUCLEAR NORM PENALIZED ESTIMATION OF INTERACTIVE FIXED EFFECT MODELS. Incomplete and Work in Progress. 1. Introduction

1 Cricket chirps: an example

Bayesian rules of probability as principles of logic [Cox] Notation: pr(x I) is the probability (or pdf) of x being true given information I

Maths for Signals and Systems Linear Algebra in Engineering

arxiv:hep-ex/ v1 2 Jun 2000

The Variational Gaussian Approximation Revisited

ECE521 week 3: 23/26 January 2017

Statistical Searches in Astrophysics and Cosmology

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Non-Imaging Data Analysis

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Cosmology with high (z>1) redshift galaxy surveys

A Gravitational Ising Model for the Statistical Bias of Galaxies

arxiv: v1 [astro-ph.co] 3 Apr 2019

Computational functional genomics

Cosmic Statistics of Statistics: N-point Correlations

Morphology and Topology of the Large Scale Structure of the Universe

Precise measurement of the radial BAO scale in galaxy redshift surveys

Technical Vignette 5: Understanding intrinsic Gaussian Markov random field spatial models, including intrinsic conditional autoregressive models

Economics Division University of Southampton Southampton SO17 1BJ, UK. Title Overlapping Sub-sampling and invariance to initial conditions

A NEW METHOD FOR ADAPTIVE OPTICS POINT SPREAD FUNCTION RECONSTRUCTION

Empirical Likelihood Tests for High-dimensional Data

+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing

statistical methods for tailoring seasonal climate forecasts Andrew W. Robertson, IRI

Statistical analysis of galaxy surveys I. Robust error estimation for two-point clustering statistics

MIXTURE MODELS AND EM

Background Mathematics (2/2) 1. David Barber

Comment on Article by Scutari

To Estimate or Not to Estimate?

Generalized Elastic Net Regression

Principal Component Analysis

A Comparison of Particle Filters for Personal Positioning

BigBOSS Data Reduction Software

Parameter estimation in epoch folding analysis

Determination of the Mass-Ratio Distribution, II: Double-lined Spectroscopic Binary Stars

Numerical Methods I Singular Value Decomposition

Does k-th Moment Exist?

Chapter 4: Factor Analysis

Uncertainty quantification for Wavefield Reconstruction Inversion

Statistical Analysis of Solar and Stellar Properties

Solving the Schrödinger equation for the Sherrington Kirkpatrick model in a transverse field

Linear Feedback Control Using Quasi Velocities

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Modern Navigation. Thomas Herring

Cosmological Studies with SZE-determined Peculiar Velocities. Sarah Church Stanford University

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

A New Subspace Identification Method for Open and Closed Loop Data

Statistical Inference with Monotone Incomplete Multivariate Normal Data

Baryon Acoustic Oscillations (BAO) in the Sloan Digital Sky Survey Data Release 7 Galaxy Sample

Cosmological Tests of Gravity

Transcription:

Statistical Challenges in 21st Century Cosmology Proceedings IAU Symposium No. 306, 2014 A.F. Heavens, J.-L. Starck, A. Krone-Martins eds. c 2014 International Astronomical Union DOI: 00.0000/X000000000000000X Errors on errors Estimating cosmological parameter covariance arxiv:1412.4914v1 [astro-ph.im] 16 Dec 2014 Benjamin Joachimi 1 and Andy Taylor 2 1 Department of Physics & Astronomy, University College London, Gower Place, London WC1E 6BT, United Kingdom email: b.joachimi@ucl.ac.uk 2 Institute for Astronomy, University of Edinburgh, Royal Observatory, Blackford Hill, Edinburgh EH9 3HJ, United Kingdom email: ant@roe.ac.uk Abstract. Current and forthcoming cosmological data analyses share the challenge of huge datasets alongside increasingly tight requirements on the precision and accuracy of extracted cosmological parameters. The community is becoming increasingly aware that these requirements not only apply to the central values of parameters but, equally important, also to the error bars. Due to non-linear effects in the astrophysics, the instrument, and the analysis pipeline, data covariance matrices are usually not well known a priori and need to be estimated from the data itself, or from suites of large simulations. In either case, the finite number of realisations available to determine data covariances introduces significant biases and additional variance in the errors on cosmological parameters in a standard likelihood analysis. Here, we review recent work on quantifying these biases and additional variances and discuss approaches to remedy these effects. Keywords. Methods: statistical, methods: data analysis, cosmological parameters 1. What noise does to your covariance matrix The variance-covariance of cosmological data is often complicated and not known analytically, receiving contributions from sample variance coupled with complex survey masks, instrumental effects, as well as measurement and shot noise. Therefore it is customary to estimate data covariance matrices from the data itself (via resampling techniques) or from suites of realistic mock datasets. The latter are computationally expensive to generate, so that there is a strong drive to keep the number of simulated realisationsof the data, N S, to a minimum. Conversely, the ever increasing size of surveys and number of mature cosmological probes to be extracted leads to data vectors which can easily exceed dimensions of N D > 1000 in the near future. Despite the pressure to keep N S small while creating data vectors with large N D, cosmologists have until recently almost exclusively employed the standard sample covariance estimator and proceeded to perform a likelihood analysis without further consideration of the statistical uncertainty and potential biases of their covariance estimate. Only recently, beginning with the work of Hartlap et al. (2007), has there been an increased awareness of long-established results on this topic in the statistics literature. If the elements of a data vector D are Gaussian distributed, the data sample covariance estimate M = D D τ, where D = D D, follows a Wishart distribution (the multivariate generalisation of a χ 2 distribution) with N S 1 degrees of freedom (Wishart 1928). The inverse data covariance, Ψ M 1, which is required in least squares and 1

2 Benjamin Joachimi & Andy Taylor N S =5000 N S =100 N S =22 eigenvalue 5 4 3 2 1 N S 10 22 50 100 500 5000 N D =20 0 0 5 10 15 20 eigenmode Figure 1. Illustration of the impact of noise on a covariance matrix, for the toy case of a 20-dimensional identity matrix. Top: Realisations of Wishart-distributed random matrices with expectation subtracted, generated for N S = 5000,100,22 (from left to right) to illustrate the increasing levels of noise. Bottom: Ordered eigenvalues, averaged over 1000 realisations of covariance matrices. The lines show eigenvalues for different values of the number of realisation used for computing the covariance, N S, as indicated in the legend. The dashed horizontal line indicates the eigenvalues in the noise-free case. likelihood expressions, then follows an Inverse-Wishart distribution with N S N D 2 degrees of freedom. The moments of this distribution were first derived by Kaufman (1967) who found for the mean ˆΨ = N S 1 N S N D 2 Ψ, (1.1) demonstrating that ˆΨ is increasingly biased high for decreasing N S and diverges at N S = N D + 2. Figure 1 provides an illustration for this non-intuitive behaviour. For decreasing N S the largest (smallest) eigenvalues of a noisy covariance matrix are biased increasingly high (low), and the condition number dramatically increases. The smallest eigenvalue drops to zero at N S = N D +2, rendering the covariance singular. Even after correcting for the bias, the variance in the covariance estimate diverges at a very similar rate (see again Kaufman 1967). One may wonder about the impact on early cosmological analyses which generally boasted only a small number of simulations and seem to have ignored the biases in the inverse data covariance. Since the estimated covariances would have been very noisy, one can hypothesise that researchers would have calculated a singular value decomposition and set the smallest eigenvalues to zero before proceeding with a Moore-Penrose pseudoinverse. Figure 2 illustrates two possible scenarios, alongside with the bias expected from Eq. (1.1). Depending on the exact value of N S and N D, and the details of the noisesuppression measures, biases may have been large and both positive and negative. The

SCCC21. Errors on errors 3 fractional biason inverse data covariance 8 6 4 2 0 max condition no. 1000 max condition no. 100 Kaufman bias 10 100 N S Figure 2. The effect of noise stabilising measures (via singular value decomposition) on the bias of the inverse covariance. Shown is the average fractional bias on the diagonal elements of the inverse covariance matrix (for N D = 24; indicated by the vertical line), as a function of the number of realisation used for computing the covariance, N S. The black solid line corresponds to the bias of the inverse of the sample covariance matrix, as calculated by Kaufman (1967). The red (blue) line shows the bias for the case that the smallest eigenvalues of the covariance have been set to zero before calculating a pseudo-inverse, such that the condition number does not exceed 1000 (100). increase in the largest eigenvalues would have remained untreated, and consequently the inverse covariance would have tended to zero (or the fractional bias to -1) for N S 0. Precise and accurate cosmological analysis requires a more careful treatment of these noise effects, either quantifying their effect on cosmological parameter errors or mitigating them by employing alternatives to the standard sample covariance estimator. 2. Impact on the errors of cosmological parameters Taylor& Joachimi(2014, TJ14 hereafter) calculated the impact of biases and variances in the data sample covariance on the cosmological parameter covariance, finding that the latter is generally biased high and takes up additional variance (see Dodelson & Schneider 2013 and Percival et al. 2014 for earlier, approximate results). For the case that the parameter covariance is estimated from the curvature of the likelihood at its peak (similar to estimates from MCMC samples), they derived its full distribution, which is again a Wishart distribution with N S N D +N P 1 degrees of freedom, where N P is the number of parameters, i.e. the dimension of the parameter covariance matrix. The mean reads ĈW µν = N S N D +N P 1 C µν, (2.1) N S N D 2 while exact expression for the variance can also be derived. Additionally, TJ14 found an exact expression for the mean of the parameter covariance estimated from the scatter in likelihood peaks, N S 2 ĈP µν = N S N D +N P 2 C µν, (2.2) which they determined from simulations based on random Wishart matrices. De-biasing the parameter covariances based on these expressions, TJ14 derived constraints on the

4 Benjamin Joachimi & Andy Taylor minimum number of simulations required to reach a maximum contribution, ν, to the error on cosmological parameters originating from noise in the data covariance (see their Fig. 5). In the case of the parameter covariance derived from the likelihood curvature, C W, the minimum number is given by N S N D + N P /ν + 2ν 2. For forthcoming cosmological surveys, where easily N D 1000, this implies that, if one tolerates a 10% increase in parameter errors due to noise in the data covariance, N S has to be only slightly largerthan N D, whereasif one restricts this increase to 1%, a challengingnumber of N S > 10 4 would result. 3. Beyond the sample covariance There are various alternatives to using the standard sample covariance estimator in a likelihood analysis, which Taylor et al. (2013, TJK13 hereafter) presented in schematic form in their Fig. 8. Below we summarise the most relevant points. Resampling methods: We have concentrated here on the generation of realisations of the data from simulations, i.e. externally. Resampling methods like bootstrap or jackknife allow for the estimation of covariances internally from the data. To preserve the correlations in the data, one has to define blocks of survey volume, or patches on the sky, which are quasi-independent. This requirement limits the number of blocks for a given survey volume/area. Moreover, long-range correlations will invalidate this assumption to some degree and cause biases. In addition, bootstrap and jackknife estimates are generally biased and, although consistent, can converge slowly. For more details on resampling methods see the contribution by P. Arnalte-Mur. Data compression: Data compression alleviates the problem of noise effects originating from the data covariance as biases and variances scale approximately with N S N D. Maximal data compression, i.e. N D N P, even eliminates all adverse noise effects, as can be seen from Eqs. (2.1) and (2.2). Note that in the latter case the factor from the Kaufman bias in Eq. (1.1) remains, which can be removed as the parameter covariance is now a linear transformation of the data covariance, so no inversion is necessary. However, data compression techniques require some information about the data covariance. TJK13 show for maximal Karhunen-Loeve compression that, using the sample covariance in the compression operation, exactly reproduces the original noise effects nothing is gained. Employing a noise-free model covariance instead renders the compression suboptimal, thus increasing errors on cosmological parameters to a yet unknown degree. Shrinkage: We are not completely ignorant about the form of covariance matrices for cosmological data. Their elements vary smoothly across angular and redshift scales, and sometimes it is fair to assume that the diagonal dominates the errors, e.g. if shot noise is prominent. In some regimes, such as on large, nearly Gaussian scales, we may even have good, if not exact, analytic models. This is valuable prior information that can be used to improve covariance estimates. One such approach is shrinkage estimation, building a linear combination of the sample covariance and a model covariance (which can contain free parameters). The weighting of the linear combination can be estimated analytically from the data. TJK13 test several shrinkage estimators of covariance for a toy cosmological case (see also Pope & Szapudi 2008). Their worst-performing estimator was derived by Stein et al. (1972) who proved that nonetheless their estimator outperforms the sample estimator with respect to a natural loss function based on the mean square error of the mean vector of the data, which implies the sample covariance estimator is inadmissible (see also D. van Dyk s contribution).

References SCCC21. Errors on errors 5 Dodelson S. & Schneider M.D. 2013, Phys Rev D, 88, 063537 Hartlap J., Simon P., & Schneider P. 2007, A&A, 464, 399 Kaufman G.M. 1967, Report No. 6710, Center for Operations Research and Econometrics, Catholic University of Louvain, Heverlee, Belgium Percival W., Ross A.J., Snchez, A.G. et al. 2014, MNRAS, 439, 2531 Pope A.C., & Szapudi I. 2008, Tech. Report No. 37, Depart. Statist., Stanford University Stein C., Efron B., & Morris C. 1972,, 389, 766 Taylor A.N., & Joachimi B. 2014, MNRAS, accepted (TJ14) Taylor A.N., Joachimi B., & Kitching T. 2013, MNRAS, 432, 1928 (TJK13) Wishart J. 1928, Biometrika, 20A, 32