Statistical techniques for data analysis in Cosmology

Similar documents
Statistical Methods in Particle Physics

Statistics and Data Analysis

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Statistical Data Analysis Stat 3: p-values, parameter estimation

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics

Stat 5101 Lecture Notes

Subject CS1 Actuarial Statistics 1 Core Principles

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland

LECTURE NOTES FYS 4550/FYS EXPERIMENTAL HIGH ENERGY PHYSICS AUTUMN 2013 PART I A. STRANDLIE GJØVIK UNIVERSITY COLLEGE AND UNIVERSITY OF OSLO

TABLE OF CONTENTS CHAPTER 1 COMBINATORIAL PROBABILITY 1

COPYRIGHTED MATERIAL CONTENTS. Preface Preface to the First Edition

Multivariate Bayesian Linear Regression MLAI Lecture 11

Non-Imaging Data Analysis

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.

Why Try Bayesian Methods? (Lecture 5)

Statistics. Lent Term 2015 Prof. Mark Thomson. 2: The Gaussian Limit

STA 2201/442 Assignment 2

Applied Probability and Stochastic Processes

Statistical Methods for Particle Physics Lecture 4: discovery, exclusion limits

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011

Bayes Theorem (Recap) Adrian Bevan.

Introduction to statistics. Laurent Eyer Maria Suveges, Marie Heim-Vögtlin SNSF grant

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Primer on statistics:

01 Probability Theory and Statistics Review

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Bayesian Inference for the Multivariate Normal

Some Concepts of Probability (Review) Volker Tresp Summer 2018

Advanced Statistical Methods. Lecture 6

Lecture 1: Probability Fundamentals

Data modelling Parameter estimation

Institute of Actuaries of India

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016

Frequentist-Bayesian Model Comparisons: A Simple Example

Lectures on Statistical Data Analysis

Fundamental Probability and Statistics

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Estimation of Operational Risk Capital Charge under Parameter Uncertainty

Statistical Methods for Astronomy

Bayesian Regression Linear and Logistic Regression

Statistical Methods in Particle Physics Lecture 2: Limits and Discovery

A Bayesian Approach to Phylogenetics

AST 418/518 Instrumentation and Statistics

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of

Practical Statistics

Stat 516, Homework 1

A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University

Multimodal Nested Sampling

STATISTICS SYLLABUS UNIT I

Statistical Methods in Particle Physics

Hypothesis testing:power, test statistic CMS:

Probability, CLT, CLT counterexamples, Bayes. The PDF file of this lecture contains a full reference document on probability and random variables.

Statistical Models with Uncertain Error Parameters (G. Cowan, arxiv: )

Statistics and data analyses

Fourier and Stats / Astro Stats and Measurement : Stats Notes

Statistical Methods for Astronomy

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

An Introduction to Bayesian Linear Regression

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Statistics for the LHC Lecture 2: Discovery

Stable Limit Laws for Marginal Probabilities from MCMC Streams: Acceleration of Convergence

Statistical Methods for Discovery and Limits in HEP Experiments Day 3: Exclusion Limits

Space Telescope Science Institute statistics mini-course. October Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses

Principles of Bayesian Inference

Systematic uncertainties in statistical data analysis for particle physics. DESY Seminar Hamburg, 31 March, 2009

Machine Learning 4771

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018

Fundamental Issues in Bayesian Functional Data Analysis. Dennis D. Cox Rice University

MATH c UNIVERSITY OF LEEDS Examination for the Module MATH2715 (January 2015) STATISTICAL METHODS. Time allowed: 2 hours

Strong Lens Modeling (II): Statistical Methods

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester

Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation

Eco517 Fall 2004 C. Sims MIDTERM EXAM

Learning Objectives for Stat 225

Data Analysis I. Dr Martin Hendry, Dept of Physics and Astronomy University of Glasgow, UK. 10 lectures, beginning October 2006

Introduction to Statistics and Error Analysis II

Confidence Intervals. First ICFA Instrumentation School/Workshop. Harrison B. Prosper Florida State University

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!

HANDBOOK OF APPLICABLE MATHEMATICS

Introduction to Statistical Methods for High Energy Physics

Introduction to Probability and Statistics (Continued)

Gaussian Random Fields

STAT 4385 Topic 01: Introduction & Review

Detection ASTR ASTR509 Jasper Wall Fall term. William Sealey Gosset

Better Bootstrap Confidence Intervals

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Human-Oriented Robotics. Probability Refresher. Kai Arras Social Robotics Lab, University of Freiburg Winter term 2014/2015

An example of Bayesian reasoning Consider the one-dimensional deconvolution problem with various degrees of prior information.

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Physics 509: Bootstrap and Robust Parameter Estimation

Statistical methods and data analysis

1 Bayesian Linear Regression (BLR)

If we want to analyze experimental or simulated data we might encounter the following tasks:

Probability and Estimation. Alan Moses

Multivariate statistical methods and data mining in particle physics

Physics 509: Non-Parametric Statistics and Correlation Testing

Transcription:

Statistical techniques for data analysis in Cosmology arxiv:0712.3028; arxiv:0911.3105 Numerical recipes (the bible ) Licia Verde ICREA & ICC UB-IEEC http://icc.ub.edu/~liciaverde

outline Lecture 1: Introduction Bayes vs Frequentists, priors, the importance of being Gaussian, modeling and statistical inference, some useful tools. Lecture 2: Monte Carlo methods. Different type of errors. Going beyond parameter fitting. Lecture 3: forecasting: Fisher matrix approach. Introduction to model selection. Conclusions.

What s is all about DATA Models, models parameters? Measurement errors Cosmic Variance LCDM? w? etc

Probabilities Frequentists vs Bayesian Bayesians and Frequentists often criticize each other; many physicists take a more pragmatic approach about what method to use.

Probabilities Concept of Random variable x Probability distribution Properties of probability distribution: 4. In general: Ex. Produce examples of this last case

We might want to add: Useful later when talking about marginalization

Bayes theorem prior Likelihood Posterior From Fundamental difference here; statistical INFERENCE Prior: how do you chose P(H)? Back to this later.

Drawbacks: Examples, discussion r log r τ log τ exp(-2 τ)

? Spergel et al 2007 Spergel et al 2003

The importance of the prior Priors are not generally bad!

Characterizing probability distributions averages moments central moments Gaussian vs non-gaussian

Characterizing probability distributions Gaussian or Normal kurtosis x skewness

Moments vs cumulants For non-gaussian distribution, the relation between central moments and cumulants for the first 6 orders is For a Gaussian distribution all moments of order higher than 2 are specified by µ 1 and µ 2

Check that: Generating function

Central limit theorem

There are exceptions: Cauchy distribution

The Poisson distribution

The importance of Gaussian Analytic Simplicity Inflation and the central limit theorem

Random fields, probabilities and Cosmology Average statistical properties Particulary important: Ensamble: all the possible realizations of the true underlying Universe Inference: examples The Cosmological principle: models of the universe are homogeneous on average; in widely separated regions of the Universe the density field has the same statistical properties A crucial assumption: we see a fair sample of the Universe Ergodicity then follows: averaging over many realizations is equivalent to averaging over a large(enough) volume Tools statistics! Correlation functions etc

Big advantage of being Bayesian Urn example (in reality NOT transparent) Cosmic variance

Gaussian random fields Fourier! Useful (back to this later) Multi-variate Gaussian Property n1: a Gaussian random field in Fourier space is still Gaussian

Property n2 Real and imaginary parts of the coefficients are randomly distributed And mutually independent Property n3: the phases of the Fourier modes are random From here the name Gaussian random phases Important property: or completely specifies your Gaussian random field

threshold X Is the density field Gaussian? Today no way In the beginning? Now you can generate a Gaussian random field!

Useful tools: Brief digression Fourier transform of overdensity field

(2-point) Correlation function Power spectrum important

This implies: Fourier transform pairs They contain the same information!

variance Independent of FT conventions! PS on what scale?

Filters Two typical choices Remember: Convolution in real space is multiplication in Fourier space Multiplication in real space in convolution in Fourier space

exercise Consider a multi variate gaussian Where is the covariance. Show that if the are Fourier modes then is diagonal. For Gaussian fields the k-modes are independent. Consequences

The importance of the power spectrum Spectral index generalize Running of the Spectral index Beware of the pivot:

End of digression: Back to Moments vs cumulants For non-gaussian distribution, the relation between central moments and cumulants for the first 6 orders is For a Gaussian distribution all moments of order higher than 2 are specified by µ 1 and µ 2

Wick s theorem Is a method of reducing high-order derivatives to a combinatorics problem used in QFT. Cumulant expansion theorem Example:

Modeling of data and Statistical inference Read numerical recipes chapter 15, read it again, then when you have to apply all this, read it again. example Fit this with a line Need a figure of merit Least squares.

What you want: Best fit parameters Error estimates on the parameters A statistical measure of the goodness of fit (possibly) Bayesian: what is the probability that a particular set of parameters is correct? Figure of merit: given a set of parameters this is the probability of occurrence

Least squares fit. And what if data are correlated? In general: chi-squared

Goodness of fit? If all is Gaussian, the probability of χ 2 at the minimum follows a χ 2 distribution, with ν=n-m degrees of freedom # data points #parameters Incomplete gamma function Goodness of fit if evaluated at the best fit

Too small Q? a) Model is wrong! Try again b) Real errors are larger c) non-gaussian In general Monte-Carlo simulate. Too large Q? a) Errors overestimated b) Neglected covariance? c) Non-Gaussian (almost never..) P.S chi-by-eye?

Confidence regions If m is the number of fitted parameters for which you want to plot the joint confidence region and p is the confidence limit desired, find the Δχ 2 such that the probability of a chi- Square variable with m degrees of freedom being less than Δχ 2 is p. Use the Q function above.

Confidence regions Number of parameters Δχ 2 Joint confidence levels

Remember Bayes Likelihoods set Back to this later In many cases, can invoke the central limit theorem

Confidence levels Bayesians =0.683.. or 0.95 or Integrating over the hypothesis Classical: likelihood ratio

In higher dimensions. visually

Questions for you in what simple case can you make an easy identification of the likelihood ratio with the chi-square? In what case can you make an easy identification between the two approaches?

χ 2 There is a BIG difference between & reduced χ 2 Only for multivariate Gaussian with constant covariance

! 2lnL 2 =! From: Numerical recipes Ch. 15 If likelihood is Gaussian and Covariance is constant Example: for multi-variate Gaussian Errors

Marginalization example

Other data sets If independent, multiply the two likelihoods (can use some of them as priors ) Beware of inconsistent experiments!

Spergel 2007

Useful trick for Gaussian likelihoods e.g. marginalizing over point source amplitude result

Observation of N clusters is Poisson example Cash 1979. expected. Experimental bin (mass, SZ decrement, X-ray lum, z ) Define Unbinned or small bins occupancy 1 or 0 only Between 2 different models is chisquared-distributed!

Key concepts today Random fields and cosmology Probability Bayes theorem Gaussian distributions (and not) Modeling of data and statistical inference Likelihoods and chisquared Confidence levels; confidence regions (intro)

question Have used the product of Poisson distributions so have assumed independent processes Clusters are clustered