Statistical Methods for Astronomy

Similar documents
Statistical Methods for Astronomy

AST 418/518 Instrumentation and Statistics

Statistical Methods for Astronomy

Practical Statistics

AST 418/518 Instrumentation and Statistics

Statistical Methods in Particle Physics

Statistics notes. A clear statistical framework formulates the logic of what we are doing and why. It allows us to make precise statements.

Primer on statistics:

Lecture 1: Probability Fundamentals

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

(1) Introduction to Bayesian statistics

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Lecture 5: Bayes pt. 1

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Statistical Data Analysis Stat 3: p-values, parameter estimation

Physics 403. Segev BenZvi. Classical Hypothesis Testing: The Likelihood Ratio Test. Department of Physics and Astronomy University of Rochester

Why Try Bayesian Methods? (Lecture 5)

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

E. Santovetti lesson 4 Maximum likelihood Interval estimation

Independent Events. Two events are independent if knowing that one occurs does not change the probability of the other occurring

Statistics for the LHC Lecture 1: Introduction

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Statistical Tools and Techniques for Solar Astronomers

Statistical Methods for Particle Physics Lecture 4: discovery, exclusion limits

Modern Methods of Data Analysis - SS 2009

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

Probability Distributions

Probability Density Functions and the Normal Distribution. Quantitative Understanding in Biology, 1.2

Statistical Methods in Particle Physics. Lecture 2

Frequentist Statistics and Hypothesis Testing Spring

Probability and Statistics. Joyeeta Dutta-Moscato June 29, 2015

Statistical Methods in Particle Physics Lecture 2: Limits and Discovery

Review: Statistical Model

Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests

1 Statistics Aneta Siemiginowska a chapter for X-ray Astronomy Handbook October 2008

2. A Basic Statistical Toolbox

Computational Biology Lecture #3: Probability and Statistics. Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept

Probability Density Functions

CPSC 531: Random Numbers. Jonathan Hudson Department of Computer Science University of Calgary

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Monte Carlo Simulations

Physics 509: Error Propagation, and the Meaning of Error Bars. Scott Oser Lecture #10

UNIT 5:Random number generation And Variation Generation

Probability and Statistics. Terms and concepts

Hypothesis testing (cont d)

Some Statistics. V. Lindberg. May 16, 2007

YETI IPPP Durham

Machine Learning using Bayesian Approaches

Announcements. Proposals graded

Statistical Methods for Particle Physics (I)

Hypothesis Testing, Bayes Theorem, & Parameter Estimation

An example to illustrate frequentist and Bayesian approches

Detection theory. H 0 : x[n] = w[n]

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Point Estimation. Vibhav Gogate The University of Texas at Dallas

Strong Lens Modeling (II): Statistical Methods

Statistical Methods for Particle Physics Lecture 3: Systematics, nuisance parameters

Probability Distributions

Statistics. Lent Term 2015 Prof. Mark Thomson. 2: The Gaussian Limit

DUBLIN CITY UNIVERSITY

Error analysis for efficiency

Modeling Uncertainty in the Earth Sciences Jef Caers Stanford University

Lectures on Statistical Data Analysis

Lies, damned lies, and statistics in Astronomy

Statistical techniques for data analysis in Cosmology

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Bayesian Learning (II)

Advanced Herd Management Probabilities and distributions

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

Detection ASTR ASTR509 Jasper Wall Fall term. William Sealey Gosset

Statistical methods and data analysis

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester

Statistics of Small Signals

Lecture 6: Markov Chain Monte Carlo

Statistics for Particle Physics. Kyle Cranmer. New York University. Kyle Cranmer (NYU) CERN Academic Training, Feb 2-5, 2009

Lecture 3: Basic Statistical Tools. Bruce Walsh lecture notes Tucson Winter Institute 7-9 Jan 2013

Journeys of an Accidental Statistician

Statistical Applications in the Astronomy Literature II Jogesh Babu. Center for Astrostatistics PennState University, USA

Bayesian Models in Machine Learning

Introduction to Statistical Methods for High Energy Physics

Confidence Distribution

Systems Simulation Chapter 7: Random-Number Generation

H 2 : otherwise. that is simply the proportion of the sample points below level x. For any fixed point x the law of large numbers gives that

(4) One-parameter models - Beta/binomial. ST440/550: Applied Bayesian Statistics

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

Bayesian Inference and MCMC

Statistics and Data Analysis

P Values and Nuisance Parameters

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Answers and expectations

Slides 3: Random Numbers

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Introduction to Bayesian Methods

arxiv: v1 [hep-ex] 9 Jul 2013

A Bayesian Approach to Phylogenetics

Stat 5101 Lecture Notes

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Transcription:

Statistical Methods for Astronomy If your experiment needs statistics, you ought to have done a better experiment. -Ernest Rutherford Lecture 1 Lecture 2 Why do we need statistics? Definitions Statistical distributions Binomial Distribution Poisson Distribution Gaussian Distribution Central Limit theorem Your Statistical Toolbox Bayes' theorem F-test KS-test Monte Carlo method transforming deviates Least Squares chi-squared significance

References Data Reduction and Error Analysis, Bevington and Robinson Practical Statistics for Astronomers, Wall and Jenkins Numerical Recipes, Press et al. Understanding Data Better with Bayesian and Global Statistical Methods, Press, 1996 (on astro-ph)

Another look at the problem Knowing the distribution allows us to predict what we will observe. We often know what we have observed and want to determine what that tells us about the distribution.

Bayesian Statistics Frequentist approaches are computationally easy, but often solve the inverse of the problem we want. Bayesian approaches use both the data and any prior information to develop a posterior distribution. Allows calculation of parameter uncertainty more directly. More easily incorporates outside information.

An Example I flip a coin 10 times and obtain 7 heads. What is the probability for flipping a heads? A frequentist statistician would say 0.7 A bayesian statistician might define a prior probability with mean=0.5 and sigma=0.2 (for example) Who would you side with?

Obtaining the Posterior Distribution Bayes' Theorem states: P B A = P A B P B P A P(A B) should be read as probability of A given B A is typically the data P(data), B the statistic we want to know. P(B) is the prior information we may know about the experiment. P(data) is just a normalization constant P B data P data B P B

Using Bayes' theorem Assume we are looking for faint companions, and expect them to be around 1% of the stars we observe. From putting in fake companions we know that we can detect planets 90% of the time. We also know that we see false planets 3% of the observations. What is the probability that an object we see is actually a planet? P planet det. = P planet det. = P planet det. = 0.9 0.01 0.9 0.01 0.03 0.99 =0.23 P planet =0.01 P det. planet =0.01 P det. planet =0.9 P det. no planet =0.03 P det. planet P planet P det P det. planet P planet P det. planet P planet P det. no planet P no planet

General Bayesian Guidance Focuses on probability rather than accept/reject. Bayesian approaches allow you to calculate probabilities the parameters have a range of values in a more straightforward way. A common concern about Bayesian statistics is that it is subjective. This is not necessarily a problem. Bayesian techniques are generally more computationally intensive, but this is rarely a drawback for modern computers.

Hypothesis Testing Hypothesis testing uses some metric to determine whether two data sets, or a data set and a model, are distinct. Typically, the problem is set up so that the hypothesis is that the data sets are consistent (the null hypothesis). A probability is calculated that the value found would be obtained again with another sample. Based on the required level of confidence, the hypothesis is rejected or accepted.

Are two data sets drawn from the same distribution? The t statistic quantifies the likelihood that the means are the same. The F statistic quantifies the likelihood that the variances of two data sets are the same. Consider two data sets, x and y, with m and n data points: t= x y s 1/m 1/ n F = x i x 2 / n 1 y j y 2 / m 1 s 2 = ns x ms y n m

Student's t test Calculate the t statistic. A perfect agreement is t=0. Evaluate the probability for t>value. t= x y s 1/m 1/ n s 2 = ns x ms y n m t= x y s 1/m 1/ n s 2 = ns x ms y n m

F test Calculate the F statistic. F = x i x 2 / n 1 y j y 2 / m 1 Calculate the probability that F>value.

The Kolmogorov-Smirnov Test Calculate the cumulative distribution function for your model (C_model(x)). Calculate the cumulative distribution function for your data(c_data(x). Find maximum of Cmodel(x)-Cdata(x) The variables, x, must be continuous to use K-S test.

K-S test example D

Monte Carlo Simulation Often we may find it easiest just to replicate an experiment or observation in the computer. In general these tools are referred to as Monte Carlo methods. General idea is to simulate randomness and reproduce observations for comparison with data. First we need a random number sequence.

Creating Random numbers A proper random sequence of numbers is a whole topic in itself. Numerical Recipes discusses this in some detail. A simple example of a random number generator is the sequence: I j 1 = a I j mod m /a Where a and m are large numbers. I_j is a seed value that would always give us the exact same sequence of random numbers.

Random Numbers The example gives a uniform distribution set of random numbers. That is, P x dx=dx if 0 x 1 0 otherwise We would like useful distributions, such as Poisson, etc. To do so, we need to transform the random numbers.

Transformation Method Starting from the law for transformation of probabilities: p y dy = p x dx We can rewrite to solve for the probability we want. dx p y = dy p x p y = dx dy 1. Need to integrate the probability distribution 2. Solve for the new variable (y) in terms of the uniform variable(x)

Example I want to simulate the time it takes between arrival of photons at the detector. This is given by an exponential probability distribution: P t dt= e t dt Use the transformation of probabilities: Need to integrate: e t dt= dx e t = dx dt e t =x t= ln x A random number in the range 0 to 1 will be transformed to one which can be between Inf and 0

Limitations Transformation methods are limited to analytical probability distributions. One also needs to be able to integrate the proability distribution and invert the equation to solve for the new variable. Often one of these criteria is not satisfied. You can still generate useful random numbers using the rejection method.

Rejection Method Generate two uniform random deviates, x and y. Adjust x to span the range of values expected for the random number (x'=f(x)). Compare the value of y to the value of the probability distribution at x' (y'=p(x')) If y'<y use the value of x' in your simulation, if y'>y reject this pair and start over.