Statistical Methods for Astronomy

Similar documents
Practical Statistics

Statistical Methods for Astronomy

Parameter Estimation and Fitting to Data

Statistics notes. A clear statistical framework formulates the logic of what we are doing and why. It allows us to make precise statements.

Math 562 Homework 1 August 29, 2006 Dr. Ron Sahoo

Statistical Data Analysis Stat 3: p-values, parameter estimation

STATISTICS OF OBSERVATIONS & SAMPLING THEORY. Parent Distributions

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

2. A Basic Statistical Toolbox

Primer on statistics:

Statistics and Data Analysis

Some Statistics. V. Lindberg. May 16, 2007

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008

Statistics for Data Analysis. Niklaus Berger. PSI Practical Course Physics Institute, University of Heidelberg

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Hypothesis testing. Anna Wegloop Niels Landwehr/Tobias Scheffer

Statistical Methods for Astronomy

Statistics. Lent Term 2015 Prof. Mark Thomson. 2: The Gaussian Limit

Space Telescope Science Institute statistics mini-course. October Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.

Chapter 8 - Statistical intervals for a single sample

Introduction to Statistics and Error Analysis II

Statistical Models with Uncertain Error Parameters (G. Cowan, arxiv: )

Frequentist-Bayesian Model Comparisons: A Simple Example

Parameter Estimation and Hypothesis Testing

AST 418/518 Instrumentation and Statistics

Hypothesis testing:power, test statistic CMS:

Testing Statistical Hypotheses

LECTURE NOTES FYS 4550/FYS EXPERIMENTAL HIGH ENERGY PHYSICS AUTUMN 2013 PART I A. STRANDLIE GJØVIK UNIVERSITY COLLEGE AND UNIVERSITY OF OSLO

Practice Problems Section Problems

Review. December 4 th, Review

Statistical Methods in Particle Physics

Lectures on Statistical Data Analysis

Exercises and Answers to Chapter 1

H 2 : otherwise. that is simply the proportion of the sample points below level x. For any fixed point x the law of large numbers gives that

Math Review Sheet, Fall 2008

Lectures on Statistics. William G. Faris

Measurement And Uncertainty

Physics 509: Bootstrap and Robust Parameter Estimation

Modern Methods of Data Analysis - WS 07/08

Statistical Methods in Particle Physics

Testing Statistical Hypotheses

Hypothesis testing (cont d)

HANDBOOK OF APPLICABLE MATHEMATICS

14.30 Introduction to Statistical Methods in Economics Spring 2009

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

Introduction to Error Analysis

If we want to analyze experimental or simulated data we might encounter the following tasks:

Introduction to Statistics and Error Analysis

TABLE OF CONTENTS CHAPTER 1 COMBINATORIAL PROBABILITY 1

Inference in Regression Analysis

Statistical techniques for data analysis in Cosmology

Modern Methods of Data Analysis - SS 2009

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

1 Statistics Aneta Siemiginowska a chapter for X-ray Astronomy Handbook October 2008

Algebra of Random Variables: Optimal Average and Optimal Scaling Minimising

Background to Statistics

Negative binomial distribution and multiplicities in p p( p) collisions

Inferential Statistics

Robustness and Distribution Assumptions

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Fourier and Stats / Astro Stats and Measurement : Stats Notes

Probability and Statistics. Joyeeta Dutta-Moscato June 29, 2015

Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

STATISTICS ANCILLARY SYLLABUS. (W.E.F. the session ) Semester Paper Code Marks Credits Topic

Statistical methods and data analysis

Hypothesis testing: theory and methods

HYPOTHESIS TESTING: FREQUENTIST APPROACH.

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

CS 5014: Research Methods in Computer Science. Bernoulli Distribution. Binomial Distribution. Poisson Distribution. Clifford A. Shaffer.

YETI IPPP Durham

Chapter 1 Statistical Inference

BACKGROUND NOTES FYS 4550/FYS EXPERIMENTAL HIGH ENERGY PHYSICS AUTUMN 2016 PROBABILITY A. STRANDLIE NTNU AT GJØVIK AND UNIVERSITY OF OSLO

Stat 135 Fall 2013 FINAL EXAM December 18, 2013

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Data modelling Parameter estimation

Lecture 1: Probability Fundamentals

My data doesn t look like that..

Statistical inference

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

RWTH Aachen Graduiertenkolleg

Statistical Data Analysis 2017/18

Physics 509: Error Propagation, and the Meaning of Error Bars. Scott Oser Lecture #10

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Bayesian Inference for Normal Mean

Contents LIST OF TABLES... LIST OF FIGURES... xvii. LIST OF LISTINGS... xxi PREFACE. ...xxiii

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Advanced Herd Management Probabilities and distributions

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Biostatistics for physicists fall Correlation Linear regression Analysis of variance

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Statistical Methods in Particle Physics

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics

Stat 5101 Lecture Notes

Transcription:

Statistical Methods for Astronomy Probability (Lecture 1) Statistics (Lecture 2) Why do we need statistics? Useful Statistics Definitions Error Analysis Probability distributions Error Propagation Binomial Distribution Least Squares Poisson Distribution chi-squared Gaussian Distribution Bayes Theorem Central Limit theorem Significance Comparison Statistics

Possible Statistics Average µ = j x j n Most Likely Statistics Median: Order data. Mode: med = x j where j = N/2+1/2(odd) = x j + x j+1 2 Most frequently occurring value(s). where j = N/2(even) Spread Statistics Variance Root mean square s 2 = j s = j (x j µ) 2 n 1 (x j µ) 2 n 1 Mean deviation x = j x j µ n 1 Sample Maximum 2

Know your statistic 3 from Biostatistical Analysis, fourth edition, Simon & Schuster 1999

Good Statistics Good statistics should be: Unbiased - should converge to the right value with more data points. Robust - should not be affected by a few bad data points. Consistent - should not be affected, systematically by the size of your sample. Close - should converge as quickly as possible with increasing data. 4

Relation to Probability distributions Statistics are based on data only! However, they are often most useful as estimators of parameters of probability distribution. This is a frequentist approach where the distribution is used to determine how often we might obtain the resulting statistic, so that we can decide whether this is the correct model. 5

Chebyshev inequality What if we don t know the underlying distribution? If we know the mean and sample variance, Chebyshev s inequality is very useful. P ( x µ >nσ) < 1 n 2 Be careful of typo in Wall and Jenkin s text! n P_ch(<) P_Gauss(=) 1 1 0.32 2 0.25 0.05 3 0.11 0.003 4 0.06 0.00006 6

Error Analysis If I take N measurements, how precisely could I determine the sample mean, compared to a set with 2N measurements? µ 2 = ( 1 N (x j µ)) 2 j µ 2 = σ2 N + 1 N 2 (x i µ)(x j µ) i=j 7

Propagation of Errors If the quantity of interest is, say f=x+y then σ 2 = f 2 f 2 σ 2 = x 2 x 2 + y 2 y 2 In general: σ 2 (f(x i )) = i ( f x i ) 2 σ 2 (x i )

Comparing a data set to a distribution Suppose we have N data points and a model we think describes this with M parameters. Our model: y(x) =y(x, a 1...a M ) An intuitive metric is the distance of each data point from the model. Let's use the square of the difference between data and the model. LS = N (yi y(x i,a 1 a M )) 2 Why is this a reasonable metric for Determining the best fit to the data?

Justification for Least-Squares What is the probability a certain data point is drawn from a given model? P e (y i y(x i )) 2 2σ 2 For N points the overall probability for a given model is N P e (y i y(x i )) 2 2σ 2 To maximize the probability we should minimize the exponent: N (y i y(x i )) 2 i=1 i=1 2σ 2

Chi-squared The exponent is referred to as the statistic chisquared: N χ 2 (y i y(x i )) 2 = Chi-squared is not a unique metric, but is commonly used: Mean: µ χ 2 = ν = N M Variance: σχ 2 =2ν 2 Often, reduced chi-squared is quoted: Mean: Variance: i=1 µ reduced χ 2 =1 σ 2 reduced χ 2 = 2 ν σ 2

Expressing Confidence A fit with 10 d.o.f. Would have A chi-squared higher than 0.83, 60% of the time. Plausible for Chi-squared<1.45 If nu=100 chi-squared<1.15 from Bevington

Using chi-squared How do we decide whether the model describes the data? Rule of thumb: Reduced chi-square should be within 1-2 sigma of 1 for a valid model. Say I had chi-square = 2 for nu=10. The statement that you could make is: A chi-square of >2 would occur 3% of the time. This suggests the data do not support the model. You need to decide how much you trust your error estimates in order to make this statement. See Wall and Jenkins Table A.2.6 for probability values.

Parameter and error estimation To estimate the uncertainty in parameters, you can vary each parameter until χ 2 goes up by 2/nu. Beware of correlation between parameters!! Joint variation should be carried out to avoid underestimating the uncertainty. Also, if you didn't know the error in your data, you have no way of determining whether the model was valid. You can still use this to derive errors for your data, if you are certain of the model. σ 2 = χ 2 ν

Statistics for Hypothesis Testing Hypothesis testing uses some metric to determine whether two data sets, or a data set and a model, are distinct. Typically, the problem is set up so that the hypothesis is that the data sets are consistent (the null hypothesis). A probability is calculated that the value found would be obtained again with another sample. Based on the required level of confidence, the hypothesis is rejected or accepted.

Are two data sets drawn from the same distribution? The t statistic quantifies the likelihood that the means are the same. The F statistic quantifies the likelihood that the variances of two data sets are the same. Consider two data sets, x and y, with m and n data points: t = x y s 1/m +1/n F = (xi x) 2 /(n 1) (yi y) 2 /(m 1) s 2 = ns x + ms y n + m S x = (xi x) 2 n

Student's t test Calculate the t statistic. A perfect agreement is t=0. Evaluate the probability for t>value. ν = m + n 2 t = x y s 1/m +1/n s 2 = ns x + ms y n + m

F test Calculate the F statistic. F = (xi x) 2 /(n 1) (yi y) 2 /(m 1) Calculate the probability that F>value.

The Kolmogorov-Smirnov Test Calculate the cumulative distribution function for your model (C_model(x)). Calculate the cumulative distribution function for your data(c_data(x). Find maximum of Cmodel(x)-Cdata(x) The variables, x, must be continuous to use K-S test.

K-S test example