Hypothesis Testing, Bayes Theorem, & Parameter Estimation

Similar documents
Statistics notes. A clear statistical framework formulates the logic of what we are doing and why. It allows us to make precise statements.

AST 418/518 Instrumentation and Statistics

ORF 245 Fundamentals of Statistics Practice Final Exam

Statistical Methods for Astronomy

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Physics 403. Segev BenZvi. Classical Hypothesis Testing: The Likelihood Ratio Test. Department of Physics and Astronomy University of Rochester

Statistical Methods for Astronomy

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

2. A Basic Statistical Toolbox

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

MULTIPLE EXPOSURES IN LARGE SURVEYS

Homework on Properties of Galaxies in the Hubble Deep Field Name: Due: Friday, April 8 30 points Prof. Rieke & TA Melissa Halford

Learning Objectives for Stat 225

Homework #7: Properties of Galaxies in the Hubble Deep Field Name: Due: Friday, October points Profs. Rieke

Frequentist Statistics and Hypothesis Testing Spring

Physics Lab #9: Measuring the Hubble Constant

Statistics of Small Signals

Probability Distributions

Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices.

Business Statistics. Lecture 10: Course Review

The Cosmological Redshift. Cepheid Variables. Hubble s Diagram

F & B Approaches to a simple model

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

Big Data Analysis with Apache Spark UC#BERKELEY

Robustness and Distribution Assumptions

Physics 403. Segev BenZvi. Credible Intervals, Confidence Intervals, and Limits. Department of Physics and Astronomy University of Rochester

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

PROJECT GLOBULAR CLUSTERS

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

ECE521 week 3: 23/26 January 2017

Statistical inference

Statistical Methods for Astronomy

The Cosmic Distance Ladder. Hubble s Law and the Expansion of the Universe!

Bayesian Model Diagnostics and Checking

Statistical Tools and Techniques for Solar Astronomers

Review of Statistics

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

Introduction to Bayesian Learning. Machine Learning Fall 2018

PRACTICAL ANALYTICS 7/19/2012. Tamás Budavári / The Johns Hopkins University

Practical Statistics

Intro to Bayesian Methods

Visual interpretation with normal approximation

Open Cluster Research Project

Exam 2 Practice Questions, 18.05, Spring 2014

A brief outline of the lab procedure, the steps will be walked through later on.

Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Exam 4 Review EXAM COVERS LECTURES 22-29

AstroBITS: Open Cluster Project

Stat 5101 Lecture Notes

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Recursive Estimation

CS Homework 3. October 15, 2009

Frequentist-Bayesian Model Comparisons: A Simple Example

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann

Lecture 5: Bayes pt. 1

Statistical Distribution Assumptions of General Linear Models

Introduction to Probability and Statistics (Continued)

Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

Physics 509: Bootstrap and Robust Parameter Estimation

Bayesian Inference in Astronomy & Astrophysics A Short Course

Probability - Lecture 4

Ch18 links / ch18 pdf links Ch18 image t-dist table

Monte Carlo Integration. Computer Graphics CMU /15-662, Fall 2016

Announcements. Proposals graded

STATISTICS OF OBSERVATIONS & SAMPLING THEORY. Parent Distributions

Basic Statistics. 1. Gross error analyst makes a gross mistake (misread balance or entered wrong value into calculation).

Deep Learning for Computer Vision

A Brief Review of Probability, Bayesian Statistics, and Information Theory

Review. One-way ANOVA, I. What s coming up. Multiple comparisons

Observational Astronomy - Lecture 13 Evolution of the Universe and Final Review

Astr 598: Astronomy with SDSS. Spring Quarter 2004, University of Washington, Željko Ivezić. Lecture 6: Milky Way Structure I: Thin and Thick Disks

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Uncertain Knowledge and Bayes Rule. George Konidaris

Statistical Methods in Particle Physics. Lecture 2

Expression arrays, normalization, and error models

Comparison of Bayesian and Frequentist Inference

Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices.

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

AUTOMATIC MORPHOLOGICAL CLASSIFICATION OF GALAXIES. 1. Introduction

Hubble s Law: Finding the Age of the Universe

Introduction to Simple Linear Regression

This does not cover everything on the final. Look at the posted practice problems for other topics.

Statistical Methods in Particle Physics

PSY 305. Module 3. Page Title. Introduction to Hypothesis Testing Z-tests. Five steps in hypothesis testing

Physics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester

Lecture 1: Probability Fundamentals

CS-E3210 Machine Learning: Basic Principles

Practice Problems Section Problems

Review of Statistics 101

Accouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF

Machine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples

Prelab Questions for Hubble Expansion Lab. 1. Why does the pitch of a firetruck s siren change as it speeds past you?

Math 50: Final. 1. [13 points] It was found that 35 out of 300 famous people have the star sign Sagittarius.

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Statistics for Managers Using Microsoft Excel/SPSS Chapter 4 Basic Probability And Discrete Probability Distributions

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

The Chi-Square Distributions

Galaxy Metallicity: What Oxygen Tells Us About The Lifecycles of Galaxies Designed by Prof Jess Werk, modified by Marie Wingyee Lau

The Hubble Law & The Structure of the Universe

Transcription:

Data Mining In Modern Astronomy Sky Surveys: Hypothesis Testing, Bayes Theorem, & Parameter Estimation Ching-Wa Yip cwyip@pha.jhu.edu; Bloomberg 518

Erratum of Last Lecture The Central Limit Theorem was proved by Bernoulli back in 17 th century. The Michelson-Morley speed-of-light experiment was carried out in 18 th century. Michelson & Morley could have accessed to the Central Limit Theorem and decided to carry out many, repeated measurements. (Need more research)

From Data to Information We don t just want data. We want information from the data. Information Database Sensors Data Analysis or Data Mining

From Data to Information We don t just want data. We want information from the data. Information Database Sensors Data Analysis or Data Mining

Probability Density Function (PDF) A function which tells the probability of an event, e.g.: Temperature T lies between 34F and 50F. Variable X lies between X 1 and X 2. A well-known PDF is the Standard Normal Distribution:

Number/Total Number PDFs Come in Many Shapes E.g., Color of galaxies Color (Redder ) (Yip, Connolly, Szalay, et al. 2004)

Properties of PDFs The total area under the function is 1 (= 100% probability): p x dx = 1 The probability of the variable x lying between a and b is: P x beween a and b b = p x dx a

2D Probability Density Function (PDF) A function which tells the probability of an event, just like the 1D PDF but with 2 variables: the variables X lies between X 1 and X 2 AND Y between Y 1 and Y 2 The total area under the curve is still 1. p x, y dxdy = 1

Example 2D PDF Seeing disk of stellar image. p(x, Y) Y X

Standard Normal Distribution Revisit: Some Terminologies Significance Level = Area at the tail of the distribution = = 2.5% = 0.025 Critical Values = Z = The value corresponds to a Significance Level

(Abridged, Taken from Dekking et al.)

If: = 0.025 (tabulated as 0250) We get: Z = 1.96 (Abridged, Taken from Dekking et al.)

Significance Level ( ) and its Critical No more tables! Values (Z)

Hypothesis Testing Goal: To test whether a hypothesis is true or false. The beginning hypothesis is our best knowledge for the problem (also called the Null Hypothesis). If Null Hypothesis is FALSE, Alternative Hypothesis is TRUE.

Hypothesis Testing Suppose we think the average of a sample is some number. We want to test this hypothesis. Null hypothesis: some # Alternative hypothesis: some # +

Hypothesis Testing: (Step 1) Set Significance Levels It is the area at the tail(s). +

Hypothesis Testing: (Step 2) Find Critical Values Use: - Look-up tables - Computational software +

Hypothesis Testing: (Step 3) Calculate Statistics From Data Also called Test Statistics. T.S. +

Hypothesis Testing: (Step 3) Calculate Statistics From Data T.S. + Also called Test Statistics. Test Statistics is usually mean-subtracted. Therefore, we can use mean=0 Normal Distribution to carry on the hypothesis testing.

Hypothesis Testing: (Step 4) Draw Conclusion Fail to reject the (null) hypothesis.

Hypothesis Testing: (Step 4) Draw Conclusion Reject the (null) hypothesis

Hypothesis Testing: (Step 4) Draw Conclusion Reject the (null) hypothesis

Hypothesis Testing Example A galaxy formation theory predicted that the average radius of galaxies in the nearby universe is 35 kpc (1pc = 1parsec = 10 16 m). A random sample of 225 galaxies has a mean radius x = 30 kpc, and the S.D. of radius = 20 kpc. Task: Set up an hypothesis test at 5% significance level. Null hypothesis: 35 kpc Alt. hypothesis: 35 kpc Use two-tailed test. (M. Calvo)

Details: Calculate Sampling Distribution of the mean: μ x = by Central Limit Thm = μ(theory) = 35 σ x = by Central Limit Thm = S.D.(theory) ~ S.D. = 20 = 4 n n 15 3 Calculate Test Statistics from data: Z = x μ x σ x = 30 35 4/3 = 3.75

n = 225 30; Almost Normal -3.75 ( = 35 kpc from theory)

n = 225 30; Almost Normal Conclusion: Reject the galaxy formation theory. Accept 35 kpc at 5% significance level. -3.75 ( = 35 kpc from theory)

Meaning of the Confidence Level It is the chance we will make Type I error. Type I error is when we reject Null Hypothesis which is actually true: There is 5% chance we are wrong by rejecting the theory (that average galaxy size being 35 kpc).

Hypothesis Testing Example: 28 Sources in IceCube (2013)

Null Hypothesis: Sources are uniformly distributed on the sky. Alternative Hypothesis: Sources are originated from the Milky Way center.

Conditional Probability: Discrete Drawing a random card from a stack of 52 playing cards, what is the probability of a card being a Jack given it is a Face card (J, Q, K)? P Jack Face = Number of Jack s Number of Face s = 4 12 = 1 3

Thinking Like Bayesian There will be a bicycle race tomorrow, what is the probability a particular athlete will win? Problem: The event only occurs once, we do not have a sample to calculate the probability of a particular athlete winning. Solution: Bayesian Statistics.

Bayes Theorem (1763) P A B = P B A P(A) P(B) Posterior = Probability of A after B occurs. Probability of A before B occurs. Likelihood Prior Normalization Factor (Thomas Bayes, 1701-1761)

Conditional Probability: Discrete Drawing a random card from a stack of 52 playing cards, what is the probability of a card being a Jack given it is a Face card (J, Q, K)? When I know nothing beforehand: P Jack = 4 52 If my friend told me it is a Face card (Added Evidence): P Jack Face = P Face Jack P(Jack) P(Face) = 1 4 52 12 52 = 1 3

Main Idea behind Bayes Theorem When we add a new piece of evidence, we change our outlook on the probability of an event.

Frequentist vs. Bayesian Frequentist: Bayesians assume a prior probability. Bayesian: Frequentist cannot assign a probability to a single event.

Parameter Estimation: The Problem Estimate parameter from data given model. 1 modeled parameter: θ Multi modeled parameters: θ = (θ 1, θ 2, θ 3,, θ N )

Quality of an Estimator: Bias and Variance Usually, the best estimator is somewhere in-between.

Least-Square Fitting (LSQ) A popular way to find linear model that bestfit the data. A linear model is a model in which there is no square, cube,, and higher order power terms in the variables. Example: straight lines Y = Slope * X + Constant Slope and Constant are the parameters in the model. A Parameter Estimation problem.

Least-Square Fitting (LSQ): Linear model Y = Slope * X + Constant X: Independent Variable, Input Variable, etc. Y: Dependent Variable, Output Variable, etc.

Example of LSQ Fitting in R

Goodness of Fit A goodness of fit measures how well the model fit the data. E.g., Chi-sq (the sum of square of the difference over all of the N data points) x 2 = N i=1 D M 2 σ 2

Graphical Meaning of Chi-sq Y Model Fit Data Points X

Graphical Meaning of Chi-sq Y Model Fit Data Points Chi-sq measures how far away the points are from the model. X

Graphical Meaning of Chi-sq Y M D Model Fit Data Points Chi-sq measures how far away the points are from the model. X

(Astronomy) Data are Imperfect Random Error Photon counts follow Poisson distribution Random error for Poisson = photon count Systematic Error Bad CCD pixels Cosmic Rays Sky Emissions Etc.

Sky Emission (or Skylines) Average Sky Emissions From SDSS DR6 (Yip)

Image Stacking: Central Limit Theorem Revisit Fruchter & Hook (2002): Stack images in order to remove cosmic ray (= systematic error in pixel flux)

Similarly, Spectra Stacking (Yip, Connolly et al. 2004)

whereas individual spectrum is noisy. (SDSS Data Release 6)

Outliers: Using Median vs. Mean When there are outliers, the median could be more robust than the mean as a measure for the average. Outliers are difficult to define, because we need to know the average distribution as well (Next Lecture: Unsupervised Machine Learning). Subfield of study: Robust Statistics.

Median as the Robust Average

Homework 2014 Jan 14 (due Monday noon, Jan 20) The data file (saved in the course website as hubbletable1.csv ) contains the Object Name, Distance (in 10 6 pc = Mpc), and Recession Velocity (in km/s) from Hubble s 1929 work. 1) Find the Hubble s Constant by using Least Square Fitting. 2) Plot the Velocity vs. Distance; and the best-fit model. 3) The Hubble s constant from WMAP survey is determined to be 71 km/s/mpc. Comment on the comparison between the calculated and the WMAP values. Read the article on Bayes theorem. A CCD records a signal of 100 photons. What is the signal-to-noise ratio (SNR)? If the human eyes can discern features with 100% certainty in an image which has SNR 5 (*), what is the minimum number of photons we need for 100% certainty? (*) This is a simplified version of the Rose Criterion, 1948. Hints: Use read.csv() in R to read Comma Separated Values. To extract a column from the data, use x$column. For example, x$distance_mpc.