Probability Distributions

Similar documents
Statistical Methods in Particle Physics

Statistical Methods in Particle Physics

Statistics, Data Analysis, and Simulation SS 2013

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

15 Discrete Distributions

Random processes and probability distributions. Phys 420/580 Lecture 20

Statistics for Data Analysis. Niklaus Berger. PSI Practical Course Physics Institute, University of Heidelberg

Chapter 5 continued. Chapter 5 sections

Statistics, Data Analysis, and Simulation SS 2015

Probability Distributions Columns (a) through (d)

Random Variables and Their Distributions

LIST OF FORMULAS FOR STK1100 AND STK1110

Introduction to Statistics and Error Analysis II

Chapter 5. Chapter 5 sections

1 Review of Probability and Distributions

Stat 5101 Notes: Brand Name Distributions

Lectures on Elementary Probability. William G. Faris

functions Poisson distribution Normal distribution Arbitrary functions

Physics 403 Probability Distributions II: More Properties of PDFs and PMFs

Statistics, Data Analysis, and Simulation SS 2017

Uniform random numbers generators

Slides 3: Random Numbers

Practice Questions for Final

Statistics and data analyses

RWTH Aachen Graduiertenkolleg

Bivariate distributions

PHYSICS 2150 LABORATORY

BMIR Lecture Series on Probability and Statistics Fall, 2015 Uniform Distribution

EXAM. Exam #1. Math 3342 Summer II, July 21, 2000 ANSWERS

MAS223 Statistical Inference and Modelling Exercises

Lecture 2 Binomial and Poisson Probability Distributions

Independent Events. Two events are independent if knowing that one occurs does not change the probability of the other occurring

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Stat 5101 Notes: Brand Name Distributions

Physics 6720 Introduction to Statistics April 4, 2017

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Lectures on Statistical Data Analysis

Week 4: Chap. 3 Statistics of Radioactivity

A Few Special Distributions and Their Properties

Sources of randomness

Week 1 Quantitative Analysis of Financial Markets Distributions A

YETI IPPP Durham

Monte Carlo Techniques

Error propagation. Alexander Khanov. October 4, PHYS6260: Experimental Methods is HEP Oklahoma State University

Statistical Tests: Discriminants

Topic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr.

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018

Common ontinuous random variables

Random Number Generation. CS1538: Introduction to simulations

Introduction to Statistics and Error Analysis

Continuous Random Variables and Continuous Distributions

Special Discrete RV s. Then X = the number of successes is a binomial RV. X ~ Bin(n,p).

Review of Probability Theory

Probability Distributions - Lecture 5

Probability and Statistics

Probabilities and distributions

ECE 313 Probability with Engineering Applications Fall 2000

HYPERGEOMETRIC and NEGATIVE HYPERGEOMETIC DISTRIBUTIONS

the amount of the data corresponding to the subinterval the width of the subinterval e x2 to the left by 5 units results in another PDF g(x) = 1 π

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

Northwestern University Department of Electrical Engineering and Computer Science

FINAL EXAM: Monday 8-10am

Slides 8: Statistical Models in Simulation

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /13/2016 1/33

2008 Winton. Review of Statistical Terminology

STAT Chapter 5 Continuous Distributions

Math 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14

Probability and Distributions

Lecture 8 Hypothesis Testing

( x) ( ) F ( ) ( ) ( ) Prob( ) ( ) ( ) X x F x f s ds

Ching-Han Hsu, BMES, National Tsing Hua University c 2015 by Ching-Han Hsu, Ph.D., BMIR Lab. = a + b 2. b a. x a b a = 12

04. Random Variables: Concepts

Physics 509: Error Propagation, and the Meaning of Error Bars. Scott Oser Lecture #10

B.N.Bandodkar College of Science, Thane. Random-Number Generation. Mrs M.J.Gholba

Part 3: Parametric Models

Section 8.1. Vector Notation

Chapter 5. Means and Variances

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014

Math 416 Lecture 2 DEFINITION. Here are the multivariate versions: X, Y, Z iff P(X = x, Y = y, Z =z) = p(x, y, z) of X, Y, Z iff for all sets A, B, C,

ACM 116: Lecture 2. Agenda. Independence. Bayes rule. Discrete random variables Bernoulli distribution Binomial distribution

Statistics 224 Solution key to EXAM 2 FALL 2007 Friday 11/2/07 Professor Michael Iltis (Lecture 2)

Data Analysis and Monte Carlo Methods

2905 Queueing Theory and Simulation PART IV: SIMULATION

Chapter 5 Joint Probability Distributions

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

ALL TEXTS BELONG TO OWNERS. Candidate code: glt090 TAKEN FROM

Review of Statistical Terminology

ISyE 6644 Fall 2014 Test #2 Solutions (revised 11/7/16)

2 Random Variable Generation

1 Exercises for lecture 1

Stat 100a, Introduction to Probability.

Guidelines for Solving Probability Problems

Probability and Probability Distributions. Dr. Mohammed Alahmed

S.A. Teukolsky, Computers in Physics, Vol. 6, No. 5,

18.440: Lecture 28 Lectures Review

1 Presessional Probability

STAT/MA 416 Answers Homework 6 November 15, 2007 Solutions by Mark Daniel Ward PROBLEMS

(y 1, y 2 ) = 12 y3 1e y 1 y 2 /2, y 1 > 0, y 2 > 0 0, otherwise.

PROBABILITY DISTRIBUTION

Transcription:

02/07/07 PHY310: Statistical Data Analysis 1 PHY310: Lecture 05 Probability Distributions Road Map The Gausssian Describing Distributions Expectation Value Variance Basic Distributions Generating Random Numbers

02/07/07 PHY310: Statistical Data Analysis 2 The Gaussian or Normal Distribution Also known as a Bell Curve. It's important since when you add enough continuous random variables (i.e. almost any measurement) the sum becomes a Gaussian. g x ;, 2 dx = x 2 1 2 2 e 2 2 dx The central value is given by μ. Called the mean The width is given by σ The variance is σ² The standard deviation is σ

02/07/07 PHY310: Statistical Data Analysis 3 The Expectation Value The Expectation Value of a P.D.F. is the value of a random variable that you expect to measure Also called the Population Mean, or just the Mean The expectation value is written E[x] This is an abbreviation that means The expectation value of x Depends on the P.D.F. and isn't a function of x. If a random variable x has a P.D.F. f(x), then the expectation value is: E[ x] = dx x f x If you want to find the expectation value of q(x) (q is a function of x): E[q x ] = dx q x f x The average, <x>, and the expectation value, E[x], are different The average is an estimator of the expectation.

02/07/07 PHY310: Statistical Data Analysis 4 Example: Gaussian Expectation Value E[ x] = dx x g x ;, 2 E[ x] = E[ x] = dx x 2 x 2 2 e 2 2 mcgrew@boxer:macros$ maxima Maxima 5.10.0 http://maxima.sourceforge.net Using Lisp GNU Common Lisp (GCL) GCL 2.6.7 (aka GCL) Distributed under the GNU Public License. See the file COPYING. Dedicated to the memory of William Schelter. This is a development version of Maxima. The function bug_report() provides bug reporting information. (%i1) integrate((x/sqrt(2*%pi*sigma^2))*exp(-(x-mu)^2/(2*sigma^2)),x,minf,inf); Is sigma positive or negative? positive; mu sigma (%o1) ---------- abs(sigma) (%i2) quit(); mcgrew@boxer:macros$

02/07/07 PHY310: Statistical Data Analysis 5 The Variance The Variance describes the width of a distribution Also called the Population Variance The variance of x is written as V[x] V[ x]=e [ x E [ x] 2 ] V[ x]=e[ x 2 2 x E[ x ] E[ x ] 2 ] V[ x]=e[ x 2 ] 2E[ x]e[ x ] E[ x] 2 V[ x]=e[ x 2 ] E[ x] 2 Multiply out the argument If k is a constant, then E[k] = k Simplify

02/07/07 PHY310: Statistical Data Analysis 6 Example: Gaussian Variance E[ x 2 ] = dx x 2 g x ;, 2 E[ x 2 ] = dx E[ x 2 ] = 2 2 x 2 2 2 e x 2 2 2 V [ x]=e[ x 2 ] E[ x] 2 V [ x]= 2 2 2 V[ x]= 2 mcgrew@boxer:macros$ maxima Maxima 5.10.0 http://maxima.sourceforge.net Using Lisp GNU Common Lisp (GCL) GCL 2.6.7 (aka GCL) Distributed under the GNU Public License. See the file COPYING. Dedicated to the memory of William Schelter. This is a development version of Maxima. The function bug_report() provides bug reporting information. (%i1) integrate((x^2/sqrt(2*%pi*%sigma^2))*exp(-(x-%mu)^2/(2*%sigma^2)),x,minf,inf); Is %sigma positive or negative? positive; 3 2 2 sqrt(2) sqrt(%pi) %sigma + 2 sqrt(2) sqrt(%pi) %mu %sigma (%o1) ------------------------------------------------------------- 2 sqrt(2) sqrt(%pi) abs(%sigma) (%i2) quit(); mcgrew@boxer:macros$

(How I Really Do Integrals) 02/07/07 PHY310: Statistical Data Analysis 7

02/07/07 PHY310: Statistical Data Analysis 8 Multi-Dimensional Correlations Multi-dimensional p.d.f.s can have internal correlations between variables Not all correlations are linear Variables with a Linear Correlation Variables with a Non-Linear Correlation

The Multi-Dimensional Variance: The Covariance The Covariance describes the width of a multi-dimensional distribution like f(x 1, x 2,..., x n ) Cov[ x i, x j ] = E[ x i i x j j ] = E[ x i x j ] i j The Covariance is usually written as an n x n matrix V ij = Cov[x i, x j ] The Correlation Coefficient is a dimensionless measure of the correlation between two random variables ij = Cov[ x i, x j ] V [ x i ] V [ x j ] = V ij x y 02/07/07 PHY310: Statistical Data Analysis 9

02/07/07 PHY310: Statistical Data Analysis 10 Binomial Distribution (Discrete) If the probability of a single positive outcome is p, what is the probability of n positive outcomes in a total of N measurements? P n ; N, p = N! n! N n! pn 1 p N n A is too hard to get in math mode, so I'm going to use ; from now on! Equivalently: If the probability of a single positive measurement is p, and a negative measurement is q=1-p, what is the probability of n positive and m=n-n negative outcomes in of a total of N measurements? P n, m ; N, p = N! n!m! pn q m The Expectation Value is E[n] = Np N! E[n]= n n=0 n! N n! pn 1 p N n =N p The Variance is V[n] = Np(1-p) = Npq The Standard Deviation is n = V[n]= N p 1 p

Binomial Example If there were 8 children born at the Hospital today: What do we know (assume)? It's 50/50 a given child will be a girl There were exactly 8 children born. The point is that the number of measurements is fixed. What is the expected distribution of boys and girls? P n ; N, p = N! n! N n! pn 1 p N n 8! P n ; 8,0.5 = n! 8 n! 0.5 n 0.5 8 n What is the expected number of girls? E[n]=N p=8 0.5=4 What is the expected variation in the number of girls? V[n]=N p 1 p =8 0.5 1 0.5 =2 ; n = V[n]= 2 02/07/07 PHY310: Statistical Data Analysis 11

02/07/07 PHY310: Statistical Data Analysis 12 Multinomial Distribution (Discrete) When there are several possible outcomes for a single measurement; What is the probability of n positive outcomes in a total of N measurements? e.g. What is the probability of drawing a spade and a diamond from a deck of cards P n 1,...,n m ; N, p 1,..., p m = N! n 1!...n m! p n 1... n m 1 pm The Trinomial can be used for any multinomial distribution where you are only interested in pairs of variables (e.g. n, m, everything else) P n, m ; N, p,q = N! n!m! N n m! pn q m 1 p q N n m Trinomial expectation values are E[n] = Np, E[m] = Nq Trinomial variances are V[n] = Np(1-p), V[m] = Nq(1-q) Trinomial covariance is Cov[n,m] = -Npq Notice the negative correlation!

02/07/07 PHY310: Statistical Data Analysis 13 Poisson Distribution (Discrete) The Poisson Distribution is the limit of the Binomial for μ = Np constant as N and p 0. It describes processes like the number of radioactive decays per unit time. P n ; = n e n! The expectation value is E[n] = μ The variance is V[n] = μ The standard deviation is

02/07/07 PHY310: Statistical Data Analysis 14 Poisson Example How many cosmic rays enter a detector in 10 seconds if the mean rate is 20 decays/second? At 20 decays/second for 10 seconds, we expect 200 cosmic rays. This is a Poisson process so the uncertainty on the expectation is CR = 200 How can we tell this is a Poisson process? Poisson statistics apply when you are counting events, but there isn't an upper limit on the number you might measure (e.g. not binomial) The number of events could be zero The number of events could be hung The expected number might be less than one. Since we live in a quantum world, many (most) processes are fundamentally Poissonian. CHEAP TRICK: When the Poisson mean is large (>15) a Gaussian (at integer values) is a really good approximation for a Poissonian. And, it's much easier to calculate.

02/07/07 PHY310: Statistical Data Analysis 15 Continuous Distributions The Uniform Distribution This is what most computer pseudo-random number generators approximate 1 f x ;, ={, x 0, otherwise E[x] = (α+β)/2 V[x] = (β-α)²/12 σ = (β-α)/ 12 The Exponential Distribution Usually seen when you measure the time until a random process happens The inverse of Poisson: you measure seconds/count instead of counts/second f t ; = 1 e t / E[x] = λ V[x] = λ² σ = λ

02/07/07 PHY310: Statistical Data Analysis 16 Gaussian Distribution (Continuous) How do you get a Gaussian? It's the limiting distribution if you take a sum of continuous random variables (add figure showing the sum of uniform variables). If you added enough discrete variables together they become continuous If you add enough random variables together, you get a Gaussian g x ;, 2 dx = x 2 1 2 2 e 2 2 dx The expectation value is E[x] = μ. The variance is V[x] = σ² The standard deviation is σ

Binomial Approaches Gaussian As the number of measurements gets large, the binomial distributions gets more and more like a Gaussian 02/07/07 PHY310: Statistical Data Analysis 17

Chi-Squared (χ²) Distribution Chi-squared is the distribution of the sum of squares of Gaussian distributed random variables. When you compare measurements to the average, the deviation is described by chi-squared. The expectation value is the number of degrees of freedom The variance is two times the d.o.f. The standard deviation is sqrt(2 d.o.f.) We will usually use the cumulative distribution for Chi-Squared Can be calculated with the ROOT TMATH::Prob(z,dof) function... 02/07/07 PHY310: Statistical Data Analysis 18

02/07/07 PHY310: Statistical Data Analysis 19 Deciding Which Distribution to Use You are counting things (boys born today) If you have a fixed number of trials use the binomial distribution e.g. There were eight children born in the hospital, how many were boys? If you don't have a fixed number use the Poisson distribution e.g. How many children were born in NY City? You have a continuous distribution If you are measuring the distance between discrete events (time or space), use the exponential distribution. e.g. How long do you have to wait until a C.R. muon arrives? e.g. The time between radioactive decays? Almost everything else is Gaussian Anything that depends on a sum of multiple random processes will have a Gaussian distribution.

Pseudo-Random Numbers MC's depend on a source of random numbers, but computers are deterministic Pseudo-random number generators are algorithms that sequences of apparently uncorrelated numbers Be very careful, the numbers are correlated, and are repeatable There are about a half dozen good generators Most computer generators return a uniform distribution between 0 and 1 Example of a simple generator: The Multiplicative Linear Congruential Generator (The MLC Generator) Start with an initial integer, n 1, called the seed Generate a sequence of new integers, n i+1 = (A n i + 1) mod M The integers will lie between 0 and M-1 MLC Generators are common and dangerous Very Fast Consecutive pairs of integers fall on a finite number of hyperplanes I'll use the term random (not pseudo-random ) unless I'm making a point 02/07/07 PHY310: Statistical Data Analysis 20

02/07/07 PHY310: Statistical Data Analysis 21 Generating Non-Uniform Distributions Start with a sequence of uniform random numbers and get a sequence of non-uniform random numbers The Transform Method You need to solve the equation Usually can't be done x r dx 'f x ' = r Functions where solution exists are usually tabulated in libraries More details available in Statistical Data Analysis The Accept/Reject Method Always works! But, can be inefficient

02/07/07 PHY310: Statistical Data Analysis 22 Basic Acceptance Works for functions over a finite interval To generate a random value from the p.d.f. f(x) over an interval [α,β] Generate a uniform random value, x, between α and β. Generate a second random value, y, between 0 and 1 If y < f(x) use x, otherwise reject x and start over.

02/07/07 PHY310: Statistical Data Analysis 23 General Acceptance The basic acceptance method only works over a fixed interval. A more general method uses the ratio of two functions. f(x) The P.D.F. you want to generate g(x) A P.D.F. that you have a transform generator for Must satisfy εf(x) < g(x) for all x, and ε>0 The algorithm: Generate x according to g(x) Generate a uniform y between 0 and g(x) if y < εf(x), accept x, otherwise reject x and start over

General Acceptance Example Generate random numbers for f x = Ae x 2 2 sin 2 2x double x, g, y, f; x = grandom->gaus(); g = exp(-x*x/2); y = grandom->uniform(0.0,g); f = exp(-x*x/2)*sin(2*x)*sin(2*x); if (y<f) h2->fill(x); Generate a Gaussian distributed number, x Generate a number, y, between 0 and g(x) Check if y is less than f(x) 02/07/07 PHY310: Statistical Data Analysis 24