Probability distributions. Probability Distribution Functions. Probability distributions (contd.) Binomial distribution

Similar documents
Probability Distributions Columns (a) through (d)

15 Discrete Distributions

Things to remember when learning probability distributions:

Probability and Distributions

Random variables. DS GA 1002 Probability and Statistics for Data Science.

System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models

Physics 403 Probability Distributions II: More Properties of PDFs and PMFs

Continuous Distributions

Guidelines for Solving Probability Problems

BMIR Lecture Series on Probability and Statistics Fall, 2015 Uniform Distribution

ECE 313 Probability with Engineering Applications Fall 2000

Ching-Han Hsu, BMES, National Tsing Hua University c 2015 by Ching-Han Hsu, Ph.D., BMIR Lab. = a + b 2. b a. x a b a = 12

Northwestern University Department of Electrical Engineering and Computer Science

Continuous Random Variables

3 Modeling Process Quality

3 Continuous Random Variables

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models

EE/CpE 345. Modeling and Simulation. Fall Class 5 September 30, 2002

Common ontinuous random variables

1 Review of Probability and Distributions

Chapter 5. Chapter 5 sections

A Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes.

This does not cover everything on the final. Look at the posted practice problems for other topics.

Random Variables and Their Distributions

Brief Review of Probability

Probability Density Functions

Chapter 5 continued. Chapter 5 sections

Basic concepts of probability theory

MFM Practitioner Module: Quantitative Risk Management. John Dodson. September 23, 2015

Discrete Distributions

Statistical Methods in Particle Physics

Sampling Distributions

ECE 302 Division 2 Exam 2 Solutions, 11/4/2009.

Continuous Random Variables and Continuous Distributions

Slides 8: Statistical Models in Simulation

Stat410 Probability and Statistics II (F16)

Sampling Distributions

STAT/MATH 395 A - PROBABILITY II UW Winter Quarter Moment functions. x r p X (x) (1) E[X r ] = x r f X (x) dx (2) (x E[X]) r p X (x) (3)

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

Basic concepts of probability theory

MAS113 Introduction to Probability and Statistics. Proofs of theorems

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Chapter 2: Random Variables

Lecture 2: Repetition of probability theory and statistics

MAS113 Introduction to Probability and Statistics. Proofs of theorems

Basic concepts of probability theory

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

2.3 Analysis of Categorical Data

Probability Distributions

ELEG 3143 Probability & Stochastic Process Ch. 2 Discrete Random Variables

Stat 5101 Notes: Brand Name Distributions

Mathematical Statistics 1 Math A 6330

STAT Chapter 5 Continuous Distributions

3. Probability and Statistics

Special Discrete RV s. Then X = the number of successes is a binomial RV. X ~ Bin(n,p).

Twelfth Problem Assignment

Week 1 Quantitative Analysis of Financial Markets Distributions A

MATH : EXAM 2 INFO/LOGISTICS/ADVICE

Stat 426 : Homework 1.

THE QUEEN S UNIVERSITY OF BELFAST

b. ( ) ( ) ( ) ( ) ( ) 5. Independence: Two events (A & B) are independent if one of the conditions listed below is satisfied; ( ) ( ) ( )

Lecture 3 Continuous Random Variable

3.4. The Binomial Probability Distribution

Expectations. Definition Let X be a discrete rv with set of possible values D and pmf p(x). The expected value or mean value of X, denoted by E(X ) or

PROBABILITY DISTRIBUTION

1 Probability and Random Variables

Topic 3: The Expectation of a Random Variable

Closed book and notes. 60 minutes. Cover page and four pages of exam. No calculators.

Relationship between probability set function and random variable - 2 -

APPM/MATH 4/5520 Solutions to Exam I Review Problems. f X 1,X 2. 2e x 1 x 2. = x 2

Math Spring Practice for the Second Exam.

Chapter 1. Sets and probability. 1.3 Probability space

Continuous Distributions

STAT 3610: Review of Probability Distributions

Stat 5101 Notes: Brand Name Distributions

FINAL EXAM: Monday 8-10am

Continuous Probability Spaces

Introduction to Statistics. By: Ewa Paszek

Lecture 3. Biostatistics in Veterinary Science. Feb 2, Jung-Jin Lee Drexel University. Biostatistics in Veterinary Science Lecture 3

Continuous random variables

A Few Special Distributions and Their Properties

Mathematical Statistics 1 Math A 6330

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 18

1 Review of Probability

Statistics for scientists and engineers

STAT/MATH 395 PROBABILITY II

Statistics 1B. Statistics 1B 1 (1 1)

Definition: A random variable X is a real valued function that maps a sample space S into the space of real numbers R. X : S R

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Statistics for Economists. Lectures 3 & 4

Chapter 3 Single Random Variables and Probability Distributions (Part 1)

HW7 Solutions. f(x) = 0 otherwise. 0 otherwise. The density function looks like this: = 20 if x [10, 90) if x [90, 100]

DS-GA 1002 Lecture notes 2 Fall Random variables

Review for the previous lecture

Department of Mathematics

Ch3. Generating Random Variates with Non-Uniform Distributions

2 Random Variable Generation

Continuous Probability Distributions. Uniform Distribution

Transcription:

Probability distributions Probability Distribution Functions G. Jogesh Babu Department of Statistics Penn State University September 27, 2011 http://en.wikipedia.org/wiki/probability_distribution We discuss few probability distribution functions particularly relevant to astronomical research. Some are discrete with countable jumps while others are continuous. The mathematical properties of these distributions have been extensively studied, sometimes for over two centuries. These include moments, methods for parameter estimation, order statistics, interval and extrema estimation, two-sample tests, and computational algorithms. We pay particular attention to the Gaussian, Chi-square, Poisson and Pareto (or power law) distributions which are particularly important in astronomy. Probability distributions (contd.) The p.m.f. s and p.d.f. s have normalizations that may be unfamiliar to astronomers because their sum or total integral needs to be equal to unity For example, the astronomer s familiar power law distribution f (x) = bx α is not a p.d.f. for arbitrary choices of α and b The Pareto distribution { αb α /x α+1 for x b > 0 f (x) = 0 for x < b has the correct normalization for a p.d.f. Binomial distribution Let X be a binomial (n, p) random variable. If there are k successes, there must also be n k failures. By independence, any single outcome with k success and n k failures has probability p k (1 p) n k. There are ( n k) such outcomes. ( ) n Therefore, the p.m.f. is fx (k) = p k (1 p) n k. k The mean and variance of a binomial (n, p) random variable are np and np(1 p), respectively. In the special case of n = 1, we have a Bernoulli random variable. ( ) X np x 1 P x e 1 2 y 2 dy np(1 p) 2π A rule of thumb for normal approximation is np(1 p) > 9. http://en.wikipedia.org/wiki/binomial_distribution

Multinomial distribution The multinomial distribution is a k-dimensional generalization of the binomial that provides k > 2 possible outcomes for each trial. Many applications to astronomy are evident: Type Ia, Ib, Ic, and II supernovae; Class 0, I, II and III pre-main sequence stars; Flora, Themis, Vesta and other asteroid families; etc. A random vector N= (N1,..., Nk) follows multinomial distribution with parameters pi > 0 where k pi = 1 and the number of trials occurring in each class is Ni where k Ni = n (total number of trials). if k n! P (N1 = m1,..., Nk = mk) = m1!... mk! pm 1 1... pm k k, mi = n and zero otherwise. Poisson distribution Events follow approximately a Poisson distribution if they are produced by Bernouilli trials, where the probability p of occurrence is very small, the number n of trials is very large, but the product λ = np approaches a constant. This is quite natural for a variety of astronomical situations: A distant quasar may emit 10 64 photons s 1 in the X-ray band but the photon arrival rate at a telescope may only be 10 3 photons s 1. The signal from a typical 10 4 s observation may thus contain only 10 1 of the n 10 68 photons emitted by the quasar during the observation period, giving p 10 67 and λ 10 1. A 1 cm 2 detector is subject to bombardment of cosmic rays, producing an astronomically uninteresting background. As the detector geometrically subtends p 10 18 of the Earth s surface area, and that these cosmic rays should arrive in a random fashion in time and across the detector, the Poisson distribution can be used to characterize and statistically remove this instrumental background. Multinomial distribution (contd.) The estimators for the class probabilities pi are the same as with the binomial parameter, Ni/n, with variances npi(1 pi). For example, an optical photometric survey may obtain a sample of 43 supernovae consisting of 16 Type 1a, 3 Type 1b and 24 Type II supernovae. The sample estimator of the Type Ia fraction is then ˆp1 = 16/43. By using the multivariate Central Limit Theorem it can be proved that k χ 2 (Ni npi) 2 = npi has approximately a chi-square distribution with k 1 d.f. The χ 2 quantity is the sum of the ratio (Oi Ei) 2 /Ei where Oi is the observed frequency of the i-th class and Ei is the expected frequency given its probability pi. The accuracy of the χ 2 approximation can be poor for large k or if any pi is small. Poisson distribution If X is a Poisson (λ) random variable where λ > 0, its p.m.f. is exactly f (x) = e λ λ x for x {0, 1, 2,...} x! By the way, λ does not have to be an integer, but X does. A Poisson (λ) random variable has mean λ and variance λ. Computing the Poisson distribution function: Start with P(X = 0) = e λ... P(X = i + 1) = λ P(X = i) i + 1 http://en.wikipedia.org/wiki/poisson_distribution

Exponential distribution X is an exponential (λ) r.v. if the PDF of X is { λe λx if x > 0 f (x) = 0 if x 0, where the parameter λ > 0. E(X) = 1 λ, Var(X) = 1 λ 2. The exponential r.v. with mean θ has density and c.d.f. { { 1 f (x) = θ e x/θ if x > 0 1 e x/θ if x > 0 and F(x) = Erlang distribution X is Erlang (n, λ) r.v. if the PDF of X is { 1 (n 1)! f (x) = λn x n 1 e λx if x > 0 Agner Krarup Erlang, a Danish mathematician and engineer, developed it to examine the number of telephone calls which might be made at the same time to the operators of the switching stations. Erlang is a special case of Gamma distribution. http://en.wikipedia.org/wiki/erlang_distribution http://en.wikipedia.org/wiki/exponential_distribution Gamma distribution X is Gamma (α, λ) r.v if the PDF of X is { 1 Γ(α) f (x) = λα x α 1 e λx if x > 0 Γ(α) = x α 1 e x dx 0 http://en.wikipedia.org/wiki/gamma_distribution Chi-square distribution with k degrees of freedom is Gamma with α = k/2, λ = 1. http://en.wikipedia.org/wiki/chi-square_distribution Poisson Process The most common origin of the Poisson distribution in astronomy is a counting process, a subset of stochastic point processes which can be produced by dropping points randomly along a line. The number of points till time t follows Poisson distribution with intensity λt. The distance between any two points is exponential with parameter λ. Successive distances are independent Total distance to the n-th point has Gamma distribution with parameters (n, λ). http://en.wikipedia.org/wiki/poisson_process

Normal distribution The most celebrated distribution in probability is the standard normal. If Z is standard normal, then fz (z) = (1/ 2π)e z2 /2. The c.d.f. even has its own symbol: Φ(z) = FZ (z). In R, Φ(z) and Φ 1 (p) are pnorm(z) and qnorm(p). Let Z be standard normal. Then σz + µ is also normal for real numbers σ and µ. What is the density of σz + µ? The result is a normal (µ, σ 2 ) random variable. (We usually take σ > 0.) http://en.wikipedia.org/wiki/normal_distribution Normal distribution Let Z be standard normal. Find E(Z ) and Var (Z ). How about E(σZ + µ) and Var (σz + µ)? The sum of squares of independent standard normals X1, X2,..., Xn, n X i 2 follows chi-square distribution with n degrees of freedom. The sample variance follows a chi square distribution, 1 n σ 2 (Xi X) 2 χ 2 n 1, where X = 1 n n Xi. Mutlivariate normal distribution X= (X1,..., Xk) MVN(ν, Σ) if every linear combination is a normal distribution. http://en.wikipedia.org/wiki/multivariate_normal_distribution Power law Also known as Pareto distribution. ( ) x α P(X > x), α > 0 xmin The Pareto probability distribution function (p.d.f.) can be qualitatively expressed as P(x) = shape ( ) location shape+1. location x http://en.wikipedia.org/wiki/pareto_distribution

Power law in astronomy Power law in astronomy Power law appear in: the sizes of lunar impact craters intensities of solar flares energies of cosmic ray protons energies of synchrotron radio lobe electrons the masses of higher-mass stars luminosities of lower-mass galaxies brightness of extragalactic X-ray and radio sources brightness decays of gamma-ray burst afterglows turbulent structure of interstellar clouds sizes of interstellar dust particles Some of these distributions are named after the authors of seminal studies: de Vaucouleurs galaxy profile Salpeter stellar Initial Mass Function (IMF) Schechter galaxy luminosity function MRN dust size distribution Larson s relations of molecular cloud kinematics The power law behavior is often limited to a range of values: the spectrum of cosmic rays breaks around 10 15 ev and again at higher energies the Schechter function drops exponentially above a characteristic galaxy absolute magnitude M 20.5 the Salpeter IMF becomes approximately lognormal below 0.5 M See the notes for extensions of power law and multivariate Pareto.