Introduction to Probability I: Expectations, Bayes Theorem, Gaussians, and the Poisson Distribution. 1

Similar documents
Approximations and more PMFs and PDFs

AMS570 Lecture Notes #2

Exponential Families and Bayesian Inference

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

EE 4TM4: Digital Communications II Probability Theory

4. Partial Sums and the Central Limit Theorem

Lecture 7: Properties of Random Samples

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 19

IE 230 Seat # Name < KEY > Please read these directions. Closed book and notes. 60 minutes.

Lecture 1 Probability and Statistics

Math 525: Lecture 5. January 18, 2018

Expectation and Variance of a random variable

Random Variables, Sampling and Estimation

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

The standard deviation of the mean

Advanced Stochastic Processes.

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 15

7.1 Convergence of sequences of random variables

CS284A: Representations and Algorithms in Molecular Biology

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

Chapter 6 Principles of Data Reduction

7.1 Convergence of sequences of random variables

Lecture 1 Probability and Statistics

Distribution of Random Samples & Limit theorems

6. Sufficient, Complete, and Ancillary Statistics

NOTES ON DISTRIBUTIONS

This section is optional.

Variance of Discrete Random Variables Class 5, Jeremy Orloff and Jonathan Bloom

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Infinite Sequences and Series

Chapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities

STAT Homework 1 - Solutions

Lecture 2: Probability, Random Variables and Probability Distributions. GENOME 560, Spring 2017 Doug Fowler, GS

1 Introduction to reducing variance in Monte Carlo simulations

Last time: Moments of the Poisson distribution from its generating function. Example: Using telescope to measure intensity of an object

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Topic 9: Sampling Distributions of Estimators

0, otherwise. EX = E(X 1 + X n ) = EX j = np and. Var(X j ) = np(1 p). Var(X) = Var(X X n ) =

Discrete Mathematics for CS Spring 2005 Clancy/Wagner Notes 21. Some Important Distributions

Chapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE. Part 3: Summary of CI for µ Confidence Interval for a Population Proportion p

Probability and statistics: basic terms

Lecture 12: September 27

Lecture 2: Monte Carlo Simulation

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Frequentist Inference

f X (12) = Pr(X = 12) = Pr({(6, 6)}) = 1/36

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

PRACTICE PROBLEMS FOR THE FINAL

Mathematics 170B Selected HW Solutions.

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Lecture 12: November 13, 2018

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

LECTURE 8: ASYMPTOTICS I

Final Review for MATH 3510

Discrete Mathematics and Probability Theory Spring 2012 Alistair Sinclair Note 15

Lecture 2: April 3, 2013

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Lecture 2: Probability, Random Variables and Probability Distributions. GENOME 560, Spring 2015 Doug Fowler, GS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Parameter, Statistic and Random Samples

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22

Binomial Distribution

CS 330 Discussion - Probability

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

Elements of Statistical Methods Lots of Data or Large Samples (Ch 8)

10/31/2018 CentralLimitTheorem

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS


Chapter 6 Sampling Distributions

Confidence Intervals for the Population Proportion p

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Lecture 3. Properties of Summary Statistics: Sampling Distribution

ECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002

Quick Review of Probability

Understanding Samples

Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

PRACTICE PROBLEMS FOR THE FINAL

Estimation for Complete Data

Stat 319 Theory of Statistics (2) Exercises

Quick Review of Probability

Lecture 19: Convergence

Output Analysis and Run-Length Control

Poisson approximations

Discrete probability distributions

UNIT 2 DIFFERENT APPROACHES TO PROBABILITY THEORY

Statistics 511 Additional Materials

Limit Theorems. Convergence in Probability. Let X be the number of heads observed in n tosses. Then, E[X] = np and Var[X] = np(1-p).

4. Basic probability theory

CSE 527, Additional notes on MLE & EM

Math 10A final exam, December 16, 2016

Lecture 4. Random variable and distribution of probability

DISTRIBUTION LAW Okunev I.V.

Learning Theory: Lecture Notes

Transcription:

Itroductio to Probability I: Expectatios, Bayes Theorem, Gaussias, ad the Poisso Distributio. 1 Pakaj Mehta February 25, 2019 1 Read: This will itroduce some elemetary ideas i probability theory that we will make use of repeatedly. Stochasticity plays a major role i biology. I these ext few lectures, we will develop the mathematical tools to treat stochasticity i biological systems. I this lecture, we will review some basic cocepts i probability theory. We will start with some basic ideas about probability ad defie various momets ad cumulats. We will the discuss how momets ad cumulats ca be calculated usig geeratig fuctios ad cumulat fuctios. We will the focus o two widely occurrig uiversal distributios : the Gaussia ad the Poisso distributio.

distributio. 2 Probability distributios of oe variable Cosider a probability distributio p(x) of a sigle variable where X ca take discrete values i some set x A or cotiuous values over the real umbers 2. We ca defie the expectatio value of a fuctio of this radom variable f (X) as f (X) = p(x) (1) x A 2 We will try to deote radom variables by capital letters ad the values they ca take by the correspodig lower case letter. for the discrete case ad f (X) = dx f (x)p(x), (2) i the cotiuous case. For simplicity, we will write thigs assumig x is discrete but it is straight forward to geeralize to the cotiuous case by replacig by itegrals. We ca defie the -th momet of x as X = p(x)x. (3) x The first momet is ofte called the mea ad we will deote it as x = µ x. We ca also defie the -th cetral of x as (X X ) = p(x)(x x ). (4) x The secod cetral momets is ofte called the variace of the distributio ad i may cases characterizes the typical width of the distributio. We will ofte deote the variace of a distributio by σ 2 x = (X X ) 2. Exercise: Show that the secod cetral momet ca also be writte as X 2 X 2. Example: Biomial Distributio Cosider the Biomial distributio which describes the probability of gettig heads if oe tosses a coi N times. Deote this radom variable by X. Note that X ca take o N + 1 values = 0, 1,..., N. If probability of gettig a head o a coi toss is p, the it is clear that ( ) N p() = p (1 p) N (5)

distributio. 3 Let us calculate the mea of this distributio. To do so, it is useful to defie q = (1 p). X = N p() =0 ( N = ) p q N Now we will use a very clever trick. We ca formally treat the above expressio as fuctio of two idepedet variables p ad q. Let us defie G(p, q) = ( ) N p q ( N ) = (p + q) N (6) ad take a partial derivative with respect to p. Formally, we kow the above is the same as X = ( ) N p q N = p p G(p, q) = p p (p + q) N = pn(p + q) N 1 = pn, (7) where i the last lie we have substituted q = 1 p. This is exactly what we expect 3. The mea umber of heads is exactly equal to the umber of times we toss the coi times the umber probability we get a heads. Exercise: Show that the variace of the Biomial distributio is just σ 2 = Np(1 p). 3 Whe oe first ecouters this kid of formal trick, it seems like magic ad cheatig. However, as log as the probability distributios coverge so that we ca iterchage sums/itegrals with derivatives there is othig wrog with this as strage as that seems. Probability distributio of multiple variables I geeral, we will also cosider probability distributios of M variables p(x 1,..., X M ). We ca get the margial distributios of a sigle variable by itegratig (summig over) all other variables 4 We have that p(x j ) = dx 1... dx j 1 dx j+1... dx M p(x 1, x 2,..., x M ). (8) 4 I this sectio, we will use cotiuous otatio as it is easier ad more compact. Discrete variables ca be treated by replacig itegrals with sums. Similarly, we ca defie the margial distributio over ay subset of variables by itegratig over the remaiig variables.

distributio. 4 We ca also defie the coditioal probability of a variable x i which ecodes the probability that variable X i takes o a give value give the values of all the other radom variables. We deote this probability by p(x i X 1,... X i 1, X i+1,..., X M ). Oe importat formula that we will make use of repeatedly is Bayes formula which relates margials ad coditioals to the full probability distributio p(x 1,..., X M ) = p(x i x 1,... X i 1, X i+1,..., X M )p(x 1,... X i 1, X i+1,..., X M ). (9) For the special case of two variables, this reduces to the formula p(x, Y) = p(x Y)p(Y) = p(y X)p(Y). (10) This is oe of the fudametal results i probability ad we will make use of it extesively whe discussig iferece i our discussio of early visio. 5 Exercise: Uderstadig test results. 6. Cosider a mammogram to diagose breast cacer. Cosider the followig facts (the umbers are t exact but reasoable). 1% of wome have breast cacer. 5 Bayes formula is crucial to uderstadig probability ad I hope you have ecoutered it before. If you have ot, please do some exercises that use Bayes formula that are everywhere o the web. 6 This is from the ice website https: //betterexplaied.com/articles/ a-ituitive-ad-short-explaatio-of-bayes-theore Mammograms miss 20% of cacers whe that are there. 10% of mammograms detect cacer eve whe it is ot there. Suppose you get a positive test result, what are the chaces you have cacer? Exercise: Cosider a distributio of two variables p(x, y). Defie a variable Z = X + Y 7 Show that Z = X + Y 7 We will make use of these results repeatedly. Please make sure you ca derive them ad uderstad them thoroughly. Show that σ 2 z = Z 2 Z 2 = σ 2 x + σ 2 y + 2Cov(X, Y) where we have defied the covariace of X ad Y as Cov(x, y) = (X X )(Y Y ).

distributio. 5 Show that if x ad y are idepedet variables tha Cov(x, y) = 0 ad σ 2 z = σ 2 x + σ 2 y. Law of Total Variace A importat result we will make use of tryig to uderstad experimets is the law of total variace. Cosider two radom variables X ad Y. We would like to kow how much the variace of Y is explaied by the value of X. I other words, we would like to decompose the variace of Y ito two terms: oe term that captures how much of the variace of Y is explaied by X ad aother term that captures the uexplaied variace. It will be helpful to defie some otatio to make the expectatio value clearer E X [ f (X)] = X x = dxp(x) f (x) (11) Notice that we ca defie the coditioal expectatio of a fuctio Y give that X = x by E Y [ f (Y) X = x] = dyp(y X = x) f (y). (12) For the special case whe f (Y) = Y 2 we ca defie the coditioal variace which measures how much does Y vary is we fix the value of X Var Y [Y X = x] = E Y [Y 2 X = x] E Y [Y X = x] 2 (13) We ca also defie a differet variace that measures the variace of the E Y [Y X] viewed as a radom fuctio of X: Var X [E Y [Y X]] = E X [E Y [Y X] 2 ] E X [E Y [Y X]] 2. (14) This is a well defied probability distributio sice p(y x) is the marigal probability distributio over y. Notice ow, that usig Bayes rule we ca write E Y [ f (y)] = dyp(y) f (y) = dxdyp(y x)p(x) f (y) = dxp(x) dyp(y x) f (y) = E X [E Y [ f (Y) X]] (15)

distributio. 6 So ow otice that we ca write σ 2 Y = E Y [Y 2 ] E Y [Y] 2 = E X [E Y [Y 2 X]] E X [E Y [Y X]] 2 = E X [E Y [Y 2 X]] E X [E Y [Y X] 2 ] + E X [E Y [Y X] 2 ] E X [E Y [Y X]] 2 = E X [E Y [Y 2 X] E Y [Y X] 2 ] + Var X [E Y [Y X]] = E X [Var Y [Y X]] + Var X [E Y [Y X]] (16) The first term is ofte called the uexplaied variace ad the secod term is the explaied variace. To see why, cosider the case where the radom variable Y are related by a liear fuctio plus oise Y = ax + ɛ, where ɛ is a third radom variable with mea ɛ = b ad ɛ 2 = σ 2 ɛ + µ 2. Furthermore, let E X [X] = µ x. The otice, that i this case all the radomess of Y is cotaied i ɛ. Notice that E[Y] = aµ x + b (17) ad E ɛ [Y X] = ax + E ɛ [ɛ] = ax + b (18) We ca also write Var X [E Y [Y X]] = E X [E Y [Y X] 2 ] E X [E Y [Y X]] 2 = E X [(ax + b) 2 ] (aµ x + b) 2 = a 2 E X [X 2 ] + 2abµ x + b 2 (aµ x + b) 2 = a 2 σ 2 X (19) This is clearly the variace i Y explaied by the variace i X. Usig the law of total variace, we see that i this case is the uexplaied variace. E X [Var Y [Y X]] = σ 2 Y a2 σ 2 X (20) Geeratig Fuctios ad Cumulat Geeratig Fuctios We are ow i the positio to itroduce some more machiery. These are the momet geeratig fuctios ad cumulat geeratig fuctios. Cosider a probability distributio p(x). We would like a easy ad efficiet way to calculate the momets of this distributio. It turs out that there is a easy way to do this usig ideas that will be familiar from Statistical Mechaics.

distributio. 7 Geeratig Fuctios Give a discrete probability distributio, p(x) we ca defie a momet-geeratig fuctio G(t) for the probability distributio as G(t) = e tx = dxp(x)e tx (21) Alteratively, we ca also defie a (ordiary) geeratig fuctio for the distributios G(z) = Z X = p(x)z x (22) x If the probability distributio is cotiuous, we defie the mometgeeratig fuctio as 8 G(t) = e tx = dxp(x)e tx. (23) 8 If x is defied over the reals, this is just the Laplace trasform. Notice that G(t = 0) = G(z = 1) = 1 sice probability distributios are ormalized. These fuctios have some really ice properties. Notice that the th momet of X ca be calculated by takig the -th partial derivative of the momet geeratig fuctio evaluated at t = 0: X N = G(t) t t=0 (24) Alteratively, we ca write i terms of the ordiary geeratig fuctio X N = (z z ) G(z) z=0 (25) where (z z ) meas apply this operator times. Example: Geeratig Fuctio of the Biomial Distributio Let us retur to the Biomial distributio above.the ordiary geeratig fuctio is give by G(z) = ( ) N p q N z = (pz + q) N = (pz + (1 p)) N (26) Let us calculate the mea usig this µ X = z z G(z) = [znp(pz + (1 p)) N 1 ] z=1 = Np (27) We ca also calculate the secod momet X 2 = z z (z z G(z)) = [znp(pz + (1 p)) N 1 ] z=1 + z [Np(pz + (1 p)) N 1 ] z=1 = Np + N(N 1)p 2 = Np(1 p) + N 2 p 2 (28)

distributio. 8 ad the variace σ 2 X = X2 X 2 = Np(1 p). (29) Example: Normal Distributio We ow cosider a Normal Radom Variable X whose probability desity is give by 1 p(x) = e (x µ)2 σ 2. (30) 2πσ 2 Let us calculate the momet geeratig fuctio 1 G(t) = dx e (x µ)2 σ 2 e tx. (31) 2πσ 2 It is useful to defie a ew variable u = (x µ)/σ 9. I terms of this variable we have that 1 G(t) = due u2 /2+t(σu+µ) 2π = e µt+t2 σ 2 /2 1 2π due (u σu/2)2 2 9 Notice that u is just the z-score i statistics. = e µt+t2 σ 2 /2 (32) Exercise: Show that the th momet of a Gaussia with mea zero (µ = 0) ca be writte as X p 0 if p is odd = (33) (p 1)!σ p if p is eve This is oe of the most importat properties of the Gaussia/Normal distributio. All higher order momets ca be expressed solely i terms of the mea ad variace! We will make use of this agai. The Cumulat Geeratig Fuctio Aother fuctio we ca defie is the Cumulat geeratig fuctio of a probability distributio K(t) = log e tx = log G(t). (34) The th derivative of K(t) evaluated at t = 0 is called the -th cumulat: κ = K () (0). (35) Let us ot look the first few cumulats. Notice that κ 1 = G (0) G(0) = µ X 1 = µ X. (36)

distributio. 9 Figure 1: "Whatever the form of the populatio distributio, the samplig distributio teds to a Gaussia, ad its dispersio is give by the Cetral Limit Theorem" (figure ad captio from Wikipedia) Similarly, it is easy to show that the secod cumulat is just the variace: κ 2 = G (0) G(0) [G (0)] 2 G(0) 2 = X 2 X 2 = σx 2 (37) Notice that the cumulat geeratig fuctio is just the Free Eergy i statistical mechaics ad the momet-geeratig fuctio is the partitio fuctio. Cetral Limit Theorem ad the Normal Distributio Oe of the most ubiquitous distributios that we see i all of physics, statistics, ad biology is the Normal distributio. This is because of the Cetral Limit theorem. Suppose that we draw some radom variables X 1, X 2,..., X N idetically ad idepedetly from a distributio with mea µ ad variace σ 2. Let us defie a variable S N = X 1 + X 2 +... + X N. (38) N The as N, the distributio of S N is well described by a Gaussia/Normal distributio with mea µ ad variace σ 2 /N. We will deote such a ormal distributio by N (µ, σ 2 /N). This is true regardless of the origial distributio (see Figure 1). The mai thig we should take away is that the variace decreases as 1/N with the umber of samples N that we take! Example: The Biomial Distributio We ca ask how the sample mea chages with the umber of samples for a Beroulli variable with probability p = 1/2 for various N. This Figure 2 from Wikipedia. The Poisso Distributio There is a secod uiversal distributio that occurs ofte i Biology. This is the distributio that describes the umber of rare evets

distributio. 10 Figure 2: Meas from various N samples draw from Beroulli distributio with p = 1/2. Notice it coverges more ad more to a Gaussia. (figure ad captio from Wikipedia) we expect i some time iterval T. The Poisso distributio is applicable if the followig assumptios hold: The umber of evets is discrete k = 0, 1, 2,.... The evets are idepedet, caot occur simultaeously, occur at a costat rate per uit time r. Geerally the umber of trials is large ad probability of success is small. Examples of thigs that ca be described by the Poisso distributio iclude: Photos arrivig i a microscope (especially at low itesity) The umber of mutatios per uit legth of DNA The umber of uclear decays i a time iterval The umber of soldiers killed by horse-kicks each year i each corps i the Prussia cavalry. This example was made famous by a book of Ladislaus Josephovich Bortkiewicz (1868Ð1931) (from Wikipedia) "The targetig of V-1 flyig bombs o Lodo durig World War II ivestigated by R. D. Clarke i 1946". Let us deote the mea umber of evets that occur i a time t by λ = rt. The, the probability that k evets occurs is give by p(k) = e λ λk k!. (39)

distributio. 11 Figure 3: Examples of Poisso distributio for differet meas.. (figure from Wikipedia) It is easy to check that k p(k) = 1 usig the Taylor series of the expoetial. We ca also calculate the mea ad variace. Notice we ca defie the geeratig fuctio of the Poisso as G(t) = e λ etk (λ) k z k! The cumulat-geeratig fuctio is just = e λ(1+et). (40) K(t) = log G(t) = λ(1 + e t ) (41) From this we ca easily get the all cumulats sice this is just differetiatig the expressio above times κ = K () (0) = λ. (42) I other words, all the higher order cumulats of the Poisso distributio are the same ad equal to the mea. Aother defiig feature of the Poisso distributio is that it has o memory. Sice thigs happe at a costat rate, there is o memory. We will see that i geeral to create o-poisso distributios (with memory), we will have to bur eergy! More o this cryptic statemet later.