Introduction to Probability I: Expectations, Bayes Theorem, Gaussians, and the Poisson Distribution. 1

Size: px

Start display at page:

Download "Introduction to Probability I: Expectations, Bayes Theorem, Gaussians, and the Poisson Distribution. 1"

Deirdre Henderson
5 years ago
Views:

1 Itroductio to Probability I: Expectatios, Bayes Theorem, Gaussias, ad the Poisso Distributio. 1 Pakaj Mehta February 25, Read: This will itroduce some elemetary ideas i probability theory that we will make use of repeatedly. Stochasticity plays a major role i biology. I these ext few lectures, we will develop the mathematical tools to treat stochasticity i biological systems. I this lecture, we will review some basic cocepts i probability theory. We will start with some basic ideas about probability ad defie various momets ad cumulats. We will the discuss how momets ad cumulats ca be calculated usig geeratig fuctios ad cumulat fuctios. We will the focus o two widely occurrig uiversal distributios : the Gaussia ad the Poisso distributio.

2 distributio. 2 Probability distributios of oe variable Cosider a probability distributio p(x) of a sigle variable where X ca take discrete values i some set x A or cotiuous values over the real umbers 2. We ca defie the expectatio value of a fuctio of this radom variable f (X) as f (X) = p(x) (1) x A 2 We will try to deote radom variables by capital letters ad the values they ca take by the correspodig lower case letter. for the discrete case ad f (X) = dx f (x)p(x), (2) i the cotiuous case. For simplicity, we will write thigs assumig x is discrete but it is straight forward to geeralize to the cotiuous case by replacig by itegrals. We ca defie the -th momet of x as X = p(x)x. (3) x The first momet is ofte called the mea ad we will deote it as x = µ x. We ca also defie the -th cetral of x as (X X ) = p(x)(x x ). (4) x The secod cetral momets is ofte called the variace of the distributio ad i may cases characterizes the typical width of the distributio. We will ofte deote the variace of a distributio by σ 2 x = (X X ) 2. Exercise: Show that the secod cetral momet ca also be writte as X 2 X 2. Example: Biomial Distributio Cosider the Biomial distributio which describes the probability of gettig heads if oe tosses a coi N times. Deote this radom variable by X. Note that X ca take o N + 1 values = 0, 1,..., N. If probability of gettig a head o a coi toss is p, the it is clear that ( ) N p() = p (1 p) N (5)

3 distributio. 3 Let us calculate the mea of this distributio. To do so, it is useful to defie q = (1 p). X = N p() =0 ( N = ) p q N Now we will use a very clever trick. We ca formally treat the above expressio as fuctio of two idepedet variables p ad q. Let us defie G(p, q) = ( ) N p q ( N ) = (p + q) N (6) ad take a partial derivative with respect to p. Formally, we kow the above is the same as X = ( ) N p q N = p p G(p, q) = p p (p + q) N = pn(p + q) N 1 = pn, (7) where i the last lie we have substituted q = 1 p. This is exactly what we expect 3. The mea umber of heads is exactly equal to the umber of times we toss the coi times the umber probability we get a heads. Exercise: Show that the variace of the Biomial distributio is just σ 2 = Np(1 p). 3 Whe oe first ecouters this kid of formal trick, it seems like magic ad cheatig. However, as log as the probability distributios coverge so that we ca iterchage sums/itegrals with derivatives there is othig wrog with this as strage as that seems. Probability distributio of multiple variables I geeral, we will also cosider probability distributios of M variables p(x 1,..., X M ). We ca get the margial distributios of a sigle variable by itegratig (summig over) all other variables 4 We have that p(x j ) = dx 1... dx j 1 dx j+1... dx M p(x 1, x 2,..., x M ). (8) 4 I this sectio, we will use cotiuous otatio as it is easier ad more compact. Discrete variables ca be treated by replacig itegrals with sums. Similarly, we ca defie the margial distributio over ay subset of variables by itegratig over the remaiig variables.

4 distributio. 4 We ca also defie the coditioal probability of a variable x i which ecodes the probability that variable X i takes o a give value give the values of all the other radom variables. We deote this probability by p(x i X 1,... X i 1, X i+1,..., X M ). Oe importat formula that we will make use of repeatedly is Bayes formula which relates margials ad coditioals to the full probability distributio p(x 1,..., X M ) = p(x i x 1,... X i 1, X i+1,..., X M )p(x 1,... X i 1, X i+1,..., X M ). (9) For the special case of two variables, this reduces to the formula p(x, Y) = p(x Y)p(Y) = p(y X)p(Y). (10) This is oe of the fudametal results i probability ad we will make use of it extesively whe discussig iferece i our discussio of early visio. 5 Exercise: Uderstadig test results. 6. Cosider a mammogram to diagose breast cacer. Cosider the followig facts (the umbers are t exact but reasoable). 1% of wome have breast cacer. 5 Bayes formula is crucial to uderstadig probability ad I hope you have ecoutered it before. If you have ot, please do some exercises that use Bayes formula that are everywhere o the web. 6 This is from the ice website https: //betterexplaied.com/articles/ a-ituitive-ad-short-explaatio-of-bayes-theore Mammograms miss 20% of cacers whe that are there. 10% of mammograms detect cacer eve whe it is ot there. Suppose you get a positive test result, what are the chaces you have cacer? Exercise: Cosider a distributio of two variables p(x, y). Defie a variable Z = X + Y 7 Show that Z = X + Y 7 We will make use of these results repeatedly. Please make sure you ca derive them ad uderstad them thoroughly. Show that σ 2 z = Z 2 Z 2 = σ 2 x + σ 2 y + 2Cov(X, Y) where we have defied the covariace of X ad Y as Cov(x, y) = (X X )(Y Y ).

5 distributio. 5 Show that if x ad y are idepedet variables tha Cov(x, y) = 0 ad σ 2 z = σ 2 x + σ 2 y. Law of Total Variace A importat result we will make use of tryig to uderstad experimets is the law of total variace. Cosider two radom variables X ad Y. We would like to kow how much the variace of Y is explaied by the value of X. I other words, we would like to decompose the variace of Y ito two terms: oe term that captures how much of the variace of Y is explaied by X ad aother term that captures the uexplaied variace. It will be helpful to defie some otatio to make the expectatio value clearer E X [ f (X)] = X x = dxp(x) f (x) (11) Notice that we ca defie the coditioal expectatio of a fuctio Y give that X = x by E Y [ f (Y) X = x] = dyp(y X = x) f (y). (12) For the special case whe f (Y) = Y 2 we ca defie the coditioal variace which measures how much does Y vary is we fix the value of X Var Y [Y X = x] = E Y [Y 2 X = x] E Y [Y X = x] 2 (13) We ca also defie a differet variace that measures the variace of the E Y [Y X] viewed as a radom fuctio of X: Var X [E Y [Y X]] = E X [E Y [Y X] 2 ] E X [E Y [Y X]] 2. (14) This is a well defied probability distributio sice p(y x) is the marigal probability distributio over y. Notice ow, that usig Bayes rule we ca write E Y [ f (y)] = dyp(y) f (y) = dxdyp(y x)p(x) f (y) = dxp(x) dyp(y x) f (y) = E X [E Y [ f (Y) X]] (15)

6 distributio. 6 So ow otice that we ca write σ 2 Y = E Y [Y 2 ] E Y [Y] 2 = E X [E Y [Y 2 X]] E X [E Y [Y X]] 2 = E X [E Y [Y 2 X]] E X [E Y [Y X] 2 ] + E X [E Y [Y X] 2 ] E X [E Y [Y X]] 2 = E X [E Y [Y 2 X] E Y [Y X] 2 ] + Var X [E Y [Y X]] = E X [Var Y [Y X]] + Var X [E Y [Y X]] (16) The first term is ofte called the uexplaied variace ad the secod term is the explaied variace. To see why, cosider the case where the radom variable Y are related by a liear fuctio plus oise Y = ax + ɛ, where ɛ is a third radom variable with mea ɛ = b ad ɛ 2 = σ 2 ɛ + µ 2. Furthermore, let E X [X] = µ x. The otice, that i this case all the radomess of Y is cotaied i ɛ. Notice that E[Y] = aµ x + b (17) ad E ɛ [Y X] = ax + E ɛ [ɛ] = ax + b (18) We ca also write Var X [E Y [Y X]] = E X [E Y [Y X] 2 ] E X [E Y [Y X]] 2 = E X [(ax + b) 2 ] (aµ x + b) 2 = a 2 E X [X 2 ] + 2abµ x + b 2 (aµ x + b) 2 = a 2 σ 2 X (19) This is clearly the variace i Y explaied by the variace i X. Usig the law of total variace, we see that i this case is the uexplaied variace. E X [Var Y [Y X]] = σ 2 Y a2 σ 2 X (20) Geeratig Fuctios ad Cumulat Geeratig Fuctios We are ow i the positio to itroduce some more machiery. These are the momet geeratig fuctios ad cumulat geeratig fuctios. Cosider a probability distributio p(x). We would like a easy ad efficiet way to calculate the momets of this distributio. It turs out that there is a easy way to do this usig ideas that will be familiar from Statistical Mechaics.

7 distributio. 7 Geeratig Fuctios Give a discrete probability distributio, p(x) we ca defie a momet-geeratig fuctio G(t) for the probability distributio as G(t) = e tx = dxp(x)e tx (21) Alteratively, we ca also defie a (ordiary) geeratig fuctio for the distributios G(z) = Z X = p(x)z x (22) x If the probability distributio is cotiuous, we defie the mometgeeratig fuctio as 8 G(t) = e tx = dxp(x)e tx. (23) 8 If x is defied over the reals, this is just the Laplace trasform. Notice that G(t = 0) = G(z = 1) = 1 sice probability distributios are ormalized. These fuctios have some really ice properties. Notice that the th momet of X ca be calculated by takig the -th partial derivative of the momet geeratig fuctio evaluated at t = 0: X N = G(t) t t=0 (24) Alteratively, we ca write i terms of the ordiary geeratig fuctio X N = (z z ) G(z) z=0 (25) where (z z ) meas apply this operator times. Example: Geeratig Fuctio of the Biomial Distributio Let us retur to the Biomial distributio above.the ordiary geeratig fuctio is give by G(z) = ( ) N p q N z = (pz + q) N = (pz + (1 p)) N (26) Let us calculate the mea usig this µ X = z z G(z) = [znp(pz + (1 p)) N 1 ] z=1 = Np (27) We ca also calculate the secod momet X 2 = z z (z z G(z)) = [znp(pz + (1 p)) N 1 ] z=1 + z [Np(pz + (1 p)) N 1 ] z=1 = Np + N(N 1)p 2 = Np(1 p) + N 2 p 2 (28)

8 distributio. 8 ad the variace σ 2 X = X2 X 2 = Np(1 p). (29) Example: Normal Distributio We ow cosider a Normal Radom Variable X whose probability desity is give by 1 p(x) = e (x µ)2 σ 2. (30) 2πσ 2 Let us calculate the momet geeratig fuctio 1 G(t) = dx e (x µ)2 σ 2 e tx. (31) 2πσ 2 It is useful to defie a ew variable u = (x µ)/σ 9. I terms of this variable we have that 1 G(t) = due u2 /2+t(σu+µ) 2π = e µt+t2 σ 2 /2 1 2π due (u σu/2)2 2 9 Notice that u is just the z-score i statistics. = e µt+t2 σ 2 /2 (32) Exercise: Show that the th momet of a Gaussia with mea zero (µ = 0) ca be writte as X p 0 if p is odd = (33) (p 1)!σ p if p is eve This is oe of the most importat properties of the Gaussia/Normal distributio. All higher order momets ca be expressed solely i terms of the mea ad variace! We will make use of this agai. The Cumulat Geeratig Fuctio Aother fuctio we ca defie is the Cumulat geeratig fuctio of a probability distributio K(t) = log e tx = log G(t). (34) The th derivative of K(t) evaluated at t = 0 is called the -th cumulat: κ = K () (0). (35) Let us ot look the first few cumulats. Notice that κ 1 = G (0) G(0) = µ X 1 = µ X. (36)

9 distributio. 9 Figure 1: "Whatever the form of the populatio distributio, the samplig distributio teds to a Gaussia, ad its dispersio is give by the Cetral Limit Theorem" (figure ad captio from Wikipedia) Similarly, it is easy to show that the secod cumulat is just the variace: κ 2 = G (0) G(0) [G (0)] 2 G(0) 2 = X 2 X 2 = σx 2 (37) Notice that the cumulat geeratig fuctio is just the Free Eergy i statistical mechaics ad the momet-geeratig fuctio is the partitio fuctio. Cetral Limit Theorem ad the Normal Distributio Oe of the most ubiquitous distributios that we see i all of physics, statistics, ad biology is the Normal distributio. This is because of the Cetral Limit theorem. Suppose that we draw some radom variables X 1, X 2,..., X N idetically ad idepedetly from a distributio with mea µ ad variace σ 2. Let us defie a variable S N = X 1 + X X N. (38) N The as N, the distributio of S N is well described by a Gaussia/Normal distributio with mea µ ad variace σ 2 /N. We will deote such a ormal distributio by N (µ, σ 2 /N). This is true regardless of the origial distributio (see Figure 1). The mai thig we should take away is that the variace decreases as 1/N with the umber of samples N that we take! Example: The Biomial Distributio We ca ask how the sample mea chages with the umber of samples for a Beroulli variable with probability p = 1/2 for various N. This Figure 2 from Wikipedia. The Poisso Distributio There is a secod uiversal distributio that occurs ofte i Biology. This is the distributio that describes the umber of rare evets

10 distributio. 10 Figure 2: Meas from various N samples draw from Beroulli distributio with p = 1/2. Notice it coverges more ad more to a Gaussia. (figure ad captio from Wikipedia) we expect i some time iterval T. The Poisso distributio is applicable if the followig assumptios hold: The umber of evets is discrete k = 0, 1, 2,.... The evets are idepedet, caot occur simultaeously, occur at a costat rate per uit time r. Geerally the umber of trials is large ad probability of success is small. Examples of thigs that ca be described by the Poisso distributio iclude: Photos arrivig i a microscope (especially at low itesity) The umber of mutatios per uit legth of DNA The umber of uclear decays i a time iterval The umber of soldiers killed by horse-kicks each year i each corps i the Prussia cavalry. This example was made famous by a book of Ladislaus Josephovich Bortkiewicz (1868Ð1931) (from Wikipedia) "The targetig of V-1 flyig bombs o Lodo durig World War II ivestigated by R. D. Clarke i 1946". Let us deote the mea umber of evets that occur i a time t by λ = rt. The, the probability that k evets occurs is give by p(k) = e λ λk k!. (39)

distributio. 11 Figure 3: Examples of Poisso distributio for differet meas.. (figure from Wikipedia) It is easy to check that k p(k) = 1 usig the Taylor series of the expoetial.

11 distributio. 11 Figure 3: Examples of Poisso distributio for differet meas.. (figure from Wikipedia) It is easy to check that k p(k) = 1 usig the Taylor series of the expoetial. We ca also calculate the mea ad variace. Notice we ca defie the geeratig fuctio of the Poisso as G(t) = e λ etk (λ) k z k! The cumulat-geeratig fuctio is just = e λ(1+et). (40) K(t) = log G(t) = λ(1 + e t ) (41) From this we ca easily get the all cumulats sice this is just differetiatig the expressio above times κ = K () (0) = λ. (42) I other words, all the higher order cumulats of the Poisso distributio are the same ad equal to the mea. Aother defiig feature of the Poisso distributio is that it has o memory. Sice thigs happe at a costat rate, there is o memory. We will see that i geeral to create o-poisso distributios (with memory), we will have to bur eergy! More o this cryptic statemet later.

Approximations and more PMFs and PDFs

Approximations and more PMFs and PDFs Approximatios ad more PMFs ad PDFs Saad Meimeh 1 Approximatio of biomial with Poisso Cosider the biomial distributio ( b(k,,p = p k (1 p k, k λ: k Assume that is large, ad p is small, but p λ at the limit.