MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 6 9/24/2008 DISCRETE RANDOM VARIABLES AND THEIR EXPECTATIONS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 6 9/24/2008 DISCRETE RANDOM VARIABLES AND THEIR EXPECTATIONS Cotets 1. A few useful discrete radom variables 2. Joit, margial, ad coditioal PMFs 3. Idepedece of radom variables 4. Expected values 1 A FEW USEFUL RANDOM VARIABLES Recall that a radom variable X : Ω R is called discrete if its rage (i.e., the set of values that it ca take) is a coutable set. The PMF of X is a fuctio p X : R [0, 1], defied by p X (x) = P(X = x), ad completely determies the probability law of X. The followig are some importat PMFs. (a) Discrete uiform with parameters a ad b, where a ad b are itegers with a < b. Here, p X (k) = 1/(b a + 1), k = a, a + 1,..., b, ad p X (k) = 0, otherwise. 1 (b) Beroulli with parameter p, where 0 p 1. Here, p X (0) = p, p X (1) = 1 p. (c) Biomial with parameters ad p, where N ad p [0, 1]. Here, ( ) p X (k) = p k (1 p) k, k = 0, 1,...,. k 1 I the remaiig examples, the qualificatio p X (k) = 0, otherwise, will be omitted for brevity. 1

A biomial radom variable with parameters ad p represets the umber of heads observed i idepedet tosses of a coi if the probability of heads at each toss is p. (d) Geometric with parameter p, where 0 < p 1. Here, p X (k) = (1 p) k 1 p, k = 1, 2,...,. A geometric radom variable with parameter p represets the umber of idepedet tosses of a coi util heads are observed for the first time, if the probability of heads at each toss is p. (e) Poisso with parameter λ, where λ > 0. Here, p X (k) = e λ λk, k = 0, 1,.... k! As will be see shortly, a Poisso radom variable ca be thought of as a limitig case of a biomial radom variable. Note that this is a legitimate PMF (i.e., it sums to oe), because of the series expasio of the expoetial fuctio, e λ = k=0 λk /k!. (f) Power law with parameter α, where α > 0. Here, 1 1 p X (k) = k α, k = 1, 2,.... (k + 1) α A equivalet but more ituitive way of specifyig this PMF is i terms of the formula 1 P(X k) =, k = 1, 2,.... k α Note that whe α is small, the tail P(X k) of the distributio decays slowly (slower tha a expoetial) as k icreases, ad i some sese such a distributio has heavy tails. Notatio: Let us use the abbreviatios du(a, b), Ber(p), Bi(, p), Geo(p), Pois(λ), ad Pow(α) to refer the above defied PMFs. We will use otatio d such as X = du(a, b) or X du(a, b) as a shorthad for the statemet that X is a discrete radom variable whose PMF is uiform o (a, b), ad similarly for the other PMFs we defied. We will also use the otatio X = Y to idicate that two radom variables have the same PMFs. d 2

1.1 Poisso distributio as a limit of the biomial To get a feel for the Poisso radom variable, thik of a biomial radom variable with very small p ad very large. For example, cosider the umber of typos i a book with a total of words, whe the probability p that ay oe word is misspelled is very small (associate a word with a coi toss that results i a head whe the word is misspelled), or the umber of cars ivolved i accidets i a city o a give day (associate a car with a coi toss that results i a head whe the car has a accidet). Such radom variables ca be well modeled with a Poisso PMF. More precisely, the Poisso PMF with parameter λ is a good approximatio for a biomial PMF with parameters ad p, i.e., k e λ λ k!! k! ( k)! p k (1 p) k, k = 0, 1,...,, provided λ = p, is large, ad p is small. I this case, usig the Poisso PMF may result i simpler models ad calculatios. For example, let = 100 ad p = 0.01. The the probability of k = 5 successes i = 100 trials is calculated usig the biomial PMF as 100! 95! 5! 0.015 (1 0.01) 95 = 0.00290. Usig the Poisso PMF with λ = p = 100 0.01 = 1, this probability is approximated by 1 e 1 = 0.00306. 5! Propositio 1. (Biomial covergece to Poisso) Let us fix some λ > 0, d ad suppose that X = Bi(, λ/), for every. Let X = Pois(λ). The, as, the PMF of X coverges to the PMF of X, i the sese that lim P(X = k) = P(X = k), for ay k 0. Proof: We have ( 1) ( k + 1) λk ( P(X = k) = k k! λ ) k 1. Fix k ad let. We have, for j = 1,..., k, k + j 1, ( 1 λ ) k ( 1, 1 λ ) e λ. 3 d

Thus, for ay fixed k, we obtai as claimed. lim P(X = k) = e λ λk = P(X = k), k! 2 JOINT, MARGINAL, AND CONDITIONAL PMFS I most applicatios, oe typically deals with several radom variables at oce. I this sectio, we itroduce a few cocepts that are useful i such a cotext. 2.1 Margial PMFs Cosider two discrete radom variables X ad Y associated with the same experimet. The probability law of each oe of them is described by the correspodig PMF, p X or p Y, called a margial PMF. However, the margial PMFs do ot provide ay iformatio o possible relatios betwee these two radom variables. For example, suppose that the PMF of X is symmetric aroud the origi. If we have either Y = X or Y = X, the PMF of Y remais the same, ad fails to capture the specifics of the depedece betwee X ad Y. 2.2 Joit PMFs The statistical properties of two radom variables X ad Y are captured by a fuctio p X,Y : R 2 [0, 1], defied by p X,Y (x, y) = P(X = x, Y = y), called the joit PMF of X ad Y. Here ad i the sequel, we will use the abbreviated otatio P(X = x, Y = y) istead of the more precise otatios P({X = x} {Y = y}) or P(X = x ad Y = x). More geerally, the PMF of fiitely may discrete radom variables, X 1,..., X o the same probability space is defied by p X1,...,X (x 1,..., x ) = P(X 1 = x 1,..., X = x ). Sometimes, we defie a vector radom variable X, by lettig X = (X 1,..., X ), i which case the joit PMF will be deoted simply as p X (x), where ow the argumet x is a -dimesioal vector. 4

The joit PMF of X ad Y determies the probability of ay evet that ca be specified i terms of the radom variables X ad Y. For example if A is the set of all pairs (x, y) that have a certai property, the ( ) P (X, Y ) A = p X,Y (x, y). (x,y) A I fact, we ca calculate the margial PMFs of X ad Y by usig the formulas p X (x) = p X,Y (x, y), p Y (y) = p X,Y (x, y). y The formula for p X (x) ca be verified usig the calculatio p X (x) = P(X = x) = P(X = x, Y = y) = p X,Y (x, y), y where the secod equality follows by otig that the evet {X = x} is the uio of the coutably may disjoit evets {X = x, Y = y}, as y rages over all the differet values of Y. The formula for p Y (y) is verified similarly. 2.3 Coditioal PMFs Let X ad Y be two discrete radom variables, defied o the same probability space, with joit PMF p X,Y. The coditioal PMF of X give Y is a fuctio p X Y, defied by p X Y (x y) = P(X = x Y = y), if P(Y = y) > 0; if P(Y = y) = 0, the value of p X Y (y x) is left udefied. Usig the defiitio of coditioal probabilities, we obtai x y p X Y (x y) = p X,Y (x, y), p Y (y) wheever p Y (y) > 0. More geerally, if we have radom variables X 1,..., X ad Y 1,..., Y m, defied o the same probability space, we defie a coditioal PMF by lettig p X1,...,X Y 1,...,Y m (x 1,..., x y 1,..., y m ) = P(X 1 = x 1,..., X = x Y 1 = y 1,..., Y m = y m ) p X1,...,X,Y 1,...,Y m (x 1,..., x, y 1,..., y m ) =, p Y1,...,Y m (y 1,..., y m ) 5

wheever p Y1,...,Y m (y 1,..., y m ) > 0. Agai, if we defie X = (X 1,..., X ) ad Y = (Y 1,..., Y m ), the shorthad otatio p X Y (x y) ca be used. Note that if p Y (y) > 0, the x p X Y (x y) = 1, where the sum is over all x i the rage of the radom variable X. Thus, the coditioal PMF is essetially the same as a ordiary PMF, but with redefied probabilities that take ito accout the coditioig evet Y = y. Visually, if we fix y, the coditioal PMF p X Y (x y), viewed as a fuctio of x is a slice of the joit PMF p X,Y, reormalized so that its etries sum to oe. 3 INDEPENDENCE OF RANDOM VARIABLES We ow defie the importat otio of idepedece of radom variables. We start with a geeral defiitio that applies to all types of radom variables, icludig discrete ad cotiuous oes. We the specialize to the case of discrete radom variables. 3.1 Idepedece of geeral radom variables Ituitively, two radom variables are idepedet if ay partial iformatio o the realized value of oe radom variable does ot chage the distributio of the other. This otio is formalized i the followig defiitio. Defiitio 1. (Idepedece of radom variables) (a) Let X 1,..., X be radom variables defied o the same probability space. We say that these radom variables are idepedet if P(X 1 B 1,..., X B ) = P(X 1 B 1 ) P(X B ), for ay Borel subsets B 1,..., B of the real lie. (b) Let {X s s S} be a collectio of radom variables idexed by the elemets of a (possibly ifiite) idex set S. We say that these radom variables are idepedet if for every fiite subset {s 1,..., s } of S, the radom variables X s1,..., X s are idepedet. Recallig the defiitio of idepedet evets, we see that idepedece of the radom variables X s, s S, is the same as requirig idepedece of the evets {X s B s }, s S, for every choice of Borel subsets B s, s S, of the real lie. 6

Verifyig the idepedece of radom variables usig the above defiitio (which ivolves arbitrary Borel sets) is rather difficult. It turs out that oe oly eeds to examie Borel sets of the form (, x]. Propositio 2. Suppose that for every, every x 1,..., x, ad every fiite subset {s 1,..., s } of S, the evets {X si x i }, i = 1,...,, are idepedet. The, the radom variables X s, s S, are idepedet. We omit the proof of the above propositio. However, we remark that ultimately, the proof relies o the fact that the collectio of evets of the form (, x] geerate the Borel σ-field. Let us defie the joit CDF of of the radom variables X 1,..., X by F X1,...,X (x 1,..., x ) = P(X 1 x 1,..., X x ). I view of Propositio 2, idepedece of X 1,..., X is equivalet to the coditio F X1,...,X (x 1,..., x ) = F X1 (x 1 ) F X (x ), x 1,..., x. Exercise 1. Cosider a collectio {A s s S} of evets, where S is a (possibly ifiite) idex set. Prove that the evets A s are idepedet if ad oly if the correspodig idicator fuctios I As, s S, are idepedet radom variables. 3.2 Idepedece of discrete radom variables For a fiite umber of discrete radom variables, idepedece is equivalet to havig a joit PMF which factors ito a product of margial PMFs. Theorem 1. Let X ad Y be discrete radom variables defied o the same probability space. The followig are equivalet. (a) The radom variables X ad Y are idepedet. (b) For ay x, y R, the evets {X = x} ad {Y = y} are idepedet. (c) For ay x, y R, we have p X,Y (x, y) = p X (x)p Y (y). (d) For ay x, y R such that p Y (y) > 0, we have p X Y (x y) = p X (x). Proof: The fact that (a) implies (b) is immediate from the defiitio of idepedece. The equivalece of (b), (c), ad (d) is also a immediate cosequece of our defiitios. We complete the proof by verifyig that (c) implies (a). 7

Suppose that X ad Y are idepedet, ad let A, B, be two Borel subsets of the real lie. We the have P(X A, Y B) = P(X = x, Y = y) x A, y B = p X,Y (x, y) x A, y B = p X (x)p Y (x) x A, y B ( )( = p X (x) p Y (y)) x A y B = P(X A) P(Y B). Sice this is true for ay Borel sets A ad B, we coclude that X ad Y are idepedet. We ote that Theorem 1 geeralizes to the case of multiple, but fiitely may, radom variables. The geeralizatio of coditios (a)-(c) should be obvious. As for coditio (d), it ca be geeralized to a few differet forms, oe of which is the followig: give ay subset S 0 of the radom variables uder cosideratio, the coditioal joit PMF of the radom variables X s, s S 0, give the values of the remaiig radom variables, is the same as the ucoditioal joit PMF of the radom variables X s, s S 0, as log as we are coditioig o a evet with positive probability. We fially ote that fuctios g(x) ad h(y ) of two idepedet radom variables X ad Y must themselves be idepedet. This should be expected o ituitive grouds: If X is idepedet from Y, the the iformatio provided by the value of g(x) should ot affect the distributio of Y, ad cosequetly should ot affect the distributio of h(y ). Observe that whe X ad Y are discrete, the g(x) ad h(y ) are radom variables (the required measurability coditios are satisfied) eve if the fuctios g ad h are ot measurable (why?). Theorem 2. Let X ad Y be idepedet discrete radom variables. Let g ad h be some fuctios from R ito itself. The, the radom variables g(x) ad h(y ) are idepedet. The proof is left as a exercise. 8

3.3 Examples Example. Let X 1,..., X be idepedet Beroulli radom variables with the same parameter p. The, the radom variable X defied by X = X 1 + + X is biomial with parameters ad p. To see this, cosider idepedet tosses of a coi i which every toss has probability p of resultig i a oe, ad let X i be the result of the ith coi toss. The, X is the umber of oes observed i idepedet tosses, ad is therefore a biomial radom variable. Example. Let X ad Y be idepedet biomial radom variables with parameters (, p) ad (m, p), respectively. The, the radom variable Z, defied by Z = X + Y is biomial with parameters ( + m, p). To see this, cosider + m idepedet tosses of a coi i which every toss has probability p of resultig i a oe. Let X be the umber of oes i the first tosses, ad let Y be the umber of oes i the last m tosses. The, Z is the umber of oes i +m idepedet tosses, which is biomial with parameters ( + m, p). Example. Cosider idepedet tosses of a coi i which every toss has probability p of resultig i a oe. Let X be the umber of oes obtaied, ad let Y = X, which is the umber of zeros. The radom variables X ad Y are ot idepedet. For example, P(X = 0) = (1 p) ad P(Y = 0) = p, but P(X = 0, Y = 0) = 0 = P(X = 0)P(Y = 0). Ituitively, kowig that there was a small umber of heads gives us iformatio that the umber of tails must be large. However, i sharp cotrast to the ituitio from the precedig example, we obtai idepedece whe the umber of coi tosses is itself radom, with a Poisso distributio. More precisely, let N be a Poisso radom variable with parameter λ. We assume that the coditioal PMF p X N ( ) is biomial with parameters ad p (represetig the umber of oes observed i coi tosses), ad defie Y = N X, which represets the umber of zeros obtaied. We have the followig surprisig result. A ituitive justificatio will have to wait util we cosider the Poisso process, later i this course. The proof is left as a exercise. Theorem 3. (Splittig of a Poisso radom variable) The radom variables ( X ad ) Y are idepedet. Moreover, X = Pois(λp) ad Y = Pois λ(1 p). d d 9

4 EXPECTED VALUES 4.1 Prelimiaries: ifiite sums Cosider a sequece {a } of oegative real umbers ad the ifiite sum i=1 a i, defied as the limit, lim i=1 a i, of the partial sums. The ifiite sum ca be fiite or ifiite; i either case, it is well defied, as log as we allow the limit to be a exteded real umber. Furthermore, it ca be verified that the value of the ifiite sum is the same eve if we reorder the elemets of the sequece {a } ad carry out the summatio accordig to this differet order. Because the order of the summatio does ot matter, we ca use the otatio N a for the ifiite sum. More geerally, if C is a coutable set ad g : C [0, ) is a oegative fuctio, we ca use the otatio x C g(x), which is uambiguous eve without specifyig a particular order i which the values g(x) are to be summed. Whe we cosider a sequece of opositive real umbers, the discussio remais the same, ad ifiite sums ca be uambiguously defied. However, whe we cosider sequeces that ivolve both positive ad egative umbers, the situatio is more complicated. I particular, the order at which the elemets of the sequece are added ca make a differece. Example. Let a = ( 1) /. It ca be verified that the limit lim i=1 a exists, ad is fiite, but that the elemets of the sequece {a } ca be reordered to form a ew sequece {b } for which the limit of i=1 b does ot exist. I order to deal with the geeral case, we proceed as follows. Let S be a coutable set, ad cosider a collectio of real umbers a s, s S. Let S + (respectively, S ) be the set of idices s for which a s 0 (respectively, a s < 0). Let S + = s S + a s ad S = s S a s. We distiguish four cases. (a) If both S + ad S are fiite (or equivaletly, if s S a s <, we say that the sum s S a s is absolutely coverget, ad is equal to S + S. I this case, for every possible arragemet of the elemets of S i a sequece {s }, we have lim a si = S + S. i=1 (b) If S + = ad S <, the sum s S a s is ot absolutely coverget; we defie it to be equal to. I this case, for every possible arragemet of the elemets of S i a sequece {s }, we have lim a si =. i=1 10

(c) If S + < ad S =, the sum s S a s is ot absolutely coverget; we defie it to be equal to. I this case, for every possible arragemet of the elemets of S i a sequece {s }, we have lim a si =. i=1 (d) If S + = ad S =, the sum s S a is left udefied. I fact, i this case, differet arragemets of the elemets of S i a sequece {s } will result ito differet or eve oexistet values of the limit lim a si. i=1 To summarize, we cosider a coutable sum to be well defied i cases (a) (c), ad call it absolutely coverget oly i case (a). We close by recordig a related useful fact. If we have a doubly idexed family of oegative umbers a ij, i, j N, ad if either (i) the umbers are oegative, or (ii) the sum (i,j) a ij is absolutely coverget, the a ij = a ij = a ij. (1) i=1 j=1 j=1 i=1 (i,j) N 2 More importat, we stress that the first equality eed ot hold i the absece of coditios (i) or (ii) above. 4.2 Defiitio of the expectatio The PMF of a radom variable X provides us with several umbers, the probabilities of all the possible values of X. It is ofte desirable to summarize this iformatio i a sigle represetative umber. This is accomplished by the expectatio of X, which is a weighted (i proportio to probabilities) average of the possible values of X. As motivatio, suppose you spi a wheel of fortue may times. At each spi, oe of the umbers m 1, m 2,..., m comes up with correspodig probability p 1, p 2,..., p, ad this is your moetary reward from that spi. What is the amout of moey that you expect to get per spi? The terms expect ad per spi are a little ambiguous, but here is a reasoable iterpretatio. 11

Suppose that you spi the wheel k times, ad that k i is the umber of times that the outcome is m i. The, the total amout received is m 1 k 1 + m 2 k 2 + + m k. The amout received per spi is m 1 k 1 + m 2 k 2 + + m k M =. k If the umber of spis k is very large, ad if we are willig to iterpret probabilities as relative frequecies, it is reasoable to aticipate that m i comes up a fractio of times that is roughly equal to p i : k i pi, i = 1,...,. k Thus, the amout of moey per spi that you expect to receive is M = m 1k 1 + m 2 k 2 + + m k m1 p 1 + m 2 p 2 + + m p. k Motivated by this example, we itroduce the followig defiitio. Defiitio 2. (Expectatio) We defie the expected value (also called the expectatio or the mea) of a discrete radom variable X, with PMF p X, as E[X] = xp X (x), x wheever the sum is well defied, ad where the sum is take over the coutable set of values i the rage of X. 4.3 Properties of the expectatio We start by poitig out a alterative formula for the expectatio, ad leave its proof as a exercise. I particular, if X ca oly take oegative iteger values, the E[X] = P(X > ). (2) 0 Example. Usig this formula, it is easy to give a example of a radom variable for d which the expected value is ifiite. Cosider X = Pow(α), where α 1. The, it ca be verified, usig the fact =1 1/ =, that E[X] = 1 0 α =. O the other had, if α > 1, the E[X] <. 12

Here is aother useful fact, whose proof is agai left as a exercise. Propositio 3. Give a discrete radom variable X ad a fuctio g : R R, we have E[g(X)] = g(x)p X (x). (3) {x p X (x)>0} More geerally, this formula remais valid give a vector X = (X 1,..., X ) of radom variables with joit PMF p X = p X1,...,X, ad a fuctio g : R R. For example, suppose that X is a discrete radom variable ad cosider the fuctio g : R R defied by g(x) = x 2. Let Y = g(x). I order to calculate the expectatio E[Y ] accordig to Defiitio 1, we eed to first fid the PMF of Y, ad the use the formula E[Y ] = y yp Y (y). However, accordig to Propositio 3, we ca work directly with the PMF of X, ad write E[Y ] = E[X 2 ] = x x 2 p X (x). The quatity E[X 2 ] is called the secod momet of X. More geerally, if [( r N, the ) quatity ] E[X r ] is called the rth momet of X. Furthermore, E X E[X] r is called the rth cetral momet of X. The secod cetral momet, E[(X E[X]) 2 ] is called the variace of X, ad is deoted by var(x). The square root of the variace is called the stadard deviatio of X, ad is ofte deoted by σ X, or just σ. Note, that for every eve r, the rth momet ad the rth cetral momet are always oegative; i particular, the stadard deviatio is always well defied. We cotiue with a few more importat properties of expectatios. I the sequel, otatios such as X 0 or X = c mea that X(ω) 0 or X(ω) = c, respectively, for all ω Ω. Similarly, a statemet such as X 0, almost surely or X 0, a.s., meas that P(X 0) = 1. 13

Propositio 4. Let X ad Y be discrete radom variables defied o the same probability space. (a) If X 0, a.s., the E[X] 0. (b) If X = c, a.s., for some costat c R, the E[X] = c. (c) For ay a, b R, we have E[aX + by ] = ae[x] + be[y ] (as log as the sum ae[x] + be[y ] is well-defied). (d) If E[X] is fiite, the var(x) = E[X 2 ] (E[X]) 2. (e) For every a R, we have var(ax) = a 2 var(x). (f) If X ad Y are idepedet, the E[XY ] = E[X]E[Y ], ad var(x + Y ) = var(x)+var(y ), provided all of the expectatios ivolved are well defied. (g) More geerally, if X 1,..., X are idepedet, the [ ] E X i = E[X i ], i=1 i=1 ad ( ) var X i = var(x i ), i=1 i=1 as log as all expectatios ivolved are well defied. Remark: We emphasize that property (c) does ot require idepedece. Proof: We oly give the proof for the case where all expectatios ivolved are well defied ad fiite, ad leave it to the reader to verify that the results exted to the case where all expectatios ivolved are well defied but possibly ifiite. Parts (a) ad (b) are immediate cosequeces of the defiitios. For part (c), we use the secod part of Propositio 3, ad the Eq. (1), we obtai E[aX + by ] = (ax + by)p X,Y (x, y) x,y ( ) ( ) = ax p X,Y (x, y) + by p X,Y (x, y) x y y x = a xp X (x) + b yp Y (y) x = ae[x] + be[y ]. y 14

For part (d), we have [ ] var(x) = E X 2 2XE[X] + (E[X]) 2 [ ] = E[X 2 ] 2E XE[X] + (E[X]) 2 = E[X 2 ] 2(E[X]) 2 + (E[X]) 2 = E[X 2 ] (E[X]) 2. where the secod equality made use of property (c). Part (e) follows easily from (d) ad (c). For part (f), we apply Propositio 3 ad the use idepedece to obtai E[XY ] = xyp X,Y (x, y) x,y = xyp X (x)p Y (y) ( x,y )( ) = xp X (x) p Y (y) x = E[X] E[Y ]. Furthermore, usig property (d), we have var(x + Y ) = E[(X + Y ) 2 ] (E[X + Y ]) 2 = E[X 2 ] + E[Y 2 ] + 2E[XY ] (E[X]) 2 (E[Y ]) 2 2E[X]E[Y ]. Usig the equality E[XY ] = E[X]E[Y ], the above expressio becomes var(x)+ var(y ). The proof of part (g) is similar ad is omitted. Remark: The equalities i part (f) eed ot hold i the absece of idepedece. For example, cosider a radom variable X that takes either value 1 or 1, with probability 1/2. The, E[X] = ( 0, but ) E[X 2 ] = 1. If we let Y = X, we see that E[XY ] = E[X 2 ] = 1 = 0 = E[X]. Furthermore, var(x + Y ) = var(2x) = 4var(X), while var(x) + var(y ) = 2. Exercise 2. Show that var(x) = 0 if ad oly if there exists a costat c such that P(X = c) = 1. y 15

MIT OpeCourseWare http://ocw.mit.edu 6.436J / 15.085J Fudametals of Probability Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.