7. Multivariate Probability

Size: px

Start display at page:

Download "7. Multivariate Probability"

Mervyn Leonard
6 years ago
Views:

1 7. Multvarate Probablty Chrs Pech and Mehran Saham May 2017 Often you wll work on problems where there are several random varables (often nteractng wth one another). We are gong to start to formally look at how those nteractons play out. For now we wll thnk of jont probabltes wth two random varables X and Y. 1 Dscrete Jont Dstrbutons In the dscrete case a jont probablty mass functon tells you the probablty of any combnaton of events X = a and Y = b: p X,Y (a,b) = P(X = a,y = b) Ths functon tells you the probablty of all combnatons of events (the, means and ). If you want to back calculate the probablty of an event only for one varable you can calculate a margnal from the jont probablty mass functon: p X (a) = P(X = a) = P X,Y (a,y) y p Y (b) = P(Y = b) = P X,Y (x,b) x In the contnuous case a jont probablty densty functon tells you the relatve probablty of any combnaton of events X = a and Y = y. In the dscrete case, we can defne the functon p X,Y non-parametrcally. Instead of usng a formula for p we smply state the probablty of each possble outcome. 2 Contnuous Jont Dstrbutons Random varables X and Y are Jontly Contnuous f there exsts a Probablty Densty Functon (PDF) f X,Y such that: P(a 1 < X a 2,b 1 < Y b 2 ) = a2 b2 a 1 b 1 f X,Y (x,y)dy dx Usng the PDF we can compute margnal probablty denstes: f X (a) = f Y (b) = f X,Y (a,y)dy f X,Y (x,b)dx Let F(a,b) be the Cumulatve Densty Functon (CDF): P(a 1 < X a 2,b 1 < Y b 2 ) = F(a 2,b 2 ) F(a 1,b 2 ) + F(a 1,b 1 ) F(a 2,b 1 ) 1

2 3 Multnomal Dstrbuton Say you perform n ndependent trals of an experment where each tral results n one of m outcomes, wth respectve probabltes: p 1, p 2,..., p m (constraned so that p = 1). Defne X to be the number of trals wth outcome. A multnomal dstrbuton s a closed form functon that answers the queston: What s the probablty that there are c trals wth outcome. Mathematcally: ( ) n P(X 1 = c 1,X 2 = c 2,...,X m = c m ) = p c 1 1 c 1,c 2,...,c pc pc m m m Example 1 A -sded de s rolled 7 tmes. What s the probablty that you roll: 1 one, 1 two, 0 threes, 2 fours, 0 fves, 3 sxes (dsregardng order). P(X 1 = 1,X 2 = 1,X 3 = 0,X 4 = 2,X 5 = 0,X = 3) = 7! 2!3! ( ) 1 7 = 420 ( 1 ) 1 ( 1 ) 1 ( 1 ) 0 ( 1 ) 2 ( 1 ) 0 ( ) 1 3 Fedaralst Papers In class we wrote a program to decde whether or not James Madson or Alexander Hamlton wrote Fedaralst Paper 49. Both men have clamed to be have wrtten t, and hence the authorshp s n dspute. Frst we used hstorcal essays to estmate p, the probablty that Hamlton generates the word (ndependent of all prevous and future choces or words). Smlarly we estmated q, the probablty that Madson generates the word. For each word we observe the number of tmes that word occurs n Fedaralst Paper 49 (we call that count c ). We assume that, gven no evdence, the paper s equally lkely to be wrtten by Madson or Hamlton. Defne three events: H s the event that Hamlton wrote the paper, M s the event that Madson wrote the paper, and D s the event that a paper has the collecton of words observed n Fedaralst Paper 49. We would lke to know whether P(H D) s larger than P(M D). Ths s equvalent to tryng to decde f P(H D)/P(M D) s larger than 1. The event D H s a multnomal parameterzed by the values p. The event D M s also a multnomal, ths tme parameterzed by the values q. Usng Bayes Rule we can smplfy the desred probablty. P(H D) P(M D) = = P(D H)P(H) P(D) P(D M)P(M) P(D) ( n ) c 1,c 2,...,c m p c ( n ) c 1,c 2,...,c m q c = P(D H)P(H) P(D M)P(M) = P(D H) P(D M) = p c q c Ths seems great! We have our desred probablty statement expressed n terms of a product of values we have already estmated. However, when we plug ths nto a computer, both the numerator and denomnator come out to be zero. The product of many numbers close to zero s too hard for a computer to represent. To fx ths problem, we use a standard trck n computatonal probablty: we apply a log to both sdes and apply 2

3 some basc rules of logs. ( P(H D) ) ( p c ) log = log P(M D) q c = log( p c = log(p c = ) log( ) q c ) log(q c ) c log(p ) c log(q ) Ths expresson s numercally stable and my computer returned that the answer was a negatve number. We can use exponentaton to solve for P(H D)/P(M D). Snce the exponent of a negatve number s a number smaller than 1, ths mples that P(H D)/P(M D) s smaller than 1. As a result, we conclude that Madson was more lkely to have wrtten Federalst Paper Expectaton wth Multple RVs Expectaton over a jont sn t ncely defned because t s not clear how to compose the multple varables. However, expectatons over functons of random varables (for example sums or multplcatons) are ncely defned: E[g(X,Y )] = x,y g(x,y)p(x,y) for any functon g(x,y ). When you expand that result for the functon g(x,y ) = X +Y you get a beautful result: E[X +Y ] = E[g(X,Y )] = g(x,y)p(x,y) = [x + y]p(x,y) x,y x,y = x,y = x = x xp(x,y) + yp(x, y) x,y x y p(x,y) + y xp(x) + yp(y) y = E[X] + E[Y ] Ths can be generalzed to multple varables: E [ n =1X ] = n =1 E[X ] y p(x, y) x Expectatons of Products Lemma Unfortunately the expectaton of the product of two random varables only has a nce decomposton n the case where the random varables are ndependent of one another. E[g(X)h(Y )] = E[g(X)]E[h(Y )] f and only f X and Y are ndependent Example 3 A dsk surface s a crcle of radus R. A sngle pont mperfecton s unformly dstrbuted on the dsk wth jont PDF: { 1 f x 2 + y 2 R 2 f X,Y (x,y) = πr 2 0 else Let D be the dstance from the orgn: D = X 2 +Y 2. What s E[D]? Hnt: use the lemmas 3

4 5 Independence wth Multple RVs Dscrete Two dscrete random varables X and Y are called ndependent f: P(X = x,y = y) = P(X = x)p(y = y) for all x,y Intutvely: knowng the value of X tells us nothng about the dstrbuton of Y. If two varables are not ndependent, they are called dependent. Ths s a smlar conceptually to ndependent events, but we are dealng wth multple varables. Make sure to keep your events and varables dstnct. Contnuous Two contnuous random varables X and Y are called ndependent f: P(X a,y b) = P(X a)p(y b) for all a,b Ths can be stated equvalently as: F X,Y (a,b) = F X (a)f Y (b) for all a,b f X,Y (a,b) = f X (a) f Y (b) for all a,b More generally, f you can factor the jont densty functon then your contnuous random varable are ndependent: f X,Y (x,y) = h(x)g(y) where < x,y < Example 2 Let N be the # of requests to a web server/day and that N Po(λ). Each request comes from a human (probablty = p) or from a bot (probablty = (1 p)), ndependently. Defne X to be the # of requests from humans/day and Y to be the # of requests from bots/day. Snce requests come n ndependently, the probablty of X condtoned on knowng the number of requests s a Bnomal. Specfcally: (X N) Bn(N, p) (Y N) Bn(N,1 p) Calculate the probablty of gettng exactly human requests and j bot requests. Start by expandng usng the chan rule: P(X =,Y = j) = P(X =,Y = j X +Y = + j)p(x +Y = + j) We can calculate each term n ths expresson: ( ) + j P(X =,Y = j X +Y = + j) = p (1 p) j P(X +Y = + j) = e λ λ + j ( + j)! Now we can put those together and smplfy: ( + j P(X =,Y = j) = )p (1 p) j λ λ + j e ( + j)! As an exercse you can smplfy ths expresson nto two ndependent Posson dstrbutons. 4

5 Symmetry of Independence Independence s symmetrc. That means that f random varables X and Y are ndependent, X s ndependent of Y and Y s ndependent of X. Ths clam may seem meanngless but t can be very useful. Imagne a sequence of events X 1,X 2,... Let A be the event that X s a record value (eg t s larger than all prevous values). Is A n+1 ndependent of A n? It s easer to answer that A n s ndependent of A n+1. By symmetry of ndependence both clams must be true. Convoluton of Dstrbutons Convoluton s the result of addng two dfferent random varables together. For some partcular random varables computng convoluton has ntutve closed form equatons. Importantly convoluton s the sum of the random varables themselves, not the addton of the probablty densty functons (PDF)s that correspond to the random varables. Independent Bnomals wth equal p For any two Bnomal random varables wth the same success probablty: X Bn(n 1, p) and Y Bn(n 2, p) the sum of those two random varables s another bnomal: X +Y Bn(n 1 + n 2, p). Ths does not hold when the two dstrbuton have dfferent parameters p. Independent Possons For any two Posson random varables: X Po(λ 1 ) and Y Po(λ 2 ) the sum of those two random varables s another Posson: X +Y Po(λ 1 + λ 2 ). Ths holds when λ 1 s not the same as λ 2. Independent Normals For any two normal random varables X N (µ 1,σ1 2) and Y N (µ 2,σ2 2 ) the sum of those two random varables s another normal: X +Y N (µ 1 + µ 2,σ1 2 + σ 2 2). General Independent Case For two general ndependent random varables (aka cases of ndependent random varables that don t ft the above specal stuatons) you can calculate the CDF or the PDF of the sum of two random varables usng the followng formulas: F X+Y (a) = P(X +Y a) = F X (a y) f Y (y)dy y= f X+Y (a) = f X (a y) f Y (y)dy y= There are drect analoges n the dscrete case where you replace the ntegrals wth sums and change notaton for CDF and PDF. 5

6 Example 1 Calculate the PDF of X +Y for ndependent unform random varables X Un(0,1) and Y Un(0,1)? Frst plug n the equaton for general convoluton of ndependent random varables: 1 f X+Y (a) = f X+Y (a) = y=0 1 y=0 f X (a y) f Y (y)dy f X (a y)dy Because f Y (y) = 1 It turns out that s not the easest thng to ntegrate. By tryng a few dfferent values of a n the range [0,2] we can observe that the PDF we are tryng to calculate s dscontnuous at the pont a = 1 and thus wll be easer to thnk about as two cases: a < 1 and a > 1. If we calculate f X+Y for both cases and correctly constran the bounds of the ntegral we get smple closed forms for each case: a f 0 < a 1 f X+Y (a) = 2 a f 1 < a 2 0 else 7 Condtonal Dstrbutons Before we looked at condtonal probabltes for events. Here we formally go over condtonal probabltes for random varables. The equatons for both the dscrete and contnuous case are ntutve extensons of our understandng of condtonal probablty: Dscrete The condtonal probablty mass functon (PMF) for the dscrete case: p X Y (x y) = P(X = x Y = y) = P(X = x,y = y) P(Y = y) = P X,Y (x,y) p Y (y) The condtonal cumulatve densty functon (CDF) for the dscrete case: F X Y (a y) = P(X a Y = y) = x a p X,Y (x,y) p Y (y) = p X Y (x y) x a Contnuous The condtonal probablty densty functon (PDF) for the contnuous case: f X Y (x y) = f X,Y (x,y) f Y (y) The condtonal cumulatve densty functon (CDF) for the contnuous case: F X Y (a y) = P(X a Y = y) = a f X Y (x y)dx Mxng Dscrete and Contnuous These equatons are straghtforward once you have your head around the notaton for probablty densty functons ( f X (x)) and probablty mass functons (p X (x)).

7 Let X be contnuous random varable and let N be a dscrete random varable. The condtonal probabltes of X gven N and N gven X respectvely are: f X N (x n) = p N X(n x) f X (x) p N (n) p N X (n x) = f X N(x n)p N (n) f X (x) Example 2 Let s say we have two ndependent random Posson varables for requests receved at a web server n a day: X = # requests from humans/day, X Po(λ 1 ) and Y = # requests from bots/day, Y Po(λ 2 ). Snce the convoluton of Posson random varables s also a Posson we know that the total number of requests (X +Y ) s also a Posson (X +Y ) Po(λ 1 + λ 2 ). What s the probablty of havng k human requests on a partcular day gven that there were n total requests? P(X = k X +Y = n) = P(X = k,y = n k) P(X +Y = n) = P(X = k)p(y = n k) P(X +Y = n) = e λ 1λ1 k e λ 2λ2 n k k! (n k)! n! e 1(λ 1+λ 2 ) (λ 1 + λ 2 ) n ( )( ) n k ( ) n k λ1 λ2 = k λ 1 + λ 2 λ 1 + λ 2 ( ) λ 2 Bn n, λ 1 + λ 2 8 Covarance and Correlaton Consder the two multvarate dstrbutons shown bellow. In both mages I have plotted one thousand samples drawn from the underlyng jont dstrbuton. Clearly the two dstrbutons are dfferent. However, the mean and varance are the same n both the x and the y dmenson. What s dfferent? Covarance s a quanttatve measure of the extent to whch the devaton of one varable from ts mean matches the devaton of the other from ts mean. It s a mathematcal relatonshp that s defned as: Cov(X,Y ) = E[(X E[X])(Y E[Y ])] That s a lttle hard to wrap your mnd around (but worth pushng on a bt). The outer expectaton wll be a weghted sum of the nner functon evaluated at a partcular (x,y) weghted by the probablty of (x,y). If x and y are both above ther respectve means, or f x and y are both below ther respectve means, that term wll be postve. If one s above ts mean and the other s below, the term s negatve. If the weghted sum of terms s postve, the two random varables wll have a postve correlaton. We can rewrte the above equaton to 7

8 get an equvalent equaton: Cov(X,Y ) = E[XY ] E[Y ]E[X] Usng ths equaton (and the product lemma) s t easy to see that f two random varables are ndependent ther covarance s 0. The reverse s not true n general. Propertes of Covarance Say that X and Y are arbtrary random varables: Cov(X,Y ) = Cov(Y,X) Cov(X,X) = E[X 2 ] E[X]E[X] = Var(X) Cov(aX + b,y ) = acov(x,y ) Let X = X 1 + X X n and let Y = Y 1 +Y 2 + +Y m. The covarance of X and Y s: Cov(X,Y ) = n m =1 j=1 Cov(X,X) = Var(X) = Cov(X,Y j ) n n =1 j=1 Cov(X,X j ) That last property gves us a thrd way to calculate varance. You could use ths defnton to calculate the varance of the bnomal. Correlaton Covarance s nterestng because t s a quanttatve measurement of the relatonshp between two varables. Correlaton between two random varables, ρ(x,y ) s the covarance of the two varables normalzed by the varance of each varable. Ths normalzaton cancels the unts out and normalzes the measure so that t s always n the range [0, 1]: ρ(x,y ) = Cov(X,Y ) Var(X)Var(Y ) Correlaton measure lnearty between X and Y. ρ(x,y ) = 1 ρ(x,y ) = 1 ρ(x,y ) = 0 Y = ax + b where a = σ y /σ x Y = ax + b where a = σ y /σ x absence of lnear relatonshp If ρ(x,y ) = 0 we say that X and Y are uncorrelated. If two varables are ndependent, then ther correlaton wll be 0. However, t doesn t go the other way. A correlaton of 0 does not mply ndependence. When people use the term correlaton, they are actually referrng to a specfc type of correlaton called Pearson correlaton. It measures the degree to whch there s a lnear relatonshp between the two varables. An alternatve measure s Spearman correlaton whch has a formula almost dentcal to your regular correlaton score, wth the excepton that the underlyng random varables are frst transformed nto ther rank. Spearman correlaton s outsde the scope of CS109. 8

7. Multivariate Probability

7. Multivariate Probability 7. Multvarate Probablty Chrs Pech and Mehran Saham Oct 2017 Often you wll work on problems where there are several random varables (often nteractng wth one another). We are gong to start to formally look