Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Lecture 16. Multiple Random Variables and Applications to Inference

Size: px

Start display at page:

Download "Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Lecture 16. Multiple Random Variables and Applications to Inference"

Teresa Walton
5 years ago
Views:

1 CS 70 Discrete Mathematics ad Probability Theory Fall 2009 Satish Rao,David Tse Lecture 16 Multiple Radom Variables ad Applicatios to Iferece I may probability problems, we have to deal with multiple r.v. s defied o the same probability space. We have already see examples of that whe we saw, for example, that computig the expectatio ad variace of a biomial r.v. X, it is easier to write it as a sum X i1 X i where X i represets the result of the i th trial. I iferece problems, where we observe certai quatities ad use the iformatio to ifer about other hidde quatities, multiple r.v. s arise aturally i the modelig of the situatio. We will see some examples of such problems after we go through some of the basics i the hadlig of multiple r.v. s. Joit Distributios Cosider two radom variables X ad Y defied o the same probability space. By liearity of expectatio, we kow that E(X +Y ) E(X) + E(Y ). Sice E(X) ca be calculated if we kow the distributio of X ad E(Y) ca be calculated if we kow the distributio of Y, this meas that E(X +Y ) ca be computed kowig oly the two idividual distributios. No iformatio is eeded about the relatioship betwee X ad Y. This is ot true if we eed to compute, say, E((X +Y) 2 ), e.g. as whe we computed the variace of a biomial r.v. This is because E((X +Y ) 2 ) E(X 2 ) + 2E(XY) + E(Y 2 ), ad E(XY) depeds o the relatioship betwee X ad Y. How ca we capture such a relatioship? Recall that the distributio of a sigle radom variable X is the collectio of the probabilities of all evets X a, for all possible values of a that X ca take o. Whe we have two radom variables X ad Y, we ca thik of (X,Y ) as a two-dimesioal radom variable, i which case the evets of iterest are X a Y b for all possible values of (a,b) that (X,Y) ca take o. Thus, a atural geeralizatio of the otio of distributio to multiple radom variables is the followig. Defiitio 16.1 (joit distributio): The joit distributio of two discrete radom variables X ad Y is the collectio of values {(a,b,pr[x a Y b]) : (a,b) A B}, where A ad B are the sets of all possible values take by X ad Y respectively. This otio obviously geeralizes to three or more radom variables. Sice we will write Pr[X a Y b] quite ofte, we will abbreviate it to Pr[X a,y b]. Just like the distributio of a sigle radom variable, the joit distributio is ormalized, i.e. Pr[X a,y b] 1.,b B This follows from oticig that the evets X a Y b, a A,b B, partitio the sample space. The joit distributio betwee two radom variables fully describe their statistical relatioships, ad provides eough iformatio for computig ay probabilities ad expectatios ivolvig the two radom variables. For example, E(XY) c c Pr[XY c] a ab Pr[X a,y b]. b CS 70, Fall 2009, Lecture 16 1

2 More geerally, if f is ay fuctio o R R, Figure 1: A tabular represetatio of a joit distributio. E( f (X,Y)) c c Pr[ f (X,Y ) c] f (a,b) Pr[X a,y b]. a b Moreover, the idividual distributios of X ad Y ca be recovered from the joit distributio as follows: Pr[X a] Pr[X a,y b] a A, (1) b B Pr[Y b] Pr[X a,y b] b B. (2) The first follows from the fact that the evets Y b, b B, form a partitio of the sample space Ω, ad so the evets X a Y b, b B are disjoit ad their uio yields the evet X a. Similar logic applies to the secod fact. Pictorially, oe ca thik of the joit distributio values as etries fillig a table, with the colums idexed by the values that X ca take o ad the rows idexed by the values Y ca take o (Figure 1). To get the distributio of X, all oe eeds to do is to sum the etries i each of the colums. To get the distributio of Y, just sum the etries i each of the rows. This process is sometimes called margializatio ad the idividual distributios are sometimes called margial distributios to differetiate them from the joit distributio. Idepedet Radom Variables Idepedece for radom variables is defied i aalogous fashio to idepedece for evets: Defiitio 16.2 (idepedet r.v. s): Radom variables X ad Y o the same probability space are said to be idepedet if the evets X a ad Y b are idepedet for all values a,b. Equivaletly, the joit distributio of idepedet r.v. s decomposes as Pr[X a,y b] Pr[X a]pr[y b] a,b. Note that for idepedet r.v. s, the joit distributio is fully specified by the margial distributios. Mutually idepedece of more tha two r.v. s is defied similarly. A very importat example of idepedet r.v. s is idicator r.v. s for idepedet evets. Thus, for example, if {X i } are idicator r.v. s for the ith coi toss beig Heads (as i example 2 i the last lecture ote) the the X i are mutually idepedet r.v. s. We saw that the expectatio of a sum of r.v. s is the sum of the expectatios of the idividual r.v. s. This is ot true i geeral for variace. However, it turs out to be true if the radom variables are idepedet. To CS 70, Fall 2009, Lecture 16 2

3 see that, first we look at the expectatio of a product of idepedet r.v. s (which is a quatity that frequetly shows up i variace calculatios, as we have see). Theorem 16.1: For idepedet radom variables X,Y, we have E(XY) E(X)E(Y). Proof: We have E(XY) a a ( ab Pr[X a,y b] b ab Pr[X a] Pr[Y b] b a Pr[X a] a E(X) E(Y), ) ( b Pr[Y b] b as required. I the secod lie here we made crucial use of idepedece. For example, this theorem would have allowed us to coclude immediately i our radom walk example at the begiig of the last lecture ote that E(X i X j ) E(X i )E(X j ) 0, without the eed for a calculatio. We ow use the above theorem to coclude a ice property of the variace of idepedet radom variables. Theorem 16.2: For idepedet radom variables X,Y, we have Var(X +Y) Var(X) + Var(Y). Proof: From the alterative formula for variace i Theorem 15.1, we have, usig liearity of expectatio extesively, Var(X +Y) E((X +Y) 2 ) E(X +Y) 2 E(X 2 ) + E(Y 2 ) + 2E(XY) (E(X) + E(Y)) 2 (E(X 2 ) E(X) 2 ) + (E(Y 2 ) E(Y) 2 ) + 2(E(XY) E(X)E(Y)) Var(X) + Var(Y ) + 2(E(XY) E(X)E(Y)). Now because X,Y are idepedet, by Theorem 16.1 the fial term i this expressio is zero. Hece we get our result. Note: The expressio E(XY) E(X)E(Y) appearig i the above proof is called the covariace of X ad Y, ad is a measure of the depedece betwee X,Y. It is zero whe X,Y are idepedet. Theorem 16.2 ca be used to simplify several of our variace calculatios i the last lecture. E.g., i example 1 of the last lecture ote, sice the X i are idepedet r.v. s with Var(X i ) 1 for each i, we have Var(X) Var( i1 X i) i1 Var(X i) 1. Ad i example 2 the X i are idepedet with Var(X i ) p(1 p), so we have Var(X) Var( i1 X i) i1 Var(X i) p(1 p). Note, however, that we do t get ay simplificatio i example 4 because the X i are ot idepedet. It is very importat to remember that either Theorem 16.1 or Theorem 16.2 is true i geeral, without the assumptio that X,Y are idepedet. As a simple example, ote that eve for a 0-1 r.v. X with Pr[X 1] p, E(X 2 ) p is ot equal to E(X) 2 p 2 (because of course X ad X are ot idepedet!). Note also that Theorem 16.2 does ot quite say that variace is liear for idepedet radom variables: it says oly that variaces sum. It is ot true that Var(cX) cvar(x) for a costat c. I fact, the followig is true: Theorem 16.3: For ay radom variable X ad costat c, Var(cX) c 2 Var(X). The proof is left as a straightforward exercise. ) CS 70, Fall 2009, Lecture 16 3

4 Coditioal Distributio ad Expectatio I a earlier lecture, we discussed the cocept of coditioal probability of a evet A give a evet B. This cocept allows us to defie a otio of coditioal distributio of a radom variable give aother radom variable. Defiitio 16.3 (coditioal distributio): The coditioal distributio of X give Y b is the collectio of values {(a,pr[x a Y b]) : a A }, where A is the set of all possible values take by X. The coditioal distributio ca be calculated from the joit ad margial distributios : It follows from eq. (2) that Pr[X a Y b] Pr[X a,y b]. Pr[Y b] Pr[X a Y b] 1, so the coditioal distributio is ormalized, just like a (ucoditioal) distributio. Note that if X ad Y are idepedet r.v. s, Pr[X a Y b] Pr[X a] for every a,b, i.e. the coditioal ad ucoditioal distributios of X are the same. Oe ca also aturally talk about the coditioal distributio of multiple radom variables. For example, the coditioal distributio of X ad Y give Z c is simply give by {(a,b,pr[x a,y b Z c]) : a A,b B}. Coditioal distributios have exactly the same properties as (ucoditioal) distributios. Therefore whatever we do with distributios we ca do with coditioal distributios. For example, oe ca compute expectatios of coditioal distributios. This leads to the cocept of coditioal expectatio. Defiitio 16.4 (coditioal expectatio): Let X ad Y be two r.v. s defied o the same probability space. The coditioal expectatio of X give Y b is defied to be: E(X Y b) : a Pr[X a Y b]. Coditioal probabilities ofte help us to calculate probabilities of evet by meas of the total probability law. Similarly, coditioal expectatios are ofte useful to compute expectatios via the total expectatio law: E(X) apr[x a] b B a Pr[Y b]pr[x a Y b] b B Pr[Y b] apr[x a Y b] Pr[Y b] E(X Y b). b B This formula is quite ituitive: to calculate the expectatio of the r.v. X, first calculate the coditioal expectatio of X give each of the various values of Y. The sum them, weighted by the probabilities Y takes o the various values. The formula is aother istatiatio of the divide ito cases strategy. CS 70, Fall 2009, Lecture 16 4

5 For example, suppose i the coi flippig example above, we are iterested i calculatig the expectatio of the umber of flips (say Z) of the radomly chose coi util we see the first Head. Usig the total expectatio rule, E(Z) i]e(z X i) i1pr[x 1 1. i1 p i The last step follows from the fact that coditioal o the idetity of the coi X i, Z is geometrically distributed with parameter p i. For a slightly more iterestig example, let us use the total expectatio law to give a alterative way of computig the expectatio of a geometrically distributed r.v. X. (I lecture ote 14, we did it oe way.) Recall that X is the umber of idepedet trials util we get our first success. Let Y be the idicator r.v. of the evet that the first trial is successful. Usig the total expectatio law, E(X) Pr[Y 1]E(X Y 1) + Pr[Y 0]E(X Y 0) pe(x Y 1) + (1 p)e(x Y 0). (3) Now, if Y 1, the first trial is already successful, ad X 1 with probability 1. Hece, E(X Y 1) 1. What about if Y 0? If the first trial is usuccessful, we are back to square oe ad have to cotiue tryig. Hece the umber of additioal trials after the first trial is aother geometric r.v. with the same parameter p, ad E(X Y 0) 1 + E(X). Substitutig ito (3), we get: Upo solvig this equatio, we get E(X) 1 p. E(X) p + (1 p){1 + E(X)}. It is iterestig to see that while liearity of expectatio was a useful tool to compute the expectatio of a biomially distributed r.v., the total expectatio rule is a more atural tool to compute the expectatio of a geometrically distributed r.v. Both are tools that allow us to compute expectatios without directly computig distributios. Iferece Oe of the major uses of probability is to provide a systematic framework to perform iferece uder ucertaity. A few specific applicatios are: commuicatios: Iformatio bits are set over a oisy physical chael (wireless, DSL phoe lie, etc.). From the received symbols, oe wats to make a decisio about what bits are trasmitted. cotrol: A spacecraft eeds to be laded o the moo. From oisy measuremets by motio sesors, oe wats to estimate the curret positio of the spacecraft relative to the moo surface so that appropriate cotrols ca be applied. object recogitio: From a image cotaiig a object, oe wats to recogize what type of object it is. speech recogitio: From hearig oisy utteraces, oe wats to recogize what is beig said. ivestig: By observig past performace of a stock, oe wats to estimate its itrisic quality ad hece make a decisio o whether ad how much to ivest i it. All of the above problems ca be modeled with the followig igrediets: CS 70, Fall 2009, Lecture 16 5

6 a radom variable X represetig the hidde quatity ot directly observed but i which oe is iterested. X ca be the value of a iformatio bit i a commuicatio sceario, positio of the spacecraft i the cotrol applicatio, or the object class i the recogitio problem. radom variables Y 1,Y 2,...Y represetig the observatios. They may be the outputs of a oisy chael at differet times, pixel values of a image, values of the stocks o successive days, etc. The distributio of X, called the prior distributio. This ca be iterpreted as the kowledge about X before seeig the observatios. The coditioal distributio of Y 1,...Y give X. This models the oise or radomess i the observatios. Sice the observatios are oisy, there is i geeral o hope of kowig what the exact value of X is give the observatios. Istead, all kowledge about X ca be summarized by the coditioal distributio of X give the observatios. We do t kow what the exact value of X is, but the coditioal distributio tells us what values of X is more likely ad which are less likely. Based o this iformatio, itelliget decisios ca be made. Iferece Example 1: Multi-arm Badits Questio: You walk ito a casio. There are several slot machies (badits). You kow some have odds very favorable to you, some have less favorable odds, ad some have very poor odds. However, you do t kow which are which. You start playig o some them, ad by observig the outcomes, you wat to lear which is which so that you ca itelligetly figure out which machie to play o (or ot play at all, which may be the most itelliget decisio.) Stripped-dow versio: Suppose there are biased cois. Coi i has probability p i of comig up Heads; however, you do t kow which is which. You radomly pick oe coi ad flip it. If the coi comes up Heads you wi $1, ad if it comes up Tails you lose $1. What is the probability of wiig? What is the probability of wiig o the ext flip give you have observed a Heads with this coi? Give you have observed two Heads i a row? Would you bet o the ext flip? Modelig usig Radom Variables Let X be the coi radomly chose, ad Y i be the idicator r.v. for the evet that the ith flip of this radomly chose coi comes up Head. Sice we do t kow which coi we have chose, X is the hidde quatity. The Y i s are the observatios. Predictig the First Flip The first questio asks for Pr[Y 1 1]. First we calculate the joit distributio of X ad Y 1 : Pr[X i,y 1 1] Pr[X i]pr[y 1 1 X i] p i (4) Applyig (2), we get: Pr[Y 1 1] j,y 1 1] j1pr[x 1 p j. (5) j1 CS 70, Fall 2009, Lecture 16 6

7 Note that combiig the above two equatios, we are i effect usig the fact that: Pr[Y 1 1] i1 Pr[X i]pr[y 1 1 X i]. (6) This is just the total probability rule for evets applied to radom variables. Oce you get familiar with this type of calculatio, you ca bypass the itermediate calculatio of the joit distributio ad directly write this dow. Predictig the Secod Flip after Observig the First Now, give that we observed Y 1 1, we are learig somethig about the radomly chose coi X. This kowledge is captured by the coditioal distributio: usig eqs. (4) ad (5). Pr[X i Y 1 1] Pr[X i,y 1 1] Pr[Y 1 1] p i j1 p, j Note that whe we substitute eq. (4) ito the above equatio, we are i effect usig: Pr[X i Y 1 1] Pr[X i]pr[y 1 1 X i]. Pr[Y 1 1] This is just Bayes rule for evets applied to radom variables. Just like for evets, this rule has the iterpretatio of updatig kowledge based o the observatio: {(i,pr[x i]),i 1,...,} is the prior distributio of the hidde X; {(i,pr[x i Y 1 1]) : i 1,...,} is the posterior distributio of X give the observatio. Bayes rule updates the prior distributio to yield the posterior distributio Now we ca calculate the probability of wiig usig this coi i the secod flip: Pr[Y 2 1 Y 1 1] Pr[X j Y 1 1]Pr[Y 2 1 X j,y 1 1]. (7) j1 This ca be iterpreted as the total probability rule (6) but i a ew probability space with all the probabilities uder the additioal coditio Y 1 1. You are asked to verify this formula from first priciples. Now let us calculate the various probabilities o the right had side of (7).The probability Pr[X j Y 1 1] is just the posterior distributio of X give the observatio. We have already calculated it. What about the probability Pr[Y 2 1 X j,y 1 1]? There are two coditioig evets: X j ad Y 1 1. But here is the thig: oce we kow that the ukow coi is coi j, the kowig the first flip is a Head is redudat ad provides o further statistical iformatio about the outcome of the secod flip: the probability of gettig a Heads o the secod flip is just p j. I other words, Pr[Y 2 1 X j,y 1 1] Pr[Y 2 1 X j] p j. (8) The evets Y 1 1 ad Y 2 1 are said to be idepedet coditioal o the evet X j. Sice i fact Y 1 a ad Y 2 b are idepedet give X j for all a,b, j, we will say that the radom variables Y 1 ad Y 2 are idepedet give the radom variable X. Defiitio 16.5 (Coditioal Idepedece): Two evets A ad B are said to be coditioally idepedet give a third evet C if Pr[A B C] Pr[A C] Pr[B C]. CS 70, Fall 2009, Lecture 16 7

8 Two radom variables X ad Y are said to be idepedet give a third radom variable Z if for every a, b, c, Pr[X a,y b Z c] Pr[X a Z c] Pr[Y b Z c]. Note that the r.v. s Y 1 ad Y 2 are ot idepedet. Kowig the outcome of Y 1 tells us some iformatio about the idetity of the coi (X) ad hece allows us to ifer somethig about Y 2. However, if we already kow X, the the outcomes of the differet flips Y 1 ad Y 2 are idepedet. Now substitutig (8) ito (7), we get the probability of wiig usig this coi i the secod flip: Pr[Y 2 1 Y 1 1] j1 Pr[X j Y 1 1]Pr[Y 2 1 X j] j1 p2 j j1 p. j Predictig the Third Flip After Observig the First Two Usig Bayes rule ad the total probability rule, we ca compute the posterior distributio of X give that we observed two Heads i a row: Pr[X j Y 1 1,Y 2 1] Pr[X j]pr[y 1 1,Y 2 1 X j] Pr[Y 1 1,Y 2 1] Pr[X j]pr[y 1 1,Y 2 1 X j] i1 Pr[X i]pr[y 1 1,Y 2 1 X i] Pr[X j]pr[y 1 1 X j]pr[y 2 1 X j] i1 Pr[X i]pr[y 1 1 X i]pr[y 2 1 X i] p 2 j i1 p2 i The probability of gettig a wi o the third flip usig the same coi is: Pr[Y 3 1 Y 1 1,Y 2 1] Pr[X j Y 1 1,Y 2 1]Pr[Y 3 1 X j,y 1 1,Y 2 1] j1 Pr[X j Y 1 1,Y 2 1]Pr[Y 3 1 X j] j1 j1 p3 j j1 p2 j Suppose 3 ad the three cois have biased probabilities p 1 2/3, p 2 1/2, p 3 1/5. The coditioal distributios of X after observig o flip, oe Heads ad two Heads i a row are show i Figure 2. Note that as more Heads are observed, the coditioal distributio is icreasigly cocetrated o coi 1 with p 1 2/3: we are icreasigly certai that the coi chose is the good coi. The correspodig probabilities of wiig o the ext flip after observig o flip, 1 Heads ad two Heads i a row are 0.46,0.54 ad 0.58 respectively. The coditioal probability of wiig gets better ad better. Iferece Example 2: Commuicatio over a Noisy Chael Questio: I have oe bit of iformatio that I wat to commuicate over a oisy chael. The oisy chael flips each oe of my trasmitted symbols idepedetly with probability p < 0.5. How much improvemet i performace do I get by repeatig my trasmissio times? CS 70, Fall 2009, Lecture 16 8

9 Figure 2: The coditioal distributios of X give o observatios, 1 Heads, ad 2 Heads. Figure 3: The system diagram for the commuicatio problem. Commet: I a earlier lecture ote, we also cosidered a commuicatio problem ad gave some examples of error-correctig codes. However, the models for the commuicatio chael are differet. There, we put a boud o the maximum umber of flips the chael ca make. Here, we do ot put such bouds a priori but istead imposes a probabilistic model. Sice there is o boud o the maximum umber of flips the chael ca make, there is o guaratee that the receiver will always decode correctly. Istead, oe has to be satisfied with beig able to decode correctly with high probability, eg., probability of error < Modelig The situatio is show i Figure 3. Let X ( 0 or 1) be the value of the iformatio bit I wat to trasmit. Assume that X is equally likely to be 0 or 1. The received symbol o the ith repetitio of X is Y i X + Z i mod 2, i 1,2,..., with Z i 1 with probability p ad Z i 0 with probability 1 p. Note that Y i is differet from X if ad oly if Z i 1. Thus, the trasmitted symbol is flipped with probability p. The Z i s are assumed to be mutually idepedet across differet repetitio of X ad also idepedet of X. The Z i s ca be iterpreted as oise. Note that the received symbols Y i s are ot idepedet; they all cotai iformatio about the trasmitted bit X. However, give X, they are idepedet sice they the oly deped o the oise Z i s. CS 70, Fall 2009, Lecture 16 9

10 Decisio Rule First, we have to figure out what decisio rule to use at the receiver, i.e. give each of the 2 possible received sequeces, Y 1 b 1,Y 2 b 2,...Y b, how should the receiver guess what value of X was trasmitted? A atural rule is the maximum a posteriori (MAP) rule: guess the value a for which the coditioal probability of X a give the observatios is the largest amog all a. More explicitly: { a 0 if Pr[X 0 Y1 b 1,...,Y b ] Pr[X 1 Y 1 b 1,...Y b ] 1 otherwise Now, let s make some simplificatios to this rule. By Bayes rule, Pr[X 0 Y 1 b 1,...Y b ] Pr[X 0]Pr[Y 1 b 1,...,Y b X 0] Pr[Y 1 b 1,...,Y b ] Pr[X 0]Pr[Y 1 b 1 X 0]Pr[Y 2 b 2 X 0]... Pr[Y b X 0] (10) Pr[Y 1 b 1,...,Y b ] I the secod step, we are usig the fact that the observatios Y i s are coditioally idepedet give X. (Why?) Similarly, Pr[X 1 Y 1 b 1,...Y b ] Pr[X 1]Pr[Y 1 b 1,...,Y b X 1] Pr[Y 1 b 1,...,Y b ] (9) (11) Pr[X 1]Pr[Y 1 b 1 X 1]Pr[Y 2 b 2 X 1]...Pr[Y b X 1] (12). Pr[Y 1 b 1,...,Y b ] A equivalet way of describig the MAP rule is that it computes the ratio of these coditioal probabilities ad checks if it is greater tha or less tha 1. If it is greater tha 1, the guess that a 0 was trasmitted; otherwise guess that a 1 was trasmitted. (This ratio idicates how likely a 0 is compared to a 1, ad is called the likelihood ratio.) Dividig (10) ad (12), the likelihood ratio L is: L i1 Pr[Y i b i X 0] Pr[Y i b i X 1]. (13) Note that we did t have to compute Pr[Y 1 b 1,...,Y b ], sice it appears i both of the coditioal probabilities ad got caceled out whe computig the ratio. Now, { p Pr[Y i b i X 0] Pr[Y i b i X 1] 1 p if b i 1 1 p p if b i 0 I other words, L has a factor of p/(1 p) < 1 for every 1 received ad a factor of (1 p)/p > 1 for every 0 received. So the likelihood ratio L isgreater tha 1 if ad oly if the umber of 1 s is less tha the umber of 0 s. Thus, the decisio rule is simply a majority rule: guess that a 0 was trasmitted if the umber of 0 s i the received sequece is more tha the umber of 1 s ad vice versa. Note that i derivig this rule, we assumed that Pr[X 0] Pr[X 1] 0.5. Whe the prior distributio is ot uiform, the MAP rule is o loger a simple majority rule. You are asked to derive the MAP rule i the geeral case i the exercises. CS 70, Fall 2009, Lecture 16 10

11 Error Probability Aalysis What is the probability that the guess is icorrect? This is just the evet E that the umber of flips by the oisy chael is greater tha or equal to /2. (This is a slight upper boud sice oe could be correct whe there are /2 flips uder some model of how oe guesses a aswer if there is a tie.) So the error probability of our majority rule is: Pr[E] Pr[ i1 Z i 2 ] k /2 ( ) p k (1 p) k, k recogizig that the radom variable S : i1 Z i has a biomial distributio with parameters ad p. This gives a expressio for the error probability that ca be umerically evaluated for give values of. Give a target error probability of, say, 0.01, oe ca the compute the smallest umber of repetitios eeded to achieve the target error probability. 1 As i the hashig ad load balacig applicatios we looked at earlier i the course, we are iterested i a more explicit relatioship betwee ad the error probability to get a better ituitio of the problem. The above expressio is too cumbersome for this purpose. Istead, otice that /2 is greater tha the mea p of S ad hece the error evet is related to the tail of the distributio of S. Oe ca therefore apply Chebyshev s iequality i the last lecture ote to boud the error probability: Pr[S > 2 ] < Pr[ S p > (1 2 p)] Var(S) p(1 p) 2 ( 1 2 p)2 ( p)2 The importat thig to ote is that the error probability decreases with, so ideed by repeatig more times, the performace improves (as oe would expect!). For a give target error probability of say 0.01, oe eeds to repeat o more tha p(1 p) 100 ( 1 2 p)2 times. For p 0.25, this evaluates to about 300. I the exercises, you are asked to compare the boud with the actual error probability. You will see that the boud is ot very good, ad actually oe ca repeat much fewer times to get a error probability of I a upper-divisio course like EECS 126, you ca lear about much better bouds. 1 Needless to say, oe does ot wat to repeat more times tha is ecessary as we are usig more time to trasmit each iformatio bit ad the rate of commuicatio is slowed dow. CS 70, Fall 2009, Lecture 16 11

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 18

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 18 EECS 70 Discrete Mathematics ad Probability Theory Sprig 2013 Aat Sahai Lecture 18 Iferece Oe of the major uses of probability is to provide a systematic framework to perform iferece uder ucertaity. A