Probability Review. Gonzalo Mateos

Size: px

Start display at page:

Download "Probability Review. Gonzalo Mateos"

Gervais Long
5 years ago
Views:

1 Probability Review Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester September 11, 2018 Introduction to Random Processes Probability Review 1

2 Sigma-algebras and probability spaces Sigma-algebras and probability spaces Conditional probability, total probability, Bayes rule Independence Random variables Discrete random variables Continuous random variables Expected values Joint probability distributions Joint expectations Introduction to Random Processes Probability Review 2

3 Probability An event is something that happens A random event has an uncertain outcome The probability of an event measures how likely it is to occur Example I ve written a student s name in a piece of paper. Who is she/he? Event: Student x s name is written in the paper Probability: P(x) measures how likely it is that x s name was written Probability is a measurement tool Mathematical language for quantifying uncertainty Introduction to Random Processes Probability Review 3

4 Sigma-algebra Given a sample space or universe S Ex: All students in the class S = {x1, x 2,..., x N } (x n denote names) Def: An outcome is an element or point in S, e.g., x 3 Def: An event E is a subset of S Ex: {x1}, student with name x 1 Ex: Also {x1, x 4}, students with names x 1 and x 4 Outcome x 3 and event {x 3 } are different, the latter is a set Def: A sigma-algebra F is a collection of events E S such that (i) The empty set belongs to F: F (ii) Closed under complement: If E F, then E c F (iii) Closed under countable unions: If E 1, E 2,... F, then i=1e i F F is a set of sets Introduction to Random Processes Probability Review 4

5 Examples of sigma-algebras Example No student and all students, i.e., F 0 := {, S} Example Empty set, women, men, everyone, i.e., F 1 := {, Women, Men, S} Example F 2 including the empty set plus All events (sets) with one student {x 1 },..., {x N } plus All events with two students {x 1, x 2 }, {x 1, x 3 },..., {x 1, x N }, {x 2, x 3 },..., {x 2, x N },... {x N 1, x N } plus All events with three, four,..., N students F 2 is known as the power set of S, denoted 2 S Introduction to Random Processes Probability Review 5

6 Axioms of probability Define a function P(E) from a sigma-algebra F to the real numbers P(E) qualifies as a probability if A1) Non-negativity: P(E) 0 A2) Probability of universe: P(S) = 1 A3) Additivity: Given sequence of disjoint events E 1, E 2,... ( ) P E i = i=1 P (E i ) i=1 Disjoint (mutually exclusive) events means E i E j =, i j Union of countably infinite many disjoint events Triplet (S, F, P( )) is called a probability space Introduction to Random Processes Probability Review 6

7 Consequences of the axioms Implications of the axioms A1)-A3) Impossible event: P( ) = 0 Monotonicity: E 1 E 2 P(E 1 ) P(E 2 ) Range: 0 P(E) 1 Complement: P(E c ) = 1 P(E) Finite disjoint union: For disjoint events E 1,..., E N ( N ) N P E i = P (E i ) i=1 i=1 Inclusion-exclusion: For any events E 1 and E 2 P(E 1 E 2 ) = P(E 1 ) + P(E 2 ) P(E 1 E 2 ) Introduction to Random Processes Probability Review 7

8 Probability example Let s construct a probability space for our running example Universe of all students in the class S = {x 1, x 2,..., x N } Sigma-algebra with all combinations of students, i.e., F = 2 S Suppose names are equiprobable P({x n}) = 1/N for all n Have to specify probability for all E F Define P(E) = E S Q: Is this function a probability? A1): P(E) = E S 0 A2): P(S) = = 1 S S ( N ) A3): P i=1 E N i=1 E i Ni=1 E i = = i = N S S i=1 P(E i) The P( ) just defined is called uniform probability distribution Introduction to Random Processes Probability Review 8

9 Conditional probability, total probability, Bayes rule Sigma-algebras and probability spaces Conditional probability, total probability, Bayes rule Independence Random variables Discrete random variables Continuous random variables Expected values Joint probability distributions Joint expectations Introduction to Random Processes Probability Review 9

10 Conditional probability Consider events E and F, and suppose we know F occurred Q: What does this information imply about the probability of E? Def: Conditional probability of E given F is (need P(F ) > 0) P(E F ) = P(E F ) P(F ) In general P(E F ) P(F E) Renormalize probabilities to the set F Discard a piece of S F E 2 F E 2 May discard a piece of E as well E 1 S For given F with P(F ) > 0, P( F ) satisfies the axioms of probability Introduction to Random Processes Probability Review 10

11 Conditional probability example The name I wrote is male. What is the probability of name x n? Assume male names are F = {x 1,..., x M } P(F ) = M N If name x n is male, x n F and we have for event E = {x n } P(E F ) = P({x n }) = 1 N Conditional probability is as you would expect P(E F ) = P(E F ) P(F ) = 1/N M/N = 1 M If name is female x n / F, then P(E F ) = P( ) = 0 As you would expect, then P(E F ) = 0 Introduction to Random Processes Probability Review 11

12 Law of total probability Consider event E and events F and F c F and F c form a partition of the space S (F F c = S, F F c = ) Because F F c = S cover space S, can write the set E as E = E S = E [F F c ] = [E F ] [E F c ] Because F F c = are disjoint, so is [E F ] [E F c ] = P(E) = P([E F ] [E F c ]) = P(E F ) + P(E F c ) Use definition of conditional probability P(E) = P(E F )P(F ) + P(E F c )P(F c ) Translate conditional information P(E F ) and P(E F c ) Into unconditional information P(E) Introduction to Random Processes Probability Review 12

13 Law of total probability (continued) In general, consider (possibly infinite) partition F i, i = 1, 2,... of S Sets are disjoint F i F j = for i j Sets cover the space i=1 F i = S F 1 E F 1 E F 3 E F2 F 3 F 2 As before, because i=1 F i = S cover the space, can write set E as [ ] E = E S = E F i = [E F i ] i=1 i=1 Because F i F j = are disjoint, so is [E F i ] [E F j ] =. Thus ( ) P(E) = P [E F i ] = P(E F i ) = i=1 i=1 P(E Fi )P(F i ) i=1 Introduction to Random Processes Probability Review 13

14 Total probability example Consider a probability class in some university Seniors get an A with probability (w.p.) 0.9, juniors w.p. 0.8 An exchange student is a senior w.p. 0.7, and a junior w.p. 0.3 Q: What is the probability of the exchange student scoring an A? Let A = exchange student gets an A, S denote senior, and J junior Use the law of total probability P(A) = P(A S)P(S) + P(A J)P(J) = = 0.87 Introduction to Random Processes Probability Review 14

15 Bayes rule From the definition of conditional probability P(E F )P(F ) = P(E F ) Likewise, for F conditioned on E we have P(F E)P(E) = P(F E) Quantities above are equal, giving Bayes rule P(E F ) = P(F E)P(E) P(F ) Bayes rule allows time reversion. If F (future) comes after E (past), P(E F ), probability of past (E) having seen the future (F ) P(F E), probability of future (F ) having seen past (E) Models often describe future past. Interest is often in past future Introduction to Random Processes Probability Review 15

16 Bayes rule example Consider the following partition of my E 1 = spam w.p. P(E 1) = 0.7 E 2 = low priority w.p. P(E 2) = 0.2 E 3 = high priority w.p. P(E 3) = 0.1 Let F = an contains the word free From experience know P(F E1) = 0.9, P(F E2) = P(F E3) = 0.01 I got an containing free. What is the probability that it is spam? Apply Bayes rule P(E 1 F ) = P(F E 1)P(E 1) P(F ) = P(F E 1)P(E 1) 3 i=1 P(F E i )P(E i ) = Law of total probability very useful when applying Bayes rule Introduction to Random Processes Probability Review 16

17 Independence Sigma-algebras and probability spaces Conditional probability, total probability, Bayes rule Independence Random variables Discrete random variables Continuous random variables Expected values Joint probability distributions Joint expectations Introduction to Random Processes Probability Review 17

18 Independence Def: Events E and F are independent if P(E F ) = P(E)P(F ) Events that are not independent are dependent According to definition of conditional probability P(E F ) = P(E F ) P(F ) = P(E)P(F ) P(F ) = P(E) Intuitive, knowing F does not alter our perception of E F bears no information about E The symmetric is also true P(F E) = P(F ) Whether E and F are independent relies strongly on P( ) Avoid confusing with disjoint events, meaning E F = Q: Can disjoint events with P(E) > 0, P(F ) > 0 be independent? No Introduction to Random Processes Probability Review 18

19 Independence example Wrote one name, asked a friend to write another (possibly the same) Probability space (S, F, P( )) for this experiment S is the set of all pairs of names [x n (1), x n (2)], S = N 2 Sigma-algebra is (cartesian product) power set F = 2 S Define P(E) = E S as the uniform probability distribution Consider the events E 1 = I wrote x 1 and E 2 = My friend wrote x 2 Q: Are they independent? Yes, since P(E 1 E 2 ) = P ( {(x 1, x 2 )} ) = {(x 1, x 2 )} S = 1 N 2 = P(E 1)P(E 2 ) Dependent events: E 1 = I wrote x 1 and E 3 = Both names are male Introduction to Random Processes Probability Review 19

20 Independence for more than two events Def: Events E i, i = 1, 2,... are called mutually independent if ( ) P E i = P(E i ) i I i I for every finite subset I of at least two integers Ex: Events E 1, E 2, and E 3 are mutually independent if all the following hold P(E 1 E 2 E 3) = P(E 1)P(E 2)P(E 3) P(E 1 E 2) = P(E 1)P(E 2) P(E 1 E 3) = P(E 1)P(E 3) P(E 2 E 3) = P(E 2)P(E 3) If P(E i E j ) = P(E i )P(E j ) for all (i, j), the E i are pairwise independent Mutual independence pairwise independence. Not the other way Introduction to Random Processes Probability Review 20

21 Random variables Sigma-algebras and probability spaces Conditional probability, total probability, Bayes rule Independence Random variables Discrete random variables Continuous random variables Expected values Joint probability distributions Joint expectations Introduction to Random Processes Probability Review 21

22 Random variable (RV) definition Def: RV X (s) is a function that assigns a value to an outcome s S Think of RVs as measurements associated with an experiment Example Throw a ball inside a 1m 1m square. Interested in ball position Uncertain outcome is the place s [0, 1] 2 where the ball falls Random variables are X (s) and Y (s) position coordinates RV probabilities inferred from probabilities of underlying outcomes P(X (s) = x) = P({s S : X (s) = x}) P(X (s) (, x]) = P({s S : X (s) (, x]}) X (s) is the random variable and x a particular value of X (s) Introduction to Random Processes Probability Review 22

23 Example 1 Throw coin for head (H) or tails (T ). Coin is fair P(H) = 1/2, P(T ) = 1/2. Pay $1 for H, charge $1 for T. Earnings? Possible outcomes are H and T To measure earnings define RV X with values X (H) = 1, X (T ) = 1 Probabilities of the RV are P(X = 1) = P(H) = 1/2, P(X = 1) = P(T ) = 1/2 Also have P(X = x) = 0 for all other x ±1 Introduction to Random Processes Probability Review 23

24 Example 2 Throw 2 coins. Pay $1 for each H, charge $1 for each T. Earnings? Now the possible outcomes are HH, HT, TH, and TT To measure earnings define RV Y with values Y (HH) = 2, Y (HT ) = 0, Y (TH) = 0, Y (TT ) = 2 Probabilities of the RV are P(Y = 2) = P(HH) = 1/4, P(Y = 0) = P(HT ) + P(TH) = 1/2, P(Y = 2) = P(TT ) = 1/4 Introduction to Random Processes Probability Review 24

25 About Examples 1 and 2 RVs are easier to manipulate than events Let s 1 {H, T } be outcome of coin 1 and s 2 {H, T } of coin 2 Can relate Y and X s as Y (s 1, s 2 ) = X 1 (s 1 ) + X 2 (s 2 ) Throw N coins. Earnings? Enumeration becomes cumbersome Alternatively, let s n {H, T } be outcome of n-th toss and define Y (s 1, s 2,..., s N ) = N X n (s n ) n=1 Will usually abuse notation and write Y = N n=1 X n Introduction to Random Processes Probability Review 25

26 Example 3 Throw a coin until landing heads for the first time. P(H) = p Number of throws until the first head? Outcomes are H, TH, TTH, TTTH,... Note that S = Stop tossing after first H (thus THT not a possible outcome) Let N be a RV counting the number of throws N = n if we land T in the first n 1 throws and H in the n-th P(N = 1) = P(H) P(N = 2) = P(TH) = p = (1 p)p. P(N = n) = P(TT }{{... T } H) = (1 p) n 1 p n 1 tails Introduction to Random Processes Probability Review 26

27 Example 3 (continued) From A2) we should have P(S) = n=1 P(N =n) = 1 Holds because n=1 (1 p)n 1 is a geometric series (1 p) n 1 = 1 + (1 p) + (1 p) = 1 (1 p) = 1 p n=1 Plug the sum of the geometric series in the expression for P(S) P(N = n) = p (1 p) n 1 = p 1 p = 1 n=1 n=1 Introduction to Random Processes Probability Review 27

28 Indicator function The indicator function of an event is a random variable Let s S be an outcome, and E S be an event { 1, if s E I {E}(s) = 0, if s / E Example Indicates that outcome s belongs to set E, by taking value 1 Number of throws N until first H. Interested on N exceeding N 0 Event is {N : N > N 0 }. Possible outcomes are N = 1, 2,... Denote indicator function as I N0 = I {N : N > N 0 } Probability P(I N0 = 1) = P(N > N 0 ) = (1 p) N0 For N to exceed N 0 need N 0 consecutive tails Doesn t matter what happens afterwards Introduction to Random Processes Probability Review 28

29 Discrete random variables Sigma-algebras and probability spaces Conditional probability, total probability, Bayes rule Independence Random variables Discrete random variables Continuous random variables Expected values Joint probability distributions Joint expectations Introduction to Random Processes Probability Review 29

30 Probability mass and cumulative distribution functions Discrete RV takes on, at most, a countable number of values Probability mass function (pmf) p X (x) = P(X = x) If RV is clear from context, just write px (x) = p(x) If X supported in {x 1, x 2,...}, pmf satisfies (i) p(x i ) > 0 for i = 1, 2,... (ii) p(x) = 0 for all other x x i (iii) i=1 p(x i) = 1 Pmf for throw to first heads (p = 0.3) Cumulative distribution function (cdf) F X (x) = P(X x) = p(x i ) i:x i x Staircase function with jumps at x i Cdf for throw to first heads (p = 0.3) Introduction to Random Processes Probability Review 30

31 Bernoulli A trial/experiment/bet can succeed w.p. p or fail w.p. q := 1 p Ex: coin throws, any indication of an event Bernoulli X can be 0 or 1. Pmf is p(x) = p x q 1 x Cdf is 0, x < 0 F (x) = q, 0 x < 1 1, x 1 pmf (p = 0.4) cdf (p = 0.4) Introduction to Random Processes Probability Review 31

32 Geometric Count number of Bernoulli trials needed to register first success Trials succeed w.p. p and are independent Number of trials X until success is geometric with parameter p Pmf is p(x) = p(1 p) x 1 One success after x 1 failures, trials are independent Cdf is F (x) = 1 (1 p) x Recall P (X > x) = (1 p) x ; or just sum the geometric series pmf (p = 0.3) cdf (p = 0.3) Introduction to Random Processes Probability Review 32

33 Binomial Count number of successes X in n Bernoulli trials Trials succeed w.p. p and are independent Number of successes X is binomial with parameters (n, p). Pmf is ( ) n p(x) = p x (1 p) n x n! = x (n x)!x! px (1 p) n x X = x for x successes (p x ) and n x failures ((1 p) n x ). ( n x) ways of drawing x successes and n x failures pmf (n = 9, p = 0.4) cdf (n = 9, p = 0.4) Introduction to Random Processes Probability Review 33

34 Binomial (continued) Let Y i, i = 1,... n be Bernoulli RVs with parameter p Y i associated with independent events Can write binomial X with parameters (n, p) as X = Example Consider binomials Y and Z with parameters (n Y, p) and (n Z, p) Q: Probability distribution of X = Y + Z? Write Y = n Y i=1 Y i and Z = n Z i=1 Z i, thus X = n Y i=1 Y i + n Z Z i i=1 X is binomial with parameter (n Y + n Z, p) n i=1 Y i Introduction to Random Processes Probability Review 34

35 Poisson Counts of rare events (radioactive decay, packet arrivals, accidents) Usually modeled as Poisson with parameter λ and pmf λ λx p(x) = e x! Q: Is this a properly defined pmf? Yes Taylor s expansion of e x = 1 + x + x 2 / x i /i! Then P(S) = p(i) = e λ λ i i! = e λ e λ = 1 i=0 i=0 pmf (λ = 4) cdf (λ = 4) Introduction to Random Processes Probability Review 35

36 Poisson approximation of binomial X is binomial with parameters (n, p) Let n while maintaining a constant product np = λ If we just let n number of successes diverges. Boring Compare with Poisson distribution with parameter λ λ = 5, n = 6, 8, 10, 15, 20, Introduction to Random Processes Probability Review 36

37 Poisson and binomial (continued) This is, in fact, the motivation for the definition of a Poisson RV Substituting p = λ/n in the pmf of a binomial RV p n(x) = = n! (n x)!x! ( ) x ( λ 1 λ ) n x n n n(n 1)... (n x + 1) λ x (1 λ/n) n n x x! (1 λ/n) x Used factorials defs., (1 λ/n) n x = (1 λ/n)n (1 λ/n) x, and reordered terms In the limit, red term is lim n (1 λ/n) n = e λ Black and blue terms converge to 1. From both observations lim p n(x) = 1 λx e λ λx = e λ n x! 1 x! Limit is the pmf of a Poisson RV Introduction to Random Processes Probability Review 37

38 Closing remarks Binomial distribution is motivated by counting successes The Poisson is an approximation for large number of trials n Poisson distribution is more tractable (compare pmfs) Sometimes called law of rare events Individual events (successes) happen with small probability p = λ/n Aggregate event (number of successes), though, need not be rare Notice that all four RVs seen so far are related to coin tosses Introduction to Random Processes Probability Review 38

39 Where did the probability space go? Random variables are mappings X (s) : S R Example The underlying probability space often disappears This is for notational convenience, but it s still there Let s construct a probability space for a Bernoulli RV Let S = [0, 1], F the Borel sigma-field and P ([a, b]) = b a, a b Fix a parameter p [0, 1] and define { 1, s p, X (s) = 0, s > p. P (X = 1) = P (s p) = P ([0, p]) = p and P (X = 0) = 1 p Can do a similar construction for all distributions consider so far Introduction to Random Processes Probability Review 39

40 Continuous random variables Sigma-algebras and probability spaces Conditional probability, total probability, Bayes rule Independence Random variables Discrete random variables Continuous random variables Expected values Joint probability distributions Joint expectations Introduction to Random Processes Probability Review 40

41 Continuous RVs, probability density function Possible values for continuous RV X form a dense subset X R Uncountably infinite number of possible values Probability density function (pdf) f X (x) 0 is such that for any subset X R (Normal pdf to the right) P(X X ) = f X (x)dx Will have P(X = x) = 0 for all x X Cdf defined as before and related to the pdf (Normal cdf to the right) X F X (x) = P(X x) = x f X (u) du P(X ) = F X ( ) = lim x F X (x) = Introduction to Random Processes Probability Review 41

42 More on cdfs and pdfs When the set X = [a, b] is an interval of R P (X [a, b]) = P (X b) P (X a) = F X (b) F X (a) In terms of the pdf it can be written as P (X [a, b]) = b For small interval [x 0, x 0 + δx], in particular P (X [x 0, x 0 + δx]) = a x0 +δx x 0 f X (x) dx f X (x) dx f X (x 0)δx Probability is the area under the pdf (thus density ) Another relationship between pdf and cdf is F X (x) = f X (x) x Fundamental theorem of calculus ( derivative inverse of integral ) Introduction to Random Processes Probability Review 42

43 Uniform Model problems with equal probability of landing on an interval [a, b] Pdf of uniform RV is f (x) = 0 outside the interval [a, b] and f (x) = 1 b a, for a x b Cdf is F (x) = (x a)/(b a) in the interval [a, b] (0 before, 1 after) Prob. of interval [α, β] [a, b] is β f (x)dx = (β α)/(b a) α Depends on interval s width β α only, not on its position pdf (a = 1, b = 1) cdf (a = 1, b = 1) Introduction to Random Processes Probability Review 43

44 Exponential Model duration of phone calls, lifetime of electronic components Pdf of exponential RV is f (x) = { λe λx, x 0 0, x < 0 As parameter λ increases, height increases and width decreases Cdf obtained by integrating pdf F (x) = x f (u) du = x 0 x λe λu du = e λu = 1 e λx 0 1 pdf (λ = 1) cdf (λ = 1) Introduction to Random Processes Probability Review 44

45 Normal / Gaussian Model randomness arising from large number of random effects Pdf of normal RV is f (x) = 1 2πσ e (x µ)2 /2σ 2 µ is the mean (center), σ 2 is the variance (width) 0.68 prob. between µ ± σ, prob. in µ ± 3σ Standard normal RV has µ = 0 and σ 2 = 1 Cdf F (x) cannot be expressed in terms of elementary functions 0.4 pdf (µ = 0, σ 2 = 1) cdf (µ = 0, σ 2 = 1) Introduction to Random Processes Probability Review 45

46 Expected values Sigma-algebras and probability spaces Conditional probability, total probability, Bayes rule Independence Random variables Discrete random variables Continuous random variables Expected values Joint probability distributions Joint expectations Introduction to Random Processes Probability Review 46

47 Expected values We are asked to summarize information about a RV in a single value What should this value be? If we are allowed a description with a few values What should they be? Expected (mean) values are convenient answers to these questions Beware: Expectations are condensed descriptions They overlook some aspects of the random phenomenon Whole story told by the probability distribution (cdf) Introduction to Random Processes Probability Review 47

48 Definition for discrete RVs Discrete RV X taking on values x i, i = 1, 2,... with pmf p(x) Def: The expected value of the discrete RV X is E [X ] := x i p(x i ) = xp(x) i=1 x:p(x)>0 Weighted average of possible values x i. Probabilities are weights Common average if RV takes values x i, i = 1,..., N equiprobably E [X ] = N x i p(x i ) = i=1 N i=1 x i 1 N = 1 N N i=1 x i Introduction to Random Processes Probability Review 48

49 Expected value of Bernoulli and geometric RVs Ex: For a Bernoulli RV p(x) = p x q 1 x, for x {0, 1} E [X ] = 1 p + 0 q = p Ex: For a geometric RV p(x) = p(1 p) x 1 = pq x 1, for x 1 Note that q x / q = xq x 1 and that derivatives are linear operators E [X ] = xpq x 1 = p x=1 x=1 q x q = p ( ) q x q x=1 Sum inside derivative is geometric. Sums to q/(1 q), thus E [X ] = p ( ) q p = q 1 q (1 q) 2 = 1 p Time to first success is inverse of success probability. Reasonable Introduction to Random Processes Probability Review 49

50 Expected value of Poisson RV Ex: For a Poisson RV p(x) = e λ (λ x /x!), for x 0 First summand in definition is 0, pull λ out, and use x x! = 1 (x 1)! E [X ] = x=0 λ λx xe x! = λe λ x=1 λ x 1 (x 1)! Sum is Taylor s expansion of e λ = 1 + λ + λ 2 /2! λ x /x! E [X ] = λe λ e λ = λ Poisson is limit of binomial for large number of trials n, with λ = np Counts number of successes in n trials that succeed w.p. p Expected number of successes is λ = np Number of trials probability of individual success. Reasonable Introduction to Random Processes Probability Review 50

51 Definition for continuous RVs Continuous RV X taking values on R with pdf f (x) Def: The expected value of the continuous RV X is E [X ] := xf (x) dx Compare with E [X ] := x:p(x)>0 xp(x) in the discrete RV case Note that the integral or sum are assumed to be well defined Otherwise we say the expectation does not exist Introduction to Random Processes Probability Review 51

52 Expected value of normal RV Ex: For a normal RV add and subtract µ, separate integrals E [X ] = 1 2πσ = 1 2πσ = µ 1 2πσ xe (x µ) 2 2σ 2 dx (x + µ µ)e (x µ) 2 2σ 2 e (x µ) 2 2σ 2 dx + 1 dx 2πσ First integral is 1 because it integrates a pdf in all R (x µ)e (x µ) 2 2σ 2 Second integral is 0 by symmetry. Both observations yield E [X ] = µ The mean of a RV with a symmetric pdf is the point of symmetry dx Introduction to Random Processes Probability Review 52

53 Expected value of uniform and exponential RVs Ex: For a uniform RV f (x) = 1/(b a), for a x b E [X ] = xf (x) dx = b a x b a dx = b2 a 2 (a + b) = 2(b a) 2 Makes sense, since pdf is symmetric around midpoint (a + b)/2 Ex: For an exponential RV (non symmetric) integrate by parts E [X ] = 0 xλe λx dx = xe λx + 0 = xe λx 0 0 e λx λ e λx dx 0 = 1 λ Introduction to Random Processes Probability Review 53

54 Expected value of a function of a RV Consider a function g(x ) of a RV X. Expected value of g(x )? g(x) is also a RV, then it also has a pmf p g(x ) ( g(x) ) E [g(x )] = g(x):p g(x ) (g(x))>0 g(x)p g(x ) ( g(x) ) Requires calculating the pmf of g(x ). There is a simpler way Theorem Consider a function g(x ) of a discrete RV X with pmf p X (x). Then E [g(x )] = g(x i )p X (x i ) i=1 Weighted average of functional values. No need to find pmf of g(x ) Same can be proved for a continuous RV E [g(x )] = g(x)f X (x) dx Introduction to Random Processes Probability Review 54

55 Expected value of a linear transformation Consider a linear function (actually affine) g(x ) = ax + b E [ax + b] = (ax i + b)p X (x i ) i=1 = ax i p X (x i ) + bp X (x i ) i=1 i=1 = a x i p X (x i ) + b p X (x i ) i=1 i=1 = ae [X ] + b1 Can interchange expectation with additive/multiplicative constants E [ax + b] = ae [X ] + b Again, the same holds for a continuous RV Introduction to Random Processes Probability Review 55

56 Expected value of an indicator function Let X be a RV and X be a set I {X X } = { 1, if x X 0, if x / X Expected value of I {X X } in the discrete case E [I {X X }] = x:p X (x)>0 I {x X }p X (x) = x X Likewise in the continuous case E [I {X X }] = I {x X }f X (x)dx = x X p X (x) = P (X X ) f X (x)dx = P (X X ) Expected value of indicator RV = Probability of indicated event Recall E [X ] = p for Bernoulli RV (it indicates success ) Introduction to Random Processes Probability Review 56

57 Moments, central moments and variance Def: The n-th moment (n 0) of a RV is E [X n ] = xi n p(x i ) Def: The n-th central moment corrects for the mean, that is [ (X ) ] n ( E E [X ] = xi E [X ] ) n p(xi ) i=1 i=1 0-th order moment is E [ X 0] = 1; 1-st moment is the mean E [X ] 2-nd central moment is the variance. Measures width of the pmf [ (X ) ] 2 var [X ] = E E [X ] = E [ X 2] E 2 [X ] Ex: For affine functions var [ax + b] = a 2 var [X ] Introduction to Random Processes Probability Review 57

58 Variance of Bernoulli and Poisson RVs Ex: For a Bernoulli RV X with parameter p, E [X ] = E [ X 2] = p var [X ] = E [ X 2] E 2 [X ] = p p 2 = p(1 p) Ex: For Poisson RV Y with parameter λ, second moment is E [ Y 2] = = y=0 y=1 y 2 λ λy e y! = y=1 y e λ λ y (y 1)! (y 1) e λ λ y (y 1)! + e λ λ y (y 1)! = e λ λ 2 y=2 y=1 λ y 2 (y 2)! + e λ λ y=1 = e λ λ 2 e λ + e λ λe λ = λ 2 + λ var [Y ] = E [ Y 2] E 2 [Y ] = λ 2 + λ λ 2 = λ λ y 1 (y 1)! Introduction to Random Processes Probability Review 58

59 Joint probability distributions Sigma-algebras and probability spaces Conditional probability, total probability, Bayes rule Independence Random variables Discrete random variables Continuous random variables Expected values Joint probability distributions Joint expectations Introduction to Random Processes Probability Review 59

60 Joint cdf Want to study problems with more than one RV. Say, e.g., X and Y Probability distributions of X and Y are not sufficient Joint probability distribution (cdf) of (X, Y ) defined as F XY (x, y) = P (X x, Y y) If X, Y clear from context omit subindex to write F XY (x, y) = F (x, y) Can recover F X (x) by considering all possible values of Y F X (x) = P (X x) = P (X x, Y ) = F XY (x, ) F X (x) and F Y (y) = F XY (, y) are called marginal cdfs Introduction to Random Processes Probability Review 60

61 Joint pmf Consider discrete RVs X and Y X takes values in X := {x 1, x 2,...} and Y in Y := {y 1, y 2,...} Joint pmf of (X, Y ) defined as p XY (x, y) = P (X = x, Y = y) Possible values (x, y) are elements of the Cartesian product X Y (x1, y 1), (x 1, y 2),..., (x 2, y 1), (x 2, y 2),..., (x 3, y 1), (x 3, y 2),... Marginal pmf p X (x) obtained by summing over all values of Y p X (x) = P (X = x) = y Y P (X = x, Y = y) = y Y p XY (x, y) Likewise p Y (y) = x Xp XY (x, y). Marginalize by summing Introduction to Random Processes Probability Review 61

62 Joint pdf Consider continuous RVs X, Y. Arbitrary set A R 2 Joint pdf is a function f XY (x, y) : R 2 R + such that P ((X, Y ) A) = f XY (x, y) dxdy Marginalization. There are two ways of writing P (X X ) P (X X ) = P (X X, Y R) = A + X X Definition of f X (x) P (X X ) = X X f X (x) dx f XY (x, y) dydx Introduction to Random Processes Probability Review 62

63 Joint pdf Consider continuous RVs X, Y. Arbitrary set A R 2 Joint pdf is a function f XY (x, y) : R 2 R + such that P ((X, Y ) A) = f XY (x, y) dxdy Marginalization. There are two ways of writing P (X X ) P (X X ) = P (X X, Y R) = A + X X f XY (x, y) dydx Definition of f X (x) P (X X ) = X X f X (x) dx Lipstick on a pig (same thing written differently is still same thing) f X (x) = + f XY (x, y) dy, f Y (y) = + f XY (x, y) dx Introduction to Random Processes Probability Review 63

64 Example Consider two Bernoulli RVs B 1, B 2, with the same parameter p Define X = B 1 and Y = B 1 + B 2 The pmf of X is p X (0) = 1 p, p X (1) = p Likewise, the pmf of Y is p Y (0) = (1 p) 2, p Y (1) = 2p(1 p), p Y (2) = p 2 The joint pmf of X and Y is p XY (0, 0) = (1 p) 2, p XY (0, 1) = p(1 p), p XY (0, 2) = 0 p XY (1, 0) = 0, p XY (1, 1) = p(1 p), p XY (1, 2) = p 2 Introduction to Random Processes Probability Review 64

65 Random vectors For convenience often arrange RVs in a vector Prob. distribution of vector is joint distribution of its entries Consider, e.g., two RVs X and Y. Random vector is X = [X, Y ] T If X and Y are discrete, vector variable X is discrete with pmf ( p X (x) = p X [x, y] T ) = p XY (x, y) If X, Y continuous, X continuous with pdf ( f X (x) = f X [x, y] T ) = f XY (x, y) ( Vector cdf is F X (x) = F ) X [x, y] T = F XY (x, y) In general, can define n-dimensional RVs X := [X 1, X 2,..., X n ] T Just notation, definitions carry over from the n = 2 case Introduction to Random Processes Probability Review 65

66 Joint expectations Sigma-algebras and probability spaces Conditional probability, total probability, Bayes rule Independence Random variables Discrete random variables Continuous random variables Expected values Joint probability distributions Joint expectations Introduction to Random Processes Probability Review 66

67 Joint expectations RVs X and Y and function g(x, Y ). Function g(x, Y ) also a RV Expected value of g(x, Y ) when X and Y discrete can be written as E [g(x, Y )] = g(x, y)p XY (x, y) When X and Y are continuous E [g(x, Y )] = x,y:p XY (x,y)>0 g(x, y)f XY (x, y) dxdy Can have more than two RVs and use vector notation Ex: Linear transformation of a vector RV X R n : g(x) = a T X E [ a T X ] = a T xf X (x) dx R n Introduction to Random Processes Probability Review 67

68 Expected value of a sum of random variables Expected value of the sum of two continuous RVs E [X + Y ] = = (x + y)f XY (x, y) dxdy x f XY (x, y) dxdy + y f XY (x, y) dxdy Remove x (y) from innermost integral in first (second) summand E [X + Y ] = x f XY (x, y) dy dx + = xf X (x) dx + = E [X ] + E [Y ] Used marginal expressions yf Y (y) dy Expectation summation E [ i X ] i = i E [X i] y f XY (x, y) dx dy Introduction to Random Processes Probability Review 68

69 Expected value is a linear operator Combining with earlier result E [ax + b] = ae [X ] + b proves that E [a x X + a y Y + b] = a x E [X ] + a y E [Y ] + b Better yet, using vector notation (with a R n, X R n, b a scalar) E [ a T X + b ] = a T E [X] + b Also, if A is an m n matrix with rows a T 1,..., at m and b R m a vector with elements b 1,..., b m, we can write E [AX + b]= E [ a T 1 X + b 1 ] E [ a T 2 X + b 2 ]. E [ a T mx + b m ] = a T 1 E [X] + b 1 a T 2 E [X] + b 2. a T me [X] + b m = AE [X] + b Expected value operator can be interchanged with linear operations Introduction to Random Processes Probability Review 69

70 Independence of RVs Events E and F are independent if P (E F ) = P (E) P (F ) Def: RVs X and Y are independent if events X x and Y y are independent for all x and y, i.e. P (X x, Y y) = P (X x) P (Y y) By definition, equivalent to F XY (x, y) = F X (x)f Y (y) For discrete RVs equivalent to analogous relation between pmfs p XY (x, y) = p X (x)p Y (y) For continuous RVs the analogous is true for pdfs f XY (x, y) = f X (x)f Y (y) Independence Joint distribution factorizes into product of marginals Introduction to Random Processes Probability Review 70

71 Sum of independent Poisson RVs Independent Poisson RVs X and Y with parameters λ x and λ y Q: Probability distribution of the sum RV Z := X + Y? Z = n only if X = k, Y = n k for some 0 k n (use independence, Poisson pmf, rearrange terms, binomial theorem) p Z (n) = = = n P (X = k, Y = n k) = k=0 n k=0 e λx λk x e (λx +λy ) n! λ n k y k! e λy (n k)! (λ x + λ y ) n = n P (X = k) P (Y = n k) k=0 e (λx +λy ) n! n k=0 n! (n k)!k! λk x λ n k y Z is Poisson with parameter λ z := λ x + λ y Sum of independent Poissons is Poisson (parameters added) Introduction to Random Processes Probability Review 71

72 Expected value of a binomial RV Binomial RVs count number of successes in n Bernoulli trials Ex: Let X i, i = 1,... n be n independent Bernoulli RVs n n Can write binomial X = X i E [X ] = E [X i ] = np i=1 i=1 Expected nr. successes = nr. trials prob. individual success Same interpretation that we observed for Poisson RVs n Ex: Dependent Bernoulli trials. Y = X i, but X i are not independent Expected nr. successes is still E [Y ] = np Linearity of expectation does not require independence Y is not binomial distributed i=1 Introduction to Random Processes Probability Review 72

73 Expected value of a product of independent RVs Theorem For independent RVs X and Y, and arbitrary functions g(x ) and h(y ): E [g(x )h(y )] = E [g(x )] E [h(y )] The expected value of the product is the product of the expected values Can show that g(x ) and h(y ) are also independent. Intuitive Ex: Special case when g(x ) = X and h(y ) = Y yields E [XY ] = E [X ] E [Y ] Expectation and product can be interchanged if RVs are independent Different from interchange with linear operations (always possible) Introduction to Random Processes Probability Review 73

74 Expected value of a product of independent RVs Proof. Suppose X and Y continuous RVs. Use definition of independence E [g(x )h(y )] = = g(x)h(y)f XY (x, y) dxdy g(x)h(y)f X (x)f Y (y) dxdy Integrand is product of a function of x and a function of y E [g(x )h(y )] = g(x)f X (x) dx = E [g(x )] E [h(y )] h(y)f Y (y) dy Introduction to Random Processes Probability Review 74

75 Variance of a sum of independent RVs Let X n, n = 1,... N be independent with E [X n ] = µ n, var [X n ] = σ 2 n Q: Variance of sum X := N n=1 X n? Notice that mean of X is E [X ] = N n=1 µ n. Then ( N ) 2 ( N N ) 2 var [X ] = E X n µ n = E (X n µ n ) n=1 n=1 n=1 Expand square and interchange summation and expectation var [X ] = N N n=1 m=1 [ ] E (X n µ n )(X m µ m ) Introduction to Random Processes Probability Review 75

76 Variance of a sum of independent RVs (continued) Separate terms in sum. Then use independence and E(X n µ n ) = 0 var [X ] = = N N n=1,n m m=1 N n=1,n m m=1 [ ] E (X n µ n )(X m µ m ) + N E(X n µ n )E(X m µ m ) + N n=1 E [(X n µ n ) 2] N σn 2 = If RVs are independent Variance of sum is sum of variances Slightly more general result holds for independent X i, i = 1,..., n [ ] var (a i X i + b i ) = ai 2 var [X i ] i i n=1 N n=1 σ 2 n Introduction to Random Processes Probability Review 76

77 Variance of binomial RV and sample mean Ex: Let X i, i = 1,... n be independent Bernoulli RVs Recall E [X i ] = p and var [X i ] = p(1 p) Write binomial X with parameters (n, p) as: X = Variance of binomial then var [X ] = n i=1 X i n var [X i ] = np(1 p) Ex: Let Y i, i = 1,... n be independent RVs and E [Y i ] = µ, var [Y i ] = σ 2 Sample mean is Ȳ = 1 n Y i. What about E [ Ȳ ] and var [ Ȳ ]? n i=1 Expected value E [ Ȳ ] = 1 n Variance var [ Ȳ ] = 1 n 2 n i=1 i=1 n E [Y i ] = µ i=1 var [Y i ] = σ2 n (used independence) Introduction to Random Processes Probability Review 77

78 Covariance Def: The covariance of X and Y is (generalizes variance to pairs of RVs) cov(x, Y ) = E [(X E [X ])(Y E [Y ])] = E [XY ] E [X ] E [Y ] If cov(x, Y ) = 0 variables X and Y are said to be uncorrelated If X, Y independent then E [XY ] = E [X ] E [Y ] and cov(x, Y ) = 0 Independence implies uncorrelated RVs Opposite is not true, may have cov(x, Y ) = 0 for dependent X, Y Ex: X uniform in [ a, a] and Y = X 2 But uncorrelatedness implies independence if X, Y are normal If cov(x, Y ) > 0 then X and Y tend to move in the same direction Positive correlation If cov(x, Y ) < 0 then X and Y tend to move in opposite directions Negative correlation Introduction to Random Processes Probability Review 78

79 Covariance example Let X be a zero-mean random signal and Z zero-mean noise Signal X and noise Z are independent Consider received signals Y 1 = X + Z and Y 2 = X + Z (I) Y 1 and X are positively correlated (X, Y 1 move in same direction) cov(x, Y 1 ) = E [XY 1 ] E [X ] E [Y 1 ] = E [X (X + Z)] E [X ] E [X + Z] Second term is 0 (E [X ] = 0). For first term independence of X, Z E [X (X + Z)] = E [ X 2] + E [X ] E [Z] = E [ X 2] Combining observations cov(x, Y 1 ) = E [ X 2] > 0 Introduction to Random Processes Probability Review 79

80 Covariance example (continued) (II) Y 2 and X are negatively correlated (X, Y 2 move opposite direction) Same computations cov(x, Y 2 ) = E [ X 2] < 0 (III) Can also compute correlation between Y 1 and Y 2 cov(y 1, Y 2 ) = E [(X + Z)( X + Z)] E [(X + Z)] E [( X + Z)] = E [ X 2] + E [ Z 2] Negative correlation if E [ X 2] > E [ Z 2] (small noise) Positive correlation if E [ X 2] < E [ Z 2] (large noise) Correlation between X and Y 1 or X and Y 2 comes from causality Correlation between Y 1 and Y 2 does not. Latent variables X and Z Correlation does not imply causation Plausible, indeed commonly used, model of a communication channel Introduction to Random Processes Probability Review 80

81 Glossary Sample space Outcome and event Sigma-algebra Countable union Axioms of probability Probability space Conditional probability Law of total probability Bayes rule Independent events Random variable (RV) Discrete RV Bernoulli, binomial, Poisson Continuous RV Uniform, Normal, exponential Indicator RV Pmf, pdf and cdf Law of rare events Expected value Variance and standard deviation Joint probability distribution Marginal distribution Random vector Independent RVs Covariance Uncorrelated RVs Introduction to Random Processes Probability Review 81

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed