Midterm #1. Lecture 10: Joint Distributions and the Law of Large Numbers. Joint Distributions - Example, cont. Joint Distributions - Example

Midterm #1 Midterm 1 Lecture 10: and the Law of Large Numbers Statistics 104 Colin Rundel February 0, 01 Exam will be passed back at the end of class Exam was hard, on the whole the class did well: Mean: 75 Median: 81 SD: 1.8 Max: 105 Final grades will be curved, midterm grades will be posted this week. Statistics 104 (Colin Rundel) Lecture 10 February 0, 01 1 / - Example - Example, cont. Draw two socks at random, without replacement, from a drawer full of twelve colored socks: black, 4 white, purple Let B be the number of Black socks, W the number of White socks drawn, then the distributions of B and W are given by: P(Bk) P(Wk) 8 Note - B HyperGeo(1,, ) 0 1 5 1 11 1 11 3 7 1 11 8 4 8 1 11 3 ( )( ) k k 5 1 11 4 3 1 11 ( 1 ) and W HyperGeo(1, 4, ) ( )( ) 4 8 k k ( ) 1 Let B be the number of Black socks, W the number of White socks drawn, then the distributions of B and W are given by: 0 1 B 1 1 W 0 1 8 4 3 0 0 0 8 P(B b, W w) 3 ( )( 4 b w)( ( 1 ) b w ) Statistics 104 (Colin Rundel) Lecture 10 February 0, 01 / Statistics 104 (Colin Rundel) Lecture 10 February 0, 01 3 /

Marginal Distribution Conditional Distribution Note that the column and row sums are the distributions of B and W respectively. P(B b) P(B b, W 0) + P(B b, W 1) + P(B b, W ) P(W w) P(B 0, W w)+p(b 1, W w)+p(b, W w) These are the marginal distributions of B and W. In general, P(X x) y P(X x, Y y) y P(X x Y y)p(y y) Conditional distributions are defined as we have seen previously with P(X x Y y) P(X x, Y y) P(Y y) joint pmf marginal pmf Therefore the pmf for white socks given no black socks were drawn is P(W w B 0) P(W w, B 0) P(B 0) / 1 1 / 8 8 if W 0 if W 1 / if W Statistics 104 (Colin Rundel) Lecture 10 February 0, 01 4 / Statistics 104 (Colin Rundel) Lecture 10 February 0, 01 5 / Expectation of Independence, cont. E[g(X, Y )] x g(x, y)p(x x, Y y) For example we can define g(x, y) x y then y E(BW ) (0 0 1/) + (0 1 8/) + (0 /) + (1 0 1/) + (1 1 4/) + (1 0/) + ( 0 /) + ( 1 0/) + (1 0/) 4/ 4/11 Note that E(BW ) E(B)E(W ) since Remember that Cov(X, Y ) 0 when X and Y are independent. Cov(B, W ) E[(B E[B])(W E[W ])] E(BW ) E(B)E(W ) 4/11 /3 10/33 0.30303 E(B)E(W ) (0 / + 1 3/ + /) (0 8/ + 1 3/ + /) / 44/ /3 This implies that B and W are not independent. Statistics 104 (Colin Rundel) Lecture 10 February 0, 01 / Statistics 104 (Colin Rundel) Lecture 10 February 0, 01 7 /

Expectation of Conditional Probability Multinomial Distribution Works like any other distribution E(X Y y) xp(x x Y y) x Therefore we can calculating things like conditional mean and variance, E(W B 0) 0 1/ + 1 8/ + / 0/ 1.333 E(W B 0) 0 1/ + 1 8/ + / 3/.1333 Var(W B 0) E(W B 0) E(W B 0) 3/ (4/3) 1/45 0.355 Let X 1, X,, X k be the k random variables that reflect the number of outcomes belonging to category k in n trials with the probability of success for category k being p k, X 1,, X k Multinom(n, p 1,, p k ) P(X 1 x 1,, X k x k ) f (x 1,, x k n, p 1,, p k ) where n! x 1! x k! px 1 1 px k k k k x i n and p i 1 E(X i ) np i Var(X i ) np i (1 p i ) Cov(X i, X j ) np i p j Statistics 104 (Colin Rundel) Lecture 10 February 0, 01 8 / Statistics 104 (Colin Rundel) Lecture 10 February 0, 01 9 / Multinomial Example Markov s Inequality Some regions of DNA have an elevated amount of GC relative to AT base pairs. If in a normal region of DNA we expect equal amounts of ACGT vs a GC rich region which has twice as much GC as AT. If we observe the following sequence ACTGACTTGGACCCGACGGA what is the probability that it is from a normal region or a GC rich region. For any random variable X 0 and constant a > 0, then P(X a) E(X ) a Corollary - Chebyshev s Inequality: P( X E(X ) a) Var(X ) a Statistics 104 (Colin Rundel) Lecture 10 February 0, 01 10 / Statistics 104 (Colin Rundel) Lecture 10 February 0, 01 11 /

Derivation of Markov s Inequality Derivation of Chebyshev s Inequality Let X be a random variable such that X 0 then { 1 if X a I X a 0 if X < a ai X a X E(aI X a ) E(X ) ae(i X a ) E(X ) P(X a) E(X ) a Proposition - for a non-decreasing function f (x) then P(X a) P(f (X ) f (a)) E(f (X )) f (a) If we define the positive valued random variable to be X E(X ) and f (x) x then P( X E(X ) a) P[(X E(X )] a ) E([X E(X )] ) a Var(X ) a If we define a kσ where σ Var(X ) then P( X E(X ) kσ) Var(X ) k σ 1 k Statistics 104 (Colin Rundel) Lecture 10 February 0, 01 1 / Statistics 104 (Colin Rundel) Lecture 10 February 0, 01 13 / Independent and Identically Distributed (iid) Sums of iid Random Variables Let X 1, X,, X n iid D where D is some probability distribution where E(X i ) µ and Var(X i ) σ. We defined S n X 1 + X + + X n A collection of random variables that share the same probability distribution and all are mutually independent. Example If X Binom(n, p) then X n Y i where Y 1,, Y n iid Bern(p) E(S n ) E(X 1 + X + + X n ) E(X 1 ) + E(X ) + + E(X n ) µ + µ + + µ nµ Var(S n ) E[((X 1 + X + + X n ) (µ + µ + + µ)) ] E[((X 1 mu) + (X µ) + + (X n µ)) ] n n n E[(X i µ) ] + E[(X i µ)(x j µ)] j1 i j n n n Var(X i ) + Cov(X i, X j ) nσ j1 i j Statistics 104 (Colin Rundel) Lecture 10 February 0, 01 14 / Statistics 104 (Colin Rundel) Lecture 10 February 0, 01 /

Average of iid Random Variables Weak Let X 1, X,, X n iid D where D is some probability distribution where E(X i ) µ and Var(X i ) σ. We defined X n (X 1 + X + + X n )/n Based on these results and Markov s Inequality we can show the following: E( X n ) E(S n /n) E(S n )/n µ Var( X n ) Var(S n /n) 1 n Var(S n) nσ n σ n P( X n µ > ɛ) P( S n nµ nɛ) P[(S n nµ) n ɛ ] Therefore, given σ < E[(S n nµ) ] n ɛ nσ n ɛ σ nɛ lim P( X n µ ɛ) 0 Statistics 104 (Colin Rundel) Lecture 10 February 0, 01 1 / Statistics 104 (Colin Rundel) Lecture 10 February 0, 01 17 / LLN and CLT Weak ( X n converges in probability to µ): lim P( X n µ > ɛ) 0 Strong ( X n converges almost surely to µ): P( lim X n µ) 1 Strong LLN is a more powerful result (Strong LLN implies Weak LLN), proof is more complicated. These results justify the long term frequency definition of probability Law of large numbers shows us that which shows that n >>> S n nµ. S n nµ lim lim n ( X n µ) 0 What happens if we divide by something that grows slower than n like n? S n n mu lim n lim n( X n µ) d N(0, σ ) This is the Central Limit Theorem, of which the DeMoivre-Laplace theorem for the normal approximation to the binomial is a special case. Hopefully by the end of this class we will have the tools to prove this. Statistics 104 (Colin Rundel) Lecture 10 February 0, 01 18 / Statistics 104 (Colin Rundel) Lecture 10 February 0, 01 19 /