Basics on Probability Jingrui He 09/11/2007
Coin Flips You flip a coin Head with probability 0.5 You flip 100 coins How many heads would you expect
Coin Flips cont. You flip a coin Head with probability p Binary random variable Bernoulli trial with success probability p You flip k coins How many heads would you expect Number of heads X: discrete random variable Binomial distribution with parameters k and p
Discrete Random Variables Random variables (RVs) which may take on only a countable number of distinct values E.g. the total number of heads X you get if you flip 100 coins X is a RV with arity k if it can take on exactly one value out of { x } 1, K, xk E.g. the possible values that X can take on are 0, 1, 2,, 100
Probability of Discrete RV Probability mass function (pmf): Easy facts about pmf P( X = x ) = 1 i ( = x ) i = x j = i j ( = x ) ( ) ( ) i = x j = = xi + = x j P X X 0 if i P X X P X P X if ( K ) x1 x2 x k P X = X = X = = 1 ( = x ) P X i i j
Common Distributions X U 1, K, N X takes values 1, 2,, N Uniform ( i) P X = = 1 E.g. picking balls of different colors from a box X Bin( n, p) X takes values 0, 1,, n Binomial N n i i P ( X = i) = p ( 1 p) E.g. coin flips [ ] n i
Coin Flips of Two Persons Your friend and you both flip coins Head with probability 0.5 You flip 50 times; your friend flip 100 times How many heads will both of you get
Joint Distribution Given two discrete RVs X and Y, their joint distribution is the distribution of X and Y together E.g. P(You get 21 heads AND you friend get 70 heads) x E.g. 50 100 y ( x y) P X = Y = = 1 ( i j ) P You get heads AND your friend get heads = 1 0 j 0 i= =
Conditional Probability P X = x Y = y is the probability of X = x, given the occurrence of ( ) E.g. you get 0 heads, given that your friend gets 61 heads ( x Y y) P X = = = ( = x Y = y) P X Y ( = y) P Y = y
Law of Total Probability Given two discrete RVs X and Y, which take values in { x } 1, K, xm and { y } 1, K, yn, We have ( = x ) ( ) i = = xi = y j P X P X Y j ( x ) ( ) i y j y j = P X = Y = P Y = j
Marginalization Marginal Probability Joint Probability ( = x ) ( ) i = = xi = y j P X P X Y j ( x ) ( ) i y j y j = P X = Y = P Y = j Conditional Probability Marginal Probability
Bayes Rule X and Y are discrete RVs ( x Y y) P X = = = ( = x Y = y) P X ( = y) P Y ( x Y ) i y j P X = = = ( = y ) ( ) j = xi = xi P( Y = y X ) P( X ) j = xk = xk P Y X P X k
Independent RVs Intuition: X and Y are independent means that X = that x neither makes it more or less probable Y = y Definition: X and Y are independent iff P X = x Y = y = P X = x P Y = y ( ) ( ) ( )
More on Independence ( = x = y) = ( = x) ( = y) P X Y P X P Y ( = x = y) = ( = x) P( Y = y X = x) = P( Y = y) P X Y P X E.g. no matter how many heads you get, your friend will not be affected, and vice versa
Conditionally Independent RVs Intuition: X and Y are conditionally independent given Z means that once Z is known, the value of X does not add any additional information about Y Definition: X and Y are conditionally independent given Z iff ( = x = y = z) = ( = x = z) ( = y = z) P X Y Z P X Z P Y Z
More on Conditional Independence ( = x = y = z) = ( = x = z) ( = y = z) P X Y Z P X Z P Y Z ( = x = y = z) = ( = x = z) P X Y, Z P X Z ( = y = x = z) = ( = y = z) P Y X, Z P Y Z
Monty Hall Problem You're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1 The host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. Do you want to pick door No. 2 instead?
Host reveals Goat A or Host reveals Goat B Host must reveal Goat B Host must reveal Goat A
Monty Hall Problem: Bayes Rule C i : the car is behind door i, i = 1, 2, 3 P ( C ) i = 1 3 H ij : the host opens door j after you pick door i ( ) ij Ck P H 0 i = j 0 j = k = 1 2 i = k 1 i k, j k
Monty Hall Problem: Bayes Rule cont. WLOG, i=1, j=3 ( ) P C H 1 13 = ( ) ( ) P H C P C 13 1 1 ( ) P H 13 1 1 1 P( H ) ( ) 13 C1 P C 1 = = 2 3 6
Monty Hall Problem: Bayes Rule cont. ( ) = (, ) + (, ) + (, ) P H P H C P H C P H C 13 13 1 13 2 13 3 1 1 = + 1 6 3 1 = 2 ( ) 1 6 1 P C1 H 13 = = 1 2 3 ( ) ( ) ( ) ( ) = P H C P C + P H C P C 13 1 1 13 2 2
Monty Hall Problem: Bayes Rule cont. ( ) 1 6 1 P C1 H 13 = = 1 2 3 1 2 P C H = 1 = > P C H 3 3 ( ) ( ) 2 13 1 13 You should switch!
Continuous Random Variables What if X is continuous? Probability density function (pdf) instead of probability mass function (pmf) A pdf is any function f ( x) that describes the probability density in terms of the input variable x.
PDF Properties of pdf f ( x) 0, x + f f ( x) = 1 ( x) 1??? Actual probability can be obtained by taking the integral of pdf E.g. the probability of X being between 0 and 1 is ( ) ( ) P 0 X 1 = f x dx 1 0
Cumulative Distribution Function ( ) ( ) FX v = P X v Discrete RVs FX ( v) = P( X = vi ) vi Continuous RVs X ( ) = ( ) d F x f x F v f x dx dx v X ( ) = ( )
Common Distributions Normal ( ) ( 2 ) X N µ, σ ( x µ ) 2 1 f x = exp, x 2πσ 2σ E.g. the height of the entire population 2 0.4 0.35 0.3 0.25 f(x) 0.2 0.15 0.1 0.05 0-5 -4-3 -2-1 0 1 2 3 4 5 x
Common Distributions cont. Beta ( α β ) ( ) X Beta α, β 1 α 1 f x;, = x 1 x, x 0,1 B (, ) β 1 ( ) [ ] α β α = β = 1: uniform distribution between 0 and 1 E.g. the conjugate prior for the parameter p in Binomial distribution 1.6 1.4 1.2 1 f(x) 0.8 0.6 0.4 0.2 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x
Joint Distribution Given two continuous RVs X and Y, the joint pdf can be written as f ( x y) X,Y, x y ( ) fx,y x, y dxdy = 1
Multivariate Normal Generalization to higher dimensions of the one-dimensional normal fv ( x, K, x ) = X 1 d d 2 1 2 ( 2π ) 1 Σ Covariance Matrix 1 v T 1 v exp Σ 2 ( x µ ) ( x µ ) Mean
Moments Mean (Expectation): µ = E ( X) Discrete RVs: E ( X) v P( X v ) Continuous RVs: Variance: Discrete RVs: Continuous RVs: = v i = i + ( X) ( ) E = xf x dx ( X) = ( X ) 2 V E µ 2 ( X) ( µ ) P( X ) = v i = i + 2 V v v ( X) ( µ ) ( ) V = x f x dx i i
Properties of Moments Mean E ( X + Y) = E ( X) + E ( Y) E ( ax) = ae ( X) If X and Y are independent, Variance 2 V ( ax + b) = a V ( X) If X and Y are independent, ( XY) = ( X) ( Y) E E E ( ) V X + Y = V (X) + V (Y)
Moments of Common Distributions Uniform X U 1, K, N ( 1 N ) 2 Binomial X Bin n, p Mean ; variance Mean np; variance Normal Mean µ ; variance Beta [ ] + ( 2 ) ( ) 2 np ( 2 ) X N µ, σ σ ( ) X Beta α, β ( ) Mean α α + β ; variance 2 N 1 12 αβ ( α + β ) 2 ( α + β + 1)
Probability of Events X denotes an event that could possibly happen E.g. X= you will fail in this course P(X) denotes the likelihood that X happens, or X=true What s the probability that you will fail in this course? Ω Ω = denotes the entire event set { X,X}
The Axioms of Probabilities 0 <= P(X) <= 1 P( Ω ) = 1 ( K) ( ) P X1 X2 P X i disjoint events, where are Useful rules P( X X ) = P( X ) + P( X ) P( X X ) = i 1 2 1 2 1 2 P( X) = 1 P( X) X i
Interpreting the Axioms Ω X 1 X 2