Probability Review Yutian Li Stanford University January 18, 2018 Yutian Li (Stanford University) Probability Review January 18, 2018 1 / 27
Outline 1 Elements of probability 2 Random variables 3 Multiple random variables 4 Common inequalities Yutian Li (Stanford University) Probability Review January 18, 2018 2 / 27
Outline 1 Elements of probability 2 Random variables 3 Multiple random variables 4 Common inequalities Yutian Li (Stanford University) Probability Review January 18, 2018 3 / 27
Elements of probability Definition (Sample space Ω) The set of all the outcomes of a random experiment. Yutian Li (Stanford University) Probability Review January 18, 2018 4 / 27
Elements of probability Definition (Event space F) A set whose elements A F (called events) are subsets of Ω. Yutian Li (Stanford University) Probability Review January 18, 2018 5 / 27
Elements of probability Definition (Probability measure) A function P : F R that satisfies the following properties. P(A) 0, for all A F. P(Ω) = 1. If A 1, A 2,... are disjoint events, then P( i A i ) = i P(A i ). These three properties are called the the Axioms of Probability. Yutian Li (Stanford University) Probability Review January 18, 2018 6 / 27
Elements of probability Definition (Probability measure) A function P : F R that satisfies the following properties. P(A) 0, for all A F. P(Ω) = 1. If A 1, A 2,... are disjoint events, then P( i A i ) = i P(A i ). These three properties are called the the Axioms of Probability. Yutian Li (Stanford University) Probability Review January 18, 2018 6 / 27
Elements of probability Properties: 1 If A B = P(A) P(B). 2 P(A B) min(p(a), P(B)). 3 Union bound: P(A B) P(A) + P(B). 4 P(Ω \ A) = 1 P(A). 5 Law of total probability: If A 1,..., A k are a set of disjoint events such that k i=1 A i = Ω, then k P(A i ) = 1. i=1 Yutian Li (Stanford University) Probability Review January 18, 2018 7 / 27
Elements of probability Properties: 1 If A B = P(A) P(B). 2 P(A B) min(p(a), P(B)). 3 Union bound: P(A B) P(A) + P(B). 4 P(Ω \ A) = 1 P(A). 5 Law of total probability: If A 1,..., A k are a set of disjoint events such that k i=1 A i = Ω, then k P(A i ) = 1. i=1 Yutian Li (Stanford University) Probability Review January 18, 2018 7 / 27
Elements of probability Properties: 1 If A B = P(A) P(B). 2 P(A B) min(p(a), P(B)). 3 Union bound: P(A B) P(A) + P(B). 4 P(Ω \ A) = 1 P(A). 5 Law of total probability: If A 1,..., A k are a set of disjoint events such that k i=1 A i = Ω, then k P(A i ) = 1. i=1 Yutian Li (Stanford University) Probability Review January 18, 2018 7 / 27
Elements of probability Properties: 1 If A B = P(A) P(B). 2 P(A B) min(p(a), P(B)). 3 Union bound: P(A B) P(A) + P(B). 4 P(Ω \ A) = 1 P(A). 5 Law of total probability: If A 1,..., A k are a set of disjoint events such that k i=1 A i = Ω, then k P(A i ) = 1. i=1 Yutian Li (Stanford University) Probability Review January 18, 2018 7 / 27
Elements of probability Properties: 1 If A B = P(A) P(B). 2 P(A B) min(p(a), P(B)). 3 Union bound: P(A B) P(A) + P(B). 4 P(Ω \ A) = 1 P(A). 5 Law of total probability: If A 1,..., A k are a set of disjoint events such that k i=1 A i = Ω, then k P(A i ) = 1. i=1 Yutian Li (Stanford University) Probability Review January 18, 2018 7 / 27
Elements of probability Definition (Conditional probability) P(A B) = P(A B). P(B) Yutian Li (Stanford University) Probability Review January 18, 2018 8 / 27
Elements of probability Theorem (Chain rule) Let S 1,..., S k be events, P(S i ) > 0. Then P(S 1 S 2 S k ) =P(S 1 )P(S 2 S 1 )P(S 3 S 2 S 1 ) P(S k S 1 S 2 S k 1 ). Yutian Li (Stanford University) Probability Review January 18, 2018 9 / 27
Elements of probability Definition (Independence) Two events are called independent if and only if P(A B) = P(A)P(B). Yutian Li (Stanford University) Probability Review January 18, 2018 10 / 27
Outline 1 Elements of probability 2 Random variables 3 Multiple random variables 4 Common inequalities Yutian Li (Stanford University) Probability Review January 18, 2018 11 / 27
Random variables Definition (Random variable) A random variable X is a function X : Ω R. Typically we denote random variables using upper case letters X (ω) or more simply X. We denote the value that a random variable may take on using lower case letters x. Yutian Li (Stanford University) Probability Review January 18, 2018 12 / 27
Random variables Definition (Cumulative distribution function) A cumulative distribution function (CDF) is a function F X : R [0, 1] defined as F X (x) = P(X x). Properties: 1 0 F X (x) 1. 2 lim x F X (x) = 0. 3 lim x F X (x) = 1. 4 x y = F X (x) F X (y). 5 Right-continuous: lim x +a F X (x) = F x (a). Yutian Li (Stanford University) Probability Review January 18, 2018 13 / 27
Random variables Definition (Cumulative distribution function) A cumulative distribution function (CDF) is a function F X : R [0, 1] defined as F X (x) = P(X x). Properties: 1 0 F X (x) 1. 2 lim x F X (x) = 0. 3 lim x F X (x) = 1. 4 x y = F X (x) F X (y). 5 Right-continuous: lim x +a F X (x) = F x (a). Yutian Li (Stanford University) Probability Review January 18, 2018 13 / 27
Random variables Definition (Cumulative distribution function) A cumulative distribution function (CDF) is a function F X : R [0, 1] defined as F X (x) = P(X x). Properties: 1 0 F X (x) 1. 2 lim x F X (x) = 0. 3 lim x F X (x) = 1. 4 x y = F X (x) F X (y). 5 Right-continuous: lim x +a F X (x) = F x (a). Yutian Li (Stanford University) Probability Review January 18, 2018 13 / 27
Random variables Definition (Cumulative distribution function) A cumulative distribution function (CDF) is a function F X : R [0, 1] defined as F X (x) = P(X x). Properties: 1 0 F X (x) 1. 2 lim x F X (x) = 0. 3 lim x F X (x) = 1. 4 x y = F X (x) F X (y). 5 Right-continuous: lim x +a F X (x) = F x (a). Yutian Li (Stanford University) Probability Review January 18, 2018 13 / 27
Random variables Definition (Cumulative distribution function) A cumulative distribution function (CDF) is a function F X : R [0, 1] defined as F X (x) = P(X x). Properties: 1 0 F X (x) 1. 2 lim x F X (x) = 0. 3 lim x F X (x) = 1. 4 x y = F X (x) F X (y). 5 Right-continuous: lim x +a F X (x) = F x (a). Yutian Li (Stanford University) Probability Review January 18, 2018 13 / 27
Random variables Definition (Cumulative distribution function) A cumulative distribution function (CDF) is a function F X : R [0, 1] defined as F X (x) = P(X x). Properties: 1 0 F X (x) 1. 2 lim x F X (x) = 0. 3 lim x F X (x) = 1. 4 x y = F X (x) F X (y). 5 Right-continuous: lim x +a F X (x) = F x (a). Yutian Li (Stanford University) Probability Review January 18, 2018 13 / 27
Random variables Definition (Probability density function) If the cumulative distribution function F X is differentiable everywhere, we define the probability density function (PDF) as the derivative of the CDF, f X (x) = df X (x). dx Note that the PDF for a continuous random variable may not always exist. Properties: 1 f X (x) 0. 2 f X (x)dx = 1. 3 x A f X (x)dx = P(X A). Yutian Li (Stanford University) Probability Review January 18, 2018 14 / 27
Random variables Definition (Probability density function) If the cumulative distribution function F X is differentiable everywhere, we define the probability density function (PDF) as the derivative of the CDF, f X (x) = df X (x). dx Note that the PDF for a continuous random variable may not always exist. Properties: 1 f X (x) 0. 2 f X (x)dx = 1. 3 x A f X (x)dx = P(X A). Yutian Li (Stanford University) Probability Review January 18, 2018 14 / 27
Random variables Definition (Probability density function) If the cumulative distribution function F X is differentiable everywhere, we define the probability density function (PDF) as the derivative of the CDF, f X (x) = df X (x). dx Note that the PDF for a continuous random variable may not always exist. Properties: 1 f X (x) 0. 2 f X (x)dx = 1. 3 x A f X (x)dx = P(X A). Yutian Li (Stanford University) Probability Review January 18, 2018 14 / 27
Random variables Definition (Probability density function) If the cumulative distribution function F X is differentiable everywhere, we define the probability density function (PDF) as the derivative of the CDF, f X (x) = df X (x). dx Note that the PDF for a continuous random variable may not always exist. Properties: 1 f X (x) 0. 2 f X (x)dx = 1. 3 x A f X (x)dx = P(X A). Yutian Li (Stanford University) Probability Review January 18, 2018 14 / 27
Random variables Definition (Expectation) If X is a continuous random variable with PDF f X (x) and g : R R is an arbitrary function. The expectation or expected value of g(x ) is defined as E[g(X )] = Properties: g(x)f X (x)dx. 1 E[a] = a for any constant a R. 2 E[ag(X )] = ae[g(x )] for any constant a R. 3 Linearity: E[f (X ) + g(x )] = E[f (X )] + E[g(X )]. 4 For a discrete random variable X, E[1{X = k}] = P(X = k). Yutian Li (Stanford University) Probability Review January 18, 2018 15 / 27
Random variables Definition (Expectation) If X is a continuous random variable with PDF f X (x) and g : R R is an arbitrary function. The expectation or expected value of g(x ) is defined as E[g(X )] = Properties: g(x)f X (x)dx. 1 E[a] = a for any constant a R. 2 E[ag(X )] = ae[g(x )] for any constant a R. 3 Linearity: E[f (X ) + g(x )] = E[f (X )] + E[g(X )]. 4 For a discrete random variable X, E[1{X = k}] = P(X = k). Yutian Li (Stanford University) Probability Review January 18, 2018 15 / 27
Random variables Definition (Expectation) If X is a continuous random variable with PDF f X (x) and g : R R is an arbitrary function. The expectation or expected value of g(x ) is defined as E[g(X )] = Properties: g(x)f X (x)dx. 1 E[a] = a for any constant a R. 2 E[ag(X )] = ae[g(x )] for any constant a R. 3 Linearity: E[f (X ) + g(x )] = E[f (X )] + E[g(X )]. 4 For a discrete random variable X, E[1{X = k}] = P(X = k). Yutian Li (Stanford University) Probability Review January 18, 2018 15 / 27
Random variables Definition (Expectation) If X is a continuous random variable with PDF f X (x) and g : R R is an arbitrary function. The expectation or expected value of g(x ) is defined as E[g(X )] = Properties: g(x)f X (x)dx. 1 E[a] = a for any constant a R. 2 E[ag(X )] = ae[g(x )] for any constant a R. 3 Linearity: E[f (X ) + g(x )] = E[f (X )] + E[g(X )]. 4 For a discrete random variable X, E[1{X = k}] = P(X = k). Yutian Li (Stanford University) Probability Review January 18, 2018 15 / 27
Random variables Definition (Expectation) If X is a continuous random variable with PDF f X (x) and g : R R is an arbitrary function. The expectation or expected value of g(x ) is defined as E[g(X )] = Properties: g(x)f X (x)dx. 1 E[a] = a for any constant a R. 2 E[ag(X )] = ae[g(x )] for any constant a R. 3 Linearity: E[f (X ) + g(x )] = E[f (X )] + E[g(X )]. 4 For a discrete random variable X, E[1{X = k}] = P(X = k). Yutian Li (Stanford University) Probability Review January 18, 2018 15 / 27
Random variables Definition (Variance) The variance of a random variable X is a measure of how concentrated the distribution of a random variable X is around its mean. Formally, the variance of a random variable X is defined as Var[X ] = E[(X E[X ]) 2 ]. Alternatively, Var[X ] = E[X 2 ] E[X ] 2. Properties: 1 Var[a] = 0 for any constant a R. 2 Var[af (X )] = a 2 Var[f (X )] for any constant a R. Yutian Li (Stanford University) Probability Review January 18, 2018 16 / 27
Random variables Definition (Variance) The variance of a random variable X is a measure of how concentrated the distribution of a random variable X is around its mean. Formally, the variance of a random variable X is defined as Var[X ] = E[(X E[X ]) 2 ]. Alternatively, Var[X ] = E[X 2 ] E[X ] 2. Properties: 1 Var[a] = 0 for any constant a R. 2 Var[af (X )] = a 2 Var[f (X )] for any constant a R. Yutian Li (Stanford University) Probability Review January 18, 2018 16 / 27
Random variables Definition (Variance) The variance of a random variable X is a measure of how concentrated the distribution of a random variable X is around its mean. Formally, the variance of a random variable X is defined as Var[X ] = E[(X E[X ]) 2 ]. Alternatively, Var[X ] = E[X 2 ] E[X ] 2. Properties: 1 Var[a] = 0 for any constant a R. 2 Var[af (X )] = a 2 Var[f (X )] for any constant a R. Yutian Li (Stanford University) Probability Review January 18, 2018 16 / 27
Random variables Definition (Variance) The variance of a random variable X is a measure of how concentrated the distribution of a random variable X is around its mean. Formally, the variance of a random variable X is defined as Var[X ] = E[(X E[X ]) 2 ]. Alternatively, Var[X ] = E[X 2 ] E[X ] 2. Properties: 1 Var[a] = 0 for any constant a R. 2 Var[af (X )] = a 2 Var[f (X )] for any constant a R. Yutian Li (Stanford University) Probability Review January 18, 2018 16 / 27
Random variables Discrete random variables: X Bernoulli(p) (where 0 p 1): one if a coin with heads probability p comes up heads, zero otherwise. { p if p = 1 p(x) = 1 p if p = 0 X Binomial(n, p) (where 0 p 1): the number of heads in n independent flips of a coin with heads probability p. ( ) n p(x) = p x (1 p) n x x Yutian Li (Stanford University) Probability Review January 18, 2018 17 / 27
Random variables Discrete random variables: X Geometric(p) (where p > 0): the number of flips of a coin with heads probability p until the first heads. p(x) = p(1 p) x 1 X Poisson(λ) (where λ > 0): a probability distribution over the nonnegative integers used for modeling the frequency of rare events. λ λx p(x) = e x! Yutian Li (Stanford University) Probability Review January 18, 2018 18 / 27
Random variables Continuous random variables: X Uniform(a, b) (where a < b): equal probability density to every value between a and b on the real line. f (x) = { 1 b a if a x b 0 otherwise X Exponential(λ) (where λ > 0): decaying probability density over the negative reals. { λe λx if x 0 f (x) = 0 otherwise Yutian Li (Stanford University) Probability Review January 18, 2018 19 / 27
Random variables Continuous random variables: X Normal(µ, σ 2 ): also known as the Gaussian distribution. f (x) = 1 2πσ 2 e 1 2σ 2 (x µ)2 Yutian Li (Stanford University) Probability Review January 18, 2018 20 / 27
Outline 1 Elements of probability 2 Random variables 3 Multiple random variables 4 Common inequalities Yutian Li (Stanford University) Probability Review January 18, 2018 21 / 27
Multiple random variables Definition (Joint cumulative distribution function) Suppose that we have two random variables X and Y, F XY (x, y) = P(X x, Y y). The joint CDF F XY (x, y) and the marginal cumulative distribution functions F X (x) and F Y (y) of each variable separately are related by F X (x) = lim y F XY (x, y), F Y (y) = lim x F XY (x, y). Yutian Li (Stanford University) Probability Review January 18, 2018 22 / 27
Multiple random variables Definition (Joint cumulative distribution function) Suppose that we have two random variables X and Y, F XY (x, y) = P(X x, Y y). The joint CDF F XY (x, y) and the marginal cumulative distribution functions F X (x) and F Y (y) of each variable separately are related by F X (x) = lim y F XY (x, y), F Y (y) = lim x F XY (x, y). Yutian Li (Stanford University) Probability Review January 18, 2018 22 / 27
Multiple random variables In the discrete case, the conditional probability mass function of X given Y is simply p Y X (y x) = p XY (x, y), p X (x) assuming that p X (x) 0. A useful formula that often arises when trying to derive expression for the conditional probability of one variable given another, is Bayes rule: p Y X (y x) = p XY (x, y) p X (x) = p X Y (x y)p Y (y) y p X Y (x y )p Y (y ). Yutian Li (Stanford University) Probability Review January 18, 2018 23 / 27
Multiple random variables In the discrete case, the conditional probability mass function of X given Y is simply p Y X (y x) = p XY (x, y), p X (x) assuming that p X (x) 0. A useful formula that often arises when trying to derive expression for the conditional probability of one variable given another, is Bayes rule: p Y X (y x) = p XY (x, y) p X (x) = p X Y (x y)p Y (y) y p X Y (x y )p Y (y ). Yutian Li (Stanford University) Probability Review January 18, 2018 23 / 27
Outline 1 Elements of probability 2 Random variables 3 Multiple random variables 4 Common inequalities Yutian Li (Stanford University) Probability Review January 18, 2018 24 / 27
Common inequalities Theorem (Markov inequality) Let X be a non-negative random variable, then P(X a) E[X ] a. Yutian Li (Stanford University) Probability Review January 18, 2018 25 / 27
Common inequalities Theorem (Chebyshev inequality) Let X be a random variable, then P( X E[X ] a) Var(x) a 2. Yutian Li (Stanford University) Probability Review January 18, 2018 26 / 27
Common inequalities Theorem (Jensen inequality) Let φ be a convex function and X be a random variable, then E[φ(X )] φ(e[x ]). Yutian Li (Stanford University) Probability Review January 18, 2018 27 / 27