Probability Review. Yutian Li. January 18, Stanford University. Yutian Li (Stanford University) Probability Review January 18, / 27

Similar documents
Quick Tour of Basic Probability Theory and Linear Algebra

Review of Probability Theory

More on Distribution Function

1 Random Variable: Topics

Random Variables. Will Perkins. January 11, 2013

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

2 Continuous Random Variables and their Distributions

Recitation 2: Probability

Lecture 11. Probability Theory: an Overveiw

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).

SDS 321: Introduction to Probability and Statistics

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

STAT Chapter 5 Continuous Distributions

Probability. Paul Schrimpf. January 23, UBC Economics 326. Probability. Paul Schrimpf. Definitions. Properties. Random variables.

Probability Review. Chao Lan

1 Presessional Probability

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

Random Variables and Their Distributions

EE514A Information Theory I Fall 2013

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

STT 441 Final Exam Fall 2013

Chapter 3: Random Variables 1

Lecture 3. Discrete Random Variables

Random variables. DS GA 1002 Probability and Statistics for Data Science.

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Chapter 4. Continuous Random Variables 4.1 PDF

Lecture 2: Repetition of probability theory and statistics

Joint Distribution of Two or More Random Variables

Deep Learning for Computer Vision

Probability Theory. Patrick Lam

Chapter 4. Chapter 4 sections

Northwestern University Department of Electrical Engineering and Computer Science

Random Variables. Saravanan Vijayakumaran Department of Electrical Engineering Indian Institute of Technology Bombay

Probability and Distributions

Motivation and Applications: Why Should I Study Probability?

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Continuous Random Variables

Lecture 1. ABC of Probability

Fundamental Tools - Probability Theory II

Lecture 1: Review on Probability and Statistics

Definition: A random variable X is a real valued function that maps a sample space S into the space of real numbers R. X : S R

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Probability reminders

Algorithms for Uncertainty Quantification

CS145: Probability & Computing

Order Statistics. The order statistics of a set of random variables X 1, X 2,, X n are the same random variables arranged in increasing order.

Math Review Sheet, Fall 2008

Continuous Probability Spaces

Chapter 2. Continuous random variables

A Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes.

Chapter 3: Random Variables 1

1.1 Review of Probability Theory

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

1 Review of Probability and Distributions

Introduction to Probability

Review: mostly probability and some statistics

6.1 Moment Generating and Characteristic Functions

V7 Foundations of Probability Theory

Introduction to Machine Learning

Introduction to Probability Theory

Set Theory Digression

Chapter 3. Chapter 3 sections

Mathematical Preliminaries

Continuous Random Variables and Continuous Distributions

CME 106: Review Probability theory

Lecture 1. ABC of Probability

Single Maths B: Introduction to Probability

Probability and Statistics

Appendix A : Introduction to Probability and stochastic processes

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Mathematical Statistics 1 Math A 6330

1 Variance of a Random Variable

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

Review of Probability. CS1538: Introduction to Simulations

CS 246 Review of Proof Techniques and Probability 01/14/19

SDS 321: Introduction to Probability and Statistics

Some Basic Concepts of Probability and Information Theory: Pt. 2

Introduction to Stochastic Processes

Statistics STAT:5100 (22S:193), Fall Sample Final Exam B

Review for Exam Spring 2018

Formulas for probability theory and linear models SF2941

Continuous Distributions

1 Basic continuous random variable problems

General Random Variables

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

Probability Models. 4. What is the definition of the expectation of a discrete random variable?

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Introduction to Probability and Stocastic Processes - Part I

F X (x) = P [X x] = x f X (t)dt. 42 Lebesgue-a.e, to be exact 43 More specifically, if g = f Lebesgue-a.e., then g is also a pdf for X.

Two hours. Statistical Tables to be provided THE UNIVERSITY OF MANCHESTER. 14 January :45 11:45

Math-Stat-491-Fall2014-Notes-I

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Dept. of Linguistics, Indiana University Fall 2015

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

CMPSCI 240: Reasoning about Uncertainty

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 8 10/1/2008 CONTINUOUS RANDOM VARIABLES

Math Bootcamp 2012 Miscellaneous

Transcription:

Probability Review Yutian Li Stanford University January 18, 2018 Yutian Li (Stanford University) Probability Review January 18, 2018 1 / 27

Outline 1 Elements of probability 2 Random variables 3 Multiple random variables 4 Common inequalities Yutian Li (Stanford University) Probability Review January 18, 2018 2 / 27

Outline 1 Elements of probability 2 Random variables 3 Multiple random variables 4 Common inequalities Yutian Li (Stanford University) Probability Review January 18, 2018 3 / 27

Elements of probability Definition (Sample space Ω) The set of all the outcomes of a random experiment. Yutian Li (Stanford University) Probability Review January 18, 2018 4 / 27

Elements of probability Definition (Event space F) A set whose elements A F (called events) are subsets of Ω. Yutian Li (Stanford University) Probability Review January 18, 2018 5 / 27

Elements of probability Definition (Probability measure) A function P : F R that satisfies the following properties. P(A) 0, for all A F. P(Ω) = 1. If A 1, A 2,... are disjoint events, then P( i A i ) = i P(A i ). These three properties are called the the Axioms of Probability. Yutian Li (Stanford University) Probability Review January 18, 2018 6 / 27

Elements of probability Definition (Probability measure) A function P : F R that satisfies the following properties. P(A) 0, for all A F. P(Ω) = 1. If A 1, A 2,... are disjoint events, then P( i A i ) = i P(A i ). These three properties are called the the Axioms of Probability. Yutian Li (Stanford University) Probability Review January 18, 2018 6 / 27

Elements of probability Properties: 1 If A B = P(A) P(B). 2 P(A B) min(p(a), P(B)). 3 Union bound: P(A B) P(A) + P(B). 4 P(Ω \ A) = 1 P(A). 5 Law of total probability: If A 1,..., A k are a set of disjoint events such that k i=1 A i = Ω, then k P(A i ) = 1. i=1 Yutian Li (Stanford University) Probability Review January 18, 2018 7 / 27

Elements of probability Properties: 1 If A B = P(A) P(B). 2 P(A B) min(p(a), P(B)). 3 Union bound: P(A B) P(A) + P(B). 4 P(Ω \ A) = 1 P(A). 5 Law of total probability: If A 1,..., A k are a set of disjoint events such that k i=1 A i = Ω, then k P(A i ) = 1. i=1 Yutian Li (Stanford University) Probability Review January 18, 2018 7 / 27

Elements of probability Properties: 1 If A B = P(A) P(B). 2 P(A B) min(p(a), P(B)). 3 Union bound: P(A B) P(A) + P(B). 4 P(Ω \ A) = 1 P(A). 5 Law of total probability: If A 1,..., A k are a set of disjoint events such that k i=1 A i = Ω, then k P(A i ) = 1. i=1 Yutian Li (Stanford University) Probability Review January 18, 2018 7 / 27

Elements of probability Properties: 1 If A B = P(A) P(B). 2 P(A B) min(p(a), P(B)). 3 Union bound: P(A B) P(A) + P(B). 4 P(Ω \ A) = 1 P(A). 5 Law of total probability: If A 1,..., A k are a set of disjoint events such that k i=1 A i = Ω, then k P(A i ) = 1. i=1 Yutian Li (Stanford University) Probability Review January 18, 2018 7 / 27

Elements of probability Properties: 1 If A B = P(A) P(B). 2 P(A B) min(p(a), P(B)). 3 Union bound: P(A B) P(A) + P(B). 4 P(Ω \ A) = 1 P(A). 5 Law of total probability: If A 1,..., A k are a set of disjoint events such that k i=1 A i = Ω, then k P(A i ) = 1. i=1 Yutian Li (Stanford University) Probability Review January 18, 2018 7 / 27

Elements of probability Definition (Conditional probability) P(A B) = P(A B). P(B) Yutian Li (Stanford University) Probability Review January 18, 2018 8 / 27

Elements of probability Theorem (Chain rule) Let S 1,..., S k be events, P(S i ) > 0. Then P(S 1 S 2 S k ) =P(S 1 )P(S 2 S 1 )P(S 3 S 2 S 1 ) P(S k S 1 S 2 S k 1 ). Yutian Li (Stanford University) Probability Review January 18, 2018 9 / 27

Elements of probability Definition (Independence) Two events are called independent if and only if P(A B) = P(A)P(B). Yutian Li (Stanford University) Probability Review January 18, 2018 10 / 27

Outline 1 Elements of probability 2 Random variables 3 Multiple random variables 4 Common inequalities Yutian Li (Stanford University) Probability Review January 18, 2018 11 / 27

Random variables Definition (Random variable) A random variable X is a function X : Ω R. Typically we denote random variables using upper case letters X (ω) or more simply X. We denote the value that a random variable may take on using lower case letters x. Yutian Li (Stanford University) Probability Review January 18, 2018 12 / 27

Random variables Definition (Cumulative distribution function) A cumulative distribution function (CDF) is a function F X : R [0, 1] defined as F X (x) = P(X x). Properties: 1 0 F X (x) 1. 2 lim x F X (x) = 0. 3 lim x F X (x) = 1. 4 x y = F X (x) F X (y). 5 Right-continuous: lim x +a F X (x) = F x (a). Yutian Li (Stanford University) Probability Review January 18, 2018 13 / 27

Random variables Definition (Cumulative distribution function) A cumulative distribution function (CDF) is a function F X : R [0, 1] defined as F X (x) = P(X x). Properties: 1 0 F X (x) 1. 2 lim x F X (x) = 0. 3 lim x F X (x) = 1. 4 x y = F X (x) F X (y). 5 Right-continuous: lim x +a F X (x) = F x (a). Yutian Li (Stanford University) Probability Review January 18, 2018 13 / 27

Random variables Definition (Cumulative distribution function) A cumulative distribution function (CDF) is a function F X : R [0, 1] defined as F X (x) = P(X x). Properties: 1 0 F X (x) 1. 2 lim x F X (x) = 0. 3 lim x F X (x) = 1. 4 x y = F X (x) F X (y). 5 Right-continuous: lim x +a F X (x) = F x (a). Yutian Li (Stanford University) Probability Review January 18, 2018 13 / 27

Random variables Definition (Cumulative distribution function) A cumulative distribution function (CDF) is a function F X : R [0, 1] defined as F X (x) = P(X x). Properties: 1 0 F X (x) 1. 2 lim x F X (x) = 0. 3 lim x F X (x) = 1. 4 x y = F X (x) F X (y). 5 Right-continuous: lim x +a F X (x) = F x (a). Yutian Li (Stanford University) Probability Review January 18, 2018 13 / 27

Random variables Definition (Cumulative distribution function) A cumulative distribution function (CDF) is a function F X : R [0, 1] defined as F X (x) = P(X x). Properties: 1 0 F X (x) 1. 2 lim x F X (x) = 0. 3 lim x F X (x) = 1. 4 x y = F X (x) F X (y). 5 Right-continuous: lim x +a F X (x) = F x (a). Yutian Li (Stanford University) Probability Review January 18, 2018 13 / 27

Random variables Definition (Cumulative distribution function) A cumulative distribution function (CDF) is a function F X : R [0, 1] defined as F X (x) = P(X x). Properties: 1 0 F X (x) 1. 2 lim x F X (x) = 0. 3 lim x F X (x) = 1. 4 x y = F X (x) F X (y). 5 Right-continuous: lim x +a F X (x) = F x (a). Yutian Li (Stanford University) Probability Review January 18, 2018 13 / 27

Random variables Definition (Probability density function) If the cumulative distribution function F X is differentiable everywhere, we define the probability density function (PDF) as the derivative of the CDF, f X (x) = df X (x). dx Note that the PDF for a continuous random variable may not always exist. Properties: 1 f X (x) 0. 2 f X (x)dx = 1. 3 x A f X (x)dx = P(X A). Yutian Li (Stanford University) Probability Review January 18, 2018 14 / 27

Random variables Definition (Probability density function) If the cumulative distribution function F X is differentiable everywhere, we define the probability density function (PDF) as the derivative of the CDF, f X (x) = df X (x). dx Note that the PDF for a continuous random variable may not always exist. Properties: 1 f X (x) 0. 2 f X (x)dx = 1. 3 x A f X (x)dx = P(X A). Yutian Li (Stanford University) Probability Review January 18, 2018 14 / 27

Random variables Definition (Probability density function) If the cumulative distribution function F X is differentiable everywhere, we define the probability density function (PDF) as the derivative of the CDF, f X (x) = df X (x). dx Note that the PDF for a continuous random variable may not always exist. Properties: 1 f X (x) 0. 2 f X (x)dx = 1. 3 x A f X (x)dx = P(X A). Yutian Li (Stanford University) Probability Review January 18, 2018 14 / 27

Random variables Definition (Probability density function) If the cumulative distribution function F X is differentiable everywhere, we define the probability density function (PDF) as the derivative of the CDF, f X (x) = df X (x). dx Note that the PDF for a continuous random variable may not always exist. Properties: 1 f X (x) 0. 2 f X (x)dx = 1. 3 x A f X (x)dx = P(X A). Yutian Li (Stanford University) Probability Review January 18, 2018 14 / 27

Random variables Definition (Expectation) If X is a continuous random variable with PDF f X (x) and g : R R is an arbitrary function. The expectation or expected value of g(x ) is defined as E[g(X )] = Properties: g(x)f X (x)dx. 1 E[a] = a for any constant a R. 2 E[ag(X )] = ae[g(x )] for any constant a R. 3 Linearity: E[f (X ) + g(x )] = E[f (X )] + E[g(X )]. 4 For a discrete random variable X, E[1{X = k}] = P(X = k). Yutian Li (Stanford University) Probability Review January 18, 2018 15 / 27

Random variables Definition (Expectation) If X is a continuous random variable with PDF f X (x) and g : R R is an arbitrary function. The expectation or expected value of g(x ) is defined as E[g(X )] = Properties: g(x)f X (x)dx. 1 E[a] = a for any constant a R. 2 E[ag(X )] = ae[g(x )] for any constant a R. 3 Linearity: E[f (X ) + g(x )] = E[f (X )] + E[g(X )]. 4 For a discrete random variable X, E[1{X = k}] = P(X = k). Yutian Li (Stanford University) Probability Review January 18, 2018 15 / 27

Random variables Definition (Expectation) If X is a continuous random variable with PDF f X (x) and g : R R is an arbitrary function. The expectation or expected value of g(x ) is defined as E[g(X )] = Properties: g(x)f X (x)dx. 1 E[a] = a for any constant a R. 2 E[ag(X )] = ae[g(x )] for any constant a R. 3 Linearity: E[f (X ) + g(x )] = E[f (X )] + E[g(X )]. 4 For a discrete random variable X, E[1{X = k}] = P(X = k). Yutian Li (Stanford University) Probability Review January 18, 2018 15 / 27

Random variables Definition (Expectation) If X is a continuous random variable with PDF f X (x) and g : R R is an arbitrary function. The expectation or expected value of g(x ) is defined as E[g(X )] = Properties: g(x)f X (x)dx. 1 E[a] = a for any constant a R. 2 E[ag(X )] = ae[g(x )] for any constant a R. 3 Linearity: E[f (X ) + g(x )] = E[f (X )] + E[g(X )]. 4 For a discrete random variable X, E[1{X = k}] = P(X = k). Yutian Li (Stanford University) Probability Review January 18, 2018 15 / 27

Random variables Definition (Expectation) If X is a continuous random variable with PDF f X (x) and g : R R is an arbitrary function. The expectation or expected value of g(x ) is defined as E[g(X )] = Properties: g(x)f X (x)dx. 1 E[a] = a for any constant a R. 2 E[ag(X )] = ae[g(x )] for any constant a R. 3 Linearity: E[f (X ) + g(x )] = E[f (X )] + E[g(X )]. 4 For a discrete random variable X, E[1{X = k}] = P(X = k). Yutian Li (Stanford University) Probability Review January 18, 2018 15 / 27

Random variables Definition (Variance) The variance of a random variable X is a measure of how concentrated the distribution of a random variable X is around its mean. Formally, the variance of a random variable X is defined as Var[X ] = E[(X E[X ]) 2 ]. Alternatively, Var[X ] = E[X 2 ] E[X ] 2. Properties: 1 Var[a] = 0 for any constant a R. 2 Var[af (X )] = a 2 Var[f (X )] for any constant a R. Yutian Li (Stanford University) Probability Review January 18, 2018 16 / 27

Random variables Definition (Variance) The variance of a random variable X is a measure of how concentrated the distribution of a random variable X is around its mean. Formally, the variance of a random variable X is defined as Var[X ] = E[(X E[X ]) 2 ]. Alternatively, Var[X ] = E[X 2 ] E[X ] 2. Properties: 1 Var[a] = 0 for any constant a R. 2 Var[af (X )] = a 2 Var[f (X )] for any constant a R. Yutian Li (Stanford University) Probability Review January 18, 2018 16 / 27

Random variables Definition (Variance) The variance of a random variable X is a measure of how concentrated the distribution of a random variable X is around its mean. Formally, the variance of a random variable X is defined as Var[X ] = E[(X E[X ]) 2 ]. Alternatively, Var[X ] = E[X 2 ] E[X ] 2. Properties: 1 Var[a] = 0 for any constant a R. 2 Var[af (X )] = a 2 Var[f (X )] for any constant a R. Yutian Li (Stanford University) Probability Review January 18, 2018 16 / 27

Random variables Definition (Variance) The variance of a random variable X is a measure of how concentrated the distribution of a random variable X is around its mean. Formally, the variance of a random variable X is defined as Var[X ] = E[(X E[X ]) 2 ]. Alternatively, Var[X ] = E[X 2 ] E[X ] 2. Properties: 1 Var[a] = 0 for any constant a R. 2 Var[af (X )] = a 2 Var[f (X )] for any constant a R. Yutian Li (Stanford University) Probability Review January 18, 2018 16 / 27

Random variables Discrete random variables: X Bernoulli(p) (where 0 p 1): one if a coin with heads probability p comes up heads, zero otherwise. { p if p = 1 p(x) = 1 p if p = 0 X Binomial(n, p) (where 0 p 1): the number of heads in n independent flips of a coin with heads probability p. ( ) n p(x) = p x (1 p) n x x Yutian Li (Stanford University) Probability Review January 18, 2018 17 / 27

Random variables Discrete random variables: X Geometric(p) (where p > 0): the number of flips of a coin with heads probability p until the first heads. p(x) = p(1 p) x 1 X Poisson(λ) (where λ > 0): a probability distribution over the nonnegative integers used for modeling the frequency of rare events. λ λx p(x) = e x! Yutian Li (Stanford University) Probability Review January 18, 2018 18 / 27

Random variables Continuous random variables: X Uniform(a, b) (where a < b): equal probability density to every value between a and b on the real line. f (x) = { 1 b a if a x b 0 otherwise X Exponential(λ) (where λ > 0): decaying probability density over the negative reals. { λe λx if x 0 f (x) = 0 otherwise Yutian Li (Stanford University) Probability Review January 18, 2018 19 / 27

Random variables Continuous random variables: X Normal(µ, σ 2 ): also known as the Gaussian distribution. f (x) = 1 2πσ 2 e 1 2σ 2 (x µ)2 Yutian Li (Stanford University) Probability Review January 18, 2018 20 / 27

Outline 1 Elements of probability 2 Random variables 3 Multiple random variables 4 Common inequalities Yutian Li (Stanford University) Probability Review January 18, 2018 21 / 27

Multiple random variables Definition (Joint cumulative distribution function) Suppose that we have two random variables X and Y, F XY (x, y) = P(X x, Y y). The joint CDF F XY (x, y) and the marginal cumulative distribution functions F X (x) and F Y (y) of each variable separately are related by F X (x) = lim y F XY (x, y), F Y (y) = lim x F XY (x, y). Yutian Li (Stanford University) Probability Review January 18, 2018 22 / 27

Multiple random variables Definition (Joint cumulative distribution function) Suppose that we have two random variables X and Y, F XY (x, y) = P(X x, Y y). The joint CDF F XY (x, y) and the marginal cumulative distribution functions F X (x) and F Y (y) of each variable separately are related by F X (x) = lim y F XY (x, y), F Y (y) = lim x F XY (x, y). Yutian Li (Stanford University) Probability Review January 18, 2018 22 / 27

Multiple random variables In the discrete case, the conditional probability mass function of X given Y is simply p Y X (y x) = p XY (x, y), p X (x) assuming that p X (x) 0. A useful formula that often arises when trying to derive expression for the conditional probability of one variable given another, is Bayes rule: p Y X (y x) = p XY (x, y) p X (x) = p X Y (x y)p Y (y) y p X Y (x y )p Y (y ). Yutian Li (Stanford University) Probability Review January 18, 2018 23 / 27

Multiple random variables In the discrete case, the conditional probability mass function of X given Y is simply p Y X (y x) = p XY (x, y), p X (x) assuming that p X (x) 0. A useful formula that often arises when trying to derive expression for the conditional probability of one variable given another, is Bayes rule: p Y X (y x) = p XY (x, y) p X (x) = p X Y (x y)p Y (y) y p X Y (x y )p Y (y ). Yutian Li (Stanford University) Probability Review January 18, 2018 23 / 27

Outline 1 Elements of probability 2 Random variables 3 Multiple random variables 4 Common inequalities Yutian Li (Stanford University) Probability Review January 18, 2018 24 / 27

Common inequalities Theorem (Markov inequality) Let X be a non-negative random variable, then P(X a) E[X ] a. Yutian Li (Stanford University) Probability Review January 18, 2018 25 / 27

Common inequalities Theorem (Chebyshev inequality) Let X be a random variable, then P( X E[X ] a) Var(x) a 2. Yutian Li (Stanford University) Probability Review January 18, 2018 26 / 27

Common inequalities Theorem (Jensen inequality) Let φ be a convex function and X be a random variable, then E[φ(X )] φ(e[x ]). Yutian Li (Stanford University) Probability Review January 18, 2018 27 / 27