Random variables. DS GA 1002 Probability and Statistics for Data Science.

Similar documents
DS-GA 1002 Lecture notes 2 Fall Random variables

Probability and Statistics for Data Science. Carlos Fernandez-Granda

Random Processes. DS GA 1002 Probability and Statistics for Data Science.

Northwestern University Department of Electrical Engineering and Computer Science

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

CMPSCI 240: Reasoning Under Uncertainty

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Brief Review of Probability

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Lecture 3. Discrete Random Variables

Random variable X is a mapping that maps each outcome s in the sample space to a unique real number x, x. X s. Real Line

Lecture Notes 2 Random Variables. Random Variable

Lecture Notes 2 Random Variables. Discrete Random Variables: Probability mass function (pmf)

Sample Spaces, Random Variables

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 8 10/1/2008 CONTINUOUS RANDOM VARIABLES

EE/CpE 345. Modeling and Simulation. Fall Class 5 September 30, 2002

Slides 8: Statistical Models in Simulation

Multivariate random variables

System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models

Statistics: Learning models from data

Statistics and Econometrics I

(Ch 3.4.1, 3.4.2, 4.1, 4.2, 4.3)

Guidelines for Solving Probability Problems

Multivariate random variables

Probability and Distributions

(Ch 3.4.1, 3.4.2, 4.1, 4.2, 4.3)

The random variable 1

X 1 ((, a]) = {ω Ω : X(ω) a} F, which leads us to the following definition:

The exponential distribution and the Poisson process

Chapter 2: Random Variables

Fundamental Tools - Probability Theory II

Probability Review. Yutian Li. January 18, Stanford University. Yutian Li (Stanford University) Probability Review January 18, / 27

Lecture 2: Repetition of probability theory and statistics

Review of Probability Theory

Random Variables. Definition: A random variable (r.v.) X on the probability space (Ω, F, P) is a mapping

Chapter 1. Sets and probability. 1.3 Probability space

Probability Density Functions and the Normal Distribution. Quantitative Understanding in Biology, 1.2

Chapter 1 Statistical Reasoning Why statistics? Section 1.1 Basics of Probability Theory

2 Continuous Random Variables and their Distributions

Probability (continued)

Chapter 2. Random Variable. Define single random variables in terms of their PDF and CDF, and calculate moments such as the mean and variance.

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

ECE 302: Probabilistic Methods in Electrical Engineering

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

Continuous Probability Spaces

Stat 512 Homework key 2

Random variable X is a mapping that maps each outcome s in the sample space to a unique real number x, < x <. ( ) X s. Real Line

Moments. Raw moment: February 25, 2014 Normalized / Standardized moment:

Stochastic Models in Computer Science A Tutorial

Expectation. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

Computer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr.

1: PROBABILITY REVIEW

Continuous Distributions

Chapter 3: Random Variables 1

Notes on Continuous Random Variables

Math Review Sheet, Fall 2008

Chapter 2 Random Variables

Common ontinuous random variables

Expectation. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Chapter 5. Statistical Models in Simulations 5.1. Prof. Dr. Mesut Güneş Ch. 5 Statistical Models in Simulations

SDS 321: Introduction to Probability and Statistics

1 Random Variable: Topics

Chapter 3: Random Variables 1

Discrete Random Variables

Example A. Define X = number of heads in ten tosses of a coin. What are the values that X may assume?

MFM Practitioner Module: Quantitative Risk Management. John Dodson. September 23, 2015

Lecture 8 : The Geometric Distribution

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Stat 426 : Homework 1.

Probability Review. Gonzalo Mateos

Random Variables Example:

1 Review of Probability

BINOMIAL DISTRIBUTION

Basics of Stochastic Modeling: Part II

Algorithms for Uncertainty Quantification

Quick Tour of Basic Probability Theory and Linear Algebra

Bayesian statistics. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Continuous-Valued Probability Review

Topic 3: The Expectation of a Random Variable

BMIR Lecture Series on Probability and Statistics Fall, 2015 Uniform Distribution

Transform Techniques - CF

Probability, Random Processes and Inference

Random Variable. Pr(X = a) = Pr(s)

functions Poisson distribution Normal distribution Arbitrary functions

Continuous Random Variables

Probability distributions. Probability Distribution Functions. Probability distributions (contd.) Binomial distribution

Lecture 2. Distributions and Random Variables

SDS 321: Introduction to Probability and Statistics

Relationship between probability set function and random variable - 2 -

Name: Firas Rassoul-Agha

Chapter 4: Continuous Probability Distributions

Part I Stochastic variables and Markov chains

HW7 Solutions. f(x) = 0 otherwise. 0 otherwise. The density function looks like this: = 20 if x [10, 90) if x [90, 100]

Things to remember when learning probability distributions:

3 Continuous Random Variables

Random Variables and Their Distributions

Transcription:

Random variables DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda

Motivation Random variables model numerical quantities that are uncertain They allow us to structure the information we have about these quantities in a principled way

Definition Given a probability space (Ω, F, P), a random variable X is a function from the sample space Ω to the real numbers R We use uppercase letters to denote random variables: X, Y,... Once the outcome ω Ω is revealed, X (ω) is the realization of X We use lowercase letters to denote numerical values: x, y,...

Characterization Given a probability space (Ω, F, P), for any set S P (X S) = P ({ω X (ω) S}). We will almost never construct probabilistic models like this!

Discrete random variables Discrete random variables take values on a finite or countably infinite subset of R such as the integers The probability mass function (pmf) of X is defined as p X (x) := P ({ω X (ω) = x}) In words, p X (x) is the probability that X equals x The pmf completely specifies a random variable

Probability mass function If D is the range of X, then ( D, 2 D ), p X is a valid probability space Any pmf satisfies p X (x) 0 for any x D, p X (x) = 1 x D P (X S) = x S p X (x) for any S D

Probability mass function 0.4 0.3 px (x) 0.2 0.1 0 1 2 3 4 5 x

Example P (X {1, 4}) P (X > 3)

Example P (X {1, 4}) = p X (1) + p X (4) = 0.5 P (X > 3)

Example P (X {1, 4}) = p X (1) + p X (4) = 0.5 P (X > 3) = p X (4) + p X (5) = 0.6

Defining a discrete random variable To define a discrete random variable X we just need A discrete range D A nonnegative function p X satisfying p X (x) = 1 x D

Bernoulli random variable Experiment with two possible outcomes (coin flip with bias p) p X (0) = 1 p p X (1) = p Special case: Indicator random variable of an event S { 1, if ω S, 1 S (ω) = 0, otherwise Bernoulli with parameter P (S) Allows to represent an event by a random variable

Example: Coin flips You flip a coin with bias p until you obtain heads (flips are independent) If you model the number of flips as a random variable X, what is p X?

Example: Coin flips p X (k)

Example: Coin flips p X (k) = P (k flips)

Example: Coin flips p X (k) = P (k flips) = P (1st flip = tails,..., k 1th flip = tails, kth flip = heads)

Example: Coin flips p X (k) = P (k flips) = P (1st flip = tails,..., k 1th flip = tails, kth flip = heads) = P (1st flip = tails) P (k 1th flip = tails) P (kth flip = heads)

Example: Coin flips p X (k) = P (k flips) = P (1st flip = tails,..., k 1th flip = tails, kth flip = heads) = P (1st flip = tails) P (k 1th flip = tails) P (kth flip = heads) = (1 p) k 1 p

Geometric random variable The pmf of a geometric random variable with parameter p is p X (k) = (1 p) k 1 p k = 1, 2,...

Geometric random variable p = 0.2 0.8 0.6 px (k) 0.4 0.2 0 2 4 6 8 10 k

Geometric random variable p = 0.5 0.8 0.6 px (k) 0.4 0.2 0 2 4 6 8 10 k

Geometric random variable p = 0.8 0.8 0.6 px (k) 0.4 0.2 0 2 4 6 8 10 k

Example: Coin flips You flip a coin with bias p n times (flips are independent) If you model the number of heads as a random variable X, what is p X?

Example: Coin flips What is the probability of getting k heads and then n k tails? P (k heads, then n k tails)

Example: Coin flips What is the probability of getting k heads and then n k tails? P (k heads, then n k tails) = P (1st = heads,..., kth = heads, k + 1th = tails,..., nth = tails)

Example: Coin flips What is the probability of getting k heads and then n k tails? P (k heads, then n k tails) = P (1st = heads,..., kth = heads, k + 1th = tails,..., nth = tails) = P (1st = heads) P (kth = heads) P (k + 1th = tails) P (nth = tails)

Example: Coin flips What is the probability of getting k heads and then n k tails? P (k heads, then n k tails) = P (1st = heads,..., kth = heads, k + 1th = tails,..., nth = tails) = P (1st = heads) P (kth = heads) P (k + 1th = tails) P (nth = tails) = p k (1 p) n k

Example: Coin flips Any fixed order of k heads and n k tails has the same probability

Example: Coin flips Any fixed order of k heads and n k tails has the same probability We are interested in the union of these events

Example: Coin flips Any fixed order of k heads and n k tails has the same probability We are interested in the union of these events Can we just add their probabilities?

Example: Coin flips Any fixed order of k heads and n k tails has the same probability We are interested in the union of these events Can we just add their probabilities? How many possible orders are there?

Example: Coin flips Any fixed order of k heads and n k tails has the same probability We are interested in the union of these events Can we just add their probabilities? How many possible orders are there? ( ) n := k n! k! (n k)!

Example: Coin flips Any fixed order of k heads and n k tails has the same probability We are interested in the union of these events Can we just add their probabilities? How many possible orders are there? ( ) n := k p X (k) = n! k! (n k)! ( ) n p k (1 p) n k k

Binomial random variable The pmf of a binomial random variable with parameters n and p is p X (k) = ( ) n p k (1 p) n k, k k = 0, 1, 2,..., n

Binomial random variable n = 20, p = 0.2 0.25 0.2 px (k) 0.15 0.1 5 10 2 0 0 5 10 15 20 k

Binomial random variable n = 20, p = 0.5 0.25 0.2 px (k) 0.15 0.1 5 10 2 0 0 5 10 15 20 k

Binomial random variable n = 20, p = 0.8 0.25 0.2 px (k) 0.15 0.1 5 10 2 0 0 5 10 15 20 k

Example: Call center Model the number of calls received per day Assumptions: 1. Each call occurs independently from every other call 2. A given call has the same probability of occurring at any given time of the day 3. Calls occur at a rate of λ calls per day

Example: Call center Discretize day into n slots

Example: Call center Discretize day into n slots Probability of receiving m calls in one slot?

Example: Call center Discretize day into n slots Probability of receiving m calls in one slot? (λ/n) m

Example: Call center Discretize day into n slots Probability of receiving m calls in one slot? (λ/n) m If n is large enough λ/n >> (λ/n) m for all m > 1

Example: Call center Discretize day into n slots Probability of receiving m calls in one slot? (λ/n) m If n is large enough λ/n >> (λ/n) m for all m > 1 Assume that in each slot we either receive one call or none at all. What is probability of k calls in a day?

Example: Call center Discretize day into n slots Probability of receiving m calls in one slot? (λ/n) m If n is large enough λ/n >> (λ/n) m for all m > 1 Assume that in each slot we either receive one call or none at all. What is probability of k calls in a day? Binomial with parameters n and p := λ/n!

Example: Call center P (k calls during the day )

Example: Call center P (k calls during the day ) = lim P (k calls in n small intervals) n

Example: Call center P (k calls during the day ) = lim P (k calls in n small intervals) n ( ) n = lim p k (1 p) (n k) n k

Example: Call center P (k calls during the day ) = lim n = lim n = lim n P (k calls in n small intervals) ( ) n p k (1 p) (n k) k ( n k ) ( λ n ) k ( 1 λ ) (n k) n

Example: Call center P (k calls during the day ) = lim n = lim n = lim n = lim n P (k calls in n small intervals) ( ) n p k (1 p) (n k) k ( n k ) ( λ n ) k ( 1 λ ) (n k) n n! λ k k! (n k)! (n λ) k ( 1 λ ) n n

Example: Call center P (k calls during the day ) = lim n = lim n = lim n = lim n P (k calls in n small intervals) ( ) n p k (1 p) (n k) k ( n k = λk e λ k! ) ( λ n ) k ( 1 λ ) (n k) n n! λ k k! (n k)! (n λ) k ( 1 λ ) n n Identity proved in the notes lim n n! (n k)! (n λ) k ( 1 λ ) n = e λ n

Poisson random variable The pmf of a Poisson random variable with parameter λ is p X (k) = λk e λ k! k = 0, 1, 2,...

Poisson random variable λ = 10 0.15 0.1 px (k) 5 10 2 0 0 10 20 30 40 50 k

Poisson random variable λ = 20 0.15 0.1 px (k) 5 10 2 0 0 10 20 30 40 50 k

Poisson random variable λ = 30 0.15 0.1 px (k) 5 10 2 0 0 10 20 30 40 50 k

Example: Call center Pmf of binomial with parameters n and p = λ n Poisson with parameter λ converges to pmf of This is an example of convergence in distribution

Binomial random variable n = 40, p = 20 40 0.15 0.1 5 10 2 0 0 10 20 30 40 k

Binomial random variable n = 80, p = 20 80 0.15 0.1 5 10 2 0 0 10 20 30 40 k

Binomial random variable n = 400, p = 20 400 0.15 0.1 5 10 2 0 0 10 20 30 40 k

Poisson random variable λ = 20 0.15 0.1 5 10 2 0 0 10 20 30 40 k

Call-center data Assumptions do not hold over the whole day (why?) They do hold (approximately) for intervals of time Example: Data from a call center in Israel We compare the histogram of the number of calls received in an interval of 4 hours over 2 months and the pmf of a Poisson random variable fitted to the data

Call-center data 0.14 0.12 0.10 0.08 0.06 0.04 0.02 Real data Poisson distribution 0.00 0 5 10 15 20 25 30 35 40 Number of calls

Continuous random variables Useful to model continuous quantities without discretizing Assigning nonzero probabilities to events of the form {X = x} for x R doesn t work! Instead, we only consider events of the form {X S} where S is a union of intervals (formally a Borel set) We cannot consider every possible subset of R for technical reasons

Cumulative distribution function The cumulative distribution function (cdf) of X is defined as F X (x) := P ({X (ω) x : ω Ω}) = P (X x) In words, F X (x) is the probability of X being smaller than x The cdf can be defined for both continuous and discrete random variables

Cumulative distribution function The cdf completely specifies the distribution of the random variable The probability of any interval (a, b] is given by P (a < X b) = P (X b) P (X a) = F X (b) F X (a) To define a continuous random variable we just need a valid cdf! A valid underlying probability space exists, but we don t need to worry about it

Properties of the cdf lim F X (x) = 0, x lim F X (x) = 1, x F X (b) F X (a) if b > a, i.e. F X is nondecreasing

Example 0 for x < 0 0.5 x for 0 x 1 F X (x) := 0.5 ( for 1 x 2 0.5 1 + (x 2) 2) for 2 x 3 1 for x > 3

Example 1 FX (x) 0.5 0 1 0 0.5 1 2 2.5 3 4 x

Example P (0.5 < X 2.5) =

Example P (0.5 < X 2.5) = F X (2.5) F X (0.5) = 0.375

Example 1 FX (x) 0.5 P(X (0.5, 2.5]) 0 1 0 0.5 1 2 2.5 3 4 x

Probability density function When the cdf is differentiable, its derivative can be interpreted as a density Probability density function f X (x) := df X (x) d x The pdf is not a probability measure! (It can be greater than 1)

Probability density function By the fundamental theorem of calculus Intuitively, P (a < X b) = F X (b) F X (a) = b a f X (x) dx lim P (X (x, x + )) = f X (x) 0

Properties of the pdf For any union of intervals (any Borel set) S P (X S) = f X (x) dx In particular, S f X (x) dx = 1 From the monotonicity of the cdf f X (x) 0

Example 0 for x < 0 0.5 x for 0 x 1 F X (x) := 0.5 ( for 1 x 2 0.5 1 + (x 2) 2) for 2 x 3 1 for x > 3 f X (x) = for x < 0, for 0 x 1 for 1 x 2 for 2 x 3 for x > 3.

Example 0 for x < 0 0.5 x for 0 x 1 F X (x) := 0.5 ( for 1 x 2 0.5 1 + (x 2) 2) for 2 x 3 1 for x > 3 f X (x) = 0 for x < 0, for 0 x 1 for 1 x 2 for 2 x 3 for x > 3.

Example 0 for x < 0 0.5 x for 0 x 1 F X (x) := 0.5 ( for 1 x 2 0.5 1 + (x 2) 2) for 2 x 3 1 for x > 3 0 for x < 0, 0.5 for 0 x 1 f X (x) = for 1 x 2 for 2 x 3 for x > 3.

Example 0 for x < 0 0.5 x for 0 x 1 F X (x) := 0.5 ( for 1 x 2 0.5 1 + (x 2) 2) for 2 x 3 1 for x > 3 0 for x < 0, 0.5 for 0 x 1 f X (x) = 0 for 1 x 2 for 2 x 3 for x > 3.

Example 0 for x < 0 0.5 x for 0 x 1 F X (x) := 0.5 ( for 1 x 2 0.5 1 + (x 2) 2) for 2 x 3 1 for x > 3 0 for x < 0, 0.5 for 0 x 1 f X (x) = 0 for 1 x 2 x 2 for 2 x 3 for x > 3.

Example 0 for x < 0 0.5 x for 0 x 1 F X (x) := 0.5 ( for 1 x 2 0.5 1 + (x 2) 2) for 2 x 3 1 for x > 3 0 for x < 0, 0.5 for 0 x 1 f X (x) = 0 for 1 x 2 x 2 for 2 x 3 0 for x > 3.

Example FX (x) 1 0.5 0 0 0.5 1 2 2.5 3 x fx (x) 1 0.5 0 0 1 2 3 x

Example P (0.5 < X 2.5) =

Example P (0.5 < X 2.5) = 2.5 0.5 f X (x) dx

Example P (0.5 < X 2.5) = = 2.5 0.5 1 0.5 f X (x) dx 0.5 dx + 2.5 2 x 2 dx = 0.375

Example 1 P (X (0.5, 2.5]) fx (x) 0.5 0 0 1 2 3 x

Uniform random variable Pdf of a uniform random variable with domain [a, b]: { 1 f X (x) = b a, if a x b, 0, otherwise

Uniform random variable in [a, b] 1 b a 1 fx (x) FX (x) 0 0 a x b a x b

Exponential random variable Used to model waiting times (time until a certain event occurs) Examples: decay of a radioactive particle, telephone call, mechanical failure of a device Pdf of an exponential random variable with parameter λ: { λe λx, if x 0, f X (x) = 0, otherwise

Exponential random variables 1.5 λ = 0.5 λ = 1.0 λ = 1.5 fx (x) 1 0.5 0 0 2 4 6 8 x

Call-center data Example: Data from a call center in Israel We compare the histogram of the inter-arrival times between calls occurring between 8 pm and midnight over two days and the pdf of an exponential random variable fitted to the data

Call center 0.9 0.8 Exponential distribution Real data 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0 1 2 3 4 5 6 7 8 9 Interarrival times (s)

Gaussian or normal random variable Extremely popular in probabilistic models and statistics Sums of independent random variables converge to Gaussian distributions under certain assumptions Pdf of a Gaussian random variable with mean µ and standard deviation σ f X (x) = 1 e (x µ)2 2σ 2 2πσ

Gaussian random variables 0.4 0.3 µ = 2 σ = 1 µ = 0 σ = 2 µ = 0 σ = 4 fx (x) 0.2 0.1 0 10 5 0 5 10 x

Height data Example: Data from a population of 25 000 people We compare the histogram of the heights and the pdf of a Gaussian random variable fitted to the data

Height data 0.25 0.20 Gaussian distribution Real data 0.15 0.10 0.05 60 62 64 66 68 70 72 74 76 Height (inches)

Problem The Gaussian cdf does not have a closed form solution This complicates computing the probability that a Gaussian belongs to a set

Standard Gaussian If X is Gaussian with mean µ and standard deviation σ, then U := X µ σ is a standard Gaussian, with mean zero and unit standard deviation ( [ X µ a µ P (X [a, b]) = P σ σ, b µ ]) σ ( ) ( ) b µ a µ = Φ Φ σ σ Φ is the cdf of a standard Gaussian

Beta random variable Useful in Bayesian statistics Unimodal continuous distribution in the unit interval The pdf of a beta distribution with parameters a and b is defined as f β (θ; a, b) := { θ a 1 (1 θ) b 1 β(a,b), if 0 θ 1, 0 otherwise β (a, b) := u a 1 (1 u) b 1 du u

Beta random variables fx (x) 6 4 2 a = 1 b = 1 a = 1 b = 2 a = 3 b = 3 a = 6 b = 2 a = 3 b = 15 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x

Conditioning on an event We usually define random variables using their pmf, cdf or pdf How can we incorporate the information that X S for some set S?

Conditional pmf If X has pmf p X, the conditional pmf of X given X S is p X X S (x) := P (X = x X S) { px (x) = s S p X (s) if x S 0 otherwise. Valid pmf in the new probability space restricted to the event {X S}

Conditional cdf If X has pdf f X, the conditional cdf of X given X S is F X X S (x) := P (X x X S) P (X x, X S) = P (X S) u x,u S = f X (u) du u S f X (u) du Valid cdf in the new probability space restricted to the event {X S}

Example: Geometric random variables are memoryless We flip a coin repeatedly until we obtain heads, but pause after k 0 flips (which were tails) What is the probability of obtaining heads in k more flips?

Example: Geometric random variables are memoryless P (k more flips)

Example: Geometric random variables are memoryless P (k more flips) = p X X >k0 (k)

Example: Geometric random variables are memoryless P (k more flips) = p X X >k0 (k) = p X (k) m=k 0 +1 p X (m)

Example: Geometric random variables are memoryless P (k more flips) = p X X >k0 (k) = = p X (k) m=k 0 +1 p X (m) (1 p) k 1 p m=k 0 +1 (1 p)m 1 p

Example: Geometric random variables are memoryless P (k more flips) = p X X >k0 (k) = = p X (k) m=k 0 +1 p X (m) (1 p) k 1 p m=k 0 +1 (1 p)m 1 p = (1 p) k k 0 1 p for k > k 0 Geometric series: m=k 0 +1 α m 1 = αk 0 1 α for any α < 1

Example: Exponential random variables are memoryless Assume email inter-arrival times are exponential with parameter λ You get an email, then no email for t 0 minutes How is the waiting time until the next email distributed now?

Example: Exponential random variables are memoryless F T T >t0 (t)

Example: Exponential random variables are memoryless F T T >t0 (t) = t t 0 f T (u) du f T (u) du t 0

Example: Exponential random variables are memoryless F T T >t0 (t) = = t t 0 f T (u) du f T (u) du t 0 t t 0 λe λu du λe λu du t 0

Example: Exponential random variables are memoryless F T T >t0 (t) = = t t 0 f T (u) du f T (u) du t 0 t t 0 λe λu du λe λu du t 0 = e λt e λt 0 e λt 0

Example: Exponential random variables are memoryless F T T >t0 (t) = = t t 0 f T (u) du f T (u) du t 0 t t 0 λe λu du λe λu du t 0 = e λt e λt 0 e λt 0 = 1 e λ(t t 0) for t > t 0

Example: Exponential random variables are memoryless F T T >t0 (t) = = t t 0 f T (u) du f T (u) du t 0 t t 0 λe λu du λe λu du t 0 = e λt e λt 0 e λt 0 = 1 e λ(t t 0) for t > t 0 Differentiating with respect to t f T T >t0 (t) = λe λ(t t 0) for t > t 0

Functions of random variables For any deterministic function g and r.v. X, Y := g (X ) is a random variable Formally, X maps elements of Ω to R, so Y does too since Y (ω) = g (X (ω))

Discrete random variables If X is discrete p Y (y) = P (Y = y) = P (g (X ) = y) = p X (x) {x g(x)=y}

Continuous random variables If X is continuous F Y (y) = P (Y y) = P (g (X ) y) = f X (x) dx, {x g(x) y} Then we can differentiate to obtain the pdf f Y

Gaussian random variable If X is a Gaussian random variable with mean µ and standard deviation σ, derive the distribution of U := X µ σ

Gaussian random variable F U (u)

Gaussian random variable ( ) X µ F U (u) = P u σ

Gaussian random variable ( X µ F U (u) = P σ = (x µ)/σ u ) u 1 2πσ e (x µ)2 2σ 2 dx

Gaussian random variable ( X µ F U (u) = P σ = = (x µ)/σ u u ) u 1 e (x µ)2 2σ 2 dx 2πσ 1 2π e w2 2 dw by the change of variables w = x µ σ

Gaussian random variable ( X µ F U (u) = P σ = = (x µ)/σ u u ) u 1 e (x µ)2 2σ 2 dx 2πσ 1 2π e w2 2 dw by the change of variables w = x µ σ To obtain the pdf we differentiate with respect to u

Gaussian random variable ( X µ F U (u) = P σ = = (x µ)/σ u u ) u 1 e (x µ)2 2σ 2 dx 2πσ 1 2π e w2 2 dw by the change of variables w = x µ σ To obtain the pdf we differentiate with respect to u f U (u) = 1 2π e u2 2

Gaussian random variable ( X µ F U (u) = P σ = = (x µ)/σ u u ) u 1 e (x µ)2 2σ 2 dx 2πσ 1 2π e w2 2 dw by the change of variables w = x µ σ To obtain the pdf we differentiate with respect to u f U (u) = 1 2π e u2 2 U is a standard Gaussian random variable

Generating random variables Simulation is crucial to leverage probabilistic models effectively (life is not a homework problem!) It requires being able to sample from arbitrary distributions General approach: 1. Generate samples uniformly from the unit interval [0, 1] 2. Transform the samples so that they have the desired distribution

Sampling from a discrete distribution Aim: Generate a discrete random variable X with pmf p X using samples from a uniform random variable U Possible values of X : x 1 < x 2 <... How can we assign samples u 1, u 2,... from U to x 1, x 2,...?

Sampling from a discrete distribution x 3 x 2 x 1 0 u 1 u 2 u 3 u 4 u 5 1

Sampling from a discrete distribution Idea: Assign samples in an interval of length p X (x i ) to x i

Sampling from a discrete distribution x 3 x 2 x 1 0 u 1 u 2 u 3 u 4 u 5 1

Sampling from a discrete distribution x 3 x 2 x 1 0 u 1 u 2 u 3 u 4 u 5 1

Sampling from a discrete distribution x 1 if 0 U p X (x 1 ) x 2 if p X (x 1 ) U p X (x 1 ) + p X (x 2 ) X =... x i if i 1 j=1 p X (x j ) U i j=1 p X (x j )...

Sampling from a discrete distribution x 1 if 0 U F X (x 1 ) x 2 if F X (x 1 ) U F X (x 2 ) X =... x i if F X (x i 1 ) U F X (x i )...

Sampling from a discrete distribution x 3 x 2 x 1 0 u 1 F X (x 1 ) u 2 u 3 u 4 F X (x 2 ) u 5 1

Inverse-transform sampling Aim: Generate a continuous random variable X with cdf F X using samples from a uniform random variable U Algorithm: 1. Obtain a sample u of U 2. Set x := F 1 X (u)

Inverse-transform sampling F Y (y)

Inverse-transform sampling F Y (y) = P (Y y)

Inverse-transform sampling F Y (y) = P (Y y) = P ( F 1 X (U) y)

Inverse-transform sampling F Y (y) = P (Y y) = P ( F 1 X (U) y) = P (U F X (y))

Inverse-transform sampling F Y (y) = P (Y y) = P ( F 1 X (U) y) = P (U F X (y)) = FX (y) u=0 du

Inverse-transform sampling F Y (y) = P (Y y) = P ( F 1 X (U) y) = P (U F X (y)) = FX (y) u=0 = F X (y) du

Generating an exponential random variable Aim: Generate an exponential random variable X with parameter λ F X (x) := 1 e λx F 1 X (u) = 1 ( ) 1 λ log 1 u F 1 X (U) is an exponential random variable with parameter λ

Generating an exponential random variable F 1 X (u 5) F 1 X (u 4) F 1 X (u 3) F 1 F 1 X (u 2) X (u 1) 0 u 1 u 2 u 3 u 4 u 51