Convergence of Random Processes

Similar documents
Applied Statistics. Monte Carlo Simulations. Troels C. Petersen (NBI) Statistics is merely a quantisation of common sense

Expectation. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Random Processes. DS GA 1002 Probability and Statistics for Data Science.

Expectation. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

Computer intensive statistical methods Lecture 1

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

MATH Notebook 5 Fall 2018/2019

Random variables. DS GA 1002 Probability and Statistics for Data Science.

Week 9 The Central Limit Theorem and Estimation Concepts

STAT/MATH 395 PROBABILITY II

Descriptive Statistics

Things to remember when learning probability distributions:

Probability and Statistics for Data Science. Carlos Fernandez-Granda

Monte Carlo Methods. Part I: Introduction

Introduction to Machine Learning CMU-10701

Chapter 5. Means and Variances

Statistical Data Analysis

Probability Distributions Columns (a) through (d)

STAT/MATH 395 A - PROBABILITY II UW Winter Quarter Moment functions. x r p X (x) (1) E[X r ] = x r f X (x) dx (2) (x E[X]) r p X (x) (3)

CSE 312: Foundations of Computing II Quiz Section #10: Review Questions for Final Exam (solutions)

ECE 302 Division 1 MWF 10:30-11:20 (Prof. Pollak) Final Exam Solutions, 5/3/2004. Please read the instructions carefully before proceeding.

Northwestern University Department of Electrical Engineering and Computer Science

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Lecture Notes 7 Random Processes. Markov Processes Markov Chains. Random Processes

STAT 414: Introduction to Probability Theory

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Probability Models. 4. What is the definition of the expectation of a discrete random variable?

Lecture 1. ABC of Probability

Chapter 6: Large Random Samples Sections

Introduction and Overview STAT 421, SP Course Instructor

Discrete Distributions

STAT 418: Probability and Stochastic Processes

Lecture 6. Probability events. Definition 1. The sample space, S, of a. probability experiment is the collection of all

Business Statistics Midterm Exam Fall 2015 Russell. Please sign here to acknowledge

(b). What is an expression for the exact value of P(X = 4)? 2. (a). Suppose that the moment generating function for X is M (t) = 2et +1 3

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

A Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes.

Math/Stat 394 Homework 5

Statistics for Economists. Lectures 3 & 4

Order Statistics and Distributions

Math 493 Final Exam December 01

CS145: Probability & Computing

1: PROBABILITY REVIEW

1. Let X and Y be independent exponential random variables with rate α. Find the densities of the random variables X 3, X Y, min(x, Y 3 )

Lecture 5: The Principle of Deferred Decisions. Chernoff Bounds

CPSC 320 Sample Final Examination December 2013

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) HIGHER CERTIFICATE IN STATISTICS, 1996

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

IE 230 Probability & Statistics in Engineering I. Closed book and notes. 60 minutes.

Bayesian statistics. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Twelfth Problem Assignment

Fundamentals of Applied Probability and Random Processes

Massachusetts Institute of Technology

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Lecture 3. Discrete Random Variables

BINOMIAL DISTRIBUTION

Fundamental Tools - Probability Theory IV

IE 336 Seat # Name (clearly) Closed book. One page of hand-written notes, front and back. No calculator. 60 minutes.

Continuous random variables

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014

Special Discrete RV s. Then X = the number of successes is a binomial RV. X ~ Bin(n,p).

Math Review Sheet, Fall 2008

STAT 302 Introduction to Probability Learning Outcomes. Textbook: A First Course in Probability by Sheldon Ross, 8 th ed.

Data, Estimation and Inference

IB Mathematics HL Year 2 Unit 7 (Core Topic 6: Probability and Statistics) Valuable Practice

Stat 100a, Introduction to Probability.

ECE302 Spring 2015 HW10 Solutions May 3,

Probability reminders

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

Math 2311 Test 1 Review. 1. State whether each situation is categorical or quantitative. If quantitative, state whether it s discrete or continuous.

DS-GA 1002 Lecture notes 2 Fall Random variables

MAT 271E Probability and Statistics

Class 26: review for final exam 18.05, Spring 2014

Lecture 18: Central Limit Theorem. Lisa Yan August 6, 2018

Overview. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

2 Random Variable Generation

Stat Lecture 20. Last class we introduced the covariance and correlation between two jointly distributed random variables.

Learning Objectives for Stat 225

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.

PRACTICE PROBLEMS FOR EXAM 2

3. Review of Probability and Statistics

Question 2 We saw in class that if we flip a coin K times, the the distribution in runs of success in the resulting sequence is equal to

EECS 126 Probability and Random Processes University of California, Berkeley: Fall 2014 Kannan Ramchandran September 23, 2014.

Mathematical Statistics 1 Math A 6330

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

Brief Review of Probability

Midterm Exam 1 Solution

Stat 5101 Lecture Slides: Deck 7 Asymptotics, also called Large Sample Theory. Charles J. Geyer School of Statistics University of Minnesota

Swarthmore Honors Exam 2012: Statistics

Expectation, inequalities and laws of large numbers

Each trial has only two possible outcomes success and failure. The possible outcomes are exactly the same for each trial.

Introduction to Probability

Lecture 4: Two-point Sampling, Coupon Collector s problem

variance of independent variables: sum of variances So chebyshev predicts won t stray beyond stdev.

Lecture 1: August 28

Continuous-time Markov Chains

It can be shown that if X 1 ;X 2 ;:::;X n are independent r.v. s with

1 Presessional Probability

MA , Probability (Dr Chernov) Final Exam Tue, March 7, 2000 Student s name Be sure to show all your work. Each problem is 4 points.

Transcription:

Convergence of Random Processes DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda

Aim Define convergence for random processes Describe two convergence phenomena: the law of large numbers and the central limit theorem

Types of convergence Law of Large Numbers Central Limit Theorem Monte Carlo simulation

Convergence of deterministic sequences A deterministic sequence of real numbers x 1, x 2,... converges to x R, lim x i = x i if x i is arbitrarily close to x as i grows For any ɛ > 0 there is an i 0 such that for all i > i 0 x i x < ɛ Problem: Random sequences do not have fixed values

Convergence with probability one Consider a discrete random process X and a random variable X defined on the same probability space If we fix the outcome ω, X (i, ω) is a deterministic sequence and X (ω) is a constant We can determine whether for that particular ω lim X (i, ω) = X (ω) i

Convergence with probability one X converges with probability one to X if ({ P ω ω Ω, }) lim X (ω, i) = X (ω) = 1 i Deterministic convergence occurs with probability one

Puddle Initial amount of water is uniform between 0 and 1 gallon After a time interval i there is i times less water D (ω, i) := ω, i = 1, 2,... i

Puddle 0.8 ω = 0.31 ω = 0.89 ω = 0.52 D (ω, i) 0.6 0.4 0.2 0 1 2 3 4 5 6 7 8 9 10 i

Puddle If we fix ω (0, 1) lim D (ω, i) = lim i = 0 i i ω D converges to zero with probability one

Puddle 1 D (ω, i) 0.5 0 0 10 20 30 40 50 i

Alternative idea Idea: Instead of fixing ω and checking deterministic convergence: 1. Measure how close X (i) and X are for a fixed i using a deterministic quantity 2. Check whether the quantity tends to zero

Convergence in mean square The mean square of Y X measures how close X and Y are ( If E (X Y ) 2) = 0 then X = Y with probability one Proof: By Markov s inequality for any ɛ > 0 ( ) E ((X Y ) 2) P (Y X ) 2 > ɛ = 0 ɛ

Convergence in mean square X converges to X in mean square if ( ( ) lim E X X 2 (i)) = 0 i

Convergence in probability Alternative measure: Probability that Y X > ɛ for small ɛ X converges to X in probability if for any ɛ > 0 ( ) X lim P X (i) > ɛ = 0 i

Conv. in mean square implies conv. in probability ( ) X lim P X (i) > ɛ i

Conv. in mean square implies conv. in probability ( ) ( ( X lim P X (i) > ɛ = lim P X X ) 2 (i) > ɛ 2) i i

Conv. in mean square implies conv. in probability ( ) ( ( X lim P X (i) > ɛ = lim P X X ) 2 (i) > ɛ 2) i i ( ( E X X ) ) 2 (i) lim i ɛ 2

Conv. in mean square implies conv. in probability ( ) ( ( X lim P X (i) > ɛ = lim P X X ) 2 (i) > ɛ 2) i i ( ( E X X ) ) 2 (i) lim i ɛ 2 = 0

Conv. in mean square implies conv. in probability ( ) ( ( X lim P X (i) > ɛ = lim P X X ) 2 (i) > ɛ 2) i i ( ( E X X ) ) 2 (i) lim i ɛ 2 = 0 Convergence with probability one also implies convergence in probability

Convergence in distribution The distribution of X (i) converges to the distribution of X X converges in distribution to X if for all x at which F X is continuous lim F i X (i) (x) = F X (x)

Convergence in distribution Convergence in distribution does not imply that X (i) and X are close as i! Convergence in probability does imply convergence in distribution

Binomial tends to Poisson X (i) is binomial with parameters i and p := λ/i X is a Poisson random variable with parameter λ X (i) converges to X in distribution ( ) i lim p i X (i) (x) = lim p x (1 p) (i x) i x = λx e λ x! = p X (x)

Probability mass function of X (40) 0.15 0.1 5 10 2 0 0 10 20 30 40 k

Probability mass function of X (80) 0.15 0.1 5 10 2 0 0 10 20 30 40 k

Probability mass function of X (400) 0.15 0.1 5 10 2 0 0 10 20 30 40 k

Probability mass function of X 0.15 0.1 5 10 2 0 0 10 20 30 40 k

Types of convergence Law of Large Numbers Central Limit Theorem Monte Carlo simulation

Moving average The moving average à of a discrete random process X is à (i) := 1 i i X (j) j=1

Weak law of large numbers Let X be an iid discrete random process with mean µ X := µ and bounded variance σ 2 The average à of X converges in mean square to µ

Proof ) E (Ã (i)

Proof ) E (Ã (i) = E 1 i i X (j) j=1

Proof ) E (Ã (i) = E 1 i = 1 i i j=1 i X (j) j=1 ( ) E X (j)

Proof ) E (Ã (i) = E 1 i = 1 i = µ i j=1 i X (j) j=1 ( ) E X (j)

Proof ) Var (Ã (i)

Proof ) Var (Ã (i) = Var 1 i i X (j) j=1

Proof ) Var (Ã (i) = Var 1 i = 1 i 2 i j=1 i X (j) j=1 ( ) Var X (j)

Proof ) Var (Ã (i) = Var 1 i = 1 i 2 = σ2 i i j=1 i X (j) j=1 ( ) Var X (j)

Proof ( (Ã ) ) 2 lim E (i) µ i

Proof ( (Ã ) ) ( 2 (Ã )) ) 2 lim E (i) µ = lim E (i) E (Ã (i) i i

Proof ( (Ã ) ) ( 2 (Ã )) ) 2 lim E (i) µ = lim E (i) E (Ã (i) i i ) = lim Var (Ã (i) i

Proof ( (Ã ) ) ( 2 (Ã )) ) 2 lim E (i) µ = lim E (i) E (Ã (i) i i ) = lim Var (Ã (i) i = lim i σ 2 i

Proof ( (Ã ) ) ( 2 (Ã )) ) 2 lim E (i) µ = lim E (i) E (Ã (i) i i ) = lim Var (Ã (i) i = lim i σ 2 i = 0

Strong law of large numbers Let X be an iid discrete random process with mean µ X := µ and bounded variance σ 2 The average à of X converges with probability one to µ

iid standard Gaussian 2.0 1.5 1.0 Moving average Mean of iid seq. 0.5 0.0 0.5 1.0 1.5 2.0 0 10 20 30 i 40 50

iid standard Gaussian 2.0 1.5 1.0 Moving average Mean of iid seq. 0.5 0.0 0.5 1.0 1.5 2.0 0 100 200 300 i 400 500

iid standard Gaussian 2.0 1.5 1.0 Moving average Mean of iid seq. 0.5 0.0 0.5 1.0 1.5 2.0 0 1000 2000 3000 4000 5000 i

iid geometric with p = 0.4 12 10 8 6 4 2 Moving average Mean of iid seq. 0 0 10 20 30 40 50 i

iid geometric with p = 0.4 12 10 8 6 4 2 Moving average Mean of iid seq. 0 0 100 200 300 400 500 i

iid geometric with p = 0.4 12 10 8 6 4 2 Moving average Mean of iid seq. 0 0 1000 2000 3000 4000 5000 i

iid Cauchy 30 25 20 15 10 5 0 Moving average Median of iid seq. 5 0 10 20 30 40 50 i

iid Cauchy 10 5 Moving average Median of iid seq. 0 5 10 0 100 200 300 400 500 i

iid Cauchy 30 20 10 Moving average Median of iid seq. 0 10 20 30 40 50 60 0 1000 2000 3000 4000 5000 i

Types of convergence Law of Large Numbers Central Limit Theorem Monte Carlo simulation

Central Limit Theorem Let X be an iid discrete random process with mean µ X := µ and bounded variance σ 2 ) n (à µ converges in distribution to a Gaussian random variable with mean 0 and variance σ 2 The average à is approximately Gaussian with mean µ and variance σ2 /i

Height data Example: Data from a population of 25 000 people We compare the histogram of the heights and the pdf of a Gaussian random variable fitted to the data

Height data 0.25 0.20 Gaussian distribution Real data 0.15 0.10 0.05 60 62 64 66 68 70 72 74 76 Height (inches)

Sketch of proof Pdf of sum of two independent random variables is the convolution of their pdfs f X +Y (z) = y= f X (z y) f Y (y) dy Repeated convolutions of any pdf with bounded variance result in a Gaussian!

Repeated convolutions i = 1 i = 2 i = 3 i = 4 i = 5

Repeated convolutions i = 1 i = 2 i = 3 i = 4 i = 5

iid exponential λ = 2, i = 10 2 9 8 7 6 5 4 3 2 1 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65

iid exponential λ = 2, i = 10 3 30 25 20 15 10 5 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65

iid exponential λ = 2, i = 10 4 90 80 70 60 50 40 30 20 10 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65

iid geometric p = 0.4, i = 10 2 2.5 2.0 1.5 1.0 0.5 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2

iid geometric p = 0.4, i = 10 3 7 6 5 4 3 2 1 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2

iid geometric p = 0.4, i = 10 4 25 20 15 10 5 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2

iid Cauchy, i = 10 2 0.30 0.25 0.20 0.15 0.10 0.05 20 15 10 5 0 5 10 15

iid Cauchy, i = 10 3 0.30 0.25 0.20 0.15 0.10 0.05 20 15 10 5 0 5 10 15

iid Cauchy, i = 10 4 0.30 0.25 0.20 0.15 0.10 0.05 20 15 10 5 0 5 10 15

Gaussian approximation to the binomial X is binomial with parameters n and p Computing the probability that X is in a certain interval requires summing its pmf over the interval Central limit theorem provides a quick approximation X = n B i, E (B i ) = p, Var (B i ) = p (1 p) i=1 1 nx is approximately Gaussian with mean p and variance p (1 p) /n X is approximately Gaussian with mean np and variance np (1 p)

Gaussian approximation to the binomial Basketball player makes shot with probability p = 0.4 (shots are iid) Probability that she makes more than 420 shots out of 1000? Exact answer: Approximation: P (X 420) = = P (X 420) 1000 x=420 1000 x=420 p X (x) ( 1000 x ) 0.4 x 0.6 (n x) = 10.4 10 2

Gaussian approximation to the binomial Basketball player makes shot with probability p = 0.4 (shots are iid) Probability that she makes more than 420 shots out of 1000? Exact answer: P (X 420) = = 1000 x=420 1000 x=420 p X (x) ( 1000 x ) 0.4 x 0.6 (n x) = 10.4 10 2 Approximation: ( ) P (X 420) P np (1 p)u + np 420

Gaussian approximation to the binomial Basketball player makes shot with probability p = 0.4 (shots are iid) Probability that she makes more than 420 shots out of 1000? Exact answer: P (X 420) = = 1000 x=420 1000 x=420 p X (x) ( 1000 x ) 0.4 x 0.6 (n x) = 10.4 10 2 Approximation: ( ) P (X 420) P np (1 p)u + np 420 = P (U 1.29)

Gaussian approximation to the binomial Basketball player makes shot with probability p = 0.4 (shots are iid) Probability that she makes more than 420 shots out of 1000? Exact answer: P (X 420) = = 1000 x=420 1000 x=420 p X (x) ( 1000 x ) 0.4 x 0.6 (n x) = 10.4 10 2 Approximation: ( ) P (X 420) P np (1 p)u + np 420 = P (U 1.29) = 1 Φ (1.29) = 9.85 10 2

Types of convergence Law of Large Numbers Central Limit Theorem Monte Carlo simulation

Monte Carlo simulation Simulation is a powerful tool in probability and statistics Models are too complex to derive closed-form solutions (life is not a homework problem!) Example: Game of solitaire

Game of solitaire Aim: Compute the probability that you win at solitaire If every permutation of the cards has the same probability P (Win) = Number of permutations that lead to a win Total number Problem: Characterizing permutations that lead to a win is very difficult without playing out the game We can t just check because there are 52! 8 10 67 permutations! Solution: Sample many permutations and compute the fraction of wins

In the words of Stanislaw Ulam The first thoughts and attempts I made to practice (the Monte Carlo Method) were suggested by a question which occurred to me in 1946 as I was convalescing from an illness and playing solitaires. The question was what are the chances that a Canfield solitaire laid out with 52 cards will come out successfully? After spending a lot of time trying to estimate them by pure combinatorial calculations, I wondered whether a more practical method than "abstract thinking" might not be to lay it out say one hundred times and simply observe and count the number of successful plays.this was already possible to envisage with the beginning of the new era of fast computers.

Monte Carlo approximation Main principle: Use simulation to approximate quantities that are challenging to compute exactly To approximate the probability of an event E 1. Generate n independent samples from 1 E : I 1, I 2,..., I n 2. Compute the average of the n samples n à (n) := 1 n i=1 I i By the law of large numbers à converges to P (E) as n since E (1 E ) = P (E)

Basketball league Basketball league with m teams In a season every pair of teams plays once Teams are ordered: team 1 is best, team m is worst Model: For 1 i < j m Games are independent P (team j beats team i) := 1 j i + 1

Basketball league Aim: Compute probability of team ranks at the end of the season The rank of team i is modeled as a random variable R i Pmf of R 1, R 2,..., R m?

m = 3 Game outcomes Rank 1-2 1-3 2-3 R 1 R 2 R 3 Probability 1 1 2 1 2 3 1/6 1 1 3 1 3 2 1/6 1 3 2 1 1 1 1/12 1 3 3 2 3 1 1/12 2 1 2 2 1 3 1/6 2 1 3 1 1 1 1/6 2 3 2 3 1 2 1/12 2 3 3 3 2 1 1/12

m = 3 Probability mass function R 1 R 2 R 3 1 7/12 1/2 5/12 2 1/4 1/4 1/4 3 1/6 1/4 1/3

Basketball league Problem: Number of possible outcomes is 2 m(m 1)/2! For m = 10 this is larger than 10 13 Solution: Apply Monte Carlo approximation

m = 3 Game outcomes Rank 1-2 1-3 2-3 R 1 R 2 R 3 1 3 2 1 1 1 1 1 3 1 3 2 2 1 2 2 1 3 2 3 2 3 1 2 2 1 3 1 1 1 1 1 2 1 2 3 2 1 3 1 1 1 2 3 2 3 1 2 1 1 2 1 2 3 2 3 2 3 1 2

m = 3 Estimated pmf (n = 10) R 1 R 2 R 3 1 0.6 (0.583) 0.7 (0.5) 0.3 (0.417) 2 0.1 (0.25) 0.2 (0.25) 0.4 (0.25) 3 0.3 (0.167) 0.1 (0.25) 0.3 (0.333)

m = 3 Estimated pmf (n = 2, 000) R 1 R 2 R 3 1 0.582 (0.583) 0.496 (0.5) 0.417 (0.417) 2 0.248 (0.25) 0.261 (0.25) 0.244 (0.25) 3 0.171 (0.167) 0.245 (0.25) 0.339 (0.333)

Running times Running time (seconds) 10 3 10 2 10 1 10 0 10 1 10 2 Exact computation Monte Carlo approx. 10 3 2 4 6 8 10 12 14 16 18 20 Number of teams m

Error m Average error 3 9.28 10 3 4 12.7 10 3 5 7.95 10 3 6 7.12 10 3 7 7.12 10 3

m = 5

m = 20

m = 100