Lecture 3 - Expectation, inequalities and laws of large numbers

Similar documents
Lecture 11. Probability Theory: an Overveiw

Proving the central limit theorem

Covariance and Correlation

Notes 6 : First and second moment methods

Midterm #1. Lecture 10: Joint Distributions and the Law of Large Numbers. Joint Distributions - Example, cont. Joint Distributions - Example

Asymptotic Statistics-III. Changliang Zou

Practice Problem - Skewness of Bernoulli Random Variable. Lecture 7: Joint Distributions and the Law of Large Numbers. Joint Distributions - Example

Lecture 22: Variance and Covariance

Tom Salisbury

Expectation, inequalities and laws of large numbers

Two hours. Statistical Tables to be provided THE UNIVERSITY OF MANCHESTER. 14 January :45 11:45

Lecture 5 - Information theory

Lecture 13 (Part 2): Deviation from mean: Markov s inequality, variance and its properties, Chebyshev s inequality

Nonparametric hypothesis tests and permutation tests

Chapter 4 continued. Chapter 4 sections

We will briefly look at the definition of a probability space, probability measures, conditional probability and independence of probability events.

Lecture 7: Chapter 7. Sums of Random Variables and Long-Term Averages

MAS113 Introduction to Probability and Statistics

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Multivariate probability distributions and linear regression

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

Limiting Distributions

Review of Probability. CS1538: Introduction to Simulations

1 Presessional Probability

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).

Introduction to Probability

The Lindeberg central limit theorem

1 Probability space and random variables

MATH 3510: PROBABILITY AND STATS June 15, 2011 MIDTERM EXAM

Example continued. Math 425 Intro to Probability Lecture 37. Example continued. Example

Problem Solving. Correlation and Covariance. Yi Lu. Problem Solving. Yi Lu ECE 313 2/51

HW5 Solutions. (a) (8 pts.) Show that if two random variables X and Y are independent, then E[XY ] = E[X]E[Y ] xy p X,Y (x, y)

4 Sums of Independent Random Variables

FE 5204 Stochastic Differential Equations

Week 12-13: Discrete Probability

MATHEMATICS 154, SPRING 2009 PROBABILITY THEORY Outline #11 (Tail-Sum Theorem, Conditional distribution and expectation)

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Lecture 9. d N(0, 1). Now we fix n and think of a SRW on [0,1]. We take the k th step at time k n. and our increments are ± 1

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Stationary independent increments. 1. Random changes of the form X t+h X t fixed h > 0 are called increments of the process.

Chapter 4. Chapter 4 sections

ACM 116: Lectures 3 4

4 Expectation & the Lebesgue Theorems

ADVANCED PROBABILITY: SOLUTIONS TO SHEET 1

Bivariate distributions

Formulas for probability theory and linear models SF2941

18.440: Lecture 28 Lectures Review

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get

Joint Probability Distributions and Random Samples (Devore Chapter Five)

A D VA N C E D P R O B A B I L - I T Y

Chp 4. Expectation and Variance

Exercises and Answers to Chapter 1

Chapter 3: Random Variables 1

Properties of Random Variables

2. Variance and Higher Moments

Lecture 1: August 28

Lecture 6 Basic Probability

Lecture Notes 2 Random Variables. Discrete Random Variables: Probability mass function (pmf)

Covariance. Lecture 20: Covariance / Correlation & General Bivariate Normal. Covariance, cont. Properties of Covariance

18.440: Lecture 28 Lectures Review

1 Exercises for lecture 1

The Central Limit Theorem: More of the Story

Stochastic Models (Lecture #4)

Continuous Random Variables

Lecture 2: Repetition of probability theory and statistics

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

Preliminary statistics

Lecture 5: Two-point Sampling

Probability Models. 4. What is the definition of the expectation of a discrete random variable?

Lecture 4: Proofs for Expectation, Variance, and Covariance Formula

MATH 151, FINAL EXAM Winter Quarter, 21 March, 2014

Lecture 1: Review on Probability and Statistics

Joint Distribution of Two or More Random Variables

Probability and Measure

Kousha Etessami. U. of Edinburgh, UK. Kousha Etessami (U. of Edinburgh, UK) Discrete Mathematics (Chapter 7) 1 / 13

Notes 18 : Optional Sampling Theorem

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

Regression Analysis. Ordinary Least Squares. The Linear Model

We introduce methods that are useful in:

Expectation of Random Variables

. Find E(V ) and var(v ).

LIST OF FORMULAS FOR STK1100 AND STK1110

f X, Y (x, y)dx (x), where f(x,y) is the joint pdf of X and Y. (x) dx

Regression and Statistical Inference

Useful Probability Theorems

Lecture 2 Sep 5, 2017

Problem Y is an exponential random variable with parameter λ = 0.2. Given the event A = {Y < 2},

Topics in Probability and Statistics

Basic Probability. Introduction

STAT Chapter 5 Continuous Distributions

2 n k In particular, using Stirling formula, we can calculate the asymptotic of obtaining heads exactly half of the time:

Random variables (discrete)

Economics 573 Problem Set 5 Fall 2002 Due: 4 October b. The sample mean converges in probability to the population mean.

Mathematical Statistics

Lecture 4: Sampling, Tail Inequalities

Poisson approximations

Chapter 3: Random Variables 1

Probability Lecture III (August, 2006)

STA2603/205/1/2014 /2014. ry II. Tutorial letter 205/1/

Transcription:

Lecture 3 - Expectation, inequalities and laws of large numbers Jan Bouda FI MU April 19, 2009 Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 1 / 67

Part I Functions of a random variable Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 2 / 67

Functions of a random variable Given a random variable X and a function Φ : R R we define the transformed random variable Y = Φ(X ) as Random variables X and Y are defined on the same sample space, moreover, Dom(X ) = Dom(Y ). Im(Y ) = {Φ(x) x Im(X )}. The probability distribution of Y is given by p Y (y) = p X (x). x Im(X );Φ(x)=y In fact, we may define it by Φ(X ) = Φ X, where is the usual function composition. Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 3 / 67

Part II Expectation Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 4 / 67

provided the sum is absolutely (!) convergent. In case the sum is convergent, but not absolutely convergent, we say that no finite expectation exists. In case the sum is not convergent the expectation has no meaning. Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 5 / 67 Expectation The probability distribution or probability distribution function completely characterize properties of a random variable. Often we need description that is less accurate, but much shorter - single number, or a few numbers. First such characteristic describing a random variable is the expectation, also known as the mean value. Definition Expectation of a random variable X is defined as E(X ) = i x i p(x i )

Median; Mode The median of a random variable X is any number x such that P(X < x) 1/2 and P(X > x) 1/2. The mode of a random variable X is the number x such that p(x) = max p(x ). x Im(X ) Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 6 / 67

Moments Let us suppose we have a random variable X and a random variable Y = Φ(X ) for some function Φ. The expected value of Y is E(Y ) = i Φ(x i )p X (x i ). Especially interesting is the power function Φ(X ) = X k. E(X k ) is known as the kth moment of X. For k = 1 we get the expectation of X. If X and Y are random variables with matching corresponding moments of all orders, i.e. k E(X k ) = E(Y k ), then X and Y have the same distributions. Usually we center the expected value to 0 we use moments of Φ(X ) = X E(X ). We define the kth central moment of X as ( µ k = E [X E(X )] k). Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 7 / 67

Variance Definition The second central moment is known as the variance of X and defined as µ 2 = E ( [X E(X )] 2). Explicitly written, µ 2 = i [x i E(X )] 2 p(x i ). The variance is usually denoted as σ 2 X or Var(X ). Definition The square root of σ 2 X is known as the standard deviation σ X = σ 2 X. If variance is small, then X takes values close to E(X ) with high probability. If the variance is large, then the distribution is more diffused. Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 8 / 67

Expectation revisited Theorem Let X 1, X 2,... X n be random variables defined on the same probability space and let Y = Φ(X 1, X 2,... X n ). Then E(Y ) = x 1 Theorem (Linearity of expectation) Let X and Y be random variables. Then Φ(x 1, x 2,... x n )p(x 1, x 2,... x n ). x 2 x n E(X + Y ) = E(X ) + E(Y ). Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 9 / 67

Linearity of expectation (proof) Linearity of expectation. E(X + Y ) = i = i = i (x i + y j )p(x i, y j ) = j x i p(x i, y j ) + j j x i p X (x i ) + y j p Y (y j ) = j y j p(x i, y j ) = i =E(X ) + E(Y ). Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 10 / 67

Linearity of expectation The linearity of expectation can be easily generalized for any linear combination of n random variables, i.e. Theorem (Linearity of expectation) Let X 1, X 2,... X n be random variables and a 1, a 2,... a n R constants. Then ( n ) n E a i X i = a i E(X i ). i=1 Proof is left as a home exercise :-). i=1 Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 11 / 67

Expectation of independent random variables Theorem If X and Y are independent random variables, then Proof. E(XY ) = E(X )E(Y ). E(XY ) = i = i = i x i y j p(x i, y j ) = j x i y j p X (x i )p Y (y j ) = j x i p X (x i ) y j p Y (y j ) = j =E(X )E(Y ). Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 12 / 67

Expectation of independent random variables The expectation of independent random variables can be easily generalized for any n tuple X 1, X 2,... X n of mutually independent random variables: ( n ) n E X i = E(X i ). i=1 i=1 If Φ 1, Φ 2,... Φ n are functions, then [ n ] n E Φ i (X i ) = E[Φ i (X i )]. i=1 i=1 Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 13 / 67

Variance revisited Theorem Let σx 2 be the variance of the random variable X. Then σx 2 = E(X 2 ) [E(X )] 2. Proof. σx 2 =E ( [X E(X )] 2) = E ( X 2 2XE(X ) + [E(X )] 2) = =E(X 2 ) E[2XE(X )] + [E(X )] 2 = =E(X 2 ) 2E(X )E(X ) + [E(X )] 2. Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 14 / 67

Covariance Definition The quantity E ( [X E(X )][Y E(Y )] ) = i,j p xi,y j [x i E(X )] [y j E(Y )] is called the covariance of X and Y and denoted Cov(X, Y ). Theorem Let X and Y be independent random variables. Then the covariance of X and Y Cov(X, Y ) = 0. Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 15 / 67

Covariance Proof. Cov(X, Y ) =E ( [X E(X )][Y E(Y )] ) = =E[XY YE(X ) XE(Y ) + E(X )E(Y )] = =E(XY ) E(Y )E(X ) E(X )E(Y ) + E(X )E(Y ) = =E(X )E(Y ) E(Y )E(X ) E(X )E(Y ) + E(X )E(Y ) = 0 Covariance measures linear (!) dependence between two random variables. E.g. when X = ay, a 0, using E(X ) = ae(y ) we have Cov(X, Y ) = avar(y ) = 1 Var(X ). a Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 16 / 67

Covariance In general it holds that 0 Cov 2 (X, Y ) Var(X )Var(Y ). Definition We define the correlation coefficient ρ(x, Y ) as the normalized covariance, i.e. Cov(X, Y ) ρ(x, Y ) =. Var(X )Var(Y ) It holds that 1 ρ(x, Y ) 1. Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 17 / 67

Covariance It may happen that X is completely dependent on Y and yet the covariance is 0, e.g. for X = Y 2 and a suitably chosen Y. Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 18 / 67

Variance Theorem If X and Y are independent random variables, then Proof. Var(X + Y ) = Var(X ) + Var(Y ). Var(X + Y ) = E ( [(X + Y ) E(X + Y )] 2) = =E ( [(X + Y ) E(X ) E(Y )] 2) = E ( [(X E(X )) + (Y E(Y ))] 2) = =E ( [X E(X )] 2 + [Y E(Y )] 2 + 2[X E(X )][Y E(Y )] ) = =E ( [X E(X )] 2) + E ( [Y E(Y )] 2) + 2E ( [X E(X )][Y E(Y )] ) = =Var(X ) + Var(Y ) + 2E ( [X E(X )][Y E(Y )] ) = =Var(X ) + Var(Y ) + 2Cov(X, Y ) = Var(X ) + Var(Y ). Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 19 / 67

Variance If X and Y are not independent, we obtain (see proof on the previous transparency) Var(X + Y ) = Var(X ) + Var(Y ) + 2Cov(X, Y ). The additivity of variance can be generalized to a set X 1, X 2,... X n of mutually independent variables and constants a 1, a 2,... a n R as ( n ) n Var a i X i = ai 2 Var(X i ). i=1 i=1 Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 20 / 67

Part III Conditional Distribution and Expectation Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 21 / 67

Conditional probability Using the derivation of conditional probability of two events we can derive conditional probability of (a pair of) random variables. Definition The conditional probability distribution of random variable Y given random variable X (their joint distribution is p X,Y (x, y)) is p Y X (y x) =P(Y = y X = x) = = p X,Y (x, y) p X (x) P(Y = y, X = x) P(X = x) = (1) provided p X (x) 0. Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 22 / 67

Conditional distribution function Definition The conditional probability distribution function of random variable Y given random variable X (their joint distribution is p X,Y (x, y)) is F Y X (y x) = P(Y y X = x) = P(Y y and X = x) P(X = x) (2) for all values P(X = x) > 0. Alternatively we can derive it from the conditional probability distribution as t y p(x, t) F Y X (y x) = = p p X (x) Y X (t x). t y Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 23 / 67

Conditional expectation We may consider Y (X = x) to be a new random variable that is given by the conditional probability distribution p Y X. Therefore, we can define its mean and moments. Definition The conditional expectation of Y given X = x is defined E(Y X = x) = y yp(y = y X = x) = y yp Y X (y x). (3) Analogously can be defined conditional expectation of a transformed random variable Φ(Y ), namely the conditional kth moment of Y : E(Y k X = x). Of special interest will be the conditional variance Var(Y X = x) = E(Y 2 X = x) [E(Y X = x)] 2. Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 24 / 67

Conditional expectation We can derive the expectation of Y from the conditional expectations. The following equation is known as the theorem of total expectation: E(Y ) = x E(Y X = x)p X (x). (4) Analogously, the theorem of total moments is E(Y k ) = x E(Y k X = x)p X (x). (5) Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 25 / 67

Example: Random sums Let N, X 1, X 2,... be mutually independent random variables. Let us suppose that X 1, X 2,... have identical probability distribution p X (x), mean E(X ), and variance Var(X ). We also know the values E(N) and Var(N). Let us consider the random variable defined as a sum T = X 1 + X 2 + + X N. In what follows we would like to calculate E(T ) and Var(T ). For a fixed value N = n we can easily derive the conditional expectation of T by E(T N = n) = n E(X i ) = ne(x ). (6) i=1 Using the theorem of total expectation we get E(T ) = n ne(x )p N (n) = E(X ) n np N (n) = E(X )E(N). (7) Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 26 / 67

Example: Random sums It remains to derive the variance of T. Let us first compute E(T 2 ). We obtain E(T 2 N = n) = Var(T N = n) + [E(T N = n)] 2 (8) and Var(T N = n) = n Var(X i ) = nvar(x ) (9) since (T N = n) = X 1 + X 2 + + X n and X 1,..., X n are mutually independent. We substitute (6) and (9) into (8) to get i=1 E(T 2 N = n) = nvar(x ) + n 2 E(X ) 2. (10) Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 27 / 67

Example: Random sums Using the theorem of total moments we get E(T 2 ) = n ( nvar(x ) + n 2 [E(X )] 2) p N (n) ( = Var(X ) n ) ( np N (n) + [E(X )] 2 n p N (n)n 2 ) (11) Finally, we obtain =Var(X )E(N) + E(N 2 )[E(X )] 2. Var(T ) =E(T 2 ) [E(T )] 2 = =Var(X )E(N) + E(N 2 )[E(X )] 2 [E(X )] 2 [E(N)] 2 = =Var(X )E(N) + [E(X )] 2 Var(N). (12) Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 28 / 67

Part IV Inequalities Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 29 / 67

Markov inequality It is important to derive as much information as possible even from a partial description of random variable. The mean value already gives more information than one might expect, as captured by Markov inequality. Theorem (Markov inequality) Let X be a nonnegative random variable with finite mean value E(X ). Then for all t > 0 it holds that P(X t) E(X ) t Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 30 / 67

Markov inequality Proof. Let us define the random variable Y t (for fixed t) as { 0 if X < t Y t = t X t. Then Y t is a discrete random variable with probability distribution p Yt (0) = P(X < t), p Yt (t) = P(X t). We have The observation X Y t gives what is the Markov inequality. E(Y t ) = tp(x t). E(X ) E(Y t ) = tp(x t), Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 31 / 67

Chebyshev inequality In case we know both mean value and variance of a random variable, we can use much more accurate estimation Theorem (Chebyshev inequality) Let X be a random variable with finite variance. Then P [ X E(X ) t ] Var(X ) t 2, t > 0 or, alternatively, substituting X = X E(X ) P( X t) E(X 2 ) t 2, t > 0. We can see that this theorem is in agreement with our interpretation of variance. If σ is small, then there is large probability of getting outcome close to E(X ), if σ is large, then there is large probability of getting outcomes farther from the mean. Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 32 / 67

Chebyshev inequality Proof. We apply the Markov inequality to the nonnegative variable [X E(X )] 2 and we replace t by t 2 to get P [ (X E(X )) 2 t 2] E ( [X E(X )] 2) t 2 = σ2 t 2. We obtain the Chebyshev inequality using the fact that the events [(X E(X )) 2 t 2 ] = [ X E(X ) t] are the same. Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 33 / 67

Kolmogorov inequality Theorem (Kolmogorov inequality) Let X 1, X 2,... X n be independent random variables. We put S k = X 1 + + X k, (13) m k = E(S k ) = E(X 1 ) + + E(X k ), (14) sk 2 = Var(S k) = Var(X 1 ) + + Var(X k ). (15) For every t > 0 it holds that ] P [ S 1 m 1 < ts n S 2 m 2 < ts n... S n m n < ts n 1 t 2. (16) Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 34 / 67

Kolmogorov inequality Comparing to Chebyshev inequality we see that the Kolmogorov inequality is considerably stronger since Chebyshev inequality implies only i = 1... n P( S i m i < ts i ) 1 t 2. We used rewriting the Chebyshev inequality as P [ X E(X ) < t ] 1 Var(X ) t 2, t > 0 and the substitution X = S i, t = s i t. Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 35 / 67

Kolmogorov inequality Proof. We want to estimate the probability p that at least one of the terms in Eq. (16) does not hold (this is complementary event to Eq. (16)) and to verify the statement of the theorem, i.e. to test whether p t 2. Let us define n random variables Y k, k = 1,..., n as { 1 if S k m k ts n and v = 1, 2... k 1, S v m v < ts n, Y k = 0 otherwise. (17) In words, Y k equals 1 at those sample points in which the kth inequality of (16) is the first to be violated. For any sample point at most one of Y v is one and the sum Y 1 + Y 2 + + Y n is either 0 or 1. Moreover, it is 1 if and only if at least one of the inequalities (16) is violated. Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 36 / 67

Kolmogorov inequality Proof. Therefore, p = P(Y 1 + Y 2 Y n = 1). (18) We know that k Y k 1 and multiplying both sides of this inequality by (S n m n ) 2 gives Y k (S n m n ) 2 (S n m n ) 2. (19) k Through taking expectation of each side over S n we get n E[Y k (S n m n ) 2 ] sn. 2 (20) k=1 Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 37 / 67

Kolmogorov inequality Proof. We introduce the substitution U k = (S n m n ) (S k m k ) = n v=k+1 [X v E(X v )] to evaluate respective summands of the left-hand side of Eq. (20) as E[Y k (S n m n ) 2 ] =E[Y k (U k + S k m k ) 2 ] = =E[Y k (S k m k ) 2 ]+2E[Y k U k (S k m k )]+E(Y k U 2 k ). (21) Now, since U k depends only on X k+1,..., X n and Y k and S k depend only on X 1,..., X k, we have that U k is independent of Y k (S k m k ). Therefore, E[Y k U k (S k m k )] = E[Y k (S k m k )]E(U k ) = 0 since E(U k ) = 0. Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 38 / 67

Kolmogorov inequality Proof. Thus from Eq. (21) we get E[Y k (S n m n ) 2 ] E[Y k (S k m k ) 2 ] (22) observing that E(Y k U 2 k ) 0. We know that Y k 0 only if S k m k ts n, so that Y k (S k m k ) 2 t 2 s 2 ny k and therefore E[Y k (S k m k ) 2 ] t 2 s 2 ne(y k ). (23) Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 39 / 67

Kolmogorov inequality Proof. Combining (20), (22) and (23) we get n n sn 2 E[Y k (S n m n ) 2 ] E[Y k (S k m k ) 2 ] k=1 k=1 n t 2 sne(y 2 k ) = t 2 sne(y 2 1 + + Y n ) k=1 (24) and from (18) using P(Y 1 + + Y n = 1) = E(Y 1 + + Y n ) we finally obtain pt 2 1. (25) Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 40 / 67

Part V Laws of Large Numbers Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 41 / 67

(Weak) Law of Large Numbers Theorem ((Weak) Law of Large Numbers) Let X 1, X 2,... be a sequence of mutually independent random variables with a common probability distribution. If the expectation µ = E(X k ) exists, then for every ɛ > 0 ( ) lim P X 1 + + X n µ n n > ɛ = 0. In words, the probability that the average S n /n differs from the expectation by less then arbitrarily small ɛ goes to 0. Proof. WLOG we can assume that µ = E(X k ) = 0, otherwise we simply replace X k by X k µ. This induces only change of notation. Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 42 / 67

(Weak) Law of Large Numbers Proof. In the special case Var(X k ) exists, the law of large numbers is a direct consequence of the Chebyshev inequality; we substitute X = X 1 + + X n = S n to get P( S n µ t) Var(X k)n t 2. (26) We substitute t = ɛn and observe that with n the right-hand side tends to 0 to get the result. However, in case Var(X k ) exists, we can apply the more accurate central limit theorem. The proof without the assumption that Var(X k ) exists follows. Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 43 / 67

(Weak) Law of Large Numbers Proof. Let δ be a positive constant to be determined later. For each k we define a pair of random variables (k = 1... n) U k = X k, V k = 0 if X k δn (27) U k = 0, V k = X k if X k > δn (28) By this definition X k = U k + V k. (29) Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 44 / 67

(Weak) Law of Large Numbers Proof. To prove the theorem it suffices to show that both and lim P( U 1 + + U n > 1 ɛn) = 0 (30) n 2 lim P( V 1 + + V n > 1 ɛn) = 0 (31) n 2 hold, because X 1 + + X n U 1 + + U n + V 1 + + V n. Let us denote all possible values of X k by x 1, x 2,... and the corresponding probabilities p(x i ). We put a = E( X k ) = i x i p(x i ). (32) Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 45 / 67

(Weak) Law of Large Numbers Proof. The variable U 1 is bounded by δn and X 1 and therefore U 2 1 X 1 δn. Taking expectation on both sides gives E(U 2 1 ) aδn. (33) Variables U 1,... U n are mutually independent and have the same probability distribution. Therefore, E[(U 1 + + U n ) 2 ] [E(U 1 + + U n )] 2 = Var(U 1 + + U n ) = = nvar(u 1 ) ne(u 2 1 ) aδn 2. (34) Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 46 / 67

(Weak) Law of Large Numbers Proof. On the other hand, lim n E(U 1 ) = E(X 1 ) = 0 and for sufficiently large n we have [E(U 1 + U n )] 2 = n 2 [E(U 1 )] 2 n 2 aδ (35) and for sufficiently large n we get from Eq. (34) that E[(U 1 + + U n ) 2 ] 2aδn 2. (36) Using the Chebyshev inequality we get the result (30) observing that P( U 1 + + U n > 1/2ɛn) 8aδ ɛ 2. (37) By choosing sufficiently small δ we can make the right-hand side arbitrarily small to get (30). Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 47 / 67

(Weak) Law of Large Numbers Proof. In case of (31) note that P(V 1 + V 2 + V n 0) n P(V i 0) = np(v 1 0). (38) i=1 For arbitrary δ > 0 we have P(V 1 0) = P( X 1 > δn) = x i >δn p(x i ) 1 δn x i >δn x i p(x i ). (39) The last sum tends to 0 as n and therefore also the left side tends to 0. This statement is even stronger than (31) and it completes the proof. Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 48 / 67

Central Limit Theorem In case the variance exists, instead of the law of large numbers we can apply more exact central limit theorem. Theorem (Central Limit Theorem) Let X 1, X 2,... be a sequence of mutually independent identically distributed random variables with a finite mean E(X i ) = µ and a finite variance Var(X i ) = σ 2. Let S n = X 1 + + X n. Then for every fixed β it holds that ( ) lim P Sn nµ n σ < β = 1 β e 1 2 y dy (40) n 2π In case the mean does not exist, neither the central limit theorem nor the law of large number applies. Nevertheless, we still may be interested in a number of such cases, see e.g. (Feller, p. 246). Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 49 / 67

Law of Large Numbers, Central Limit Theorem and Variables with Different Distributions Let us make a small comment on the generality of the law of large numbers and the central limit theorem, namely we will relax the requirement that the random variables X 1, X 2,... are identically distributed. Let us denote in the following transparencies s 2 n = σ 2 1 + σ 2 2 + + σ 2 n. (41) The law of large numbers holds if and only if X k are uniformly bounded, i.e. there exists a constant A such that k X k < A. A sufficient condition (but not necessary) is that In this case our proof applies. s n lim n n = 0. (42) Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 50 / 67

Law of Large Numbers, Central Limit Theorem and Variables with Different Distributions Theorem (Lindeberg theorem) The central limit theorem holds for any sequence of random variables X 1, X 2... with finite means µ 1, µ 2,... and variance if and only if for every ɛ > 0 the random variables U k defined by { X k µ k if X k µ k ɛs n U k = (43) 0 if X k µ k > ɛs n satisfy lim s n 1 s 2 n n k=1 E(Uk 2 ) = 1. (44) This theorem e.g. implies that every sequence of uniformly bounded random variables obeys the central limit theorem. Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 51 / 67

Strong Law of Large Numbers The (weak) law of large number implies that large values S n m n /n occur infrequently. In many practical situation we require the stronger statement that S n m n /n remains small for all sufficiently large n. Definition (Strong Law of Large Numbers) We say that the sequence X 1, X 2,... obeys the strong law of large numbers if to every pair ɛ > 0, δ > 0 there exists an n N such that ( P r : S n m n < ɛ S n+1 m n+1 < ɛ... S n+r m n+r ) < ɛ 1 δ. n n + 1 n + r (45) It remains to determine the conditions when the strong law of large numbers holds. Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 52 / 67

Strong Law of Large Numbers Theorem (Kolmogorov criterion) Let X 1, X 2,... be a sequence of random variables with corresponding variances σ1 2, σ2 2,.... Then the convergence of the series k=1 σ 2 k k 2 (46) is a sufficient condition for the strong law of large numbers to apply. Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 53 / 67

Strong Law of Large Numbers Proof. Let A v be the event that for at least one n such that 2 v 1 < n 2 v the inequality S n m n < ɛ (47) n does not hold. It suffices to prove that for all sufficiently large v and all r it holds that P(A v ) + P(A v+1 ) + + P(A v+r ) < δ, (48) i.e. the series P(A v ) converges. The event A v implies that for some n in range 2 v 1 < n 2 v S n m n ɛn ɛ2 v 1 (49) Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 54 / 67

Strong Law of Large Numbers Proof. Using s 2 n s 2 2 v and Kolmogorov inequality with t = ɛ2v 1 /s 2 v we get P(A v ) P( S n m n ɛ2 v 1 ) 4ɛ 2 s 2 2 v 2 2v. (50) Hence (observe that 2 v k=1 σ2 k = s2 2 v ) P(A v ) 4ɛ 2 s2 2 v 2 2v 4ɛ 2 v=1 v=1 =4ɛ 2 σk 2 k=1 2 v k v=1 2v 2v 2 2 2v 8ɛ 2 k=1 k=1 σ 2 k k 2. σ 2 k (51) Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 55 / 67

Strong Law of Large Numbers Before formulating our final criterion for the strong law of large numbers, we have to introduce two auxiliary lemmas we will use in the proof of the main theorem. Lemma (First Borel-Cantelli lemma) Let {A k } be a sequence of events defined on the same sample space and a k = P(A k ). If k a k converges, then to every ɛ > 0 it is possible to find an (sufficiently large) integer n such that for all integers r the probability P(A n+1 A n+2 A n+r ) ɛ. (52) Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 56 / 67

Strong Law of Large Numbers Proof. First it is important to determine n so that a n+1 + a n+2 + ɛ. This is possible since k a k converges. The lemma follows since P(A r+1 A r+2 A r+n ) a r+1 + a r+2 + a r+n ɛ. (53) Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 57 / 67

Strong Law of Large Numbers Lemma For any v N + it holds that v k=v 1 2. (54) k2 Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 58 / 67

Strong Law of Large Numbers Proof. Let us consider the series that n s n = k=1 We calculate the limit to get k=2 k=1 1 k(k+1). Using 1 k(k+1) = 1 k 1 k+1 1 k(k + 1) = 1 1 n + 1. we get 1 k(k 1) = ( 1 k(k + 1) = lim s n = lim 1 1 ) = 1. n n n + 1 k=1 Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 59 / 67

Strong Law of Large Numbers Proof. We proceed to obtain k=1 and, analogously, 1 k 2 = 1 + 2 k=2 k=2 1 k 2 1 + k=2 1 k(k 1) = 2, 1 k 2 2 1 k(k 1) = 2. k=2 Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 60 / 67

Strong Law of Large Numbers Proof. Finally, we get the result for v > 2 1 (( v k 2 v 1 k(k 1) = v k=v k=v k=2 ( ( v 2 )) 1 =v 1 k(k + 1) k=1 ) ( 1 v 1 k(k 1) k=2 )) 1 = k(k 1) = v v 1 2. (55) Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 61 / 67

Strong Law of Large Numbers Theorem Let X 1, X 2,... be a sequence of mutually independent random variables with common probability distribution p(x i ) and the mean µ = E(X i ) exists. Then the strong law of large numbers applies to this sequence. Proof. Let us introduce two new sequences of random variables defined by U k = X k, V k = 0 if X k < k (56) U k = 0, V k = X k if X k k. (57) U k are mutually independent and we will show that they satisfy the Kolmogorov criterion. Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 62 / 67

Strong Law of Large Numbers Proof. For σk 2 = Var(U k) we get σk 2 E(U2 k ) = xi 2 p(x i ). (58) x i <k Let us put for abbreviation a v = x i p(x i ). (59) v 1 x i <v Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 63 / 67

Strong Law of Large Numbers Proof. Then the serie v a v converges since E(X k ) exists. Moreover, from (58) σk 2 a 1 + 2a 2 + 3a 3 + + ka k (60) observing that x i 2 p(x i ) v x i p(x i ) = va v. v 1 x i <v v 1 x i <v Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 64 / 67

Strong Law of Large Numbers Proof. Thus k=1 σk 2 k 2 1 k 2 k=1 k va v = v=1 va v v=1 k=v 1 k 2 ( ) < 2 a v <. (61) To see ( ) recall that from the previous lemma for any v = 1, 2,... we have 2 v k=v 1. Therefore, the Kolmogorov criterion holds for k 2 {U k }. v=1 Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 65 / 67

Strong Law of Large Numbers Proof. Now, E(U k ) = µ k = x i <k x i p(x i ), (62) lim k = µ and hence lim n (µ 1 + µ 2 + + µ n )/n = µ. From the strong law of large numbers for {U k } we obtain that with probability 1 δ or better n n > N : n 1 U k µ < ɛ (63) provided N is sufficiently large. It remains to prove that the same statement holds when we replace U k by X k. It suffices to show that we can chose N sufficiently large so that with probability close to unity the event U k = X k occurs for all k > N. k=1 Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 66 / 67

Strong Law of Large Numbers Proof. This holds if with probability arbitrarily close to one only finitely many variables V k are different from zero, because in this case there exists k 0 such that k > k 0 V k = 0 with probability 1. By the first Borel-Cantelli lemma this is the case when the series P(V k 0) converges. It remains to verify the convergence. We have and hence P(V n 0) = x i n p(x i ) a n+1 n + a n+2 n + 1 + (64) P(V n 0) n=1 n=1 v=n a v+1 v = v=1 a v+1 v v 1 = n=1 a v+1 <, (65) v=1 since E(X ) exists. This completes our proof. Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 67 / 67