Random variables and transform methods

Similar documents
The exponential distribution and the Poisson process

Continuous-time Markov Chains

Limiting Distributions

E[X n ]= dn dt n M X(t). ). What is the mgf? Solution. Found this the other day in the Kernel matching exercise: 1 M X (t) =

Limiting Distributions

3. Poisson Processes (12/09/12, see Adult and Baby Ross)

Part I Stochastic variables and Markov chains

STAT/MATH 395 A - PROBABILITY II UW Winter Quarter Moment functions. x r p X (x) (1) E[X r ] = x r f X (x) dx (2) (x E[X]) r p X (x) (3)

1 Review of Probability

Exponential Distribution and Poisson Process

3 Continuous Random Variables

Things to remember when learning probability distributions:

Probability Distributions Columns (a) through (d)

CDA6530: Performance Models of Computers and Networks. Chapter 3: Review of Practical Stochastic Processes

Poisson Processes. Particles arriving over time at a particle detector. Several ways to describe most common model.

PoissonprocessandderivationofBellmanequations

Stochastic Processes and Monte-Carlo Methods. University of Massachusetts: Spring 2018 version. Luc Rey-Bellet

Sampling Distributions

EDRP lecture 7. Poisson process. Pawe J. Szab owski

CDA5530: Performance Models of Computers and Networks. Chapter 3: Review of Practical

Chapter 2. Poisson Processes. Prof. Shun-Ren Yang Department of Computer Science, National Tsing Hua University, Taiwan

We introduce methods that are useful in:

Stochastic Processes and Monte-Carlo Methods. University of Massachusetts: Spring Luc Rey-Bellet

1.1 Review of Probability Theory

Lecture 4a: Continuous-Time Markov Chain Models

i=1 k i=1 g i (Y )] = k f(t)dt and f(y) = F (y) except at possibly countably many points, E[g(Y )] = f(y)dy = 1, F(y) = y

Poisson Processes. Stochastic Processes. Feb UC3M

Random variables. DS GA 1002 Probability and Statistics for Data Science.

Probability and Distributions

Continuous Distributions

Multiple Random Variables

1 Solution to Problem 2.1

Sampling Distributions

Section 9.1. Expected Values of Sums

3 Conditional Expectation

Chapter 5. Random Variables (Continuous Case) 5.1 Basic definitions

Lecture 1: August 28

Moments. Raw moment: February 25, 2014 Normalized / Standardized moment:

n px p x (1 p) n x. p x n(n 1)... (n x + 1) x!

Continuous Random Variables and Continuous Distributions

Chapter 1 Statistical Reasoning Why statistics? Section 1.1 Basics of Probability Theory

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Chapter 5. Chapter 5 sections

8 Laws of large numbers

18.440: Lecture 28 Lectures Review

1 Poisson processes, and Compound (batch) Poisson processes

Brief Review of Probability

THE QUEEN S UNIVERSITY OF BELFAST

Lecture notes for Part A Probability

1 Probability and Random Variables

LIST OF FORMULAS FOR STK1100 AND STK1110

Statistics 3657 : Moment Generating Functions

Discrete Distributions Chapter 6

Lecture 11. Multivariate Normal theory

DS-GA 1002 Lecture notes 2 Fall Random variables

Fundamental Tools - Probability Theory II

STAT 380 Markov Chains

Fundamental Tools - Probability Theory IV

UCSD ECE250 Handout #27 Prof. Young-Han Kim Friday, June 8, Practice Final Examination (Winter 2017)

Tom Salisbury

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

2905 Queueing Theory and Simulation PART III: HIGHER DIMENSIONAL AND NON-MARKOVIAN QUEUES

IEOR 6711, HMWK 5, Professor Sigman

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

Arrivals and waiting times

Example continued. Math 425 Intro to Probability Lecture 37. Example continued. Example

6.1 Moment Generating and Characteristic Functions

errors every 1 hour unless he falls asleep, in which case he just reports the total errors

ASM Study Manual for Exam P, First Edition By Dr. Krzysztof M. Ostaszewski, FSA, CFA, MAAA Errata

Stationary independent increments. 1. Random changes of the form X t+h X t fixed h > 0 are called increments of the process.

BMIR Lecture Series on Probability and Statistics Fall, 2015 Uniform Distribution

15 Discrete Distributions

Errata for the ASM Study Manual for Exam P, Fourth Edition By Dr. Krzysztof M. Ostaszewski, FSA, CFA, MAAA

Expectation. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

S n = x + X 1 + X X n.

Lectures 16-17: Poisson Approximation. Using Lemma (2.4.3) with θ = 1 and then Lemma (2.4.4), which is valid when max m p n,m 1/2, we have

ECE 313 Probability with Engineering Applications Fall 2000

Stat 134 Fall 2011: Notes on generating functions

Probability. Lecture Notes. Adolfo J. Rumbos

Review for the previous lecture

MAS223 Statistical Inference and Modelling Exercises

Expectation. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Department of Mathematics

Chapter 3 Common Families of Distributions

Chapter 4 : Expectation and Moments

MATH Notebook 5 Fall 2018/2019

4. CONTINUOUS RANDOM VARIABLES

Stat410 Probability and Statistics II (F16)

Chapter 3: Random Variables 1

Continuous random variables

BMIR Lecture Series on Probability and Statistics Fall 2015 Discrete RVs

STAT/MATH 395 PROBABILITY II

The story of the film so far... Mathematics for Informatics 4a. Continuous-time Markov processes. Counting processes

Figure 10.1: Recording when the event E occurs

Lecture Notes 2 Random Variables. Discrete Random Variables: Probability mass function (pmf)

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Slides 8: Statistical Models in Simulation

MA 519 Probability: Review

Lecture 5: Moment generating functions

STA 256: Statistics and Probability I

Transcription:

Chapter Random variables and transform methods. Discrete random variables Suppose X is a random variable whose range is {,,..., } and set p k = P (X = k) for k =,,..., so that its mean and variance are given by E [X] = kp k, (.) k= Var [X] = E [ (X E [X]) 2] = E [ X 2] (E [X]) 2 = Here are some examples that are of interest: ( ) 2. k 2 p k kp k (.2) Binomial distribution denoted by Bin(n, p), which is the distribution of the number of successes in n Bernoulli trials (coin flips) with success probability p. Then ( ) n P (X = k) = p k ( p) n k, k {,,..., n}, p [, ] (.3) k and E [X] = np, Var [X] = np( p). k= Poisson distribution denoted by Pois(λ), defined as P (X = k) = e λ λ k, k {,,..., }, λ > (.4) k! and E [X] = λ, Var [X] = λ. Geometric distribution denoted by Geo(p), defined as P (X = k) = ( p)p k, k {,,..., }, p [, ] (.5) and E [X] = p/( p), Var [X] = p/( p) 2. The geometric distribution possesses the memoryless property. To see this, first note that P (X k) = ( p)p j = p k. (.6) j=k k=

Therefore P (X k + n X n) = which does not depend on n. P (X k + n) P (X n) = pk+n p n = p k = P (X k), (.7) Lemma... If X is non-negative integer valued then E [X] = Proof. P (X > k). (.8) k= P (X > k) = P (X = j) k= = k= j=k+ ( j ) P (X = j) = j= k= j= jp (X = j) = E [X]. (.9) Example..2. Let X Geo(p). Then, E [X] = kp (X = k) = k= = p( p) k( p)p k k= kp k = p( p) d dp k= k= p k = p( p) d dp p = p p. (.) Exercise..3. Let X Geo(p) and show that E [X] = p/( p) by using (.6) and (.8). The probability generating function (pgf) of a random variable X with range {,,..., } and probability distribution p k = P (X = k) for k =,,..., is defined by g X (z) = p k z k. (.) k= Notice that g X (z) = E [ z X], the expectation of the random variable z X. The series in (.) is guaranteed to converge if z, because g X (z) p k z k k= p k =. (.2) k= Exercise..4. Let X Geo(p) and show that g X (z) = p pz. (.3) 2

Exercise..5. Let X Pois(λ) and show that g X (z) = e λ(z ). (.4) Exercise..6. Let X Bin(n, p) and show that g X (z) = ( p + pz) n. (.5) A pgf uniquely defines the probability distribution, via the relation p k = d k k! dz k g X(z). (.6) z= One of the great advantages of using pgfs is that the moments are easy to determine. For example, E [X] = d dz g X(z) z=. (.7) More generally, E [X(X ) (X k + )] = dk dz k g X(z). (.8) z= Example..7. For the geometric distribution, we get E [X] = d p p( p) dz pz = z= ( pz) 2 = p z= p. (.9) In a similar way, we get E [ X 2] = p( + p) ( p) 2. (.2) Example..8. For the Poisson distribution, we get E [X] = d dz eλ(z ) z= = λ. (.2) In a similar way, we get E [ X 2] = λ 2 + λ. (.22) Exercise..9. Prove that (.6) and (.8) are true, and use (.8) to derive (.2) and (.22)..2 Continuous random variables Suppose X is a random variable whose range is R with distribution function F (t) = P (X t) and density f(u) = d P (X [u, u + du]) F (u). (.23) du du 3

The mean and variance of X, when X, are given by E [X] = Var [X] = uf(u)du, (.24) ( 2 u 2 f(u)du uf(u)du). (.25) Here are some continuous distributions that are of interest: Uniform distribution denoted by Uniform(a, b) for which f(u) = { b a, a < u < b,, elsewhere, (.26) with mean b+a 2 and variance 2 (b a)2. Normal distribution denoted by N (µ, σ 2 ) for which f(u) = 2πσ 2 e 2( u µ σ ) 2, < u <, (.27) with mean µ and variance σ 2. Gamma distribution denoted by Gamma(α, β) for which with f(u) = Γ(α) = βα Γ(α) uα e βu, u >, (.28) y α e y dy (.29) the gamma function. So E [X] = α β and Var [X] = α β 2. Lemma.2.. If X is non-negative with bounded mean, we have Proof. E [X] = t= tf(t)dt = ( F (t))dt. (.3) t= [ t ] f(t) du dt = f(t)dtdu = ( F (u))du. (.3) u= u= t=u u= The moment generating function (mgf) of a continuous random variable X is given by m X (t) = E [ e tx] = e tu f(u)du. (.32) 4

For non-negative continuous random variables it is common practice to use the Laplace- Stieltjes transform (lst), instead of the mgf, which is defined by ϕ X (s) = E [ e sx] = e su f(u)du, Re s. (.33) Notice that ϕ X (s) = m X ( s). This integral in (.33) is guaranteed to converge for Re s because ϕ X (s) e su f(u)du Example.2.2. For the uniform distribution on [a, b], we get m X (t) = b a f(u)du =. (.34) e tu b a du = etb e ta tb ta. (.35) In some cases you encounter a distribution function that has jumps, for example a continuous random variable X on [, ) with a jump in zero, meaning P (X = ) >. In that case, we get ϕ X (s) = = + e su dp (X < u) = P (X = ) + e su dp (X < u) + + + e su dp (X < u) e su dp (X < u). (.36) As for the pgf, one of the advantages of using a mgf or lst is that the moments are easy to determine. For example, E [X] = d dt m X(t) = d t= ds ϕ X(s), (.37) s= and more generally, E [X k] = dk dt k m X(t) t= = ( ) k dk ds k ϕ X(s). (.38) s= Here is another important property: Property.2.3. Let X be a random variable with mgf m X (t). Let a and b be real constants, and define the random variable Y as Y = ax + b. Then m Y (t) = e bt m X (at). (.39) Example.2.4. Let X N (, ). Then, m X (t) = e 2 t2. (.4) 5

To see this, we start from m X (t) = u= = e 2 t2 f X (u)e tu du = u= u= 2π e u2 /2 e tu du 2π e (u t)2 /2 du, (.4) where we have used that e u2 /2 e ut = e u2 /2+ut = e (u t)2 /2 e t2 /2. Now for a normal random variable X N (µ, σ 2 ) we have f X (u) = σ 2π e 2( x µ σ ) 2 and u= f X (u)du =. (.42) Therefore, the integral in the right-hand side of (.4) equals, and we get (.4). Exercise.2.5. Let X N (µ, σ 2 ). Show that m X (t) = e µt+σ2 t 2 /2. (.43) Solution. By definition, m X (t) = x= e xt σ 2π e 2( x µ σ ) 2 dx. (.44) Make the change of variables z = (x µ)/σ. Then x = σz + µ and dx = σdz. This gives m X (t) = z= = e µt z= = e µt e σ2 t 2 /2 e (σz+µ)t σ 2π e z2 /2 σdz 2π e z2 /2+σzt dz z= 2π e (z σt)2 /2 dz = e µt+σ2 t 2 /2. (.45) Exercise.2.6. Let X Gamma(α, β). Show that ( ) β α m X (t) =. (.46) β t Solution. We get m X (t) = u= β α Γ(α) uα e βu e tu du = Γ(α) 6 u= u α β α e u(β t) du. (.47)

Make the change of variables x = u(β t). Then u = x/(β t) and du = dx/(β t). We then find that m X (t) = Γ(α) x= ( ) β α = β t Γ(α) ( β = β t x α (β t) α βα e x β t dx x= ) α Γ(α) Γ(α) = x α e x dx.3 Exponential distribution The exponential distribution, denoted by Exp(λ), is defined as ( ) β α. (.48) β t F (t) = { e λt, t,, t <, (.49) and f(t) = { λe λt, t >,, t <, (.5) so that E [X] = /λ and Var [X] = /λ 2. Also, ϕ X (s) = E(e sx ) = e su λe λu du = λ λ + s (.5) and hence E [X] = d λ = ds λ + s s= λ. (.52) E [ X 2] = d2 λ ds 2 = 2 λ + s s= λ 2. (.53) E [X k] = ( ) k dk λ ds k = k! λ + s s= λ k. (.54) There are several alternative ways of deriving (.54). For instance, using integration by parts, [ E X k] = t k λe λt dt = t k e λt = k λ + kt k e λt dt t k λe λt dt, (.55) so that E [ X k] = k λ E [ X k ]. Iterating then yields E [X k] = k [ λ E X k ] = k k [ λ λ E X k 2] = = k! λ k E [ X ] = k! λ k. (.56) 7

Or, [ E X k] = = ( ) k λ t k λe λt dt = ( ) k λ dk dλ k d k ( dλ k e λt) dt e λt dt = ( ) k λ dk dλ k λ = k! λ k. (.57) The exponential distribution is somewhat of a special distribution. For instance, it is the only continuous distribution that possesses the memoryless property. That is, when X Exp(λ), P (X > t + u X > u) = = P (X > t + u and X > u) P (X > u) P (X > t + u) P (X > u) = e λ(t+u) e λu = e λt = P (X > t), t, u. (.58) The hazard rate of cdf F ( ) with density f( ) on [, ) is defined as r(t) = f(t) F (t), and has the interpretation of an intensity or rate, because, if X has cdf F ( ), P (X (t, t + dt) X > t) = P (X (t, t + dt)) P (X > t) (.59) = f(t)dt = r(t)dt. (.6) F (t) For instance, if F ( ) is the time it takes for the next claim to arrive, r( ) is the intensity with which this happens. Now, if X Exp(λ), then r(t) = λe λt e λt = λ. (.6) The constant hazard rate matches perfectly with the memoryless property. Here are some further properties of exponential random variables. Property.3.. Let X i Exp(λ i ), i =, 2 and X, X 2 independent. Then, Proof. min(x, X 2 ) Exp(λ + λ 2 ). (.62) P (min(x, X 2 ) < x) = P (X < x or X 2 < x) = P (X < x) + P (X 2 < x) P (X < x) P (X 2 < x) = e λ x + e λ 2x ( e λ x )( e λ 2x ) = e (λ +λ 2 )x. (.63) 8

Alternatively, P (min(x, X 2 ) > x) = P (X > x and X 2 > x) = e λ x e λ 2x = e (λ +λ 2 )x. (.64) Property.3.2. Let X i Exp(λ i ), i =, 2 and X, X 2 independent. Then, Proof. P (X < X 2 ) = λ λ + λ 2. (.65) P (X < X 2 ) = = = y λ e λ x λ 2 e λ 2y dxdy ( e λ y )λ 2 e λ 2y dy λ 2 e λ 2y dy λ 2 e (λ +λ 2 )y dy = e λ 2y + λ 2 e (λ +λ 2 )y λ + λ 2 = λ 2 λ + λ 2. (.66) 9

Chapter 2 Sums of random variables In Chapter we have acquainted ourselves with using transforms for describing the distributions of random variables. One of the main advantages of using transform methods is that it becomes relatively easy to describe the distributions of sums of random variables. Our key example will be the total claim amount, denoted by S n. That is, S n = X + X 2 + X n, (2.) where X i denotes the payment on policy i (claim i). In this chapter, these claims X,..., X n are assumed to be independent random variables. 2. Convolutions and transforms For the mean of S n, we always get that E [S n ] = E [X + X 2 + X n ] = E [X ] + E [X 2 ] + E [X n ]. (2.2) For the variance, we have similarly, that Var [S n ] = Var [X + X 2 + X n ] = Var [X ] + Var [X 2 ] + Var [X n ], (2.3) but this is only true if the random variables X,..., X n are independent. If not, we get the expression Var [S n ] = n n Cov[X i, X j ], (2.4) i= j= where we note that Cov[X i, X i ] = Var [X i ]. Hence, for the first two moments of S n we have nice expressions available, and we now turn to the distribution of S n. Let F X (s) = P (X < s) denote the cdf of a random variable X. With convolution we mean the mathematical operation that calculated the cdf of X + X 2, given the individual cdfs of the two independent continuous random variables X and X 2 : F X F X2 (s) := F X +X 2 (s) = P (X + X 2 s) = F X2 (s x)df X (x). (2.5)

If X and X 2 are discrete random variables, we get F X F X2 (s) = P (X + X 2 s) = F X2 (s x)df X (x), (2.6) x or, perhaps easier, when X and X 2 are nonnegative, P (X + X 2 = k) = k P (X = j) P (X 2 = k j). (2.7) j= Performing the convolution operation can, in many cases, become rather cumbersome. Finding the mgf of the sum of independent random variables, however, is straightforward. If X and X 2 are independent, then [ ] m X +X 2 (t) = E e t(x +X 2 ) = E [ e tx ] E [ e tx ] 2 = m X (t)m X2 (t). (2.8) So determining the convolution of the cdfs simply corresponds to a simple multiplication of the mgfs. More generally, for the mgf of the total claim amount S n, we find that m Sn (t) = m X +X 2 + X n (t) [ ] n = E e t(x +X 2 + X n) = m Xi (t), (2.9) i= and the same properties hold for pgfs (in case of discrete random variables) and lst s (in case of nonnegative continuous random variables). Example 2... Let X Pois(λ ), X 2 Pois(λ 2 ), and assume that X and X 2 are independent. Then g X +X 2 (z) = E [ z X +X 2 ] = E [ z X ] E [ z X 2 ] = e λ (z ) e λ 2(z ) = e (λ +λ 2 )(z ). (2.) Example 2..2. Remember that we can interpret a binomial random variable as the sum of n independent Bernoulli trials. That is, if S n Bin(n, p), then S n = X + X 2 + X n where X i has range {, } with P (X i = ) = p = P (X i = ). Thus and hence g Xi (z) = p + pz (2.) g Sn (z) = n g Xi (z) = ( p + pz) n. (2.2) i= Once you have determined the transform of a random variable, there is always the problem of inverting the transform and thus recognizing the underlying distribution. This is often hard, but for Example 2.. we can see that e (λ +λ 2 )(z ) = e (λ +λ 2 ) (λ + λ 2 ) k z k, (2.3) k! k= 2

and hence P (X + X 2 = k) = e (λ +λ 2 ) (λ + λ 2 ) k, (2.4) k! so that we can conclude that X + X 2 Pois(λ + λ 2 ). Indeed, this is the known property that the sum of two independent Poisson random variables is again Poisson distributed. For most distributions, however, such a nice property does not exist. Exercise 2..3. Show that (2.4) is true without using transforms, but using the convolution formula (2.7). Example 2..4. Let X i Exp(λ i ), for i =,..., n, and assume that X, X 2,..., X n are independent. Then [ ] ϕ X +X 2 + +X n (s) = E e s(x +X 2 + X n) = n ϕ Xi (s) = i= n i= λ i λ i + s. (2.5) Example 2..5. Let X i Exp(λ), for i =,..., n, and assume that X, X 2,..., X n are independent. Then ( ) λ n ϕ X +X 2 + X n (s) =. (2.6) λ + s Because S n = X + X 2 + X n is then a Gamma(n, λ) random variable, see (.46), we know that f Sn (u) = λn u n (n )! e λu, u. (2.7) The distribution with density (2.7) is called the Erlang(n, λ) distribution. Exercise 2..6. Let X i Geo(p i ), for i =, 2, and assume that X, X 2 are independent. Show, by either using a direct method or pgfs, that (p p 2 ) P (X + X 2 = k) = ( p )( p 2 ) p 2 p [p k+ 2 p k+ ], k =,,.... (2.8) Theorem 2..7 (Feller s convergence theorem). Let F (n) (t) denote a sequence of distribution functions of nonnegative random variables with density f (n) (t). If ϕ n (s) = e su f (n) (u)du ϕ(s), as n, (2.9) then ϕ(s) is the lst of the density f(u), with f (n) (u) f(u). Example 2..8. Let f (n) (u) = (nλ)n u n e nλu, u >, (2.2) (n )! 3

be the Erlang(n, nλ) density. Remember that this corresponds to the density of S n = X + + X n with X i Exp(nλ). Then, ( ) nλ n ( ϕ Sn (s) = ϕ n (s) = = + s ) n e s/λ, as n. (2.2) nλ + s nλ Now e s/λ is the lst that belongs to the degenerate distribution P (X < x) = {, x < /λ,, x > /λ. (2.22) To understand the result in (2.2), note that E [S n ] = n nλ = λ and ( ) 2 Var [S n ] = nvar [X ] = n =, as n. (2.23) nλ nλ2 From this, we can conclude that a positive degenerate random variable, a constant c say, can be approximated by an Erlang(n, n/c) random variable with n large. There is also a counterpart of Theorem 2..7 for discrete random variables. Let X (n) Bin(n, λ/n), so that ( g X (n)(z) = λ n + λ ) n n z. (2.24) Now, since ( λ n + λ ) n n z e λ(z ), as n, (2.25) we conclude that Bin(n, λ/n) Pois(λ), and hence that Bin(n, λ/n) Pois(λ), for n large. (2.26) Exercise 2..9. Let X N (µ, σ 2) and X 2 N (µ 2, σ2 2 ) be independent random variables. Show that X + X 2 N (µ + µ 2, σ 2 + σ2 2 ). Exercise 2... Let X N (µ, σ 2 ) and let a and b be real numbers. Show that ax + b N (aµ + b, (aσ) 2 ). Exercise 2... Let X N (µ, σ 2 ). Show that Z = (X µ)/σ N (, ). Exercise 2..2. Let Z N (, ). Show that X = σz + µ N (µ, σ 2 ). 2.2 Central limit theorem Theorem 2.2. (Markov inequality). Let X be a nonnegative random variable with E [X] < and let a be a positive number. Then P (X a) E [X] a. (2.27) 4

Proof. The bound in (2.32) follows from ap (X a) E [ X {X a} ] E [X]. (2.28) Theorem 2.2.2 (Chebychev inequality). Let X have Var [X] = σ 2. Then P ( X E [X] a) σ2 a 2. (2.29) Proof. Since X E [X] is non-negative, it follows that X E [X] a if and only if (X E [X]) 2 a 2. From the Markov inequality, it follows that P ( X E [X] a) = P ( (X E [X]) 2 a 2) E [ (X E [X]) 2] = a 2 Var [X] a 2. (2.3) Some equivalent formulations of the Chebychev Inequality are P ( X µ < kσ) k 2, σ2 P ( X µ < a) a 2, P ( X µ kσ) k 2. (2.3) When you repeat an experiment many times, you expect the sample mean to be close to the actual mean. For example, when you throw a coin many times, you would expect that the fraction of times heads appears would be close to the probability of obtaining heads, which is 5% if you use a fair coin. This intuition is formalized in the following theorem. Theorem 2.2.3 (Weak law of large numbers). Let X, X 2,... be an i.i.d. sequence of nonnegative random variables with E [X] < and Var [X] = σ 2 <. Then, for any ɛ >, ( ) lim P n X i E [X] n n > ɛ =, (2.32) i= so that n n i= X i converges in probability to E [X]. Proof. Note that E [ n n i= X i] = E [X] and [ ] n ( ) [ 2 n ] Var X i = Var X i = n n i= i= ( ) 2 n Var [X i ] = σ2 n n. (2.33) i= Choose any ɛ >. By the Chebychev inequality, we find ( ) n P X i E [X] n > ɛ σ2 nɛ 2 (2.34) i= as n. This proves the theorem. The law of large numbers is often called the first fundamental theorem of probability. The following theorem, known as the central limit theorem (CLT) is sometimes called the second fundamental theorem of probability. It states that for a large number of repetitions, the sample mean is approximately normally distributed. It therefore describes the behavior of the sample mean around the true mean of the distribution. 5

Theorem 2.2.4 (CLT). Let X, X 2, X 3,... be an i.i.d. sequence of random variables, each having expected value µ and finite variance σ 2 >. Let S n := n i= X i and let X n := S n /n be the sample mean, then the random variables Z n, defined by Z n := S n nµ σ n = n X n µ σ (2.35) converge in distribution to a standard normal distribution, that is, lim P (Z n z) = Φ(z). (2.36) n Here, Φ(z) is the standard normal cdf, Φ(z) = z x= 2π e x2 /2 dx. (2.37) Proof. We will sketch the proof of the CLT, under the stronger assumption that the common mgf of the X i µ, say m(t) := m X µ (t), is finite in a neighborhood of. Note that m() =, m () = and m () = σ 2. Using Taylor series expansion for m, we find m(t) = m() + m ()t + 2 m ()t 2 + 6 m ()t 3 +.... (2.38) The moment generating function of Z n is given by ( ) ( ( )) t t n m Zn (t) = m P (X i µ) = m, (2.39) nσ 2 nσ 2 since the moment generating function of a sum of independent random variables is just the product of the individual terms. Now using the Taylor series expansion, we may write ( ( )) t n m = ( + t2 nσ 2 2n + t 3 ) n 6σ 3/2 n 3/2 +..., (2.4) and therefore m Zn (t) = ) n ( + t2 2n +... e t2 /2, as n. (2.4) The latter is the moment generating function of the standard normal distribution. This shows that Z n converges in distribution to the standard normal distribution. Example 2.2.5. Consider an insurance company with n = policy holders. Each policy holder will file a claim with probability p = /2. Use the CLT to estimate the probability that the total number of claims lies between 4 and 6. The total number of claims S can be described by S = X + + X where the X i are i.i.d. random variables with P (X i = ) = /2 and P (X i = ) = /2. Note that E [S] = np = 5 and Var [S] = np( p) = 25, and thus Z = S 5 25 N (, ). (2.42) 6

Then, P (4 S 6) = P ( ) 4 5 6 5 Z 25 25 = P ( 2 Z 2) Φ(2) Φ( 2).9548. (2.43) This is a rather good approximation, since by observing that S Bin(, 2 ), the true value can be calculated: P (4 S 6) = 6 k=4 ( k ) ( ) k ( ) k.9648. (2.44) 2 2 Exercise 2.2.6. Claims X i, i =,..., 25, are independent with mean 4 and standard deviation 2. Estimate, using the CLT, the probability that 25 claims together exceed an amount of. Exercise 2.2.7. Let X i, i =,..., 2 be independent ( claims, where each claim is uniformly 2 ) distributed over (, ). Estimate the probability P i= X i > 7, using the CLT. Exercise 2.2.8. An insurance company has, automobile policyholders. The expected yearly claim per policyholder is 24 with a standard deviation of 8. Approximate, using the CLT, the probability that the total yearly claim exceeds 2.7 million. 7

8

Chapter 3 Total claim amount As in Chapter 2 we shall again be interested in the total claim amount, except this time we interpret the total claim amount as the sum of all claims that were filed in the interval (, t]. If we denote the number of claims in (, t] by N(t), we thus need to study S N(t) = X + X 2 + X N(t), (3.) where X i denotes claim i. We shall again assume that the claims X,..., X n are independent random variables. We further assume that the claims arrive at times σ, σ 2,..., and we denote the interarrival times of claims by T = σ and T i = σ i σ i with i 2. 3. Stochastic sums of random variables Let us first consider the random variable S N = X + X 2 + + X N, (3.2) with N a discrete random variable with pgf g N (z), and X, X 2,... i.i.d. random variables with mgf m X (t) and independent of N. Think of X, X 2,... as the claim sizes, N as the (uncertain) number of claims, and S N then as the total claim amount. There exists a nice expression for the mgf of S N : m SN (t) = E [ e ts ] N = = = n= n= P (N = n) E [e ] t(x +X 2 + X n) ( P (N = n) E [e ] ) t(x n ) P (N = n) (m X (t)) n n= = g N (m X (t)). (3.3) To see the latter step, recall the definition of a pgf: g N (z) = P (N = n) z n. (3.4) n= 9

Hence, despite the fact that the random variable S N might have a rather complex distribution, its mgf follows easily from the mgf of X and the pgf of N. With (3.3) it is now straightforward to derive expressions for the moments of S N : Property 3... Let S N be defined as in (3.2). Then, E [S N ] = E [X ] E [N], (3.5) Var [S N ] = Var [N] (E [X ]) 2 + E [N] Var [X ]. (3.6) Exercise 3..2. Let X i Exp(λ) and N Geo(p). Determine the mgf of S N. Exercise 3..3. An insurance company has policy holders, and each policy holder i will file a claim with probability of size X i Gamma(2, 2). Determine the mean and mgf of total claim size. Exercise 3..4. An insurance company has n policy holders with an insurance against storm damage. After a storm, depending on the location of the house, a policy holder will have damage or not. Policy holder i will file a claim after a storm with probability p i of size Y i, and the houses of the policy holders are so far apart that the occurrence of claims and the claim sizes Y i, can be considered independent among all policy holders. Assume that p = p =... = p n and Y, Y 2,... are i.i.d. Determine the mean and mgf of total claim size after a storm. Exercise 3..5. An insurance company has n policy holders with an insurance against storm damage. The houses of the policy holders are in the same area, so that after a storm, either nobody or everybody has damage. After a storm, with probability p all policy holders file a claim, where policy holder i files a claim of size Y i. Assume that Y, Y 2,... are i.i.d. Determine the mean and mgf of total claim size after a storm. Exercise 3..6. A touring car is involved in a severe accident, caused by the driver, and all passengers can file a claim to one and the same insurance company. There are 5 passengers, and each passenger, independently of the other passengers, will choose to file a claim with probability 9/. Passenger i will file a claim of size X i Exp(/), and all claim sizes are assumed to be i.i.d. Show that the mgf of the total claim size is given by ( m SN (t) = 9 + 9 ) / 5. (3.7) / t Exercise 3..7. From the expression for the mgf in (3.7), we obtain that E [S N ] = m S N () = 45 and Var [S N ] = m S N () (m S N ()) 2 = 495. Verify these results using (3.5) and (3.6). Exercise 3..8. A touring car is involved in a severe accident, caused by the driver, and all passengers can file a claim to one and the same insurance company. There are 5 passengers, and each passenger, independently of the other passengers, will choose to file a claim with probability 9/. All claim sizes are equally large and drawn from a Exp(/) distribution. Determine the mean and the variance of the total claim size. 2

3.2 Poisson process A stochastic proces {N(t), t } is called a counting process when N(t) represents the number of events that occurred in the time interval (, t]. All counting processes have to satisfy the following properties: (i) N() =. (ii) N(t) is an integer, for all t. (iii) N(s) N(t) if s < t. (iv) For s < t, the quantity N(t) N(s) represents the number of events in (s, t]. For example, the number of births in The Netherlands forms a counting process, while the number of inhabitants of The Netherlands clearly does not... Here are certain properties that would make a counting process special: Independent increments. This means that N(b) N(a) and N(d) N(c) are independent if the intervals (a, b] and (c, d] do not overlap. Stationary increments. This means that N(t) N(s) and N(t + u) N(s + u) are identically distributed, s, t, u > and s < t. The process that counts the numbers of arriving customers at a supermarket has roughly independent increments, but probably no stationary increments (think of rush hours). The Poisson process is a counting process with independent and stationary increments, and moreover, with a third property that can be described in multiple ways. Here is our first definition of a Poisson process: Definition 3.2.. The counting process {N(t), t } is a Poisson process with rate λ if N() =, the process has stationary and independent increments, and λt (λt)n P (N(t) = n) = e, n =,,..., t. (3.8) n! Hence, N(t) is for all t > Poisson distributed, with mean λt, i.e. the expected number of events in (, t). Definition 3.2.2. The counting process {N(t), t } is a Poisson process with rate λ if N() =, the process has stationary and independent increments, and P (N(h) = ) = λh + o(h), h, (3.9) P (N(h) 2) = o(h), h. (3.) We note that a function f( ) is o(h) if lim h f(h)/h =. Since we now have two definitions of a Poisson process, we should be able to prove that these definitions are equivalent. 2

Let us first prove that (3.8) implies (3.9) and (3.): λh (λh) P (N(h) = ) = e! = ( λh + o(h))λh = λh + o(h), h. (3.) P (N(h) 2) = P (N(h) = ) P (N(h) = ) = e λh λh (λh) e! = ( λh + o(h)) λh o(h) = o(h), h. (3.2) To see that (3.9) and (3.) imply (3.8), we give the following sketch of a proof: Divide (, t] into n smaller intervals of length h = t/n, and think of n as a large number. Then, ( ) n (λt ) j ( P (N(t) = j) = j n + o(t/n) λt ) n j n + o(t/n) λt (λt)n e, (3.3) n! where we recall that the binomial distribution can be approximated by a Poisson distribution, see (2.26). From the above sketch, it becomes clear that Poisson processes might be good descriptions of the number of claims that are filed at an insurance company. That is, for these settings, there is typically a large number n of policy holders, who all, more or less independently from each other, file a claim with probability p. Moreover, it seems reasonable that p is roughly proportional to the length of the interval, which gives p = λt/n for a fixed t. From this reasoning, we would then again arrive at P (N(t) = j) = ( ) n (λt j n ) j ( λt n ) n j e λt (λt)n. (3.4) n! Let us now investigate some further properties of the Poisson process. Let T, T 2,... denote the times in between consecutive events of a Poisson process. These times are also called interarrival times, and in case of a claim arrival process represent the times between two consecutive claims. It is clear that P (T > t) = P (N(t) = ) = e λt, (3.5) and so T Exp(λ). Due to the independent and stationary increments, we know that T 2 is independent of T, and again Exp(λ) distributed. In fact, all T i, i =, 2,... are independent and Exp(λ) distributed. Apparently, under the assumptions of independence and stationarity, the counting process start at any point t all over again. In other words, the counting process is memoryless, and we could have expected that indeed T i Exp(λ). This brings us to the third definition of a Poisson process: Definition 3.2.3. The counting process {N(t), t } is a Poisson process with rate λ if N() =, the process has stationary and independent increments, and the times between consecutive events are independent and Exp(λ) distributed. We can define the time V n of the nth event as V n = T + T 2 + + T n, (3.6) 22

which is a sum of independent Exp(λ) random variables. Such sums were treated in Chapter 2. Since we know that the mgf of V n is then given by m Vn (t) = ( ) λ n, (3.7) λ t we can immediately conclude that V n Gamma(n, λ) with pdf f(u) = λn (n )! un e λu, u, (3.8) This distribution is also called the Erlang distribution. The fact that V n Gamma(n, λ) can also be obtained directly from the definition of the Poisson process, by observing that Hence, V n t N(t) n. (3.9) P (V n t) = P (N(t) n) = j=n λt (λt)j e j! n λt (λt)j = e. (3.2) j! j= Differentiation of (3.2) then yields (3.8). and so Further intuition follows from P (V n (u, u + h)) = P (N(u) = n, nth event in (u, u + h)) (3.2) λu (λu)n = e [λh + o(h)] (3.22) (n )! f(u) = d du P (V P (V n (u, u + h)) n u) = lim = λn h h (n )! un e λu, u. (3.23) Theorem 3.2.4. Let {N (t), t } and {N 2 (t), t } be two independent Poisson processes with rates λ and λ 2. Then {N (t) + N 2 (t), t } is a Poisson process with rate λ + λ 2. Theorem 3.2.5. If a Poisson process {N(t), t } with rate λ is randomly split into two subprocesses with probabilities p and p, then the resulting processes are independent Poisson processes with rates pλ and ( p)λ. This result allows a straightforward generalization to a split into more than two subprocesses. 23

3.3 Compound Poisson process A stochastic proces {S(t), t } is said to be a compound Poisson process if it can be represented as S(t) = S N(t) = X + X 2 + + X N(t), (3.24) where {N(t), t } is a Poisson process with rate λ, and X, X, X 2,... independent and identically distributed random variables, that are also independent of {N(t), t }. Notice that when X, that S(t) = N(t). If N(t) counts the number of claims in the time interval [, t], and X i is the size of claim i, then S(t) is the total claim size up to time t. From (3.5) and (3.6) we get E [ S N(t) ] = λte [X], (3.25) Var [ S N(t) ] = λte [ X 2 ]. (3.26) Moreover, since g N(t) (z) = e λt(z ), it follows from (3.3) that m SN(t) (u) = e λt(m X(u) ). (3.27) We call in this section the parameter of the mgf u instead of t, because t is already used as time parameter. Theorem 3.3.. Let S () (t), S (2) (t),..., S (n) (t) be independent compound Poisson processes with Poisson rates λ (), λ (2),..., λ (n) and generic claim sizes X (), X (2),..., X (n). Then S(t) = S () (t) + S (2) (t) + + S (n) (t) is again a compound Poisson process with rate λ = λ () + λ (2) + + λ (n) and generic claim size X (), w.p. λ () /λ, X (2), w.p. λ (2) /λ, X = (3.28). X (n), w.p. λ (n) /λ. Proof. m S(t) (u) = n n m S (i) (t) (u) = ( ) exp λ (i) t(m X (i)(u) ) i= i= ( n ) = exp λ (i) t(m X (i)(u) ) = exp ( i= λt ( n i= )) λ (i) λ m X (i)(u). (3.29) The proof is concluded by recognizing n i= λ(i) λ m X (i)(u) as the mgf of X. Theorem 3.3. shows that the sum of compound Poisson processes is again a compound Poisson process. This was proved via transform methods. A more direct proof can be obtained using the properties of the Poisson process. Take two independent compound Poisson processes S () (t), S (2) (t) with Poisson rates λ (), λ (2) and generic claim sizes X (), X (2). 24

Since the sum of two Poisson processes is again a Poisson process, events in the new process S = S () (t) + S (2) (t) will occur according to a Poisson process with rate λ () + λ (2) (Theorem 3.2.4), and each event, independently, will be from the first compound Poisson process with probability λ () /(λ () + λ (2) ) (see Proposition.3.2), yielding the generic claim size X = { X (), w.p. λ () /(λ () + λ (2) ), X (2), w.p. λ (2) /(λ () + λ (2) ). (3.3) 25