General information You nd all important information on the course webpage: TMA4265: Stochastic Processes https://wiki.math.ntnu.no/tma4265/2014h/start Please check this website regularly! Lecturers: Andrea Riebler Lecture: Andrea Riebler, room 1242 andrea.riebler@math.ntnu.no Eercises: Petter Arnesen, room 1026, August 18th, 2014 Petter.Arnesen@math.ntnu.no Time and place: Lecture: Monday 10:15-12:00 R9, Wednesday: 10:15-12:00 R9 Eercises: Monday 9:15-10:00 S1 Course outline Reference book: Ross, S. M., 2014, Introduction to Probability Models, 11th edition, Academic Press. Rough course content: Review probability (ch. 1-3) Markov chains (ch. 4) Poisson processes (ch. 5) Continuous-time Markov chains (ch. 6) Queueing theory (ch. 8) Brownian motion (ch. 10) Eercises and project There will be weekly eercises and one obligatory project work. Eercise sheets Eercises problems will be of mied type: theoretical questions to be solved with paper and pencil will be complemented with tasks that need programming. Programming can be done in either Matlab or R. In the lectures R will be used. Eercise classes on Monday 9:15-10:00 in S1 given by Petter Strongly recommended Questions to the current eercise sheet will be answered. New eercises will be presented on the blackboard. (Procedures for simulation of stochastic processes (ch. 11))
Eercises and project Eam The eam will be on 06.12.2014 at 09:00. Project The project is obligatory and has to be done in groups of two persons. You will have two weeks, whereby one week is lecture-free. Time plan available soon. The project will count 20% of the nal mark. Eamination aids: C Calculator HP30S, CITIZEN SR-270X, CITIZEN SR-270X College, Casio f-82es PLUS with empty memory. Tabeller og formler i statistikk, Tapir forlag. K. Rottman: Matematisk formelsamling. One yellow, stamped A5 sheet with own handwritten formulas and notes. Former eams and solutions can be found on: https://wiki.math.ntnu.no/tma4265/2014h/eam Quality ensurance: Reference group Statistical software R At least three students representing the major study programmes in the course should participate. The reference group should have an ongoing dialogue with other students throughout the semester. meet three times throughout the semester with the lecturer. Evaluate the course and write a report which will be published at the end. More information: https://innsida.ntnu.no/wiki/-/wiki/english/reference+ R is available for free download at The Comprehensive R Archive Network (http://cran.r-project.org) for Windows, Linu and Mac. Rstudio http://www.rstudio.org is an integrated development environment (system where you have your script, your run-window, overview of objects and a plotting window). A nice introduction to R is the book P. Dalgaard: Introductory statistics with R, 2nd edition, Springer which is also available freely to NTNU students as an ebook. groups+-+quality+assurance+of+education
Probability and random variables - Review (Chapter 1 & 2) Eample: Rolling a die with si sides Sample space: S = {1, 2, 3, 4, 5, 6} Probabilities: P(1) = P(2) = = P(6) = 1 6 E = even number = {2, 4, 6} S P(E) = P(2) + P(4) + P(6) = 1 2 F = dividable through 3 = {3, 6} S P(F ) = P(3) + P(6) = 1. 3 E and F are independent as: P(E F ) = P(6) = 1 = P(E) P(F ). 6 Introduction Sample space and events One fundamental term in stochastics is the probability, such as the probability P(E) for the occurrence of event E : P(E) = 1 : E does certainly occur P(E) = 0 : E does certainly NOT occur P(E) = p (0, 1) : E occurs with probability p Suppose we perform an eperiment whose outcome is not predictable in advance. However, let us suppose that the set of possible outcomes is known. This set is called sample space S. Event E S occurs when the outcome of the eperiment lies in E. Some notation: E F : union of E and F EF or E F : intersection of E and F : null event. If E F = then E and F are mutually eclusive. E c : complement of E.
Probabilities dened on events Probability for the union of two events For each event E S you can dene a number P(E) satisfying We have P(E) + P(F ) = P(E F ) + P(E F ) (i) 0 P(E) 1 (ii) P(S) = 1. (iii) For E 1, E 2,... that are mutually eclusive ( ) P E n = P(E n ). n=1 n=1 P(E) is called the probability of event E. or equivalently P(E F ) = P(E) + P(F ) P(E F ) If E and F are mutually eclusive, then P(E F ) = P(E) + P(F ) (see Eample 1.3) This can be also generalized to more than two events. Sylvester-Poincaré formula Conditional probabilities For arbitrary n N and events E 1, E 2,..., E n S P(E 1 E 2 E n ) = P(E i ) i i<j P(E i E j ) + P(E i E j E k ) i<j<k ±... + ( 1) n+1 P(E 1 E 2 E n ) In particular, for E, F, G S: P(E F G) = P(E) + P(F ) + P(G) P(EF ) P(EG) P(FG) + P(EFG) see illustration blackboard Suppose we toss two fair dice: The rst die is a four Given this information, what is the probability that the sum of the two dice equals si? The conditional probability P(E F ) is dened as the probability that event E occurs given that F has occurred. We can calculate it as P(E F ) = P(EF ) P(F ), P(F ) > 0
Multiplication theorem Independent events For arbitrary events E 1, E 2,..., E n with P(E 1 E 2... E n ) > 0 we have P(E 1 E 2... E n ) = P(E 1 ) P(E 2 E 1 ) P(E 3 E 1 E 2 ) P(E n E 1 E n 1 ) where the right side can obviously be factorized in any other possible order. In particular it follows P(E 1 E 2 ) = P(E 1 ) P(E 2 E 1 ) P(E 2 E 2 E 3 ) = P(E 1 ) P(E 2 E 1 ) P(E 3 E 1 E 2 ) Two events E and F are said to be independent if P(EF ) = P(E) P(F ) That means: P(E F ) = P(E) Thus, the knowledge that F has occurred does not aect the probability that E occurs. [See blackboard Eample 1.9] Comment: It can be shown that then also E and F c, E c and F, and E c and F c are independent. Law of total probability Suppose F 1, F 2,..., F n are mutually eclusive such that n F i = S. Then for each E in S we have Especially, P(E) = n P(E F i ) = n P(F i ) P(E F i ) Bayes formula Bayes formula relies on the asymmetry of the denition of conditional probabilities so that P(E F ) = P(EF ) P(F ) P(F E) = P(EF ) P(E) P(EF ) = P(E F ) P(F ) P(EF ) = P(F E) P(E) P(E) = P(F ) P(E F ) + P(F c ) P(E F c ) P(F E) = P(E F ) P(F ) P(E) = P(E F ) P(F ) P(E F ) P(F ) + P(E F c ) P(F c ) because F and F c build a mutually eclusive decomposition of S. In general P(F j E) = P(E F j ) P(F j ) n P(E F i) P(F i )
Eample: Diagnostic tests Random variables We shall not always be interested in an eperiment itself, but rather in some consequence of its random outcome. Eample: Rolling two dice [See blackboard] Let X := Sum of the two dice Such consequences, when real-valued, may be thought of as functions which map S into the real line R, and are called random variables. Random variables: Notation We shall always use: upper-case letters, such as, X, Y, and Z, to represent generic random variables lower-case letters, such as, y, and z, to represent possible numerical values of these variables. Discrete random variables A random variable X is discrete, if they can either take a nite or countable number of values. We dene the probability mass function p(a) of X by p(a) = P(X = a) The following properties have to be fullled: p( i ) = 1, p( i ) 0. The cumulative distribution functions F can be epressed in terms of p(a) by F (a) = P(X a) = p( i ) i: i a
Properties of the cumulative distribution function (CDF) Eample: Throwing four times a fair coin F () is monotone increasing (step function). F () is piece-wise constant with jumps at elements i, where p( i ) > 0. Let X the number heads we observe. [see blackboard] lim F () = 1. lim F () = 0. Discrete distributions There are dierent common distribution functions to model dierent random processes. The simplest eample is the Bernoulli distribution. A Bernoulli random variable can only take the values 0 and 1: p(1) = P(X = 1) = p. p(0) = P(X = 0) = 1 p, where p [0, 1] is the parameter of the distribution. The CDF is given by 0 : < 0 F (X ) = 1 p : 0 < 1 1 : 1 The binomial random variable Suppose that a sequel of n independent Bernoulli trials was performed. If X represents the number of successes that occur in the n trials, then X is said to be binomially distributed with parameters n N and p [0, 1] (symbolic X B(n, p)), and probability mass function ( ) n p(i) = p i (1 p) n i, i where ( ) n = i In R: dbinom, pbinom, rbinom n! (n i)!i! i = 0, 1,..., n
Illustration: Binomial distribution The geometric random variable p() 0.00 0.10 0.20 n = 10, p = 0.5 n = 10, p = 0.3 p() 0.00 0.02 0.04 0.06 0.08 n = 100, p = 0.5 n = 100, p = 0.3 An eperiment with probability p [0, 1] of being a success is repeated independently until a success occurs. Let X denote the number of trials required until the rst success, then X is said to be a geometric random variable with parameter p (symbolic X G(p)). 0 20 40 60 80 100 The probability mass function is given by F() 0.0 0.2 0.4 0.6 0.8 1.0 n = 10, p = 0.5 n = 10, p = 0.3 F() 0.0 0.2 0.4 0.6 0.8 1.0 n = 100, p = 0.5 n = 100, p = 0.3 0 20 40 60 80 100 p(n) = P(X = n) = (1 p) (n 1) p, n = 1, 2,... The CDF is given by F = P(X n) = 1 P(X > n) = 1 (1 p) n In R: dgeom, pgeom, rgeom The Poisson random variable Illustration: Poisson distribution A random variable X taking on one of the values 0, 1, 2,..., is said to be a Poisson random variable with parameter λ (symbolic X P(λ)), if for some λ > 0 p(i) = P(X = i) = λi i! e λ, i = 0, 1,... The parameter λ reects the rate or intensity with which the p() 0.0 0.1 0.2 0.3 λ = 1 λ = 3 F() 0.2 0.4 0.6 0.8 1.0 λ = 1 λ = 3 events in the underlying time interval occur.
Approimation of the binomial distribution Illustration (n = 10) (n, p) = (10, 0.8) (n, p) = (10, 0.5) The binomial distribution B(n, p) with parameters n and p converges to the Poisson with parameter λ if n and p 0 in such a way that λ = np remains constant. p() 0.00 0.10 0.20 0.30 Binomial Poisson p() 0.00 0.10 0.20 Binomial Poisson [proof see blackboard] p() 0.00 0.10 0.20 (n, p) = (10, 0.3) Binomial Poisson p() 0.0 0.1 0.2 0.3 0.4 (n, p) = (10, 0.1) Binomial Poisson Illustration (n = 100) Denition of continuous random variables (n, p) = (100, 0.8) (n, p) = (100, 0.5) p() 0.00 0.04 0.08 Binomial Poisson 0 20 40 60 80 100 p() 0.00 0.04 0.08 0 20 40 60 80 100 Binomial Poisson Idea: A random variable X is called continuous, if to arbitray values a < b from the support of X, every intermediate value in the interval [a, b] is possible. (n, p) = (100, 0.3) (n, p) = (100, 0.1) Problem: How to compute P(a X b) if there are p() 0.00 0.04 0.08 0 20 40 60 80 100 Binomial Poisson p() 0.00 0.04 0.08 0.12 0 20 40 60 80 100 Binomial Poisson uncountable many points in the interval [a, b].
Continuous distributions The CDF F () of continuous random variables A random variable X whose set of possible values is uncountable, is called a continuous random variable. A random variable is called continuous, if there eists a function f () 0, so that the cumulative distribution function F () can be written as Some consequences: F (a) = P(X a) = P(X = a) = a a f ()d = 0 P(X B) = B f ()d f ()d = 1 a f ()d Properties: F (a) is a nondecreasing function of a, i.e. if < y then F () < F (y) lim F () = 0 lim F () = 1 d F (a) = f (a) da P(a < X b) = F (b) F (a) = b a f ()d P(X > a) = 1 F (a) Normalizing constant The uniform random variable A normalizing constant c is a multiplicative term in f (), which does not depend on. The remaining term is called core: We often write f () g(). f () = c g() }{{} core A random variable X is a uniform random variable on the interval [α, β] if 1 β α if α < < β f () = 0 otherwise The CDF is given by 0 α F () = α β α α < < β 1 α β In R: dunif(, min=a, ma=b), punif, qunif, runif
Illustration of the uniform distribution Eponential random variables For a = 2 and b = 6: A random variable X is an eponential random variable with parameter λ > 0 (symbolic X E(λ)) if f() 0.00 0.10 0.20 F() 0.0 0.4 0.8 f () = The CDF is given by { F () = { λ ep( λ) if 0 0 else 1 ep( λ) if 0 0 if < 0 In R: dep(, rate=lambda), pep, qep, rep Illustration of the eponential distribution Property of the eponential distribution Probability density function CDF f() 0.0 0.2 0.4 0.6 0.8 λ = 0.9 λ = 0.5 λ = 0.3 F() 0.0 0.2 0.4 0.6 0.8 1.0 λ = 0.9 λ = 0.5 λ = 0.3 The eponential distribution is closely related to the Poisson distribution: The number of events in an interval is Poisson distributed with parameter λ, if the time intervals between subsequent events are independent and eponentially distributed with λ.
Gamma random variables Illustration of the gamma distribution A random variable X is a gamma random variable with shape and rate parameters α, β > 0 (symbolic X G(a, b)) if its density is given by: f () = β α Γ(α) α 1 ep( β) for 0 0 else Here, Γ(α) denotes the gamma function Γ(α) = 0 α 1 ep( ) d where Γ( + 1) =! for N 0 and Γ( + 1) = Γ(). f() 0 1 2 3 4 Probability density function 0.0 0.2 0.4 0.6 0.8 1.0 α = 2.0, β = 3 α = 1.2, β = 3 α = 2.0, β = 6 α = 1.2, β = 6 F() 0.0 0.2 0.4 0.6 0.8 1.0 CDF 0.0 0.2 0.4 0.6 0.8 1.0 α = 2.0, β = 3 α = 1.2, β = 3 α = 2.0, β = 6 α = 1.2, β = 6 In R: dgamma(, shape = α, rate = β), pgamma(...), rgamma(...) Properties of the gamma distribution Normal random variables For α = 1 the gamma distribution is equal to the eponential distribution with paramter λ = β. For α = d with d N and β = 1 the gamma distribution is 2 2 equal to the χ 2 -distribution with d degrees of freedom We say that X is a normal random variable with parameters µ R and σ 2 R + (or symbolic X N (µ, σ 2 )) if its density is f () = 1 ) 1 ( 2π σ ep ( µ)2 2σ 2 for R The density function is a bell-shaped symmetric curve around µ. For µ = 0 and σ 2 = 1 the distribution is called standard normal distribution. If X N (µ, σ 2 ), then Y = αx + β N (αµ + β, (ασ) 2 ). In R: dnorm(, mean, sd), pnorm(...), rnorm(...)
Illustration of the normal distribution Probability density function CDF Epectation of a random variable To describe a probability density distribution one distinguishes between location parameters (epected values, median, mode) f() 0.0 0.1 0.2 0.3 0.4 µ = 0, σ = 1 µ = 2, σ = 1 µ = 0, σ = 2 10 5 0 5 10 F() 0.0 0.2 0.4 0.6 0.8 1.0 µ = 0, σ = 1 µ = 2, σ = 1 µ = 0, σ = 2 10 5 0 5 10 scale parameters (variance, standard deviation) The epected value E(X ) of a discrete random variable X is dened as E(X ) = In the continuous case we have E(X ) = :p()>0 p() f () d [see blackboard for Poisson] Eamples Transformations of random variables Discrete distributions: Binomial case: E(X ) = np Geometric case: E(X ) = 1/p Poisson case: E(X ) = λ Continuous distributions: Uniform case: E(X ) = (β + α)/2 Eponential case: E(X ) = 1/λ Normal case: E(X ) = µ We are often interested in transformations of random variables, where a transformation is any function of the random variables. Eamples: Changing units, e.g. changing temperature scales from Celsius to Fahrenheit. Changing scales, from ep to log. Computing ratios: X /Y. Computing non-linear arithmetics.
Epectation of a function of a random variable Let X be a discrete random variable and g a real-valued function, then E[g(X )] = :p()>0 g()p() Let X be a continuous random variable and g a real-valued function, then E[g(X )] = g()p()d Possible linear transformations adding a constant: X + b, multiplying by a constant: ax, both: ax + b. Eample: [ F ] = 9 5 [ C] + 32 0 200 400 600 800 0 20 40 60 X inde 0 200 400 600 800 0 20 40 60 X + 32 inde + 32 0 200 400 600 800 0 20 40 60 9/5*X + 32 inde 9/5* + 32 Linear transformation of the mean If a and b are constants, then E(aX + b) = a E(X ) + b. Proof (continuous case). Set g(x ) = ax + b E(aX + b) = (a + b)f ()d = a f ()d + b f ()d = a f ()d }{{} E() +b f ()d }{{} 1 = a E(X ) + b. Variance of a random variable The variance of a random variable X, denoted by Var(X ) is dened as Var(X ) = E[(X E[X ]) 2 ] For simpler calculation you can use that Var(X ) = E[X 2 ] E[X ] 2 Proof. Var(X ) = E[(X E[X ]) 2 ] = E[X 2 2X E[X ] + E[X ] 2 ] = E[X 2 ] E[2X E[X ]] + E[E[X ] 2 ] = E[X 2 ] 2 E[X ] E[X ] + E[X ] 2 = E[X 2 ] E[X ] 2
Property of the variance Jointly distributed random variables The joint cumulative probability distribution function of two continuous random variables X and Y is given by F (, y) = P(X, Y y); <, y < Var(aX + b) = a 2 Var(X ) for all a, b R Variance is unchanged by addition/subtraction of a constant! Alternatively one can use the joint probability density function f (, y) to get the joint cumulative probability distribution function as Thus: F (, y) = y v= f (u, v) du dv u= d 2 F (, y) = f (, y) d dy Jointly distributed random variables (II) The marginal density function of X (or Y ) can be obtained by: + f X () = f (, y) dy + or f Y (y) = f (, y) d Further E[g(X, Y )] = For eample, g(, y)f (, y)ddy Eample: Binomial distribution The epected value of a binomially distributed random variable X must be E[X ] = np, as X can be represented as a sum of n independent Bernoulli-distributed random variables X i B(p), with i = 1,..., n. For each X i, we have E[X i ] = p, so that [ n ] n E X i = E[X i ] = np. E[aX + by ] = a E(X ) + b E(Y ) or more general [ n ] E a i X i = n a i E[X i ]
Independent random variables Epectation value The random variable X and Y are set to be independent if, for all a, b, P(X a, Y b) = P(X a) P(Y b) In terms of the joint distribution function F of X and Y : F (a, b) = F X (a)f Y (b) for all a, b When X and Y are discrete we have If X and Y are independent, then for any functions h and g E(g(X )h(y )] = E[g(X )] E[h(Y )] Further, Var(X + Y ) = Var(X ) + Var(Y ) p(, y) = p X ()p Y (y), or f (, y) = f X ()f Y (y) in the continuous case. Covariance Properties of the covariance A measure for the linear stochastic dependency of two random variables X and Y is the covariance Cov(X, Y ) = E[(X E[X ])(Y E[Y ])] = E[XY X E[Y ] Y E[X ] + E[X ] E[Y ]] = E[XY ] E[X ] E[Y ] If X and Y are independent it follows that Cov(X, Y ) = 0. A positive value of Cov(X, Y ) is an indication that Y tends to increase as X does, whereas a negative value indicates that Y tends to decrease as X increases. Cov(X, X ) = Var(X ) Cov(X, Y ) = Cov(Y, X ) Cov(a + bx, c + dy ) = b d Cov(X, Y ) Cov(X, Y + Z) = Cov(X, Y ) + Cov(X, Z) Thus n m Cov X i, Y j = j=1 j=1 n m Cov(X i, Y j )
Variance of the sum of random variable ( n ) n Var X i = Cov X i, j=1 n j=1 X j n n = Cov(X i, X j ) = = n Cov(X i, X i ) + n Var(X i ) + 2 n Cov(X i, X j ) j<i If X i, i = 1,..., n are independent then ( n ) n Var X i = Var(X i ) j i n Cov(X i, X j ) Convolution Let X and Y independent random variables with probability mass function p X () and p Y (y). Let Z = X + Y, then f Z (z) is called convolution of p X () and p Y (y). P(X + Y = z) = = indep. = = P(X =, X + Y = z) P(X =, Y = z X ) P(X = ) P(Y = z X = ) p X () p Y (z ) }{{} called convolution of p X and p Y Limit theorems Markov's inequality: (Proofs see blackboard) Strong law of large numbers If X is a random variable that takes only nonnegative values, then for any value of a > 0 P(X a) E(X ) a Chebyshev's inequality: If X is a random variable with mean µ and variance σ 2, then for any value of k > 0 (1) Let X 1, X 2,... be a sequence of independent random variables having a common distribution, and let E[X i ] = µ. Then, with probability 1, X 1 + X 2 +... + X n µ, n n P( X µ k) σ2 (2) k 2 With (1) and (2) bounds on probabilities can be computed knowing only mean, or mean and variance, of the probability distribution.
Illustration Arithmetic mean for 5000 standard normal distributed random variables. par(ce.lab=1.4, ce.ais=1.3, ce.main=1.4, lwd=2) <- rnorm(5000) plot(cumsum()/1:5000, type="l", lab="n", ylab="arithmetic mean", ylim=c(-1,1)) abline(h=0, lty=2, col="red") Central limit theorem The central limit theorem says, that the arithmetic mean, appropriately standardized, of arbitrary independent and identically distributed random variables converges to the standard normal distribution. Arithmetic mean 1.0 0.0 0.5 1.0 0 1000 2000 3000 4000 5000 A random variable X is called standardized, if E(X ) = µ = 0 Var(X ) = σ 2 = 1 n Standardization Central limit theorem Every random variable X with nite epected value E(X ) = µ and nite variance Var(X ) = σ 2 can be standardized using a linear transformation. Dene Then, obviously, E( X ) = (E(X ) µ)/σ = 0 Var( X ) = Var(X )/σ2 = 1 X = X µ σ Let X 1, X 2,... be a sequence of independent, identically distributed random variables, each with mean µ and variance σ 2. Then the distribution of X 1 + X 2 +... + X n nµ σ n tends to the standard normal as n. This holds for any distribution of the X i s. Alternatively, X 1 +... + X n N (nµ, nσ 2 ) as n.
Stochastic processes A stochastic process {X (t), t T } is a collection of random variables, i.e. for each t T, X (t) is a random variable. inde t is often the time, so that X (t) is referred to as the state at time t. When T is a countable set, we have a discrete-time process. When T is an interval of the real line, we have a continuous-time process. The state space is dened as the set of all possible values that X (t) can assume. We consider processes with discrete random variables X (t). Dierent dependence relations among the random variables X (t) might be assumed.