A primer on basic probability and Markov chains

Size: px
Start display at page:

Download "A primer on basic probability and Markov chains"

Transcription

1 A primer on basic probability and Markov chains David Aristo January 26, 2018 Contents 1 Basic probability Informal ideas and random variables Probability spaces Independence and conditional probability Memoryless random variables Normal random variables Expectation Mean and variance Moment generating functions Markov and Chebyshev inequalities Law of large numbers Central limit theorem Modes of convergence Borel Cantelli lemma Markov chains Informal ideas The Markov property Hitting times and the strong Markov property Irreducibility and aperiodicity The Perron-Frobenius theorem Mean rst passage times Stationary distributions and convergence Ergodic theorem Left and right actions Innite state space

2 1 Basic probability This chapter is a crash course in basic probability. It attempts to present the main ideas in elementary probability without resorting to any measure theoretic technicalities or proofs. In particular, we will consider only measurable sets and functions. For this reason the presentation is somewhat incomplete, but it will not matter for the applications we need in later chapters. For a more detailed presentation at about the same level, see [1]. For a higher level measure-theoretical treatment see [2]. 1.1 Informal ideas and random variables A random variable is, informally, a real number X whose value is random. We think of the value of X as depending on some outcome ω. For instance, consider rolling two dice and let X be the sum of the rolls. Then if the outcome is ω = (1, 3) we have X = 4. The letter P stands for the probability of some subset of outcomes, or event. In this case P(X = 4) = 3/36 since there are 3 ways for the sum of rolls to be 4 namely if the outcome is ω = (1, 3), (2, 2) or (3, 1) and 36 total outcomes. Here, the event {X = 4} corresponds to the collection {(1, 3), (2, 2), (3, 1)} of outcomes. Often we use commas to stand for two or more events occuring simultaneously. For instance P(X odd, X 4) = 1/3 stands for the probability X is odd and X 4. Much of the information about a random variable X is contained in its cumulative distribution function: Denition 1.1. The cumulative distribution function (CDF) of X is F X (x) = P(X x). The CDF of X gives the probability the random variable is less than or equal to a given number. When F X is dierentiable, its derivative gives the probability the random variable is near a given number. Denition 1.2. If it exists, p X := F X is the probability density function (PDF) of X. Note that if X has a PDF p X, then P(a < X b) = F X (b) F X (a) = b a p X (x)dx. For a simple example where p X doesn't exist, consider X that always takes the value 0. Then F X (x) = 0 for x < 0 and F X (x) = 1 for x 0, so F X isn't dierentiable. If X only takes integer or discrete values, we also dene the following. Denition 1.3. Suppose X can only take nonnegative integer values. Then its probability mass function (PMF) is f X (k) = P(X = k). We'll usually consider random variables with PDFs or PMFs. Exercise 1.4. Let X be the sum of two die rolls. Find the PMF and CDF of X. 2

3 The simplest random variable with discrete values is the Bernoulli random variable: Denition 1.5. X is a Bernoulli-p random variable if its PMF is p, k = 1 f X (k) = 1 p, k = 0, 0, else where p (0, 1). Note that a Bernoulli-1/2 random variable corresponds to a (fair) coin toss. A Bernoulli-p random variable corresponds to a biased coin toss, where the probability of (say) heads is p. The simplest random variable with continuous values is the uniform random variable: Denition 1.6. X is a uniform random variable in [0, 1] if its CDF is 0, x < 0 F X (x) = x, x (0, 1). 1, x 1 Sometimes X is also called a uniform random number in [0, 1]. X corresponds to sampling a point at random in [0, 1]. Imagine [0, 1] as a dartboard: the probably of throwing a dart that lands on the board to the left of x [0, 1] is just x. Exercise 1.7. Find the PDF of a uniform random variable. We use the word distribution to refer to either the CDF, PDF, or PMF of a random variable. Since the distribution of a random variable contains most of the relevant information, we will often introduce X simply by giving its distribution. However, a distribution does not completely determine a random variable, as we see in the next section. In particular, the distribution of a random variable does not give information about how it is related to other random variables. Exercise 1.8. Show that F X is nondecreasing and lim x F X (x) = 0, lim x F X (x) = 1. Show that if F X has PDF p X, then p X(x)dx = 1. The CDF F X is also right-continuous. A proof of that requires some set-theoretic technicalities, so we skip it. 3

4 1.2 Probability spaces Recall that events are subsets of outcomes to which probabilities are assigned. Thus, events are simply certain subsets of Ω. (In general, not all subsets of Ω are events, since not all such subsets can be assigned probabilities, but we will ignore this complication. In the context of measure theory, this means we assume everything is measurable.) Formally, P is a function that assigns a number in [0, 1] to each event, with the property that P(Ω) = 1 and: Denition 1.9 (Countable additivity). If A 1, A 2,... are disjoint events, then P( k=1a k ) = P(A k ). k=1 It is easy to check that countable additivity implies countable subadditivity: P( k=1a k ) P(A k ), for any events A k. k=1 Perhaps the simplest probability measures are the uniform ones, which assign the same probabilities to all individual outcomes or events of the same size: Example 1.10 (Uniform measure on a nite set). Suppose Ω is a nite set. If P(A) = A / Ω for each A Ω, where S denotes the number of elements in a set S, then P is called the uniform probability measure on Ω. The uniform measure can be dened on certain innite outcome spaces Ω. In this case the condition P(Ω) = 1 forces each single outcome ω Ω to have zero probability: Example 1.11 (Uniform measure on the unit interval). If Ω = [0, 1] and P([a, b]) = b a for 0 a b 1, then P is called the uniform probability measure on Ω. Formally, random variables are just real-valued functions of outcomes, that is, functions X : Ω R. Denition The CDF F X (x) of X is formally dened by F X (x) = P(X 1 (, x)). It is standard to write P(X x) instead of P(X 1 (, x]). Similarly, we write P(X = x) instead of P(X 1 ({x})), and for A Ω we write P(X A) instead of P(X 1 (A)). Exercise Take the two die roll example from above, and let Ω = {1,..., 6} 2, and let P be uniform on Ω. Dene X : Ω R by X(ω) = ω 1 + ω 2. Compute its probability mass function using the formula f X (k) = P(X 1 ({k}), k N. 4

5 Random variables are not completely determined by their CDFs, For instance, let P be uniform on Ω = [0, 1] and dene X : Ω R by X(ω) = ω. Let P be uniform on Ω = [1, 2] and dene X : Ω R by X(ω) = ω 1. Then X and and X have the same CDF (what is it?). Even though X and X are technically not the same, we will usually consider them as equivalent. The following example is also instructive. Suppose X is dened as above and dene Y : Ω R by ˆX(ω) = 1 ω. Then Y has the same distribution as X (check this!). Dene also Z : Ω R by Z(ω) = 0 if ω < 1/2 and Z(ω) = 1 if ω 1/2. Then P(X [0, 1/2), Z = 0) = 1/2 while P(Y [0, 1/2), Z = 0) = 0. (By convention, the comma separating the events stands for and.) Thus, though X, Y have the same distribution, they are related to Z in dierent ways. To describe the relation between X, Z or Y, Z, we need a joint distribution. For instance the joint CDF of X and Z is F X,Z (x, z) = P(X x, Z z). On the other hand if two random variables are independent (dened below), then knowing their individual distributions is enough to know how they behave together. 1.3 Independence and conditional probability Let A, B be events. Denition A, B are independent if P(A B) = P(A)P(B). For example, consider two die rolls. Let A be the event the rst roll is odd and B the event the second roll is 4. It is easy to see that P(A)P(B) = 1 2 = 1. On the other hand, there are 3 4 outcomes in which the rst roll is odd and the second roll is 4. Thus, P(A B) = 12 = 1. Intuitively, A, B are independent if they have nothing to do 36 3 with each other. In this example, this is automatic because A and B depend on distinct die rolls. Note A, B independent is dierent from A, B being disjoint: if A B = then P(A B) = 0, while P(A), P(B) may be nonzero. Denition The conditional probability of A given B is P(A B) = P(A B). (1) P(B) Let Q be the probability dened by Q(A) = P(A B)/P(B). It corresponds to the original probability P with an added assumption that we know B occurred. Thus, P(A B) is the probability A occurs, given that B occurred. Consider two die rolls. Let B be the event that the rst roll is odd, and A the event that the rolls add to 5. Consider rst P(A B). Assuming B occurred, we know that the rst roll is either 1, 3 or 5, and each has the same probability 1/3. If the rst roll is 1 or 3, the sum of the rolls is 7 only when the second roll is 4 or 1, respectively, both which have probability 1/6. If the rst roll is 5 there is no way the sum is 5. Thus, P(A B) = =

6 On the other hand, consider P(A B). There are only two outcomes namely (1, 4) and (3, 2) for which the rst roll is odd and the rolls add to 5. Thus, P(A B) = 2/36 = 1/18. Since P(B) = 1/2 we veried equation (1). Note that if P(B) > 0 and A, B are independent, P(A B) = P(A). It says that, given that B occurred, it tells us nothing about the probability of A. Exercise Consider two die rolls. Let A be the event that the rst roll is 3, B the event that the second roll is 4, and C the event that the rolls add to 7. Show that each pair of A, B, C are independent but P(A B C) P(A)P(B)P(C). Denition Random variables X 1, X 2,... are independent if ( ) P {X j I j } = P(X j I j ) j J j J for all nite subsets J N and intervals I j R. For example, consider an innite die roll in which the outcome space is Ω = {1,..., 6} N, or sequences ω = (ω 1, ω 2,...) of numbers in ω i {1,..., 6}. Let X i be 1 if the ith roll is a 6, and 0 otherwise. That is, X i (ω) = 1 {ωi =6}, where 1 A = 1 if A is true and 0 otherwise. Then X 1, X 2,... are independent. Note that Exercise 1.16 shows something like Denition 1.17 is needed for independence: pairwise independence of the X i 's is not enough. Exercise Use Denition 1.17 to write a denition of when a sequence A 1, A 2,... of events are independent. The following is an immediate consequence of the denition of conditional probability. Theorem Let A k, k = 1, 2,... be disjoint events with P( k=1 A k) = 1. Then for any event A. P(A) = P(A A k )P(A k ) k=1 The theorem above is sometimes called the rule of total probability. Despite its simplicity, this rule (and variants theoreof) are arguably among the most useful results in probability. 1.4 Memoryless random variables Denition An exponential-λ random variable has CDF F X (x) = 1 e λx. λ > 0 is called the rate. X can be constructed by taking P uniform on Ω = (0, 1) and dening X(ω) = λ 1 log(1 ω) (check this!). Actually, this provides a way of sampling X: pick a uniform random number ω in (0, 1), and then compute X(ω). 6

7 Denition A geometric-p random variable has PMF f X (k) = (1 p) k 1 p. A geometric random variable can be constructed by taking X i 's to be independent Bernoulli random variables (Denition 1.5), and setting X = min{i : X i = 1}. Intuitively, this corresponds to taking innitely many biased coin tosses say probability of heads is p and then taking the rst toss that is heads. If the rst heads is on the kth toss then the rst k 1 tosses were tails and the kth was heads. Then by independence, P(X = k) = (1 p) k p. Exercise Show that if X is exponential or geometric, then P(X > t + s X > t) = P(X > s) where t, s > 0 if X is exponential, and t, s N if X is geometric. Because of the property in Exercise 1.22, geometric and exponential random variables are used to model random arrivals. Intuitively, the amount you have waited so far for an arrival has nothing to do with the additional amount of time you'll have to wait. For this reason these random variables are called memoryless. 1.5 Normal random variables Denition A standard normal random variable has PDF p X (x) = 1 2π e x2 /2. This is the so-called bell curve. Denition Let Y 1, Y 2,... be independent Bernoulli-1/2 random variables and X i = 2Y i 1. Then S n = X X n is called the simple random walk. Thinking of n as time, S n starts at 0 and at each time step moves 1 unit right or left with equal probability. Let F n be the CDF of n 1 S n. It turns out that F n converges to F (x) = x 1 2π e t2 /2 dt, the CDF of a standard normal random variable. Thus, a standard normal random variable is simply the (appropriately rescaled) position of a simple random walk after a long time. Denition A sequence X n, n = 1, 2,... of random variables is iid if it is independent and each X i has the same distibution (meaning the same CDF, PMF or PDF). The word iid is an acronym for independent and identically distributed. The sequence X 1, X 2,... dening the random walk is iid. It turns out that the standard normal distribution can be obtained in a similar way from any reasonable iid sequence. This is the central limit theorem, which will be proved below. 1.6 Expectation The expected value or expectation of X, denoted E(X), is dened as: 7

8 Denition For a random variable X with PDF p X, E(X) = For a random variable X with PMF f X, xp X (x)dx. (2) E(X) = kf X (k). (3) k=1 When X does not have a PDF or a PMF, it takes more work to dene E(X). We will skip this because it involves measure and integration theory outside the scope of this section. However, we note that if X is nonnegative, then E(X) = (1 F 0 X(x))dx. The expected value of X corresponds to the average value it takes. That is, equation (3) says the expected value of X is the sum of values it takes, multiplied by the probability of each value. Equation (2) says something analogous for continuous valued random variables. Expectation is linear, that is, E(X + Y ) = E(X) + E(Y ). If g : R R, then g X is a random variable, usually written g(x), and its expectation is given by (2) or (3) under appropriate conditions. The connection between expectation and probability is as follows. Let 1 X A be the function which is 1 if X A and 0 otherwise, called the indicator function of the event X A. Then P(X A) = E (1 X A ). Exercise Consider two die rolls and let X be the sum of the rolls. Find E(X). Exercise Find E(X) when X is geometric-p. Repeat for X exponential-λ. The following is a useful consequence of independence. Theorem If X, Y are independent, then whenever all the expectations exist. E(f(X)g(Y )) = E(f(X))E(g(Y )) (4) We skip the proof, which requires some measure and integration theory that is outside the scope of this section. Exercise Check that equation (4) holds when X, Y are independent and f(x) = 1 if x A and 0 otherwise, and g(x) = 1 if x B and 0 otherwise (here we write f = 1 A and g = 1 B ). 8

9 1.7 Mean and variance Denition The mean of X is its expected value E(X). The mean of X is often written µ Not all random variables have nite mean. For example, the Cauchy distribution, which has PDF p X (x) = 1 π 1 1+x 2, has E(X) =. Denition The variance of X is V ar(x) = E((X µ) 2 ), with µ the mean of X. The variance of X is often written σ 2. The standard deviation is its square root σ. The variance is a measure of how much X tends to dier from its mean. If X has a PDF p X, then its variance is σ 2 = (x µ) 2 p X (x)dx. Exercise Let X 1, X 2 be independent random variables with means µ 1, µ 2 and variances σ 2 1, σ 2 2. Show that the mean of X 1 + X 2 is µ 1 + µ 2 and the variance of X 1 + X 2 is σ σ 2 2. If c > 0 is constant, show the variance of cx 1 is c 2 σ 2 1. The sample variance of x 1,..., x n R is dened by ˆσ 2 = 1 n 1 n (x i x) 2, x = 1 n k=1 n x i. Sample variance can be thought of as an empirical quantity, while variance of a random variable is a theoretical quantity. The next exercise shows the relationship between the two types of variance. Exercise Let X 1,..., X n be iid random variables with variance σ 2. X n = 1 n n 1 i=1 (X i X) 2 has E( X n ) = σ 2, where X = 1 n n i=1 X i. k=1 Show that The normalization 1/(n + 1) is really needed in Exercise 1.34; if 1/n is used instead, there is a slight bias in the estimate of Xn of σ 2, meaning its expected value is incorrect. Exercise Find the mean and variance of an exponential-λ random variable. 1.8 Moment generating functions Let X be a random variable. Denition For k N, the kth moment of X is E(X k ). The rst moment of X is its expected value or mean. The second moment is related to its variance via V ar(x) = E(X 2 ) E(X). Usually people think of the third moment as measuring skewness of X (i.e. asymmetry about its mean) and the fourth moment as measuring the heaviness of its tail, meaning its propensity to take large positive or negative values (called kurtosis). In general, the moments together dene the moment generating function, which in turn can be used to characterize a random variable. 9

10 Denition The moment generating function (MGF) of X is ϕ X (s) = E(exp(sX)). Note that ϕ X (s) may be innite for some s, so the domain of ϕ X is simply those s for which the expectation is nite. Why is it called the MGF? Note that, by Taylor expansion, ) ϕ X (s) = E (1 + sx + s2 X , 2 and so under appropriate conditions (for being able to dierentiate inside the integral), ϕ X(0) = E(X). Similar calculations show for k N, and ϕ (k) X the kth derivative of ϕ X, ϕ (k) X (0) = E(Xk ). Note that the MGF of X can be understood as a Laplace transform of X. Under appropriate conditions the Laplace transform is invertible, in which case the MGF of X can be used to identify the distribution of X. Theorem If two random variables have the same MGF, then they have the same distribution. More generally, if their MGFs agree in a neighborhood of 0, then they have the same distribution. Exercise Show that if X n, n = 1, 2,... are iid with common MGF φ, then the MGF of S n = X X n is φ Sn (s) = φ(s) n. Exercise Find the MGF of an exponential-λ random variable. Exercise Show that the MGF of a standard normal random variable is φ(s) = e t2 / Markov and Chebyshev inequalities Theorem 1.42 (Markov inequality). Let X be a nonnegative random variable. Then for each a > 0. P(X a) E(X) a Assuming X has a PDF p X, Markov's inequality follows from E(X) = 0 xp X (x)dx a ap X (x)dx = ap(x a). Corollary 1.43 (Chebyshev inequality). Let X have mean µ and variance σ 2. Then P( X µ kσ) 1 k 2. The corollary is proved by applying the Markov inequality to (X µ) 2 with a = k 2 σ 2. 10

11 1.10 Law of large numbers The law of large number says that the average of n iid random variables converges, in some sense, to their common mean µ as n. Theorem 1.44 (Weak law of large numbers). Let X n, n = 1, 2,... be iid with nite mean µ and variance σ 2 <. Set X n = n 1 (X X n ). Then for each ɛ > 0, lim P( X n µ ɛ) = 0. (5) n By Exercise 1.34, the variance of Xn is n 1 σ 2. So by Chebyshev's inequality, P( X n µ ɛ) σ2 nɛ 2. The type of convergence in (5) is called convergence in probability. We will examine this and other modes of convergence below. The strong law of large numbers is as follows. Theorem 1.45 (Strong law of large numbers). Let X n, n = 1, 2,... be iid random variables with nite mean µ. Set X n = n 1 (X X n ). Then P(lim n Xn = µ) = 1. The type of convergence in Theorem 1.45 is stronger than that of Theorem We will compare these and other modes of convergence in Section 1.12 below. We outline an alternative proof of a weak law of large numbers in the next exercise. Exercise Let X n, n = 1, 2,... be iid with E[X 1 ] = µ and common MGF φ whose domain of denition contains a neighborhood of 0. Dene X n = n 1 (X X n ). Show that the MGF of Xn is φ Xn (s) = e sµ. (See Exercise 1.39.) Conclude that the distribution of Xn converges to the distribution of the constant random variable X µ Central limit theorem Recall that if X n, n = 1, 2,... are iid with mean µ, then by the law of large numbers n 1 (X X n ) tends to µ. Assume WLOG µ = 0. Then n 1 (X X n ) tends to 0, but if we scale instead by n 1/2 we get a nontrivial distribution, the bell curve. Theorem 1.47 (Central limit theorem). Let X n, n = 1, 2,... be iid with E(X 1 ) = 0 and σ 2 = E(X 2 1) <. Let Z n = X X n σ. n Then the distribution of Z n converges to a standard normal distribution. To see why the theorem is true, assume the MGF ϕ of X 1 is dened in a neighborhood of 0 (this assumption is not needed but makes the proof easy). Since E(X 1 ) = 0 and E(X 2 1) = σ 2, by Taylor expansion, ϕ(s) = 1 + s2 σ 2 + O(s 3 ). 2 11

12 Let ϕ n be the MGF of Z n. By independence of the X i 's (see Exercise 1.46), ( ) n ) n ( ) s ϕ n (s) = ϕ σ = (1 + s2 s 2 n 2n + O(n 3/2 ) exp as n. 2 The last expression is the MGF of a standard normal random variable (see Exercise 1.41) Modes of convergence Here we briey discuss some of the modes of convergence discussed above. Denition X n converges to X in distribution, written X d n X, if the CDF of X n converges pointwise to the CDF of X on the set where the CDF of X is continuous. The following alternative characterization of convergence in distribution is often useful: X n d X if and only if E(f(X n )) E(f(X)) for all bounded continuous f : R R. Denition X n converges to X in probability, written X n p X, if for each ɛ > 0, lim P( X n X ɛ) = 0. n Theorem Convergence in probability implies convergence in distribution. We sketch a proof. Let f be bounded and continuous. Given ɛ > 0, where M = 2 sup f. Thus, f(x n ) f(x) ɛ + M1 fn(x) f(x) ɛ E ( f(x n ) f(x) ) ɛ + MP ( f(x n ) f(x) ɛ) ɛ as n, since X n p X implies f(xn ) p f(x) (Exercise 1.51). Thus, E(f(X n )) E(f(X)). Exercise Show that X n p X implies f(xn ) p f(x) for any continuous f. Denition X n converges to X a.s., written X n The a.s. stands for almost sure. a.s. X, if P(lim n X n = X) = 1. Theorem Almost sure convergence implies convergence in probability. This follows from Fatou's lemma from measure theory; we omit proof. We have seen almost sure convergence implies convergence in probability which implied convergence in distribution. We now give examples to show the reverse implications do not hold. First, consider uniform measure P on Ω = [0, 1], and dene X n (ω) = ω for n even and X n (ω) = 1 ω for n odd. Let X be a uniform random variable in [0, 1]. Then all X n 's have the same distribution, the distribution of X (see Denition 1.6). However, for 12

13 n odd, P( X n X < ɛ) = P({1/2 ɛ/2 < X n < 1/2 + ɛ/2) = ɛ. Thus, X n does not converge to X in probability. Next, let X n be independent Bernoulli-1/n, that is, P(X n = 1) = 1/n and P(X n = 0) = 1 1/n. Then X n converges in distribution and in probability to 0, but P(lim sup X n = 1) = 1 n by the Borel Cantelli lemma, proved below. Thus, X n does not converge almost surely to 0. (See the discussion below Theorem 1.54 below.) 1.13 Borel Cantelli lemma Theorem Let E n, n = 1, 2,... be independent events with n=1 P(E n) =. Then ( ) P E k = 1. n=1 k n Before proving the theorem, consider its application to the example at the end of the last section. Let E n = {X n = 1} be the event that X n = 1. Then the hypotheses of Theorem 1.54 are satised, so P ( n=1 k n {X n = 1}) = 1. Observe that k n {X n = 1} = {sup k n X k = 1} and thus { } {X n = 1} = sup X k = 1 for all n n=1 k n k n { } = lim sup X n = 1 n. Since this event has probability 1, X n does not converge to 0 almost surely. Now we turn to a proof of the theorem. By DeMorgan's laws, it suces to show that ( ) P = 0, n=1 k n where Ek c = Ω \ E k denotes the complement of E. We have ( ) P = P(Ek) c k n k n E c k E c k = k n (1 P(E k )) k n exp ( P(E k )) = exp ( k n P(E k ) where we used independence and 1 x e x. Countable subadditivity gives the result. 13 ) = 0.

14 2 Markov chains This chapter is a crash course on the basics of Markov chains. For simplicity of presentation we mostly assume nite state space. We will describe how results generalize to countably innite or uncountable state space only when needed for later use. For a more detailed presentation see [4]. 2.1 Informal ideas Informally, a Markov chain is a random walk in which the next step of the walk depends only on the current position. For instance, let X i, i = 1, 2,... be iid random variables with P(X i = 1) = P(X i = 1) = 1/2. Then S n = X X n is a random walk where at each step we go right or left with equal probability. This is called the simple random walk. Markov chains generalize this example by allowing the steps to have any distribution which depends only on the current position. 2.2 The Markov property Throughout, J is a nite set and X n is a sequence of random variables with values in J. Here J is called state space and elements of J are called states. Denition 2.1. X n is a Markov chain if for each n 0 and i 0,..., i n 1, i, j I, P(X n+1 = j X 0 = i 0,..., X n 1 = i n 1, X n = i) = P(X n+1 = j X n = i). (6) Equation (11) is called the Markov property. We think of the subscript n as time or a time step and the sequence X n of the time evolution of a random walk. Due to the Markov property, this evolution is completely described by the matrices P n (i, j) = P(X n+1 = j X n = i). We are mostly interested in cases where these matrices do not depend on n, in which case X n is called time homogeneous. We will assume our Markov chains are time homogeneous unless otherwise specied. Denition 2.2. The transition matrix P of a time homogeneous Markov chain is It is easily checked that P ij = P(X n+1 = j X n = i). P(X n = j n,..., X 1 = j 1 X 0 = j 0 ) = P j0 j 1... P jn 1 j n. Moreover, probabilities of X n after n steps can be obtained from the nth power of P : Exercise 2.3. Use the denition of conditional probability and induction to show that where (P n ) ij stands for the ij entry of P n. P(X n = j X 0 = i) = (P n ) ij. 14

15 For example, consider a random walk on three states J = {1, 2, 3}. Suppose 0 1/2 1/2 P = 1/3 0 2/3 (7) Starting at 1, we go to either 2 or 3 with the same probability. Starting at 2, we go to 3 twice as often as 1. And starting at 3, we always go to 1. In this Markov chain we never stay at the same state for two consecutive time steps, but in general this is allowed (in which case the diagonal of P is nonzero). In general, the only assumptions on P are that it is square, its entries are nonnegative, and its rows sum to 1. That is, any such matrix is the transition matrix of a Markov chain, and conversely. Denition 2.4. We write P i (X n = j) P(X n = j X 0 = i) and, for any PMF ρ on J, P ρ (X n = j) = i I P i (X n = j)ρ(i). More generally, P i or P ρ indicate that X n started at X 0 = i or at the distribution ρ. The superscript next to P corresponds to an initial distribution of X n. Thus, P i (X n = j) is the probability X n = j when we start at X 0 = i. In light of the law of total probability (Theorem 1.19), P ρ (X n = j) corresponds the probability X n = j when X 0 is a random variable with PMF ρ. 2.3 Hitting times and the strong Markov property A stopping time for a Markov chain (X n ) is a random variable τ with values in N such that the event {τ = k} is known at time k. This means that, based on knowing the values of X n for 0 n k, we can decide whether or not τ = k. Unfortunately, a more precise description of stopping times is not possible without introducing some measure theory. However, we can introduce an important class of stopping times with no further eort. Denition 2.5. A hitting time of X n has the form τ A = inf{n 0 : X n A} for some nonempty A J. Here, τ A is called the hitting time of A. A hitting time is a stopping time, and the most important type of stopping time for us. It turns out that Markov chains (in discrete time) have the Markov property at stopping times. This is called the strong Markov property: Theorem 2.6 (Strong Markov property). Let X n be a Markov chain and τ a stopping time. Then for every n 1 and i, j 1,..., j n J P(X τ+n = j n,..., X τ+1 = j 1 X τ = i, {X k = i k, 0 k < τ}) = P i (X n = j n,..., X 1 = j 1 ). 15

16 The strong Markov property says that the Markov chain starts afresh after each stopping time. It can be proved from the (ordinary) Markov property. Exercise 2.7. Prove the strong Markov property in the case where τ is a hitting time. (Hint: use Theorem 1.19.) We end this section by giving an example of a stopping time that will be important later on. Consider two Markov chains (X n ) and (Y n ) and let τ = inf{n 0 : X n = Y n } be the rst time the chains intersect each other. Note that τ is the hitting time of the set A = {(i, i) : i J}. This τ is called the coupling time of the Markov chains, and is useful for proving convergence theorems as we will see below. 2.4 Irreducibility and aperiodicity A Markov chain on nite state space J is called irreducible if any state can be reached from any other state. We write i j if and only if (P n ) ij > 0 for some n. Thus, i j if and only if j can be reached from i after some number of steps, with positive probability. Denition 2.8. X n is irreducible if i j and j i for each i, j J. If X n is not irreducible, it is called reducible. Exercise 2.9. Show that the transition matrix from (7) denes a irreducible Markov chain. Give an example of a transition matrix dening a reducible Markov chain. Denition The period of X n is the integer k = min i J gcd{n > 0 : Pi (X n = i) > 0}, where gcd stands for greatest common divisor. X n is aperiodic if its period is k = 1. Intuitively, an irreducible Markov chain is aperiodic if, starting at (any state) i, it can return to i at irregular times. As an example, consider a Markov chain on state space J = {1, 2} with transition probabilities P 12 = P 21 = 1 and P 11 = P 21 = 0. Then P is periodic with period 2. Exercise Show that the matrix P from (7) denes an aperiodic Markov chain. 2.5 The Perron-Frobenius theorem Let X n be a Markov chain with transition matrix P. The Perron-Frobenius theorem provides useful spectral information about P. Theorem 2.12 (Perron-Frobenius 1). Let X n be irreducible and aperiodic. Then P has a simple eigenvalue λ = 1 such that all other eigenvalues are strictly smaller in magnitude. Moreover, the (normalized) left eigenvector corresponding to λ = 1 denes a PMF on J. 16

17 Exercise Verify the assumptions/conclusion of Theorem 2.12 for P dened in (7). It turns out that if X n has period k, then P has eigenvalues e 2πim/k for m = 1,..., k. And if J has exactly k disjoint subsets (necessarily disjoint) on which X n is irreducible, then P has eigenvalue 1 with geometric multiplicity k. In particular, the transition matrix of any Markov chain (on nite state space) has the eigenvalue 1 and a corresponding left eigenvector which denes a PMF on J. Example Give an example of a periodic and a non-irreducible Markov chain. Find the left eigenvectors and eigenvalues to verify the statements above. It is often useful to have the following version of Theorem 2.12 for restricted transition matrices, that is, matrices whose ith row and column has been removed for i in some set B. Theorem 2.15 (Perron-Frobenius 2). Let P be the transition matrix of X n, let B J be a nonempty proper subset of J, and let Q be the restriction of P to B. Let Xn be obtained from X n by conditioning X n to always remain in B. If Xn is irreducible and aperiodic, then Q has a real simple eigenvalue λ < 1 such that all other eigenvalues are strictly smaller in magnitude, and λ has a left eigenvector that denes a PMF on B. The proofs of the Perron Frobenius theorems are somewhat technical. For proofs we refer the reader to [5]. Recall that P n describes the behavior of a X n after n steps. Later on, we will use this together with the spectral information from the Perron-Frobenius theorems to study long-time convergence of X n. 2.6 Mean rst passage times Recall τ A is the hitting time for X n of a set A J. Denition The mean rst passage time of X n to A is the expected value E(τ A ). Mean rst passage times satisfy the following linear system: Theorem Let X n have transition matrix P. Then u i = E i (τ A ) satises { 1 + j / A u i = u jp ij, i / A 0, i A. (8) Clearly u i = 0 if i A. If i / A, by the Markov property and law of total probability, u i = E i (τ A ) = j J E i (τ A X 1 = j)p i (X 1 = j) = j J (1 + E j (τ A ))P i (X 1 = j) = 1 + j J u j P ij. 17

18 Let v, Q, I and 1 denote the restrictions of u, P, the identity matrix, and the all ones column vector, respectively, to J \ A. By Theorem 2.15, if X n is irreducible, then I Q does not have 0 as an eigenvalue, hence is invertible. In this case, the last display becomes v = (I Q) 1 1 which gives an explicit linear equation for the mean rst passage time. Exercise Find E i (τ A ) for i = 1, 2, 3 when P is as in (7) and A = {3}. 2.7 Stationary distributions and convergence Let P be the transition matrix of a Markov chain X n on nite state space J. Denition A vector π is a stationary distribution of P (or X n ) if πp = π. Here πp is matrix multiplication, thinking of π as a row vector. Entrywise, this is π i P ij = π j. i J Another way to write this is P π (X 1 = j) = π j. That is, if we start π at time 0, we are still at π at time 1. More generally, if X 0 has PMF π, then X n has PMF π for every n. So if we start at π we remain at π forever. This is the reason for the word stationary. The remarks below the Perron-Frobenius theorem (Theorem 2.12) show that every Markov chain on nite state space has a stationary distribution. The stationary distribution is unique if the Markov chain is irreducible. Theorem A irreducible Markov chain has a unique stationary distribution. It turns out that the stationary probability π i can be understood as the average residence time in i during a loop from j to j, in the following sense: Theorem Let X n be a irreducible Markov chain with stationary distribution π. Then for each i J, ( E j τj ) 1 n=0 1 X n=i π i =, (9) E j (τ j ) where j J is arbitrary and τ j = inf{n > 0 : X n = j} is the hitting time of j. The theorem can be understood as follows. The denominator of the RHS of (9) is simply the normalization required to obtain a probability vector. So consider the numerator. It counts the (average) number of times we are at i in a loop from j to j, excluding the last step of the loop. Multiplying by P represents evolving X n one step. This corresponds to counting the (average) number of times we are at i in a loop from j to j, excluding the rst step of the loop. But at the rst and last steps of the loop we are at j, so we get the same thing! 18

19 Exercise Prove Theorem Then use it to show that π i = 1/E i (τ i ). If the distribution of X n converges, it must converge to a stationary distribution: Theorem Suppose X n is a Markov chain with initial distribution ρ such that π j = lim n P ρ (X n = j) (10) exists for each j J. Then π is a stationary distribution of X n. Note that P ρ (X n = j) = (ρp n ) j and so (10) shows that lim n ρp n = π. From this it is easy to see πp = π. This shows that if a Markov chain converges to some distribution as in (10), then it must be a stationary distribution. As we have seen above, all Markov chains have a stationary distribution; however not all Markov chains converge to this distribution. Exercise Give an example of an aperiodic Markov chain for which (10) does not hold. Give an example of a non-irreducible Markov chain for which (10) holds for two dierent π's, depending on the choice of ρ. The following theorem gives conditions for a Markov chain to converge to a unique stationary distribution. Theorem Let X n be aperiodic and irreducible, and π its stationary distribution. Then for any initial distribution ρ and j J, π j = lim n P ρ (X n = j). The theorem states that X n converges in distribution to π. One proof comes from Perron-Frobenius (Theorem 2.12). Let r < 1 be an upper bound for the magnitude of the second-largest eigenvalue. From the Perron-Frobenius theorem, it can be shown that P n = 1π + O(r n ) where 1 is the all 1's column vector, and π is a row vector. Then P ρ (X n = j) = (ρp n ) j = (ρ1π + O(r n )) j = (π + O(r n )) j π j as n. Another proof of Theorem 2.25 comes from the following coupling argument. Let X n and Y n be two independent copies of the same Markov chain. Let X 0 have PMF ρ and Y 0 have PMF π. Dene τ = inf{n 0 : X n = Y n }. In can be checked that aperiodicity and irreducibility imply P(τ < ) = 1. But since π is stationary and Y 0 has distribution π, we know Y n has distribution π for all n, including n = τ. But this means X τ has distribution π and hence X n has distribution π for n τ! (Note that this argument also shows uniqueness of π.) Exercise Prove Theorem 2.25 by lling in details from the above proof sketch. 19

20 2.8 Ergodic theorem The ergodic theorem, Theorem 2.28 below, describes a dierent type of convergence of X n. While the theorems above involve the convergence of the distribution of X n to π, the ergodic theorem describes the convergence of time averages of single paths of X n, i.e. trajectories of X n corresponding to a single outcome ω. This is a very dierent type of convergence than the ones discussed above. For example, convergence of X n π in distribution cannot be understood in terms of single paths. Let f : J R be an arbitrary function on state space and write f(i) f i. Proposition Let X n be irreducible and π its stationary distribution. Then ( E j τj ) 1 n=0 f(x n ) f i π i =. E j (τ j ) i J The proposition can be proved in a way similar to Theorem Now we are ready for the ergodic theorem. Theorem Let X n be irreducible with π its stationary distribution. Then n 1 n 1 m=0 f(x m ) a.s. π i f i i J It is common to understand the ergodic theorem as a statement saying the time average over a path equals the spatial average with respect to a stationary distribution π. We only sketch a proof. Let σ k be the kth time that X n visits j, and dene S k = σ k+1 n=σ k +1 f(x n ), T k = σ k+1 σ k. By the strong Markov property, S k and T k are iid. We may approximate n 1 lim n n 1 f(x m ) m=0 by S S k T T k for k large. By the law of large numbers, S S k T T k But by Proposition 2.27, E(S 1 )/E(T 1 ) = i J π if i. a.s. E(S 1) E(T 1 ). Exercise Fill in the details in the above sketch to prove Theorem

21 2.9 Left and right actions Let P be a transition matrix for a Markov chain X n on nite state space J, µ a PMF on J and f : J R a function. It is often convenient to view left and right matrix multiplications µp and P f as transformations on measures and functions, respectively, in the following sense: (µp n ) j = jth entry of the PMF µp n := P µ (X n = j), (P n f)(j) = value of the function P n f at j := E j (f(x n )). Thus, µp n is a PMF on J, corresponding to taking n steps of the Markov chain starting at µ, and P n f is a function on J corresponding to evaluating f after n steps. The left and right actions can be combined as follows: 2.10 Innite state space µp n f = E µ (f(x n )). The case where J is countably innite is easy to handle, so we consider it rst. A Markov chain X n on state space J is still dened exactly as in Denition 2.1. For the convergence theorems above to hold, we need to add an additional assumption that for some state j, the expected return time to j is nite, that is, E j [τ j ] < with τ j = inf{n 0 : X n = j}. This is called positive recurrence. Then X n has stationary distribution π dened by the formula in Theorem 2.27 (and the proof is the same). In this case, irreducibility implies j is reached in nite time, and the coupling argument below Theorem 2.25 gives convergence of X n to π in distribution. Moreover, the arguments in proof of the ergodic theorem go through with little modication. Now consider the case where J is uncountably innite. Denition X n is a Markov chain on an uncountable state space J if for each A J. P(X n+1 A X 0 = i 0,..., X n 1 = i n 1, X n = i) = P(X n+1 A X n = i), (11) We will usually assume J is R n or a subset of R n. In this case, the transition matrix P must be replaced with a transition kernel, which we will assume has a a density P(X n+1 A X n = x) = p(x, y) dy. Think of p as a matrix with entries indexed by the A reals. (What properties should p have?) For later reference we record this notation. Denition X n has transition density p if p : J J R satises P(X n+1 A X n = x) = p(x, y) dy. for all x J and A J. 21 A

22 Using the transition density, all the basics above are easily translated to the current setup. Essentially, sums are replaced with integrals. One crucial dierence is that the probability X n equals any single point is zero. This aects how we can prove convergence to equilibrium, as we discuss below. Exercise Let ξ i, i = 1, 2,... be iid standard normal random variables and X 0 = 0, X n = ξ ξ n for n 1. Find the transition density of X n. Denition Let X n have transition density p. If π : J R satises π(y) = π(x)p(x, y) dx for every y J, we say π is a stationary density for X n. J In uncountable state space, positive recurrence does not make sense, since in general we never hit single points. In fact, if X n has a transition density, then P x (X n = y) = 0 for all x, y whenever n 1! But the arguments above can still be modied under some extra assumption. Typically, one assumes there is some set A J that can be treated like a point. This A is called a small set. A typical assumption (called a Doeblin condition) is that there is a probability measure µ on A and a constant c > 0 such that (i) X n reaches A in nite expected time from every point and (ii) for each i A and B A, P i (X 1 B) cµ(b). Intuitively, after reaching A, in the next step we are distributed according to µ on A with probability at least c. Being distributed according to µ on A can now be treated as a point, j, and all the above arguments based on looking at loops from j to j still hold. References [1] R. Durrett, The essentials of probability, Duxbury press, [2] R. Durrett, Probability: Theory and examples, Duxbury press, [3] C. Geyer, Practical Markov chain Monte Carlo. [4] J.R. Norris, Markov chains, Cambridge series in statistical and probabilistic mathematics, [5] E. Seneta, Non-negative matrices and Markov chains, Springer series in statistics, [6] R.J. Baxter, Exactly solved models in statistical mechanics, Academic press,

23 References [1] R. Durrett, The essentials of probability, Duxbury press, [2] R. Durrett, Probability: Theory and examples, Duxbury press, [3] C. Geyer, Practical Markov chain Monte Carlo. [4] J.R. Norris, Markov chains, Cambridge series in statistical and probabilistic mathematics, [5] E. Seneta, Non-negative matrices and Markov chains, Springer series in statistics, [6] R.J. Baxter, Exactly solved models in statistical mechanics, Academic press,

Recitation 2: Probability

Recitation 2: Probability Recitation 2: Probability Colin White, Kenny Marino January 23, 2018 Outline Facts about sets Definitions and facts about probability Random Variables and Joint Distributions Characteristics of distributions

More information

Review of Probability Theory

Review of Probability Theory Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving

More information

8 Laws of large numbers

8 Laws of large numbers 8 Laws of large numbers 8.1 Introduction We first start with the idea of standardizing a random variable. Let X be a random variable with mean µ and variance σ 2. Then Z = (X µ)/σ will be a random variable

More information

Sample Spaces, Random Variables

Sample Spaces, Random Variables Sample Spaces, Random Variables Moulinath Banerjee University of Michigan August 3, 22 Probabilities In talking about probabilities, the fundamental object is Ω, the sample space. (elements) in Ω are denoted

More information

Basic Probability. Introduction

Basic Probability. Introduction Basic Probability Introduction The world is an uncertain place. Making predictions about something as seemingly mundane as tomorrow s weather, for example, is actually quite a difficult task. Even with

More information

Quick Tour of Basic Probability Theory and Linear Algebra

Quick Tour of Basic Probability Theory and Linear Algebra Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions

More information

Chapter 7. Markov chain background. 7.1 Finite state space

Chapter 7. Markov chain background. 7.1 Finite state space Chapter 7 Markov chain background A stochastic process is a family of random variables {X t } indexed by a varaible t which we will think of as time. Time can be discrete or continuous. We will only consider

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

1 Sequences of events and their limits

1 Sequences of events and their limits O.H. Probability II (MATH 2647 M15 1 Sequences of events and their limits 1.1 Monotone sequences of events Sequences of events arise naturally when a probabilistic experiment is repeated many times. For

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

Lecture 6 Basic Probability

Lecture 6 Basic Probability Lecture 6: Basic Probability 1 of 17 Course: Theory of Probability I Term: Fall 2013 Instructor: Gordan Zitkovic Lecture 6 Basic Probability Probability spaces A mathematical setup behind a probabilistic

More information

Markov Chains and Stochastic Sampling

Markov Chains and Stochastic Sampling Part I Markov Chains and Stochastic Sampling 1 Markov Chains and Random Walks on Graphs 1.1 Structure of Finite Markov Chains We shall only consider Markov chains with a finite, but usually very large,

More information

Discrete Probability Refresher

Discrete Probability Refresher ECE 1502 Information Theory Discrete Probability Refresher F. R. Kschischang Dept. of Electrical and Computer Engineering University of Toronto January 13, 1999 revised January 11, 2006 Probability theory

More information

1 Presessional Probability

1 Presessional Probability 1 Presessional Probability Probability theory is essential for the development of mathematical models in finance, because of the randomness nature of price fluctuations in the markets. This presessional

More information

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows. Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage

More information

Lecture 2: Review of Basic Probability Theory

Lecture 2: Review of Basic Probability Theory ECE 830 Fall 2010 Statistical Signal Processing instructor: R. Nowak, scribe: R. Nowak Lecture 2: Review of Basic Probability Theory Probabilistic models will be used throughout the course to represent

More information

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains Markov Chains A random process X is a family {X t : t T } of random variables indexed by some set T. When T = {0, 1, 2,... } one speaks about a discrete-time process, for T = R or T = [0, ) one has a continuous-time

More information

Northwestern University Department of Electrical Engineering and Computer Science

Northwestern University Department of Electrical Engineering and Computer Science Northwestern University Department of Electrical Engineering and Computer Science EECS 454: Modeling and Analysis of Communication Networks Spring 2008 Probability Review As discussed in Lecture 1, probability

More information

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete

More information

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) D. ARAPURA This is a summary of the essential material covered so far. The final will be cumulative. I ve also included some review problems

More information

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past.

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past. 1 Markov chain: definition Lecture 5 Definition 1.1 Markov chain] A sequence of random variables (X n ) n 0 taking values in a measurable state space (S, S) is called a (discrete time) Markov chain, if

More information

Random variables. DS GA 1002 Probability and Statistics for Data Science.

Random variables. DS GA 1002 Probability and Statistics for Data Science. Random variables DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Motivation Random variables model numerical quantities

More information

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Markov Chains, Stochastic Processes, and Matrix Decompositions

Markov Chains, Stochastic Processes, and Matrix Decompositions Markov Chains, Stochastic Processes, and Matrix Decompositions 5 May 2014 Outline 1 Markov Chains Outline 1 Markov Chains 2 Introduction Perron-Frobenius Matrix Decompositions and Markov Chains Spectral

More information

6.1 Moment Generating and Characteristic Functions

6.1 Moment Generating and Characteristic Functions Chapter 6 Limit Theorems The power statistics can mostly be seen when there is a large collection of data points and we are interested in understanding the macro state of the system, e.g., the average,

More information

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) Contents 1 Vector Spaces 1 1.1 The Formal Denition of a Vector Space.................................. 1 1.2 Subspaces...................................................

More information

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

STAT2201. Analysis of Engineering & Scientific Data. Unit 3 STAT2201 Analysis of Engineering & Scientific Data Unit 3 Slava Vaisman The University of Queensland School of Mathematics and Physics What we learned in Unit 2 (1) We defined a sample space of a random

More information

. Find E(V ) and var(v ).

. Find E(V ) and var(v ). Math 6382/6383: Probability Models and Mathematical Statistics Sample Preliminary Exam Questions 1. A person tosses a fair coin until she obtains 2 heads in a row. She then tosses a fair die the same number

More information

Lecture 4: Probability and Discrete Random Variables

Lecture 4: Probability and Discrete Random Variables Error Correcting Codes: Combinatorics, Algorithms and Applications (Fall 2007) Lecture 4: Probability and Discrete Random Variables Wednesday, January 21, 2009 Lecturer: Atri Rudra Scribe: Anonymous 1

More information

The Theory behind PageRank

The Theory behind PageRank The Theory behind PageRank Mauro Sozio Telecom ParisTech May 21, 2014 Mauro Sozio (LTCI TPT) The Theory behind PageRank May 21, 2014 1 / 19 A Crash Course on Discrete Probability Events and Probability

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Basics A new 4 credit unit course Part of Theoretical Computer Science courses at the Department of Mathematics There will be 4 hours

More information

1 Proof techniques. CS 224W Linear Algebra, Probability, and Proof Techniques

1 Proof techniques. CS 224W Linear Algebra, Probability, and Proof Techniques 1 Proof techniques Here we will learn to prove universal mathematical statements, like the square of any odd number is odd. It s easy enough to show that this is true in specific cases for example, 3 2

More information

Definition A finite Markov chain is a memoryless homogeneous discrete stochastic process with a finite number of states.

Definition A finite Markov chain is a memoryless homogeneous discrete stochastic process with a finite number of states. Chapter 8 Finite Markov Chains A discrete system is characterized by a set V of states and transitions between the states. V is referred to as the state space. We think of the transitions as occurring

More information

The main results about probability measures are the following two facts:

The main results about probability measures are the following two facts: Chapter 2 Probability measures The main results about probability measures are the following two facts: Theorem 2.1 (extension). If P is a (continuous) probability measure on a field F 0 then it has a

More information

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019 Lecture 10: Probability distributions DANIEL WELLER TUESDAY, FEBRUARY 19, 2019 Agenda What is probability? (again) Describing probabilities (distributions) Understanding probabilities (expectation) Partial

More information

Week 12-13: Discrete Probability

Week 12-13: Discrete Probability Week 12-13: Discrete Probability November 21, 2018 1 Probability Space There are many problems about chances or possibilities, called probability in mathematics. When we roll two dice there are possible

More information

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 8. For any two events E and F, P (E) = P (E F ) + P (E F c ). Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 Sample space. A sample space consists of a underlying

More information

Lecture 1: Review on Probability and Statistics

Lecture 1: Review on Probability and Statistics STAT 516: Stochastic Modeling of Scientific Data Autumn 2018 Instructor: Yen-Chi Chen Lecture 1: Review on Probability and Statistics These notes are partially based on those of Mathias Drton. 1.1 Motivating

More information

Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of. F s F t

Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of. F s F t 2.2 Filtrations Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of σ algebras {F t } such that F t F and F t F t+1 for all t = 0, 1,.... In continuous time, the second condition

More information

STAT 7032 Probability Spring Wlodek Bryc

STAT 7032 Probability Spring Wlodek Bryc STAT 7032 Probability Spring 2018 Wlodek Bryc Created: Friday, Jan 2, 2014 Revised for Spring 2018 Printed: January 9, 2018 File: Grad-Prob-2018.TEX Department of Mathematical Sciences, University of Cincinnati,

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 3 9/10/2008 CONDITIONING AND INDEPENDENCE

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 3 9/10/2008 CONDITIONING AND INDEPENDENCE MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 3 9/10/2008 CONDITIONING AND INDEPENDENCE Most of the material in this lecture is covered in [Bertsekas & Tsitsiklis] Sections 1.3-1.5

More information

the time it takes until a radioactive substance undergoes a decay

the time it takes until a radioactive substance undergoes a decay 1 Probabilities 1.1 Experiments with randomness Wewillusethetermexperimentinaverygeneralwaytorefertosomeprocess that produces a random outcome. Examples: (Ask class for some first) Here are some discrete

More information

2. Transience and Recurrence

2. Transience and Recurrence Virtual Laboratories > 15. Markov Chains > 1 2 3 4 5 6 7 8 9 10 11 12 2. Transience and Recurrence The study of Markov chains, particularly the limiting behavior, depends critically on the random times

More information

1 Random Variable: Topics

1 Random Variable: Topics Note: Handouts DO NOT replace the book. In most cases, they only provide a guideline on topics and an intuitive feel. 1 Random Variable: Topics Chap 2, 2.1-2.4 and Chap 3, 3.1-3.3 What is a random variable?

More information

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University Chapter 3, 4 Random Variables ENCS6161 - Probability and Stochastic Processes Concordia University ENCS6161 p.1/47 The Notion of a Random Variable A random variable X is a function that assigns a real

More information

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1 Chapter 2 Probability measures 1. Existence Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension to the generated σ-field Proof of Theorem 2.1. Let F 0 be

More information

An Introduction to Entropy and Subshifts of. Finite Type

An Introduction to Entropy and Subshifts of. Finite Type An Introduction to Entropy and Subshifts of Finite Type Abby Pekoske Department of Mathematics Oregon State University pekoskea@math.oregonstate.edu August 4, 2015 Abstract This work gives an overview

More information

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R In probabilistic models, a random variable is a variable whose possible values are numerical outcomes of a random phenomenon. As a function or a map, it maps from an element (or an outcome) of a sample

More information

3. Review of Probability and Statistics

3. Review of Probability and Statistics 3. Review of Probability and Statistics ECE 830, Spring 2014 Probabilistic models will be used throughout the course to represent noise, errors, and uncertainty in signal processing problems. This lecture

More information

INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING

INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING ERIC SHANG Abstract. This paper provides an introduction to Markov chains and their basic classifications and interesting properties. After establishing

More information

Lecture 3: Random variables, distributions, and transformations

Lecture 3: Random variables, distributions, and transformations Lecture 3: Random variables, distributions, and transformations Definition 1.4.1. A random variable X is a function from S into a subset of R such that for any Borel set B R {X B} = {ω S : X(ω) B} is an

More information

STOCHASTIC PROCESSES Basic notions

STOCHASTIC PROCESSES Basic notions J. Virtamo 38.3143 Queueing Theory / Stochastic processes 1 STOCHASTIC PROCESSES Basic notions Often the systems we consider evolve in time and we are interested in their dynamic behaviour, usually involving

More information

7 Random samples and sampling distributions

7 Random samples and sampling distributions 7 Random samples and sampling distributions 7.1 Introduction - random samples We will use the term experiment in a very general way to refer to some process, procedure or natural phenomena that produces

More information

MS&E 321 Spring Stochastic Systems June 1, 2013 Prof. Peter W. Glynn Page 1 of 10. x n+1 = f(x n ),

MS&E 321 Spring Stochastic Systems June 1, 2013 Prof. Peter W. Glynn Page 1 of 10. x n+1 = f(x n ), MS&E 321 Spring 12-13 Stochastic Systems June 1, 2013 Prof. Peter W. Glynn Page 1 of 10 Section 4: Steady-State Theory Contents 4.1 The Concept of Stochastic Equilibrium.......................... 1 4.2

More information

Week 2. Review of Probability, Random Variables and Univariate Distributions

Week 2. Review of Probability, Random Variables and Univariate Distributions Week 2 Review of Probability, Random Variables and Univariate Distributions Probability Probability Probability Motivation What use is Probability Theory? Probability models Basis for statistical inference

More information

Markov Chains Handout for Stat 110

Markov Chains Handout for Stat 110 Markov Chains Handout for Stat 0 Prof. Joe Blitzstein (Harvard Statistics Department) Introduction Markov chains were first introduced in 906 by Andrey Markov, with the goal of showing that the Law of

More information

EE514A Information Theory I Fall 2013

EE514A Information Theory I Fall 2013 EE514A Information Theory I Fall 2013 K. Mohan, Prof. J. Bilmes University of Washington, Seattle Department of Electrical Engineering Fall Quarter, 2013 http://j.ee.washington.edu/~bilmes/classes/ee514a_fall_2013/

More information

JUSTIN HARTMANN. F n Σ.

JUSTIN HARTMANN. F n Σ. BROWNIAN MOTION JUSTIN HARTMANN Abstract. This paper begins to explore a rigorous introduction to probability theory using ideas from algebra, measure theory, and other areas. We start with a basic explanation

More information

Notes 1 : Measure-theoretic foundations I

Notes 1 : Measure-theoretic foundations I Notes 1 : Measure-theoretic foundations I Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Wil91, Section 1.0-1.8, 2.1-2.3, 3.1-3.11], [Fel68, Sections 7.2, 8.1, 9.6], [Dur10,

More information

http://www.math.uah.edu/stat/markov/.xhtml 1 of 9 7/16/2009 7:20 AM Virtual Laboratories > 16. Markov Chains > 1 2 3 4 5 6 7 8 9 10 11 12 1. A Markov process is a random process in which the future is

More information

Homework 1 Solutions ECEn 670, Fall 2013

Homework 1 Solutions ECEn 670, Fall 2013 Homework Solutions ECEn 670, Fall 03 A.. Use the rst seven relations to prove relations (A.0, (A.3, and (A.6. Prove (F G c F c G c (A.0. (F G c ((F c G c c c by A.6. (F G c F c G c by A.4 Prove F (F G

More information

Necessary and sufficient conditions for strong R-positivity

Necessary and sufficient conditions for strong R-positivity Necessary and sufficient conditions for strong R-positivity Wednesday, November 29th, 2017 The Perron-Frobenius theorem Let A = (A(x, y)) x,y S be a nonnegative matrix indexed by a countable set S. We

More information

Lecture 5: Random Walks and Markov Chain

Lecture 5: Random Walks and Markov Chain Spectral Graph Theory and Applications WS 20/202 Lecture 5: Random Walks and Markov Chain Lecturer: Thomas Sauerwald & He Sun Introduction to Markov Chains Definition 5.. A sequence of random variables

More information

Statistics and Econometrics I

Statistics and Econometrics I Statistics and Econometrics I Random Variables Shiu-Sheng Chen Department of Economics National Taiwan University October 5, 2016 Shiu-Sheng Chen (NTU Econ) Statistics and Econometrics I October 5, 2016

More information

1 Variance of a Random Variable

1 Variance of a Random Variable Indian Institute of Technology Bombay Department of Electrical Engineering Handout 14 EE 325 Probability and Random Processes Lecture Notes 9 August 28, 2014 1 Variance of a Random Variable The expectation

More information

DS-GA 1002 Lecture notes 2 Fall Random variables

DS-GA 1002 Lecture notes 2 Fall Random variables DS-GA 12 Lecture notes 2 Fall 216 1 Introduction Random variables Random variables are a fundamental tool in probabilistic modeling. They allow us to model numerical quantities that are uncertain: the

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables ECE 6010 Lecture 1 Introduction; Review of Random Variables Readings from G&S: Chapter 1. Section 2.1, Section 2.3, Section 2.4, Section 3.1, Section 3.2, Section 3.5, Section 4.1, Section 4.2, Section

More information

MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES. Contents

MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES. Contents MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES JAMES READY Abstract. In this paper, we rst introduce the concepts of Markov Chains and their stationary distributions. We then discuss

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 8 10/1/2008 CONTINUOUS RANDOM VARIABLES

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 8 10/1/2008 CONTINUOUS RANDOM VARIABLES MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 8 10/1/2008 CONTINUOUS RANDOM VARIABLES Contents 1. Continuous random variables 2. Examples 3. Expected values 4. Joint distributions

More information

We introduce methods that are useful in:

We introduce methods that are useful in: Instructor: Shengyu Zhang Content Derived Distributions Covariance and Correlation Conditional Expectation and Variance Revisited Transforms Sum of a Random Number of Independent Random Variables more

More information

18.175: Lecture 8 Weak laws and moment-generating/characteristic functions

18.175: Lecture 8 Weak laws and moment-generating/characteristic functions 18.175: Lecture 8 Weak laws and moment-generating/characteristic functions Scott Sheffield MIT 18.175 Lecture 8 1 Outline Moment generating functions Weak law of large numbers: Markov/Chebyshev approach

More information

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Positive and null recurrent-branching Process

Positive and null recurrent-branching Process December 15, 2011 In last discussion we studied the transience and recurrence of Markov chains There are 2 other closely related issues about Markov chains that we address Is there an invariant distribution?

More information

Stochastic Processes

Stochastic Processes qmc082.tex. Version of 30 September 2010. Lecture Notes on Quantum Mechanics No. 8 R. B. Griffiths References: Stochastic Processes CQT = R. B. Griffiths, Consistent Quantum Theory (Cambridge, 2002) DeGroot

More information

Multiple Random Variables

Multiple Random Variables Multiple Random Variables Joint Probability Density Let X and Y be two random variables. Their joint distribution function is F ( XY x, y) P X x Y y. F XY ( ) 1, < x

More information

Dept. of Linguistics, Indiana University Fall 2015

Dept. of Linguistics, Indiana University Fall 2015 L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 34 To start out the course, we need to know something about statistics and This is only an introduction; for a fuller understanding, you would

More information

M378K In-Class Assignment #1

M378K In-Class Assignment #1 The following problems are a review of M6K. M7K In-Class Assignment # Problem.. Complete the definition of mutual exclusivity of events below: Events A, B Ω are said to be mutually exclusive if A B =.

More information

Chapter 3: Random Variables 1

Chapter 3: Random Variables 1 Chapter 3: Random Variables 1 Yunghsiang S. Han Graduate Institute of Communication Engineering, National Taipei University Taiwan E-mail: yshan@mail.ntpu.edu.tw 1 Modified from the lecture notes by Prof.

More information

Spring 2014 Advanced Probability Overview. Lecture Notes Set 1: Course Overview, σ-fields, and Measures

Spring 2014 Advanced Probability Overview. Lecture Notes Set 1: Course Overview, σ-fields, and Measures 36-752 Spring 2014 Advanced Probability Overview Lecture Notes Set 1: Course Overview, σ-fields, and Measures Instructor: Jing Lei Associated reading: Sec 1.1-1.4 of Ash and Doléans-Dade; Sec 1.1 and A.1

More information

Recap of Basic Probability Theory

Recap of Basic Probability Theory 02407 Stochastic Processes? Recap of Basic Probability Theory Uffe Høgsbro Thygesen Informatics and Mathematical Modelling Technical University of Denmark 2800 Kgs. Lyngby Denmark Email: uht@imm.dtu.dk

More information

Probability Review. Yutian Li. January 18, Stanford University. Yutian Li (Stanford University) Probability Review January 18, / 27

Probability Review. Yutian Li. January 18, Stanford University. Yutian Li (Stanford University) Probability Review January 18, / 27 Probability Review Yutian Li Stanford University January 18, 2018 Yutian Li (Stanford University) Probability Review January 18, 2018 1 / 27 Outline 1 Elements of probability 2 Random variables 3 Multiple

More information

µ n 1 (v )z n P (v, )

µ n 1 (v )z n P (v, ) Plan More Examples (Countable-state case). Questions 1. Extended Examples 2. Ideas and Results Next Time: General-state Markov Chains Homework 4 typo Unless otherwise noted, let X be an irreducible, aperiodic

More information

Lecture 7. µ(x)f(x). When µ is a probability measure, we say µ is a stationary distribution.

Lecture 7. µ(x)f(x). When µ is a probability measure, we say µ is a stationary distribution. Lecture 7 1 Stationary measures of a Markov chain We now study the long time behavior of a Markov Chain: in particular, the existence and uniqueness of stationary measures, and the convergence of the distribution

More information

Review of Probability. CS1538: Introduction to Simulations

Review of Probability. CS1538: Introduction to Simulations Review of Probability CS1538: Introduction to Simulations Probability and Statistics in Simulation Why do we need probability and statistics in simulation? Needed to validate the simulation model Needed

More information

Math 6810 (Probability) Fall Lecture notes

Math 6810 (Probability) Fall Lecture notes Math 6810 (Probability) Fall 2012 Lecture notes Pieter Allaart University of North Texas September 23, 2012 2 Text: Introduction to Stochastic Calculus with Applications, by Fima C. Klebaner (3rd edition),

More information

Module 3. Function of a Random Variable and its distribution

Module 3. Function of a Random Variable and its distribution Module 3 Function of a Random Variable and its distribution 1. Function of a Random Variable Let Ω, F, be a probability space and let be random variable defined on Ω, F,. Further let h: R R be a given

More information

P i [B k ] = lim. n=1 p(n) ii <. n=1. V i :=

P i [B k ] = lim. n=1 p(n) ii <. n=1. V i := 2.7. Recurrence and transience Consider a Markov chain {X n : n N 0 } on state space E with transition matrix P. Definition 2.7.1. A state i E is called recurrent if P i [X n = i for infinitely many n]

More information

Midterm 2 Review. CS70 Summer Lecture 6D. David Dinh 28 July UC Berkeley

Midterm 2 Review. CS70 Summer Lecture 6D. David Dinh 28 July UC Berkeley Midterm 2 Review CS70 Summer 2016 - Lecture 6D David Dinh 28 July 2016 UC Berkeley Midterm 2: Format 8 questions, 190 points, 110 minutes (same as MT1). Two pages (one double-sided sheet) of handwritten

More information

MATH Notebook 5 Fall 2018/2019

MATH Notebook 5 Fall 2018/2019 MATH442601 2 Notebook 5 Fall 2018/2019 prepared by Professor Jenny Baglivo c Copyright 2004-2019 by Jenny A. Baglivo. All Rights Reserved. 5 MATH442601 2 Notebook 5 3 5.1 Sequences of IID Random Variables.............................

More information

2.1 Elementary probability; random sampling

2.1 Elementary probability; random sampling Chapter 2 Probability Theory Chapter 2 outlines the probability theory necessary to understand this text. It is meant as a refresher for students who need review and as a reference for concepts and theorems

More information

1: PROBABILITY REVIEW

1: PROBABILITY REVIEW 1: PROBABILITY REVIEW Marek Rutkowski School of Mathematics and Statistics University of Sydney Semester 2, 2016 M. Rutkowski (USydney) Slides 1: Probability Review 1 / 56 Outline We will review the following

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information

Disjointness and Additivity

Disjointness and Additivity Midterm 2: Format Midterm 2 Review CS70 Summer 2016 - Lecture 6D David Dinh 28 July 2016 UC Berkeley 8 questions, 190 points, 110 minutes (same as MT1). Two pages (one double-sided sheet) of handwritten

More information

STA 711: Probability & Measure Theory Robert L. Wolpert

STA 711: Probability & Measure Theory Robert L. Wolpert STA 711: Probability & Measure Theory Robert L. Wolpert 6 Independence 6.1 Independent Events A collection of events {A i } F in a probability space (Ω,F,P) is called independent if P[ i I A i ] = P[A

More information

Recap of Basic Probability Theory

Recap of Basic Probability Theory 02407 Stochastic Processes Recap of Basic Probability Theory Uffe Høgsbro Thygesen Informatics and Mathematical Modelling Technical University of Denmark 2800 Kgs. Lyngby Denmark Email: uht@imm.dtu.dk

More information

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y. CS450 Final Review Problems Fall 08 Solutions or worked answers provided Problems -6 are based on the midterm review Identical problems are marked recap] Please consult previous recitations and textbook

More information

Lecture 13 (Part 2): Deviation from mean: Markov s inequality, variance and its properties, Chebyshev s inequality

Lecture 13 (Part 2): Deviation from mean: Markov s inequality, variance and its properties, Chebyshev s inequality Lecture 13 (Part 2): Deviation from mean: Markov s inequality, variance and its properties, Chebyshev s inequality Discrete Structures II (Summer 2018) Rutgers University Instructor: Abhishek Bhrushundi

More information

Ergodic Properties of Markov Processes

Ergodic Properties of Markov Processes Ergodic Properties of Markov Processes March 9, 2006 Martin Hairer Lecture given at The University of Warwick in Spring 2006 1 Introduction Markov processes describe the time-evolution of random systems

More information