Markov Chains and Computer Science

Size: px
Start display at page:

Download "Markov Chains and Computer Science"

Transcription

1 A not so Short Introduction Jean-Marc Vincent Laboratoire LIG, projet Inria-Mescal UniversitéJoseph Fourier Spring / 44

2 Outline 1 Markov Chain History Approaches 2 Formalisation 3 Long run behavior 4 Cache modeling 5 Synthesis 2 / 44

3 History (Andreï Markov) An example of statistical investigation in the text of "Eugene Onegin" illustrating coupling of "tests" in chains. (1913) In Proceedings of Academic Scientific St. Petersburg, VI, pages / 44

4 Graphs and Paths c 1 b a d g 3 h i Random Walks Path in a graph: X n n-th visited node path : i 0, i 1,, i n normalized weight : arc (i, j) p i,j 2 e 4 5 f concatenation :. P(i 0, i 1,, i n) = p i0,i 1 p i1,i 2 p in 1,i n disjoint union : + P(i 0 i n) = i 1,,i n 1 p i0,i 1 p i1,i 2 p in 1,i n automaton : state/transitions randomized (language) 4 / 44

5 Dynamical Systems Evolution Operator Initial value : X 0 Recurrence equation : X n+1 = Φ(X n, ξ n+1 ) Innovation at step n + 1 : ξ n+1 Finite set of innovations : {φ 1, φ 2,, φ K } Random function (chosen with a given probability) Diaconis-Freedman 99 Randomized Iterated Systems 5 / 44

6 Measure Approach Distribution of K particles Ehrenfest s Urn (1907) Initial State X 0 = 0 State = nb of particles in 0 Dynamic : uniform choice of a particle and jump to the other side π n(i) = P(X n = i X 0 = 0) = π n 1 (i 1). K i + 1 K +π n 1 (i + 1). i + 1 K Paul Ehrenfest ( ) π n = π n 1.P Iterated product of matrices 6 / 44

7 Algorithmic Interpretation int minimum (T,K) min= + cpt=0; for (k=0; k < K; k++) do if (T[k]< min) then min = T[k]; process(min); cpt++; end if end for return(cpt) Worst case K ; Best case 1; on average? Number of processing min State : X n = rank of the n th processing P(X n+1 = j X n = i, X n 1 = i k 1,, X 0 = i 0 ) = P(X n+1 = j X n = i) { 1 si j < i; i 1 P(X n+1 = j X n = i) = 0 sinon. All the information of for the step n + 1 is contained in the state at step n τ = min{n; X n = 1} Correlation of length 1 7 / 44

8 Outline 1 Markov Chain 2 Formalisation States and transitions Applications 3 Long run behavior 4 Cache modeling 5 Synthesis 8 / 44

9 Formal definition Let {X n} n N a random sequence of variables in a discrete state-space X {X n} n N is a Markov chain with initial law π(0) iff X 0 π(0) and for all n N and for all (j, i, i n 1,, i 0 ) X n+2 P(X n+1 = j X n = i, X n 1 = i n 1,, X 0 = i 0 ) = P(X n+1 = j X n = i). {X n} n N is a homogeneous Markov chain iff for all n N and for all (j, i) X 2 P(X n+1 = j X n = i) = P(X 1 = j X 0 = i) def = p i,j. (invariance during time of probability transition) 9 / 44

10 Formal definition Let {X n} n N a random sequence of variables in a discrete state-space X {X n} n N is a Markov chain with initial law π(0) iff X 0 π(0) and for all n N and for all (j, i, i n 1,, i 0 ) X n+2 P(X n+1 = j X n = i, X n 1 = i n 1,, X 0 = i 0 ) = P(X n+1 = j X n = i). {X n} n N is a homogeneous Markov chain iff for all n N and for all (j, i) X 2 P(X n+1 = j X n = i) = P(X 1 = j X 0 = i) def = p i,j. (invariance during time of probability transition) 9 / 44

11 Algebraic representation P = ((p i,j )) is the transition matrix of the chain P is a stochastic matrix p i,j 0; p i,j = 1. j Linear recurrence equation π n(i) = P(X n = i) π n = π n 1 P. Equation of Chapman-Kolmogorov (homogeneous): P n = ((p (n) i,j )) p (n) i,j = P(X n = j X 0 = i); P n+m = P n.p m ; P(X n+m = j X 0 = i) = k = k P(X n+m = j X m = k)p(x m = k X 0 = i); P(X n = j X 0 = k)p(x m = k X 0 = i). Interpretation: decomposition of the set of paths with length n + m from i to j. 10 / 44

12 Problems Finite horizon - Estimation of π(n) - Estimation of stopping times - τ A = inf{n 0; X n A} Infinite horizon - Convergence properties - Estimation of the asymptotics - Estimation speed of convergence - 11 / 44

13 Problems Finite horizon - Estimation of π(n) - Estimation of stopping times - τ A = inf{n 0; X n A} Infinite horizon - Convergence properties - Estimation of the asymptotics - Estimation speed of convergence - 11 / 44

14 Applications in computer science Applications in most of scientific domains... In computer science : Markov chain : an algorithmic tool - Numerical methods (Monte-Carlo methods) - Randomized algorithms (ex: TCP, searching, pagerank...) - Learning machines (hidden Markov chains) - Markov chains : a modeling tool - Performance evaluation (quantification and dimensionning) - Stochastic control - Program verification - 12 / 44

15 Applications in computer science Applications in most of scientific domains... In computer science : Markov chain : an algorithmic tool - Numerical methods (Monte-Carlo methods) - Randomized algorithms (ex: TCP, searching, pagerank...) - Learning machines (hidden Markov chains) - Markov chains : a modeling tool - Performance evaluation (quantification and dimensionning) - Stochastic control - Program verification - 12 / 44

16 Nicholas Metropolis ( ) Metropolis contributed several original ideas to mathematics and physics. Perhaps the most widely known is the Monte Carlo method. Also, in 1953 Metropolis co-authored the first paper on a technique that was central to the method known now as simulated annealing. He also developed an algorithm (the Metropolis algorithm or Metropolis-Hastings algorithm) for generating samples from the Boltzmann distribution, later generalized by W.K. Hastings. Simulated annealing Convergence to a global minimum by a stochastic gradient scheme. X n+1 = X n gradφ(x n) n(random). n(random) n / 44

17 Nicholas Metropolis ( ) Simulated annealing Convergence to a global minimum by a stochastic gradient scheme. X n+1 = X n gradφ(x n) n(random). Metropolis contributed several original ideas to mathematics and physics. Perhaps the most widely known is the Monte Carlo method. Also, in 1953 Metropolis co-authored the first paper on a technique that was central to the method known now as simulated annealing. He also developed an algorithm (the Metropolis algorithm or Metropolis-Hastings algorithm) for generating samples from the Boltzmann distribution, later generalized by W.K. Hastings. n(random) n / 44

18 Nicholas Metropolis ( ) Simulated annealing Convergence to a global minimum by a stochastic gradient scheme. X n+1 = X n gradφ(x n) n(random). Metropolis contributed several original ideas to mathematics and physics. Perhaps the most widely known is the Monte Carlo method. Also, in 1953 Metropolis co-authored the first paper on a technique that was central to the method known now as simulated annealing. He also developed an algorithm (the Metropolis algorithm or Metropolis-Hastings algorithm) for generating samples from the Boltzmann distribution, later generalized by W.K. Hastings. n(random) n / 44

19 Nicholas Metropolis ( ) Simulated annealing Convergence to a global minimum by a stochastic gradient scheme. X n+1 = X n gradφ(x n) n(random). Metropolis contributed several original ideas to mathematics and physics. Perhaps the most widely known is the Monte Carlo method. Also, in 1953 Metropolis co-authored the first paper on a technique that was central to the method known now as simulated annealing. He also developed an algorithm (the Metropolis algorithm or Metropolis-Hastings algorithm) for generating samples from the Boltzmann distribution, later generalized by W.K. Hastings. n(random) n / 44

20 Nicholas Metropolis ( ) Simulated annealing Convergence to a global minimum by a stochastic gradient scheme. X n+1 = X n gradφ(x n) n(random). Metropolis contributed several original ideas to mathematics and physics. Perhaps the most widely known is the Monte Carlo method. Also, in 1953 Metropolis co-authored the first paper on a technique that was central to the method known now as simulated annealing. He also developed an algorithm (the Metropolis algorithm or Metropolis-Hastings algorithm) for generating samples from the Boltzmann distribution, later generalized by W.K. Hastings. n(random) n / 44

21 Nicholas Metropolis ( ) Simulated annealing Convergence to a global minimum by a stochastic gradient scheme. X n+1 = X n gradφ(x n) n(random). Metropolis contributed several original ideas to mathematics and physics. Perhaps the most widely known is the Monte Carlo method. Also, in 1953 Metropolis co-authored the first paper on a technique that was central to the method known now as simulated annealing. He also developed an algorithm (the Metropolis algorithm or Metropolis-Hastings algorithm) for generating samples from the Boltzmann distribution, later generalized by W.K. Hastings. n(random) n / 44

22 Nicholas Metropolis ( ) Simulated annealing Convergence to a global minimum by a stochastic gradient scheme. X n+1 = X n gradφ(x n) n(random). Metropolis contributed several original ideas to mathematics and physics. Perhaps the most widely known is the Monte Carlo method. Also, in 1953 Metropolis co-authored the first paper on a technique that was central to the method known now as simulated annealing. He also developed an algorithm (the Metropolis algorithm or Metropolis-Hastings algorithm) for generating samples from the Boltzmann distribution, later generalized by W.K. Hastings. n(random) n / 44

23 Nicholas Metropolis ( ) Simulated annealing Convergence to a global minimum by a stochastic gradient scheme. X n+1 = X n gradφ(x n) n(random). Metropolis contributed several original ideas to mathematics and physics. Perhaps the most widely known is the Monte Carlo method. Also, in 1953 Metropolis co-authored the first paper on a technique that was central to the method known now as simulated annealing. He also developed an algorithm (the Metropolis algorithm or Metropolis-Hastings algorithm) for generating samples from the Boltzmann distribution, later generalized by W.K. Hastings. n(random) n / 44

24 Nicholas Metropolis ( ) Simulated annealing Convergence to a global minimum by a stochastic gradient scheme. X n+1 = X n gradφ(x n) n(random). Metropolis contributed several original ideas to mathematics and physics. Perhaps the most widely known is the Monte Carlo method. Also, in 1953 Metropolis co-authored the first paper on a technique that was central to the method known now as simulated annealing. He also developed an algorithm (the Metropolis algorithm or Metropolis-Hastings algorithm) for generating samples from the Boltzmann distribution, later generalized by W.K. Hastings. n(random) n / 44

25 Nicholas Metropolis ( ) Simulated annealing Convergence to a global minimum by a stochastic gradient scheme. X n+1 = X n gradφ(x n) n(random). Metropolis contributed several original ideas to mathematics and physics. Perhaps the most widely known is the Monte Carlo method. Also, in 1953 Metropolis co-authored the first paper on a technique that was central to the method known now as simulated annealing. He also developed an algorithm (the Metropolis algorithm or Metropolis-Hastings algorithm) for generating samples from the Boltzmann distribution, later generalized by W.K. Hastings. n(random) n / 44

26 Nicholas Metropolis ( ) Simulated annealing Convergence to a global minimum by a stochastic gradient scheme. X n+1 = X n gradφ(x n) n(random). Metropolis contributed several original ideas to mathematics and physics. Perhaps the most widely known is the Monte Carlo method. Also, in 1953 Metropolis co-authored the first paper on a technique that was central to the method known now as simulated annealing. He also developed an algorithm (the Metropolis algorithm or Metropolis-Hastings algorithm) for generating samples from the Boltzmann distribution, later generalized by W.K. Hastings. n(random) n / 44

27 Modeling and Analysis of Computer Systems Complex system Basic model assumptions System : - automaton (discrete state space) - discrete or continuous time Environment : non deterministic - time homogeneous - stochastically regular System Problem Understand typical states - steady-state estimation - ergodic simulation - state space exploring techniques 14 / 44

28 Modeling and Analysis of Computer Systems Complex system Input of the system Basic model assumptions System : - automaton (discrete state space) - discrete or continuous time Environment : non deterministic - time homogeneous - stochastically regular System Environment System output Problem Understand typical states - steady-state estimation - ergodic simulation - state space exploring techniques 14 / 44

29 Modeling and Analysis of Computer Systems Complex system Input of the system Basic model assumptions System : - automaton (discrete state space) - discrete or continuous time Environment : non deterministic - time homogeneous - stochastically regular System Environment System output Problem Understand typical states - steady-state estimation - ergodic simulation - state space exploring techniques 14 / 44

30 Modeling and Analysis of Computer Systems Complex system Input of the system Basic model assumptions System : - automaton (discrete state space) - discrete or continuous time Environment : non deterministic - time homogeneous - stochastically regular System Environment System output Problem Understand typical states - steady-state estimation - ergodic simulation - state space exploring techniques 14 / 44

31 Outline 1 Markov Chain 2 Formalisation 3 Long run behavior Convergence Solving Simulation 4 Cache modeling 5 Synthesis 15 / 44

32 States classification Graph analysis Irreducible class Strongly connected components i and j are in the same component if there exist a path from i to j and a path from j to i with a positive probability Leaves of the tree of strongly connected components are irreducible classes States in irreducible classes are called recurrent Other states are called transient Periodicity An irreducible class is aperiodic if the gcd of length of all cycles is 1 A Markov chain is irreducible if there is only one class. Each state is reachable from any other state with a positive probability path. 16 / 44

33 States classification Graph analysis Irreducible class Strongly connected components i and j are in the same component if there exist a path from i to j and a path from j to i with a positive probability Leaves of the tree of strongly connected components are irreducible classes States in irreducible classes are called recurrent Other states are called transient Periodicity An irreducible class is aperiodic if the gcd of length of all cycles is 1 A Markov chain is irreducible if there is only one class. Each state is reachable from any other state with a positive probability path. 16 / 44

34 States classification Graph analysis Transient classes Absorbing state Irreducible class Strongly connected components i and j are in the same component if there exist a path from i to j and a path from j to i with a positive probability Leaves of the tree of strongly connected components are irreducible classes States in irreducible classes are called recurrent Other states are called transient Irreducible class Periodic class Irreducible class Periodicity An irreducible class is aperiodic if the gcd of length of all cycles is 1 A Markov chain is irreducible if there is only one class. Each state is reachable from any other state with a positive probability path. 16 / 44

35 States classification : matrix form Transient classes Irreducible class Periodic class Absorbing state Irreducible class * * * * * * * * * * * * * * * * * * * * 1 17 / 44

36 Automaton Flip-flop ON-OFF system Two states model : - communication line - processor activity p p q Trajectory X n state of the automaton at time n. Transient distribution π n(1) = P(X n = 1); π n(2) = P(X n = 2) q Parameters : - proportion of transitions : p, q - mean sojourn time in state 1 : 1 p - mean sojourn time in state 2 : 1 q Problem Estimation of π n : state prevision, resource utilization 18 / 44

37 Automaton Flip-flop ON-OFF system Two states model : - communication line - processor activity p p q Trajectory X n state of the automaton at time n. Transient distribution π n(1) = P(X n = 1); π n(2) = P(X n = 2) q Parameters : - proportion of transitions : p, q - mean sojourn time in state 1 : 1 p - mean sojourn time in state 2 : 1 q Problem Estimation of π n : state prevision, resource utilization 18 / 44

38 Automaton Flip-flop ON-OFF system Two states model : - communication line - processor activity -... Trajectory X n state of the automaton at time n. 2 X 1-p p 1-q Parameters : - proportion of transitions : p, q - mean sojourn time in state 1 : 1 p - mean sojourn time in state 2 : 1 q q n Transient distribution π n(1) = P(X n = 1); π n(2) = P(X n = 2) Problem Estimation of π n : state prevision, resource utilization 18 / 44

39 Automaton Flip-flop ON-OFF system Two states model : - communication line - processor activity -... Trajectory X n state of the automaton at time n. 2 X 1-p p 1-q Parameters : - proportion of transitions : p, q - mean sojourn time in state 1 : 1 p - mean sojourn time in state 2 : 1 q q n Transient distribution π n(1) = P(X n = 1); π n(2) = P(X n = 2) Problem Estimation of π n : state prevision, resource utilization 18 / 44

40 Mathematical model Transition probabilities [ ] 1 p p P = q 1 q P(X n+1 = 1 X n = 1) = 1 p; P(X n+1 = 2 X n = 1) = p; P(X n+1 = 1 X n = 2) = q; P(X n+1 = 2 X n = 2) = 1 q. { πn+1 (1) = π n(1)(1 p) + π n(2)q; π n+1 (2) = π n(1)p + π n(2)(1 q); π n+1 = π n P Linear iterations Spectrum of P (eigenvalues) Sp = {1, 1 p q} System resolution 1 p q < 1 Non pathologic case πn(1) = q ( p+q + π 0 (1) q ) (1 p q) p+q n ; πn(2) = p ( p+q + π 0 (2) p ) (1 p q) p+q n ; 1 p q = 1 p = q = 0 Reducible behavior 1 p q = 1 p = q = 1 Periodic behavior 19 / 44

41 Mathematical model Transition probabilities [ ] 1 p p P = q 1 q P(X n+1 = 1 X n = 1) = 1 p; P(X n+1 = 2 X n = 1) = p; P(X n+1 = 1 X n = 2) = q; P(X n+1 = 2 X n = 2) = 1 q. { πn+1 (1) = π n(1)(1 p) + π n(2)q; π n+1 (2) = π n(1)p + π n(2)(1 q); π n+1 = π n P Linear iterations Spectrum of P (eigenvalues) Sp = {1, 1 p q} System resolution 1 p q < 1 Non pathologic case πn(1) = q ( p+q + π 0 (1) q ) (1 p q) p+q n ; πn(2) = p ( p+q + π 0 (2) p ) (1 p q) p+q n ; 1 p q = 1 p = q = 0 Reducible behavior 1 p q = 1 p = q = 1 Periodic behavior 19 / 44

42 Mathematical model Transition probabilities [ ] 1 p p P = q 1 q P(X n+1 = 1 X n = 1) = 1 p; P(X n+1 = 2 X n = 1) = p; P(X n+1 = 1 X n = 2) = q; P(X n+1 = 2 X n = 2) = 1 q. { πn+1 (1) = π n(1)(1 p) + π n(2)q; π n+1 (2) = π n(1)p + π n(2)(1 q); π n+1 = π n P Linear iterations Spectrum of P (eigenvalues) Sp = {1, 1 p q} System resolution 1 p q < 1 Non pathologic case πn(1) = q ( p+q + π 0 (1) q ) (1 p q) p+q n ; πn(2) = p ( p+q + π 0 (2) p ) (1 p q) p+q n ; 1 p q = 1 p = q = 0 Reducible behavior 1 p q = 1 p = q = 1 Periodic behavior 19 / 44

43 Mathematical model Transition probabilities [ ] 1 p p P = q 1 q P(X n+1 = 1 X n = 1) = 1 p; P(X n+1 = 2 X n = 1) = p; P(X n+1 = 1 X n = 2) = q; P(X n+1 = 2 X n = 2) = 1 q. { πn+1 (1) = π n(1)(1 p) + π n(2)q; π n+1 (2) = π n(1)p + π n(2)(1 q); π n+1 = π n P Linear iterations Spectrum of P (eigenvalues) Sp = {1, 1 p q} System resolution 1 p q < 1 Non pathologic case πn(1) = q ( p+q + π 0 (1) q ) (1 p q) p+q n ; πn(2) = p ( p+q + π 0 (2) p ) (1 p q) p+q n ; 1 p q = 1 p = q = 0 Reducible behavior p q = 1 p = q = 1 Periodic behavior 1 19 / 44

44 Mathematical model Transition probabilities [ ] 1 p p P = q 1 q P(X n+1 = 1 X n = 1) = 1 p; P(X n+1 = 2 X n = 1) = p; P(X n+1 = 1 X n = 2) = q; P(X n+1 = 2 X n = 2) = 1 q. { πn+1 (1) = π n(1)(1 p) + π n(2)q; π n+1 (2) = π n(1)p + π n(2)(1 q); π n+1 = π n P Linear iterations Spectrum of P (eigenvalues) Sp = {1, 1 p q} System resolution 1 p q < 1 Non pathologic case πn(1) = q ( p+q + π 0 (1) q ) (1 p q) p+q n ; πn(2) = p ( p+q + π 0 (2) p ) (1 p q) p+q n ; 1 p q = 1 p = q = 0 Reducible behavior p q = 1 p = q = 1 Periodic behavior / 44

45 Recurrent behavior Numerical example p = 1, q = Steady state behavior { π (1) = q π (2) = ; p+q p. p+q π unique probability vector solution π = π P. If π 0 = π then π n = π for all n stationary behavior Rapid convergence (exponential rate) / 44

46 Recurrent behavior Numerical example p = 1, q = Steady state behavior { π (1) = q π (2) = ; p+q p. p+q π unique probability vector solution π = π P. If π 0 = π then π n = π for all n stationary behavior Rapid convergence (exponential rate) / 44

47 Convergence In Law Let {X n} n N a homogeneous, irreducible and aperiodic Markov chain taking values in a discrete state X then Denote The following limits exist (and do not depend on i) lim n + P(Xn = j X 0 = i) = π j ; π is the unique probability vector invariant by P πp = π; The convergence is rapid (geometric); there is C > 0 and 0 < α < 1 such that X n P(X n = j X 0 = i) π j C.α n. L X ; with X with law π π is the steady-state probability associated to the chain 21 / 44

48 Equilibrium equation Interpretation i1 p i1,j p j,k1 k1 p j,j i2 p i2,j p j,k2 j k2 i3 p i3,j p j,k3 p i4,j k3 i4 Probability to enter j =probability to exit j balance equation π i p i,j = π j p j,k = π j p j,k = π j (1 p j,j ) i j k j k j π def = steady-state. If π 0 = π the process is stationary (π n = π) 22 / 44

49 Equilibrium equation Interpretation i1 p i1,j p j,k1 k1 p j,j i2 p i2,j p j,k2 j k2 i3 p i3,j p j,k3 p i4,j k3 i4 Probability to enter j =probability to exit j balance equation π i p i,j = π j p j,k = π j p j,k = π j (1 p j,j ) i j k j k j π def = steady-state. If π 0 = π the process is stationary (π n = π) 22 / 44

50 Equilibrium equation Interpretation i1 p i1,j p j,k1 k1 p j,j i2 p i2,j p j,k2 j k2 i3 p i3,j p j,k3 p i4,j k3 i4 Probability to enter j =probability to exit j balance equation π i p i,j = π j p j,k = π j p j,k = π j (1 p j,j ) i j k j k j π def = steady-state. If π 0 = π the process is stationary (π n = π) 22 / 44

51 Proof 1 : Finite state space algebraic approach Positive matrix P > 0 contraction max i p (n) i,j min i p (n) i,j Perron-Froebenius P > 0 P is positive and stochastic then the spectral radius ρ = 1 is an eigenvalue with multiplicity 1, the corresponding eigenvector is positive and the other eigenvalues have module < 1. Case P 0 Aperiodique and irreducible there is k such that P k > 0 and apply the above result. 23 / 44

52 Proof 1 : details P > 0 Soit x et y = Px, ω = min i,j p i,j x = max x i, x = min x i. i i y i = j p i,j x j Property of centroid : (1 ω)x + ωx y i (1 ω)x + ωx 0 y y (1 2ω)(x x) P n x s(x)(1, 1,, 1) t Then P n converges to a matrix where all lines are identical. 24 / 44

53 Proof 2 : Return time τ + i = inf{n 1; X n = i X 0 = i}. then 1 is an invariant probability (Kac s lemma) Eτ + i Proof : 1 Eτ + i < 2 Study on a regeneration interval (Strong Markov property) 3 Uniqueness by harmonic functions / 44

54 Proof 3 : Coupling Let {X n} n N a homogeneous aperiodic and irreducible Markov chain with initial { law } π(0) and steady-state probability π. Let Xn another Markov chain π(0) with the same transition matrix as {X n} {X n} et n N { Xn } independent - Z n = (X n, X n) is a homogeneous Markov chain - if {X n} is aperiodic and irreducible, so it is for Z n Let τ be the hitting time of the diagonal, τ < P-a.s. then P(X n = i) P( X n = i) < 2P(τ > n) P(X n = i) π(i) < 2P(τ > n) / 44

55 Ergodic Theorem Let {X n} n N a homogeneous aperiodic and irreducible Markov chain on X with steady-state probability π then - for all function f satisfying E π f < + 1 N N n=1 f (X n) P p.s. E πf. generalization of the strong law of large numbers - If E πf = 0 then there exist σ such that 1 σ N N L f (X n) N (0, 1). n=1 generalization of the central limit theorem 27 / 44

56 Fundamental question Given a function f (cost, reward, performance,...) estimate E π f and give the quality of this estimation. 28 / 44

57 Solving methods Solving π = πp Analytical/approximation methods Formal methods N 50 Maple, Sage,... Direct numerical methods N 1000 Mathematica, Scilab,... Iterative methods with preconditioning N 100, 000 Marca,... Adapted methods (structured Markov chains) N 1, 000, 000 PEPS,... Monte-Carlo simulation N 10 7 Postprocessing of the stationary distribution Computation of rewards (expected stationary functions) Utilization, response time, / 44

58 Ergodic Sampling(1) Ergodic sampling algorithm Representation : transition fonction X n+1 = Φ(X n, e n+1 ). x x 0 {choice of the initial state at time =0} n = 0; repeat n n + 1; e Random_event(); x Φ(x, e); Store x {computation of the next state X n+1 } until some empirical criteria return the trajectory Problem : Stopping criteria 30 / 44

59 Ergodic Sampling(2) Start-up Convergence to stationary behavior lim P(Xn = x) = πx. n + Warm-up period : Avoid initial state dependence Estimation error : P(X n = x) π x Cλ n 2. λ 2 second greatest eigenvalue of the transition matrix - bounds on C and λ 2 (spectral gap) - cut-off phenomena λ 2 and C non reachable in practice (complexity equivalent to the computation of π) some known results (Birth and Death processes) 31 / 44

60 Ergodic Sampling(3) Estimation quality Ergodic theorem : 1 lim n + n n f (X i ) = E πf. i=1 Length of the sampling : Error control (CLT theorem) Complexity Complexity of the transition function evaluation (computation of Φ(x,.)) Related to the stabilization period + Estimation time 32 / 44

61 Ergodic sampling(4) Typical trajectory States 0 time Warm up period Estimation period 33 / 44

62 Replication Method Typical trajectory States 0 time replication periods Sample of independent states Drawback : length of the replication period (dependence from initial state) 34 / 44

63 Regeneration Method Typical trajectory States start up period R1 R2 R regeneration period time Sample of independent trajectories Drawback : length of the regeneration period (choice of the regenerative state) 35 / 44

64 Outline 1 Markov Chain 2 Formalisation 3 Long run behavior 4 Cache modeling 5 Synthesis 36 / 44

65 Cache modelling Virtual memory Paging in OS Carte CPU CPU DISQUE DE PAGINATION Move-to-front strategy Least recently used (LRU) Virtual memory Memory Disque Adress State Pages P 3 P 7 P 2 P 6 P 5 P 1 P 8 P 4 E Pages P 5 P 3 P 7 P 2 P 6 P 1 P 8 P 4 E 1 MMU MEMOIRE BUS - cache hierarchy (processor) - data caches (databases) - proxy-web (internet) - routing tables (networking) -... State of the system : Page position CONTROLEUR DISQUE Huge number of pages, small memory capacity Move-ahead strategy Ranking algorithm Virtual memory Memory Disk Adress State Pages P 3 P 7 P 2 P 6 P 5 P 1 P 8 P 4 E Pages P 3 P 7 P 2 P 5 P 6 P 1 P 8 P 4 E 2 Problem Performance : mean response time (memory access << disk access) Choose the strategy that achieves the best long-term performance 37 / 44

66 Cache modelling Virtual memory Paging in OS Carte CPU CPU DISQUE DE PAGINATION Move-to-front strategy Least recently used (LRU) Virtual memory Memory Disque Adress State Pages P 3 P 7 P 2 P 6 P 5 P 1 P 8 P 4 E Pages P 5 P 3 P 7 P 2 P 6 P 1 P 8 P 4 E 1 MMU MEMOIRE BUS - cache hierarchy (processor) - data caches (databases) - proxy-web (internet) - routing tables (networking) -... State of the system : Page position CONTROLEUR DISQUE Huge number of pages, small memory capacity Move-ahead strategy Ranking algorithm Virtual memory Memory Disk Adress State Pages P 3 P 7 P 2 P 6 P 5 P 1 P 8 P 4 E Pages P 3 P 7 P 2 P 5 P 6 P 1 P 8 P 4 E 2 Problem Performance : mean response time (memory access << disk access) Choose the strategy that achieves the best long-term performance 37 / 44

67 Cache modelling Virtual memory Paging in OS Carte CPU CPU DISQUE DE PAGINATION Move-to-front strategy Least recently used (LRU) Virtual memory Memory Disque Adress State Pages P 3 P 7 P 2 P 6 P 5 P 1 P 8 P 4 E Pages P 5 P 3 P 7 P 2 P 6 P 1 P 8 P 4 E 1 MMU MEMOIRE BUS - cache hierarchy (processor) - data caches (databases) - proxy-web (internet) - routing tables (networking) -... State of the system : Page position CONTROLEUR DISQUE Huge number of pages, small memory capacity Move-ahead strategy Ranking algorithm Virtual memory Memory Disk Adress State Pages P 3 P 7 P 2 P 6 P 5 P 1 P 8 P 4 E Pages P 3 P 7 P 2 P 5 P 6 P 1 P 8 P 4 E 2 Problem Performance : mean response time (memory access << disk access) Choose the strategy that achieves the best long-term performance 37 / 44

68 Cache modelling Virtual memory Paging in OS Carte CPU CPU DISQUE DE PAGINATION Move-to-front strategy Least recently used (LRU) Virtual memory Memory Disque Adress State Pages P 3 P 7 P 2 P 6 P 5 P 1 P 8 P 4 E Pages P 5 P 3 P 7 P 2 P 6 P 1 P 8 P 4 E 1 MMU MEMOIRE BUS - cache hierarchy (processor) - data caches (databases) - proxy-web (internet) - routing tables (networking) -... State of the system : Page position CONTROLEUR DISQUE Huge number of pages, small memory capacity Move-ahead strategy Ranking algorithm Virtual memory Memory Disk Adress State Pages P 3 P 7 P 2 P 6 P 5 P 1 P 8 P 4 E Pages P 3 P 7 P 2 P 5 P 6 P 1 P 8 P 4 E 2 Problem Performance : mean response time (memory access << disk access) Choose the strategy that achieves the best long-term performance 37 / 44

69 Modelling State of the system N = number of pages State = permutation of {1,, N} Size of the state space = N! = numerically untractable Example : Linux system - Size of page = 4kb - Memory size = 1Gb - Swap disk size = 1Gb Size of the state space = ! exercise : compute the order of magnitude Flow modelling Requests are random Request have the same probability distributions Requests are stochastically independent {R n} n N random sequence of i.i.d. requests State space reduction P A = More frequent page All other pages have the same frequency. a = P(R n = P A ), b = P(R n = P i ), a > b, a + (N 1)b = 1. {X n} n N position of page P A at time n. State space = {1,, N} (size reduction) Markov chain (state dependent policy) 38 / 44

70 Modelling State of the system N = number of pages State = permutation of {1,, N} Size of the state space = N! = numerically untractable Example : Linux system - Size of page = 4kb - Memory size = 1Gb - Swap disk size = 1Gb Size of the state space = ! exercise : compute the order of magnitude Flow modelling Requests are random Request have the same probability distributions Requests are stochastically independent {R n} n N random sequence of i.i.d. requests State space reduction P A = More frequent page All other pages have the same frequency. a = P(R n = P A ), b = P(R n = P i ), a > b, a + (N 1)b = 1. {X n} n N position of page P A at time n. State space = {1,, N} (size reduction) Markov chain (state dependent policy) 38 / 44

71 Modelling State of the system N = number of pages State = permutation of {1,, N} Size of the state space = N! = numerically untractable Example : Linux system - Size of page = 4kb - Memory size = 1Gb - Swap disk size = 1Gb Size of the state space = ! exercise : compute the order of magnitude Flow modelling Requests are random Request have the same probability distributions Requests are stochastically independent {R n} n N random sequence of i.i.d. requests State space reduction P A = More frequent page All other pages have the same frequency. a = P(R n = P A ), b = P(R n = P i ), a > b, a + (N 1)b = 1. {X n} n N position of page P A at time n. State space = {1,, N} (size reduction) Markov chain (state dependent policy) 38 / 44

72 Modelling State of the system N = number of pages State = permutation of {1,, N} Size of the state space = N! = numerically untractable Example : Linux system - Size of page = 4kb - Memory size = 1Gb - Swap disk size = 1Gb Size of the state space = ! exercise : compute the order of magnitude Flow modelling Requests are random Request have the same probability distributions Requests are stochastically independent {R n} n N random sequence of i.i.d. requests State space reduction P A = More frequent page All other pages have the same frequency. a = P(R n = P A ), b = P(R n = P i ), a > b, a + (N 1)b = 1. {X n} n N position of page P A at time n. State space = {1,, N} (size reduction) Markov chain (state dependent policy) 38 / 44

73 Move to front analysis Markov chain graph a b 2b (N 2)b (N 1)b 1 (N 1)b (N 2)b (N 3)b 2b b 2 3 N 1 N a a a a Transition matrix a (N 1)b a b (N 2)b b (N 3)b (N 2)b b a 0 0 (N 1)b. Example N = 8, a = 0.3 and b = 0.1 π = / 44

74 Move to front analysis Markov chain graph a b 2b (N 2)b (N 1)b 1 (N 1)b (N 2)b (N 3)b 2b b 2 3 N 1 N a a a a Transition matrix a (N 1)b a b (N 2)b b (N 3)b (N 2)b b a 0 0 (N 1)b. Example N = 8, a = 0.3 and b = 0.1 π = / 44

75 Move to front analysis Markov chain graph a b 2b (N 2)b (N 1)b 1 (N 1)b (N 2)b (N 3)b 2b b 2 3 N 1 N a a a a Transition matrix a (N 1)b a b (N 2)b b (N 3)b (N 2)b b a 0 0 (N 1)b. Example N = 8, a = 0.3 and b = 0.1 π = / 44

76 Move ahead analysis Markov chain graph (N 2)b+a (N 2)b (N 2)b (N 2)b (N 1)b 1 b a b b b b 2 3 N 1 N a a a a Transition matrix a + (N 2)b b a (N 2)b b a (N 2)b b (N 2)b b a (N 1)b. Example N = 8, a = 0.3 and b = 0.1 π = / 44

77 Move ahead analysis Markov chain graph (N 2)b+a (N 2)b (N 2)b (N 2)b (N 1)b 1 b a b b b b 2 3 N 1 N a a a a Transition matrix a + (N 2)b b a (N 2)b b a (N 2)b b (N 2)b b a (N 1)b. Example N = 8, a = 0.3 and b = 0.1 π = / 44

78 Move ahead analysis Markov chain graph (N 2)b+a (N 2)b (N 2)b (N 2)b (N 1)b 1 b a b b b b 2 3 N 1 N a a a a Transition matrix a + (N 2)b b a (N 2)b b a (N 2)b b (N 2)b b a (N 1)b. Example N = 8, a = 0.3 and b = 0.1 π = / 44

79 Performances Steady state MF = Move to front Move ahead MA = (N 1 i) (N 2)(N 1)b i 1 π(i) = π 1. (a + (N i)b) (a + (N 2)b)(a + (N 1)b) Cache miss Memory size Move to front Move Ahead Best strategy : Move ahead π i = ( b ) i 1 1 b a a 1 ( b. a )N Comments Self-ordering protocol : decreasing probability Convergence speed to steady state : Move to front : 0.7 n Move ahead : 0.92 n Tradeoff between stabilization and long term performance Depends on the input flow of requests 41 / 44

80 Performances Steady state MF = Move to front Move ahead MA = (N 1 i) (N 2)(N 1)b i 1 π(i) = π 1. (a + (N i)b) (a + (N 2)b)(a + (N 1)b) Cache miss Memory size Move to front Move Ahead Best strategy : Move ahead π i = ( b ) i 1 1 b a a 1 ( b. a )N Comments Self-ordering protocol : decreasing probability Convergence speed to steady state : Move to front : 0.7 n Move ahead : 0.92 n Tradeoff between stabilization and long term performance Depends on the input flow of requests 41 / 44

81 Performances Steady state MF = Move to front Move ahead MA = (N 1 i) (N 2)(N 1)b i 1 π(i) = π 1. (a + (N i)b) (a + (N 2)b)(a + (N 1)b) Cache miss Memory size Move to front Move Ahead Best strategy : Move ahead π i = ( b ) i 1 1 b a a 1 ( b. a )N Comments Self-ordering protocol : decreasing probability Convergence speed to steady state : Move to front : 0.7 n Move ahead : 0.92 n Tradeoff between stabilization and long term performance Depends on the input flow of requests 41 / 44

82 Performances Steady state MF = Move to front Move ahead MA = (N 1 i) (N 2)(N 1)b i 1 π(i) = π 1. (a + (N i)b) (a + (N 2)b)(a + (N 1)b) Cache miss Memory size Move to front Move Ahead Best strategy : Move ahead π i = ( b ) i 1 1 b a a 1 ( b. a )N Comments Self-ordering protocol : decreasing probability Convergence speed to steady state : Move to front : 0.7 n Move ahead : 0.92 n Tradeoff between stabilization and long term performance Depends on the input flow of requests 41 / 44

83 Outline 1 Markov Chain 2 Formalisation 3 Long run behavior 4 Cache modeling 5 Synthesis 42 / 44

84 Synthesis : Modelling and Performance Methodology 1 Identify states of the system 2 Estimate transition parameters, build the Markov chain (verify properties) 3 Specify performances as a function of steady-state 4 Compute steady-state distribution and steady-state performance 5 Analyse performances as a function of input parameters Classical methods to compute the steady state 1 Analytical formulae : structure of the Markov chain (closed form) 2 Formal computation (N < 50) 3 Direct numerical computation (classical linear algebra kernels) (N < 1000) 4 Iterative numerical computation (classical linear algebra kernels) (N < ) 5 Model adapted numerical computation (N < ) 6 Simulation of random trajectories (sampling) 43 / 44

85 Synthesis : Modelling and Performance Methodology 1 Identify states of the system 2 Estimate transition parameters, build the Markov chain (verify properties) 3 Specify performances as a function of steady-state 4 Compute steady-state distribution and steady-state performance 5 Analyse performances as a function of input parameters Classical methods to compute the steady state 1 Analytical formulae : structure of the Markov chain (closed form) 2 Formal computation (N < 50) 3 Direct numerical computation (classical linear algebra kernels) (N < 1000) 4 Iterative numerical computation (classical linear algebra kernels) (N < ) 5 Model adapted numerical computation (N < ) 6 Simulation of random trajectories (sampling) 43 / 44

86 Bibliography Books O. Haggstrom. Finite Markov Chains and algorithmic applications. Cambridge University Press P. Brémaud Markov chains: Gibbs fields, Monte Carlo Simulation and Queues. Springer-Verlag, 1999 D. A. Levin, Y. Peres, et E. L. Wilmer Markov Chains and Mixing Times, American Mathematical Society,2009. Websites Virtual Laboratories in Probability and Statistics The MacTutor History of Mathematics archive (photos) 44 / 44

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

Convex Optimization CMU-10725

Convex Optimization CMU-10725 Convex Optimization CMU-10725 Simulated Annealing Barnabás Póczos & Ryan Tibshirani Andrey Markov Markov Chains 2 Markov Chains Markov chain: Homogen Markov chain: 3 Markov Chains Assume that the state

More information

8. Statistical Equilibrium and Classification of States: Discrete Time Markov Chains

8. Statistical Equilibrium and Classification of States: Discrete Time Markov Chains 8. Statistical Equilibrium and Classification of States: Discrete Time Markov Chains 8.1 Review 8.2 Statistical Equilibrium 8.3 Two-State Markov Chain 8.4 Existence of P ( ) 8.5 Classification of States

More information

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains Markov Chains A random process X is a family {X t : t T } of random variables indexed by some set T. When T = {0, 1, 2,... } one speaks about a discrete-time process, for T = R or T = [0, ) one has a continuous-time

More information

Markov Chains and MCMC

Markov Chains and MCMC Markov Chains and MCMC Markov chains Let S = {1, 2,..., N} be a finite set consisting of N states. A Markov chain Y 0, Y 1, Y 2,... is a sequence of random variables, with Y t S for all points in time

More information

STOCHASTIC PROCESSES Basic notions

STOCHASTIC PROCESSES Basic notions J. Virtamo 38.3143 Queueing Theory / Stochastic processes 1 STOCHASTIC PROCESSES Basic notions Often the systems we consider evolve in time and we are interested in their dynamic behaviour, usually involving

More information

Markov Chains, Stochastic Processes, and Matrix Decompositions

Markov Chains, Stochastic Processes, and Matrix Decompositions Markov Chains, Stochastic Processes, and Matrix Decompositions 5 May 2014 Outline 1 Markov Chains Outline 1 Markov Chains 2 Introduction Perron-Frobenius Matrix Decompositions and Markov Chains Spectral

More information

Markov Chains, Random Walks on Graphs, and the Laplacian

Markov Chains, Random Walks on Graphs, and the Laplacian Markov Chains, Random Walks on Graphs, and the Laplacian CMPSCI 791BB: Advanced ML Sridhar Mahadevan Random Walks! There is significant interest in the problem of random walks! Markov chain analysis! Computer

More information

Stochastic process. X, a series of random variables indexed by t

Stochastic process. X, a series of random variables indexed by t Stochastic process X, a series of random variables indexed by t X={X(t), t 0} is a continuous time stochastic process X={X(t), t=0,1, } is a discrete time stochastic process X(t) is the state at time t,

More information

Stochastic Processes

Stochastic Processes Stochastic Processes 8.445 MIT, fall 20 Mid Term Exam Solutions October 27, 20 Your Name: Alberto De Sole Exercise Max Grade Grade 5 5 2 5 5 3 5 5 4 5 5 5 5 5 6 5 5 Total 30 30 Problem :. True / False

More information

Stat 516, Homework 1

Stat 516, Homework 1 Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball

More information

Markov Chain Model for ALOHA protocol

Markov Chain Model for ALOHA protocol Markov Chain Model for ALOHA protocol Laila Daniel and Krishnan Narayanan April 22, 2012 Outline of the talk A Markov chain (MC) model for Slotted ALOHA Basic properties of Discrete-time Markov Chain Stability

More information

Irreducibility. Irreducible. every state can be reached from every other state For any i,j, exist an m 0, such that. Absorbing state: p jj =1

Irreducibility. Irreducible. every state can be reached from every other state For any i,j, exist an m 0, such that. Absorbing state: p jj =1 Irreducibility Irreducible every state can be reached from every other state For any i,j, exist an m 0, such that i,j are communicate, if the above condition is valid Irreducible: all states are communicate

More information

Discrete time Markov chains. Discrete Time Markov Chains, Limiting. Limiting Distribution and Classification. Regular Transition Probability Matrices

Discrete time Markov chains. Discrete Time Markov Chains, Limiting. Limiting Distribution and Classification. Regular Transition Probability Matrices Discrete time Markov chains Discrete Time Markov Chains, Limiting Distribution and Classification DTU Informatics 02407 Stochastic Processes 3, September 9 207 Today: Discrete time Markov chains - invariant

More information

Lect4: Exact Sampling Techniques and MCMC Convergence Analysis

Lect4: Exact Sampling Techniques and MCMC Convergence Analysis Lect4: Exact Sampling Techniques and MCMC Convergence Analysis. Exact sampling. Convergence analysis of MCMC. First-hit time analysis for MCMC--ways to analyze the proposals. Outline of the Module Definitions

More information

Markov chain Monte Carlo

Markov chain Monte Carlo 1 / 26 Markov chain Monte Carlo Timothy Hanson 1 and Alejandro Jara 2 1 Division of Biostatistics, University of Minnesota, USA 2 Department of Statistics, Universidad de Concepción, Chile IAP-Workshop

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning March May, 2013 Schedule Update Introduction 03/13/2015 (10:15-12:15) Sala conferenze MDPs 03/18/2015 (10:15-12:15) Sala conferenze Solving MDPs 03/20/2015 (10:15-12:15) Aula Alpha

More information

6 Markov Chain Monte Carlo (MCMC)

6 Markov Chain Monte Carlo (MCMC) 6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution

More information

Lecture Notes 7 Random Processes. Markov Processes Markov Chains. Random Processes

Lecture Notes 7 Random Processes. Markov Processes Markov Chains. Random Processes Lecture Notes 7 Random Processes Definition IID Processes Bernoulli Process Binomial Counting Process Interarrival Time Process Markov Processes Markov Chains Classification of States Steady State Probabilities

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos Contents Markov Chain Monte Carlo Methods Sampling Rejection Importance Hastings-Metropolis Gibbs Markov Chains

More information

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past.

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past. 1 Markov chain: definition Lecture 5 Definition 1.1 Markov chain] A sequence of random variables (X n ) n 0 taking values in a measurable state space (S, S) is called a (discrete time) Markov chain, if

More information

Markov Chains on Countable State Space

Markov Chains on Countable State Space Markov Chains on Countable State Space 1 Markov Chains Introduction 1. Consider a discrete time Markov chain {X i, i = 1, 2,...} that takes values on a countable (finite or infinite) set S = {x 1, x 2,...},

More information

Markov Chains and Stochastic Sampling

Markov Chains and Stochastic Sampling Part I Markov Chains and Stochastic Sampling 1 Markov Chains and Random Walks on Graphs 1.1 Structure of Finite Markov Chains We shall only consider Markov chains with a finite, but usually very large,

More information

Stochastic optimization Markov Chain Monte Carlo

Stochastic optimization Markov Chain Monte Carlo Stochastic optimization Markov Chain Monte Carlo Ethan Fetaya Weizmann Institute of Science 1 Motivation Markov chains Stationary distribution Mixing time 2 Algorithms Metropolis-Hastings Simulated Annealing

More information

Lecture 7. µ(x)f(x). When µ is a probability measure, we say µ is a stationary distribution.

Lecture 7. µ(x)f(x). When µ is a probability measure, we say µ is a stationary distribution. Lecture 7 1 Stationary measures of a Markov chain We now study the long time behavior of a Markov Chain: in particular, the existence and uniqueness of stationary measures, and the convergence of the distribution

More information

http://www.math.uah.edu/stat/markov/.xhtml 1 of 9 7/16/2009 7:20 AM Virtual Laboratories > 16. Markov Chains > 1 2 3 4 5 6 7 8 9 10 11 12 1. A Markov process is a random process in which the future is

More information

Markov Chains Handout for Stat 110

Markov Chains Handout for Stat 110 Markov Chains Handout for Stat 0 Prof. Joe Blitzstein (Harvard Statistics Department) Introduction Markov chains were first introduced in 906 by Andrey Markov, with the goal of showing that the Law of

More information

Outlines. Discrete Time Markov Chain (DTMC) Continuous Time Markov Chain (CTMC)

Outlines. Discrete Time Markov Chain (DTMC) Continuous Time Markov Chain (CTMC) Markov Chains (2) Outlines Discrete Time Markov Chain (DTMC) Continuous Time Markov Chain (CTMC) 2 pj ( n) denotes the pmf of the random variable p ( n) P( X j) j We will only be concerned with homogenous

More information

Convergence Rate of Markov Chains

Convergence Rate of Markov Chains Convergence Rate of Markov Chains Will Perkins April 16, 2013 Convergence Last class we saw that if X n is an irreducible, aperiodic, positive recurrent Markov chain, then there exists a stationary distribution

More information

The Theory behind PageRank

The Theory behind PageRank The Theory behind PageRank Mauro Sozio Telecom ParisTech May 21, 2014 Mauro Sozio (LTCI TPT) The Theory behind PageRank May 21, 2014 1 / 19 A Crash Course on Discrete Probability Events and Probability

More information

MATH 564/STAT 555 Applied Stochastic Processes Homework 2, September 18, 2015 Due September 30, 2015

MATH 564/STAT 555 Applied Stochastic Processes Homework 2, September 18, 2015 Due September 30, 2015 ID NAME SCORE MATH 56/STAT 555 Applied Stochastic Processes Homework 2, September 8, 205 Due September 30, 205 The generating function of a sequence a n n 0 is defined as As : a ns n for all s 0 for which

More information

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo Winter 2019 Math 106 Topics in Applied Mathematics Data-driven Uncertainty Quantification Yoonsang Lee (yoonsang.lee@dartmouth.edu) Lecture 9: Markov Chain Monte Carlo 9.1 Markov Chain A Markov Chain Monte

More information

Markov Chains. Arnoldo Frigessi Bernd Heidergott November 4, 2015

Markov Chains. Arnoldo Frigessi Bernd Heidergott November 4, 2015 Markov Chains Arnoldo Frigessi Bernd Heidergott November 4, 2015 1 Introduction Markov chains are stochastic models which play an important role in many applications in areas as diverse as biology, finance,

More information

LIMITING PROBABILITY TRANSITION MATRIX OF A CONDENSED FIBONACCI TREE

LIMITING PROBABILITY TRANSITION MATRIX OF A CONDENSED FIBONACCI TREE International Journal of Applied Mathematics Volume 31 No. 18, 41-49 ISSN: 1311-178 (printed version); ISSN: 1314-86 (on-line version) doi: http://dx.doi.org/1.173/ijam.v31i.6 LIMITING PROBABILITY TRANSITION

More information

LECTURE 15 Markov chain Monte Carlo

LECTURE 15 Markov chain Monte Carlo LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte

More information

LECTURE 3. Last time:

LECTURE 3. Last time: LECTURE 3 Last time: Mutual Information. Convexity and concavity Jensen s inequality Information Inequality Data processing theorem Fano s Inequality Lecture outline Stochastic processes, Entropy rate

More information

An Introduction to Entropy and Subshifts of. Finite Type

An Introduction to Entropy and Subshifts of. Finite Type An Introduction to Entropy and Subshifts of Finite Type Abby Pekoske Department of Mathematics Oregon State University pekoskea@math.oregonstate.edu August 4, 2015 Abstract This work gives an overview

More information

Probability & Computing

Probability & Computing Probability & Computing Stochastic Process time t {X t t 2 T } state space Ω X t 2 state x 2 discrete time: T is countable T = {0,, 2,...} discrete space: Ω is finite or countably infinite X 0,X,X 2,...

More information

MATH 56A: STOCHASTIC PROCESSES CHAPTER 1

MATH 56A: STOCHASTIC PROCESSES CHAPTER 1 MATH 56A: STOCHASTIC PROCESSES CHAPTER. Finite Markov chains For the sake of completeness of these notes I decided to write a summary of the basic concepts of finite Markov chains. The topics in this chapter

More information

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa Introduction to Search Engine Technology Introduction to Link Structure Analysis Ronny Lempel Yahoo Labs, Haifa Outline Anchor-text indexing Mathematical Background Motivation for link structure analysis

More information

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R. Ergodic Theorems Samy Tindel Purdue University Probability Theory 2 - MA 539 Taken from Probability: Theory and examples by R. Durrett Samy T. Ergodic theorems Probability Theory 1 / 92 Outline 1 Definitions

More information

MAA704, Perron-Frobenius theory and Markov chains.

MAA704, Perron-Frobenius theory and Markov chains. November 19, 2013 Lecture overview Today we will look at: Permutation and graphs. Perron frobenius for non-negative. Stochastic, and their relation to theory. Hitting and hitting probabilities of chain.

More information

Example: physical systems. If the state space. Example: speech recognition. Context can be. Example: epidemics. Suppose each infected

Example: physical systems. If the state space. Example: speech recognition. Context can be. Example: epidemics. Suppose each infected 4. Markov Chains A discrete time process {X n,n = 0,1,2,...} with discrete state space X n {0,1,2,...} is a Markov chain if it has the Markov property: P[X n+1 =j X n =i,x n 1 =i n 1,...,X 0 =i 0 ] = P[X

More information

Markov Chains (Part 3)

Markov Chains (Part 3) Markov Chains (Part 3) State Classification Markov Chains - State Classification Accessibility State j is accessible from state i if p ij (n) > for some n>=, meaning that starting at state i, there is

More information

Chapter 7. Markov chain background. 7.1 Finite state space

Chapter 7. Markov chain background. 7.1 Finite state space Chapter 7 Markov chain background A stochastic process is a family of random variables {X t } indexed by a varaible t which we will think of as time. Time can be discrete or continuous. We will only consider

More information

Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of. F s F t

Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of. F s F t 2.2 Filtrations Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of σ algebras {F t } such that F t F and F t F t+1 for all t = 0, 1,.... In continuous time, the second condition

More information

MARKOV PROCESSES. Valerio Di Valerio

MARKOV PROCESSES. Valerio Di Valerio MARKOV PROCESSES Valerio Di Valerio Stochastic Process Definition: a stochastic process is a collection of random variables {X(t)} indexed by time t T Each X(t) X is a random variable that satisfy some

More information

Recap. Probability, stochastic processes, Markov chains. ELEC-C7210 Modeling and analysis of communication networks

Recap. Probability, stochastic processes, Markov chains. ELEC-C7210 Modeling and analysis of communication networks Recap Probability, stochastic processes, Markov chains ELEC-C7210 Modeling and analysis of communication networks 1 Recap: Probability theory important distributions Discrete distributions Geometric distribution

More information

Lecture 20: Reversible Processes and Queues

Lecture 20: Reversible Processes and Queues Lecture 20: Reversible Processes and Queues 1 Examples of reversible processes 11 Birth-death processes We define two non-negative sequences birth and death rates denoted by {λ n : n N 0 } and {µ n : n

More information

Markov Processes Hamid R. Rabiee

Markov Processes Hamid R. Rabiee Markov Processes Hamid R. Rabiee Overview Markov Property Markov Chains Definition Stationary Property Paths in Markov Chains Classification of States Steady States in MCs. 2 Markov Property A discrete

More information

Stochastic Simulation

Stochastic Simulation Stochastic Simulation Ulm University Institute of Stochastics Lecture Notes Dr. Tim Brereton Summer Term 2015 Ulm, 2015 2 Contents 1 Discrete-Time Markov Chains 5 1.1 Discrete-Time Markov Chains.....................

More information

CS145: Probability & Computing Lecture 18: Discrete Markov Chains, Equilibrium Distributions

CS145: Probability & Computing Lecture 18: Discrete Markov Chains, Equilibrium Distributions CS145: Probability & Computing Lecture 18: Discrete Markov Chains, Equilibrium Distributions Instructor: Erik Sudderth Brown University Computer Science April 14, 215 Review: Discrete Markov Chains Some

More information

Lecture 12: Link Analysis for Web Retrieval

Lecture 12: Link Analysis for Web Retrieval Lecture 12: Link Analysis for Web Retrieval Trevor Cohn COMP90042, 2015, Semester 1 What we ll learn in this lecture The web as a graph Page-rank method for deriving the importance of pages Hubs and authorities

More information

Homework set 3 - Solutions

Homework set 3 - Solutions Homework set 3 - Solutions Math 495 Renato Feres Problems 1. (Text, Exercise 1.13, page 38.) Consider the Markov chain described in Exercise 1.1: The Smiths receive the paper every morning and place it

More information

ELEC633: Graphical Models

ELEC633: Graphical Models ELEC633: Graphical Models Tahira isa Saleem Scribe from 7 October 2008 References: Casella and George Exploring the Gibbs sampler (1992) Chib and Greenberg Understanding the Metropolis-Hastings algorithm

More information

Numerical methods for lattice field theory

Numerical methods for lattice field theory Numerical methods for lattice field theory Mike Peardon Trinity College Dublin August 9, 2007 Mike Peardon (Trinity College Dublin) Numerical methods for lattice field theory August 9, 2007 1 / 24 Numerical

More information

MARKOV CHAINS AND HIDDEN MARKOV MODELS

MARKOV CHAINS AND HIDDEN MARKOV MODELS MARKOV CHAINS AND HIDDEN MARKOV MODELS MERYL SEAH Abstract. This is an expository paper outlining the basics of Markov chains. We start the paper by explaining what a finite Markov chain is. Then we describe

More information

Lectures on Stochastic Stability. Sergey FOSS. Heriot-Watt University. Lecture 4. Coupling and Harris Processes

Lectures on Stochastic Stability. Sergey FOSS. Heriot-Watt University. Lecture 4. Coupling and Harris Processes Lectures on Stochastic Stability Sergey FOSS Heriot-Watt University Lecture 4 Coupling and Harris Processes 1 A simple example Consider a Markov chain X n in a countable state space S with transition probabilities

More information

Note that in the example in Lecture 1, the state Home is recurrent (and even absorbing), but all other states are transient. f ii (n) f ii = n=1 < +

Note that in the example in Lecture 1, the state Home is recurrent (and even absorbing), but all other states are transient. f ii (n) f ii = n=1 < + Random Walks: WEEK 2 Recurrence and transience Consider the event {X n = i for some n > 0} by which we mean {X = i}or{x 2 = i,x i}or{x 3 = i,x 2 i,x i},. Definition.. A state i S is recurrent if P(X n

More information

Treball final de grau GRAU DE MATEMÀTIQUES Facultat de Matemàtiques Universitat de Barcelona MARKOV CHAINS

Treball final de grau GRAU DE MATEMÀTIQUES Facultat de Matemàtiques Universitat de Barcelona MARKOV CHAINS Treball final de grau GRAU DE MATEMÀTIQUES Facultat de Matemàtiques Universitat de Barcelona MARKOV CHAINS Autor: Anna Areny Satorra Director: Dr. David Márquez Carreras Realitzat a: Departament de probabilitat,

More information

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018 Graphical Models Markov Chain Monte Carlo Inference Siamak Ravanbakhsh Winter 2018 Learning objectives Markov chains the idea behind Markov Chain Monte Carlo (MCMC) two important examples: Gibbs sampling

More information

MARKOV CHAIN MONTE CARLO

MARKOV CHAIN MONTE CARLO MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with

More information

P i [B k ] = lim. n=1 p(n) ii <. n=1. V i :=

P i [B k ] = lim. n=1 p(n) ii <. n=1. V i := 2.7. Recurrence and transience Consider a Markov chain {X n : n N 0 } on state space E with transition matrix P. Definition 2.7.1. A state i E is called recurrent if P i [X n = i for infinitely many n]

More information

Multi Stage Queuing Model in Level Dependent Quasi Birth Death Process

Multi Stage Queuing Model in Level Dependent Quasi Birth Death Process International Journal of Statistics and Systems ISSN 973-2675 Volume 12, Number 2 (217, pp. 293-31 Research India Publications http://www.ripublication.com Multi Stage Queuing Model in Level Dependent

More information

On Polynomial Cases of the Unichain Classification Problem for Markov Decision Processes

On Polynomial Cases of the Unichain Classification Problem for Markov Decision Processes On Polynomial Cases of the Unichain Classification Problem for Markov Decision Processes Eugene A. Feinberg Department of Applied Mathematics and Statistics State University of New York at Stony Brook

More information

12 Markov chains The Markov property

12 Markov chains The Markov property 12 Markov chains Summary. The chapter begins with an introduction to discrete-time Markov chains, and to the use of matrix products and linear algebra in their study. The concepts of recurrence and transience

More information

ISE/OR 760 Applied Stochastic Modeling

ISE/OR 760 Applied Stochastic Modeling ISE/OR 760 Applied Stochastic Modeling Topic 2: Discrete Time Markov Chain Yunan Liu Department of Industrial and Systems Engineering NC State University Yunan Liu (NC State University) ISE/OR 760 1 /

More information

CDA6530: Performance Models of Computers and Networks. Chapter 3: Review of Practical Stochastic Processes

CDA6530: Performance Models of Computers and Networks. Chapter 3: Review of Practical Stochastic Processes CDA6530: Performance Models of Computers and Networks Chapter 3: Review of Practical Stochastic Processes Definition Stochastic process X = {X(t), t2 T} is a collection of random variables (rvs); one rv

More information

Chapter 2: Markov Chains and Queues in Discrete Time

Chapter 2: Markov Chains and Queues in Discrete Time Chapter 2: Markov Chains and Queues in Discrete Time L. Breuer University of Kent 1 Definition Let X n with n N 0 denote random variables on a discrete space E. The sequence X = (X n : n N 0 ) is called

More information

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Course 495: Advanced Statistical Machine Learning/Pattern Recognition Course 495: Advanced Statistical Machine Learning/Pattern Recognition Lecturer: Stefanos Zafeiriou Goal (Lectures): To present discrete and continuous valued probabilistic linear dynamical systems (HMMs

More information

Markov Chains. Sarah Filippi Department of Statistics TA: Luke Kelly

Markov Chains. Sarah Filippi Department of Statistics  TA: Luke Kelly Markov Chains Sarah Filippi Department of Statistics http://www.stats.ox.ac.uk/~filippi TA: Luke Kelly With grateful acknowledgements to Prof. Yee Whye Teh's slides from 2013 14. Schedule 09:30-10:30 Lecture:

More information

Population Games and Evolutionary Dynamics

Population Games and Evolutionary Dynamics Population Games and Evolutionary Dynamics William H. Sandholm The MIT Press Cambridge, Massachusetts London, England in Brief Series Foreword Preface xvii xix 1 Introduction 1 1 Population Games 2 Population

More information

Stochastic Processes. Theory for Applications. Robert G. Gallager CAMBRIDGE UNIVERSITY PRESS

Stochastic Processes. Theory for Applications. Robert G. Gallager CAMBRIDGE UNIVERSITY PRESS Stochastic Processes Theory for Applications Robert G. Gallager CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv Swgg&sfzoMj ybr zmjfr%cforj owf fmdy xix Acknowledgements xxi 1 Introduction and review

More information

Stochastic modelling of epidemic spread

Stochastic modelling of epidemic spread Stochastic modelling of epidemic spread Julien Arino Centre for Research on Inner City Health St Michael s Hospital Toronto On leave from Department of Mathematics University of Manitoba Julien Arino@umanitoba.ca

More information

Non-homogeneous random walks on a semi-infinite strip

Non-homogeneous random walks on a semi-infinite strip Non-homogeneous random walks on a semi-infinite strip Chak Hei Lo Joint work with Andrew R. Wade World Congress in Probability and Statistics 11th July, 2016 Outline Motivation: Lamperti s problem Our

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Mark Schmidt University of British Columbia Winter 2019 Last Time: Monte Carlo Methods If we want to approximate expectations of random functions, E[g(x)] = g(x)p(x) or E[g(x)]

More information

Markov processes and queueing networks

Markov processes and queueing networks Inria September 22, 2015 Outline Poisson processes Markov jump processes Some queueing networks The Poisson distribution (Siméon-Denis Poisson, 1781-1840) { } e λ λ n n! As prevalent as Gaussian distribution

More information

1 Random walks: an introduction

1 Random walks: an introduction Random Walks: WEEK Random walks: an introduction. Simple random walks on Z.. Definitions Let (ξ n, n ) be i.i.d. (independent and identically distributed) random variables such that P(ξ n = +) = p and

More information

1.2. Markov Chains. Before we define Markov process, we must define stochastic processes.

1.2. Markov Chains. Before we define Markov process, we must define stochastic processes. 1. LECTURE 1: APRIL 3, 2012 1.1. Motivating Remarks: Differential Equations. In the deterministic world, a standard tool used for modeling the evolution of a system is a differential equation. Such an

More information

Math Homework 5 Solutions

Math Homework 5 Solutions Math 45 - Homework 5 Solutions. Exercise.3., textbook. The stochastic matrix for the gambler problem has the following form, where the states are ordered as (,, 4, 6, 8, ): P = The corresponding diagram

More information

Lecture 7. We can regard (p(i, j)) as defining a (maybe infinite) matrix P. Then a basic fact is

Lecture 7. We can regard (p(i, j)) as defining a (maybe infinite) matrix P. Then a basic fact is MARKOV CHAINS What I will talk about in class is pretty close to Durrett Chapter 5 sections 1-5. We stick to the countable state case, except where otherwise mentioned. Lecture 7. We can regard (p(i, j))

More information

Convex Optimization of Graph Laplacian Eigenvalues

Convex Optimization of Graph Laplacian Eigenvalues Convex Optimization of Graph Laplacian Eigenvalues Stephen Boyd Stanford University (Joint work with Persi Diaconis, Arpita Ghosh, Seung-Jean Kim, Sanjay Lall, Pablo Parrilo, Amin Saberi, Jun Sun, Lin

More information

Social network analysis: social learning

Social network analysis: social learning Social network analysis: social learning Donglei Du (ddu@unb.edu) Faculty of Business Administration, University of New Brunswick, NB Canada Fredericton E3B 9Y2 October 20, 2016 Donglei Du (UNB) AlgoTrading

More information

TCOM 501: Networking Theory & Fundamentals. Lecture 6 February 19, 2003 Prof. Yannis A. Korilis

TCOM 501: Networking Theory & Fundamentals. Lecture 6 February 19, 2003 Prof. Yannis A. Korilis TCOM 50: Networking Theory & Fundamentals Lecture 6 February 9, 003 Prof. Yannis A. Korilis 6- Topics Time-Reversal of Markov Chains Reversibility Truncating a Reversible Markov Chain Burke s Theorem Queues

More information

Language Acquisition and Parameters: Part II

Language Acquisition and Parameters: Part II Language Acquisition and Parameters: Part II Matilde Marcolli CS0: Mathematical and Computational Linguistics Winter 205 Transition Matrices in the Markov Chain Model absorbing states correspond to local

More information

CDA5530: Performance Models of Computers and Networks. Chapter 3: Review of Practical

CDA5530: Performance Models of Computers and Networks. Chapter 3: Review of Practical CDA5530: Performance Models of Computers and Networks Chapter 3: Review of Practical Stochastic Processes Definition Stochastic ti process X = {X(t), t T} is a collection of random variables (rvs); one

More information

DISCRETE STOCHASTIC PROCESSES Draft of 2nd Edition

DISCRETE STOCHASTIC PROCESSES Draft of 2nd Edition DISCRETE STOCHASTIC PROCESSES Draft of 2nd Edition R. G. Gallager January 31, 2011 i ii Preface These notes are a draft of a major rewrite of a text [9] of the same name. The notes and the text are outgrowths

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Chapter 5 Markov Chain Monte Carlo MCMC is a kind of improvement of the Monte Carlo method By sampling from a Markov chain whose stationary distribution is the desired sampling distributuion, it is possible

More information

The Transition Probability Function P ij (t)

The Transition Probability Function P ij (t) The Transition Probability Function P ij (t) Consider a continuous time Markov chain {X(t), t 0}. We are interested in the probability that in t time units the process will be in state j, given that it

More information

INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING

INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING ERIC SHANG Abstract. This paper provides an introduction to Markov chains and their basic classifications and interesting properties. After establishing

More information

Monte Carlo Methods. Leon Gu CSD, CMU

Monte Carlo Methods. Leon Gu CSD, CMU Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte

More information

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Eric Slud, Statistics Program Lecture 1: Metropolis-Hastings Algorithm, plus background in Simulation and Markov Chains. Lecture

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Mark Schmidt University of British Columbia Winter 2018 Last Time: Monte Carlo Methods If we want to approximate expectations of random functions, E[g(x)] = g(x)p(x) or E[g(x)]

More information

Zdzis law Brzeźniak and Tomasz Zastawniak

Zdzis law Brzeźniak and Tomasz Zastawniak Basic Stochastic Processes by Zdzis law Brzeźniak and Tomasz Zastawniak Springer-Verlag, London 1999 Corrections in the 2nd printing Version: 21 May 2005 Page and line numbers refer to the 2nd printing

More information

On asymptotic behavior of a finite Markov chain

On asymptotic behavior of a finite Markov chain 1 On asymptotic behavior of a finite Markov chain Alina Nicolae Department of Mathematical Analysis Probability. University Transilvania of Braşov. Romania. Keywords: convergence, weak ergodicity, strong

More information

MATH 56A: STOCHASTIC PROCESSES CHAPTER 2

MATH 56A: STOCHASTIC PROCESSES CHAPTER 2 MATH 56A: STOCHASTIC PROCESSES CHAPTER 2 2. Countable Markov Chains I started Chapter 2 which talks about Markov chains with a countably infinite number of states. I did my favorite example which is on

More information

Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321

Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321 Lecture 11: Introduction to Markov Chains Copyright G. Caire (Sample Lectures) 321 Discrete-time random processes A sequence of RVs indexed by a variable n 2 {0, 1, 2,...} forms a discretetime random process

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

CONTENTS. Preface List of Symbols and Notation

CONTENTS. Preface List of Symbols and Notation CONTENTS Preface List of Symbols and Notation xi xv 1 Introduction and Review 1 1.1 Deterministic and Stochastic Models 1 1.2 What is a Stochastic Process? 5 1.3 Monte Carlo Simulation 10 1.4 Conditional

More information