Stochastic Models in Computer Science A Tutorial

Stochastic Models in Computer Science A Tutorial Dr. Snehanshu Saha Department of Computer Science PESIT BSC, Bengaluru WCI 2015 - August 10 to August 13

1 Introduction 2 Random Variable 3 Introduction to Probability 4 Probability Distributions - The Motivation 5 Probability Distributions - Explained 6 Applications / Properties 7 Random Graphs 8 Inequalities 9 Queuing Theory 10 Markov Chains

Probability Models in Computer Science

Applications Machine Learning Randomized Algorithms Computer Graphics Medical Image Analysis Big Data Analytics Speech Recognition Systems Wireless Communication Communication Network

Randomness

What is a Random Variable?

Axioms of Probability Consider a Discrete Random Variable x 1, x 2, x 3,..., x n be the set of possible outcomes. 1 P(x i ) 0 ; i = 1, 2,..., n 2 i P(x i ) = 1

Coin Flipping - example A fair coin flipped twice Sample Space: {HH, HT, TH, TT } K : number of heads occurring K P(K ) 0 1 4 1 1 2 2 1 4

Birth of Statistics - The Battle of Plataea 5th Century BC, Plataea - a city owned by the Persians Defeated by a Greek alliance The wall of Plataea and its height PC: sikyon.com

Applications of several Distributions What is a Distribution? A Distribution is a Trend, a Dimension, a Fitting of data or just a Probability!!! Applications of Distributions Apartment Service Management Problem The Last Mile Postman (Poisson) Errors in Astronomical Observations (Gaussian)

Gaussian What do we observe about the Gaussian? Symmetry about the y-axis mean forms the axis of the symmetry maximum value is obtained at the mean Gaussian Distribution Given a random variable X, mean µ and standard deviation σ z = X µ σ

Binomial Distribution Length of the sequence of event: FIXED Out of each event: Exactly TWO (Success or Failure) Find the probability of obtaining a sequence with k successes and (n-k) failures Binomial Distribution P(K ) = ( n k )θk (1 θ) n k, k = 0, 1, 2,..., n Explain coin flipping problem in light of Binomial Distribution

Cumulative Distribution Function Properties of CDF (Continuous Random Variables) F x (x) = P(X x), < x < 1 0 F x (x) 1 2 F x (x 1 ) F x (x 2 ), if x 1 < x 2 3 lim x F x (x) = 1 4 P(a < x b) = F x (b) F x (a) 1 P(x > a) = 1 F x (a) 2 P(x < b) = F x (b) The CDF F x (x) is the area in the histogram up to x

Probability Mass Function Properties of PMF (Discrete Random Variables) 1 0 P(K ) 1 2 n i P(K i ) = 1 3 F x (x i ) F x (x i 1 ) = P(x x i ) P(x x i 1 ) = P(x = x i ) = p x (x) A Histogram is the graph of the Probability Mass Function. The total area under the graph = 1

Probability Density Function Continuous Random Variable and the PDF a) φ(x w i ) = A i e x a i b i + p(x w i)d x = 1 b) d(x) = p(x w 1) p(x w 2 )

Geometric Distribution Consider a sequence of Independent trails, each which is a success with a probability p, 0 < p < 1, or a failure with a probability 1 p. If X represents the trail # of the first success, then X is said to be a Geometric Random Variable having parameter p. Example Let N=# of packets transmitted until first success. P(N = n) = q n 1 (1 q), n = 1, 2, 3,... E (N = n) = n=1 nq n 1 (1 q) = 1 (1 q)

Cauchy Distribution Weird Distribution (Ex: DAX, German Stock Exchange) b π PDF f X (x) = ; symmetric about zero b 2 +x 2 Mean E (X ) = + xf X (x)dx = + Variance = + b π b 2 +x 2 dx bx 2 π b 2 +x 2 dx = undefined Variance is infinite

Exponential Distribution Properties f X (x) = λe λx, x 0 E (X ) = µ = + λxe λx dx = 1 λ 1 λ rate, As λ, E(X) 0 2 Var(X ) = E (X 2 ) (E (X )) 2 3 Expected time to arrive is inversely proportional to the arrival rate Stochastic equivalent of Laws of Motion

Poisson Distribution Properties A sequence of independent trials of a random experiment, the sample space of which has two outcomes, success and failure, is called a Poisson sequence of trials if the probability of success is not constant but varies from one trial to another. f X (x) = e λ. λ K! where λ is the only parameter of Poisson Distribution Limiting Case of Binomial Distribution f x (x) = ( n K )pk q (n K) Binomial Distribution Set p = λ n, λ > 0 p varies from one trial to another n, p 0 P( K successes) = ( n K )pk (1 p) n K e λ. λk K!

Poisson Distribution contd. Note Events where probability of success is small and the number of trials is large such that n p = λ is of moderate magnitude, can be modeled by P(K ) e λ. λk K! Poisson is the limiting case of Binomial Distribution Reproductive property of the Poisson Distribution

Memoryless Property P(X s + t X s) = P(X t) where X is the amount of time of waiting from a given point, say 0 Conditional Probability that the time to receive a call after s + t th amount of time = Prob (You received the call after the t th amount of time) Not dependent on the fact that you have already waited for an s th amount of time!!! No Memory of s Memoryless

Applications / Properties Jensen s Inequality For any convex function g(x ), and any random variable X; E [g(x )] g(e [X ]) Convex Function 0 α 1, any x 0 < x 1 g(αx 0 + (1 α)x 1 ) α.g(x 0 ) + (1 α).g(x 1 )... g is Convex Can we apply this property repeatedly? g( n i=0 α i x i n i=0 α i g(x i )... as long as x i < x i+1, ; i = 0, 1,...

Why Random Graphs? Telephone Internet Social Network Power Grid WSN MANET

Random Graph Undirected Graph: G (V, E ) where V : Set of Vertices and E : Set of Edges Path: A sequences of edges where {(i 0, i 1 ), (i 1, i 2 ),,..., (i k 1, i k ), (i k, j)} are distinct edges, is called a path from vertex i to vertex j

Preliminaries Clique Clique in an undirected graph is a subset of its vertices such that every two vertices in the subset are connected by an edge. Sub graphs in Social Networks denotes a set of people who know each other Maximum Clique Prob( clique of size K G(n, p)) ( n K )p(k 2 ) Independent Set Vertex Cover: Minimum number of vertices to cover all edges

Random Graphs Description Let G = (V, E );where V = {1, 2,..., n}and E = {(i, X (i))} n i=1 X (i) are independent Random Variables P{X (i) = j} = P j, n j=1 P j = 1 where (i, X (i)) an edge that starts from vertex i; i = 1, 2,..., n i.e. from each vertex, another vertex is randomly chosen and joined by an edge, according to the probabilities P j Probability Space Define a new Probability Space G(n, p) Ω = Ω E Sample Space is a string of length ( n 2 ) x : {E } {0, 1}= whether an ith edge belongs to the edge set Ω

Random Graphs - Problems Map Coloring Problem Influence Modeling Problem Bio-informatics Application Party Problem Community Clustering Problem

Ramsey Number With a reasonably large number of vertices n, you will always find a complete graph of r vertices; if not; an independent set of r vertices Ramsey Number= R(r) denotes how big the graph needs to be in order to get a clique or an independent set of r vertices R(1) = 1 ;R(2) = 2 R(3) = 6 i.e. with 6 vertices, you will find a complete graph of 3 vertices OR an independent set of 3 vertices What is R(r) = K? How small should the Ramsey number be, K min in R(r) = K? Bounds of Ramsey 2 r 2 < R(r) 2 2r 3 As long as R(r) = 2 r 2, it is impossible to find a graph which has either K clique or K independent set

Inequalities Markov s Inequality Gives an upper bound for the percent of distribution that is above a particular value What is the probability that the value of a R.V. X is far from its expectation? Definition If X is a non-negative R.V., then for any c > 0, then P{X c} E[X] c Example The average height of a kid is 4 ft. What is the probability of finding a kid whose height is greater than 6 ft? How about greater than 7 ft?

Application of Markov s Inequality in Delay Estimation Table: Avg. Delay per Job Request Figure: Average Delay Comparison

Application of Markov s Inequality in Delay Estimation contd. Table: Total Delay per Job Request Figure: Total Delay Comparison

Inequalities contd. Chebyshev s Inequality Let X be a R.V. (not necessarily non-negative), c > 0, then P( X E [X ] c) Var(X) c 2 Boole s Inequality P( n i=1 A i ) n i=1 P(A i ), where {A i } n i=1 set of events

Inequalities contd. Chernoff Bound φ(t) = E [e tx ]; X R.V. then for any c > 0 P{X c} e tc φ(t); t > 0 P{X c} e tc φ(t); t < 0 Jensen s Inequality of Expectations If f is a convex function E [f (x)] f (E [X ]) ; provided that the expectation exists

Queuing Theory - An Introduction λ: Arrival rate µ: Departure rate p n (t): State Transition Probabilities, i.e. at time t, there are n customers in the system p 1 (t + t) = λ t p 0 (t) + p 1 (t) λp 1 (t) µ t p 1 (t) + µ t p 2 (t) + 0( t)

Queuing Theory contd. Based on Memoryless property as t 0: p 0 (t + t) = (1 λ t)p 0 (t) + µ tp 1 (t) + 0( t) p 1 (t + t) = λ tp n 1 (t) + (1 (λ µ)) tp n (t) + µ tp n+1 (t) + 0( t)

Key terminologies Memoryless Property No need to remember when the last customer arrived Push the limit t 0, so that there is practically little or no difference between t and t + t Arrivals/Departures can t be tracked between time intervals Memoryless Steady State Probability p n=limt P{X(t)=n}, n=0,1,2,... Limiting or long-run probability that there will be exactly n customers in the system If p 0 = 0.3, then, in the long run, the system is empty of customers 30% of the time.

Key terminologies contd. Service Distribution (Ex: FCFS) Arrival one at a time, Depart one at a time Customers are serviced in the order they arrived Traffic Intensity Exponential inter-arrival times with mean = 1 λ times with mean = 1 µ ρ = λ µ < 1 with ρ < 1 and service Rate Equality State 0: rate at which process leaves = λp 0 and rate at which the process arrives = µp 1 Balance Equations (Job Flow) i.e. λp 0 = µp 1

Key terminologies contd. Limiting Behavior of the System Steady State Probabilities can be computed as: 1 0 = λp 0 + µp 1 2 0 = λp n 1 (λ + µ)p n + µp n+1 3 p n also satisfies n=0 p n = 1 Solutions to the System Prob( n jobs in the system): p n = (1 ρ)ρ n ; n = 0, 1, 2,... Expected # jobs in the system = L s ρ 1 ρ Expected # of jobs in the queue = L q = L s ρ Expected waiting time in the system W s = L s λ Expected waiting time in the queue W q = L q λ

Types of Queues M/M/1 Queue M: Memoryless arrival time (Poisson) M: Memoryless service time (Exponential) 1: One Server M/G/1 Queue M: Memoryless arrival time (Poisson) G: General service time 1: One Server

Markov Chains - An Introduction What is a Markov Chain? Mathematical model of a random phenomenon evolving with time such that the past affects the future only through the present Example P{X n+1 = j X n = i, X n 1 = i 1,..., X 0 = i 0 } = P i,j for all states i 0, i 1,..., i n 1, i, j and for all n 0 For a Markov Chain, the conditional distribution of any future state X n+1, given the past states X 0, X 1,..., X n 1 and the present state X n, is independent of the past states and depends only on the present state X n

Examples Mouse in a Cage Bank Account Simple random walk (Drunkard s Walk) Simple Random Walk (Drunkard s Walk) in a city Actuarial Chains

Chapman-Kolmogorov s Equations Defines n step transition probabilities P n ij to be the probability that a process in state i will be in state j after n additional transitions. That is: P n ij = P{X n+k = j X k = i}, n 0, i, j 0 P 1 ij = P ij n step transition probabilities p n+m ij = k=0 p n ik.pm kj for all n, m 0, all i, j

Thank you THANK YOU! Acknowledge Sara Punagin, M.Tech student of PESIT Bangalore South Campus, for help in preparing the slides.