The Theory behind PageRank Mauro Sozio Telecom ParisTech May 21, 2014 Mauro Sozio (LTCI TPT) The Theory behind PageRank May 21, 2014 1 / 19
A Crash Course on Discrete Probability Events and Probability Consider a stochastic process (e.g. throw a dice, pick a card from a deck) Each possible outcome is a simple event. The sample space Ω is the set of all possible simple events. An event is a set of simple events (a subset of the sample space). With each simple event E we associate a real number 0 P(E) 1, which is the probability that event E happens. Mauro Sozio (LTCI TPT) The Theory behind PageRank May 21, 2014 2 / 19
A Crash Course on Discrete Probability Probability Space Definition A probability space has three components: A sample space Ω, which is the set of all possible outcomes of the random process modeled by the probability space; A family of sets F representing the allowable events, where each set in F is a subset of the sample space in Ω; a probability function P : F R, satisfying the definition below (next slide). Mauro Sozio (LTCI TPT) The Theory behind PageRank May 21, 2014 3 / 19
A Crash Course on Discrete Probability Probability Function Definition A probability function is any function P : F R that satisfies the following conditions: for any event E, 0 P(E) 1; P(Ω) = 1; for any finite or countably infinite sequence of pairwise mutually disjoint events E 1, E 2, E 3,... P ( i 1 E i ) = i 1 P(E i ). (1) Mauro Sozio (LTCI TPT) The Theory behind PageRank May 21, 2014 4 / 19
A Crash Course on Discrete Probability The Union Bound Theorem P( n i=1e i n P(E i )). (2) i=1 Example: roll a dice: let E 1 = result is odd let E 2 = result is 2 Mauro Sozio (LTCI TPT) The Theory behind PageRank May 21, 2014 5 / 19
A Crash Course on Discrete Probability Independent Events Definition Two events E 1 and E 2 are independent if and only if P(E 1 E 2 ) = P(E 1 ) P(E 2 ) (3) Mauro Sozio (LTCI TPT) The Theory behind PageRank May 21, 2014 6 / 19
A Crash Course on Discrete Probability Conditional Probability: Example What is the probability that a random student at Telecom ParisTech was born in Paris? E 1 = the event born in Paris. E 2 = the event student at Telecom ParisTech. The conditional probability that a a student at Telecom ParisTech was born in Paris is written: P(E 1 E 2 ). Mauro Sozio (LTCI TPT) The Theory behind PageRank May 21, 2014 7 / 19
A Crash Course on Discrete Probability Conditional Probability: Definition Definition The conditional probability that event E 1 occurs given that event E 2 occurs is P(E 1 E 2 ) = P(E 1 E 2 ) P(E 2 ) The conditional probability is only well-defined if P(E 2 ) > 0. (4) By conditioning on E 2 we restrict the sample space to the set E 2. Mauro Sozio (LTCI TPT) The Theory behind PageRank May 21, 2014 8 / 19
A Crash Course on Discrete Probability Random Variable Definition A random variable X on a sample space Ω is a function on Ω; that is, X : Ω R. A discrete random variable is a random variable that takes on only a finite number of values. Mauro Sozio (LTCI TPT) The Theory behind PageRank May 21, 2014 9 / 19
A Crash Course on Discrete Probability Examples In practice, a random variable is some random quantity that we are interested in: I roll a die, X = result. E.g. X = 6. I pick a card, X = 1 if card is an Ace, 0 otherwise. I roll a dice two times. X 1 = result of the first experiment, X 2 = result of the second experiment. What is P(X 1 + X 2 = 2)? Mauro Sozio (LTCI TPT) The Theory behind PageRank May 21, 2014 10 / 19
Stochastic Processes Stochastic Processes Definition A stochastic process in discrete time n N is a sequence of random variables X 0, X 1, X 2... denoted by X = {X n }. We refer to the value X n as the state of the process at time n, with X 0 denoting the initial state. The set of possible values that each random variable can take is denoted by S. Here, we shall assume that S is finite and S N. Mauro Sozio (LTCI TPT) The Theory behind PageRank May 21, 2014 11 / 19
Markov Chains Markov Chains Definition A stochastic process {X n } is called a Markov chain if for any n 0 and any value j 0, j 1,..., i, j S, P(X n+1 = i X n = j, X n 1 = j n 1,..., X 0 = j 0 ) = P(X n+1 = i X n = j), which we denote by P ij. This can be stated as the future is independent of the past given the present state. In other words, the probability of moving to the next state does not depend on what happened in the past. Note that P ij P ji. Mauro Sozio (LTCI TPT) The Theory behind PageRank May 21, 2014 12 / 19
Markov Chains One-step Transition Matrix P ij denotes the probability that the chain, whenever in state j, moves next into state i. The square matrix P = (P ij ), i, j S, is called the one-step transition matrix. Note that for each j S we have: P ij = 1. (5) i S Mauro Sozio (LTCI TPT) The Theory behind PageRank May 21, 2014 13 / 19
Markov Chains n-step Transition Matrix The n-step transition matrix P (n), n 1, where P n ij = P(X n = i X 0 = j) = P(X m+n = i X m = j), m (6) denotes the probability that n steps later the Markov chain will be in state i given that at step m is in state j. Theorem P (n) = P n = P P P, n 1. Mauro Sozio (LTCI TPT) The Theory behind PageRank May 21, 2014 14 / 19
Markov Chains Definition A Markov chain is called irreducible a iff for any i, j S, there is n 1 such that: a definition is different when S is not finite. P n ij > 0. (7) In other words, the chain is able to move from any state i to any state j (in one or more steps). As a result, if a Markov chain is irreducible then there must be n such that P n ii > 0. Mauro Sozio (LTCI TPT) The Theory behind PageRank May 21, 2014 15 / 19
Markov Chains Aperiodicity A state i has period k if any return to i occurs at step k l, for some l > 0. Formally, k = gcd{n : P(X n = i x 0 = i) > 0, } (8) where gcd denotes the greatest common divisor. If k = 1 then state i is said to be aperiodic. Definition A Markov chain is called aperiodic if every state is aperiodic. Mauro Sozio (LTCI TPT) The Theory behind PageRank May 21, 2014 16 / 19
Markov Chains Stationary Distribution Definition A probability distribution π over the states of the Markov chain ( j S π j = 1) is called a stationary distribution if π = πp. (9) Mauro Sozio (LTCI TPT) The Theory behind PageRank May 21, 2014 17 / 19
Markov Chains Main Theorem Theorem If a Markov chain is irreducible and aperiodic a, then a stationary distribution π exists and is unique. Moreover, the Markov chain converges to its stationary distribution, that is, π j = lim n P(X n = j) = P(X n = j X 0 = i), i, j S. (10) a in this case the Markov chain is called ergodic Note that Equation (10) holds regardless the initial state i of the Markov chain. Mauro Sozio (LTCI TPT) The Theory behind PageRank May 21, 2014 18 / 19
Markov Chains Markov chains and the Random Surfer Consider the Markov chain obtained from the web graph, no rand. jumps. Is it irreducible? Mauro Sozio (LTCI TPT) The Theory behind PageRank May 21, 2014 19 / 19
Markov Chains Markov chains and the Random Surfer Consider the Markov chain obtained from the web graph, no rand. jumps. Is it irreducible? Is it aperiodic? Mauro Sozio (LTCI TPT) The Theory behind PageRank May 21, 2014 19 / 19
Markov Chains Markov chains and the Random Surfer Consider the Markov chain obtained from the web graph, no rand. jumps. Is it irreducible? Is it aperiodic? Spider traps? Other questions: What if we add random jumps? Mauro Sozio (LTCI TPT) The Theory behind PageRank May 21, 2014 19 / 19
Markov Chains Markov chains and the Random Surfer Consider the Markov chain obtained from the web graph, no rand. jumps. Is it irreducible? Is it aperiodic? Spider traps? Other questions: What if we add random jumps? Can we compute the probability distribution that an ergodic Markov chain converges to? How? Mauro Sozio (LTCI TPT) The Theory behind PageRank May 21, 2014 19 / 19