Markov Chains. Camille Crumpton, Sisi Xiong, Tyler McDaniel

Size: px

Start display at page:

Download "Markov Chains. Camille Crumpton, Sisi Xiong, Tyler McDaniel"

Aldous Bond
6 years ago
Views:

1 Markov Chains Camille Crumpton, Sisi Xiong, Tyler McDaniel

2 Questions What is the matrix A given the following graph when calculating PageRank according to the simple definition of PageRank? What is the first name of the man credited for Markov Chains? 2 What is one application that you have used personally of Markov Chains?

3 Tyler McDaniel From Asheville, NC CS PhD student; work on high-perf dense linear algebra libraries at ICL Prior work with graph algorithms in the financial sector Also worked in microcontrollers/iot

4 Sisi Xiong Changde, Hunan, China PhD student majoring in Electrical Engineering. Research interests: processing large datasets using probabilistic approaches: Bloom filters

5 Camille Crumpton Hometown: Knoxville/Maryville, TN

6 Camille Crumpton Computer Science graduate student Harpist Endurance athlete

7 Outline Overview of Markov Chains History of Markov Chains PageRank Most Famous Use of Markov Chains Algorithm Implementation Experiments Text Algorithm Algorithm Implementation Applications Fun uses! Experiments More applications Population processes Adaptive cruise control Open Problem Discussion

Overview of Markov Chains What is a Markov Chain? A discrete-time stochastic process that satisfies the Markov property Then what is the Markov property?

8 Overview of Markov Chains What is a Markov Chain? A discrete-time stochastic process that satisfies the Markov property Then what is the Markov property? If one can make predictions about the future of the process with only having the knowledge of the present state (no knowledge of past states) Sometimes called memoryless property

Overview of Markov Chains Often represented as a graph with probabilities as weights In the example to the right, each edge weight represents the

9 Overview of Markov Chains Often represented as a graph with probabilities as weights In the example to the right, each edge weight represents the probability of the Markov process changing from one state to another Can think of a Markov Chain as a stochastic (probability-driven) finite state machine

10 Overview of Markov Chains

11 Overview of Markov Chains Can also be represented as a transition matrix

12 Overview of Markov Chains: Simple Example Let s use a Markov Chain to make a simple prediction of the weather: Raining today Sunny today 40% rain tomorrow 60% sunny tomorrow 20% rain tomorrow 80% sunny tomorrow

13 Overview of Markov Chains: Simple Example

14 Overview of Markov Chains: More Complexity

15 History behind Markov Chains

16 Andrey Andreyevich Markov Born: June 1856 Death: July 1922 (66 years old) Russian mathematician Alma mater: St. Petersburg University Asked to stay at St. Petersburg University as a researcher upon graduation Studied stochastic processes

17 Markov Chain Beginnings: Probability + Poetry Andrey Markov spent hours perusing through Alexander Pushkin s novel in verse Eugene Onegin Discovered patterns in certain letters following other letters Created an analysis of vowel and consonant patterns the precursor to Markov Chains

18 Markov Chain Beginnings: Probability + Poetry If the current letter I m reading is a vowel, what is the probability that the next letter I m reading is a vowel? A consonant? What about three letters later? Ten letters later?

19 Markov Chain Beginnings: Probability + Poetry "My uncle's shown his good intentions By falling desperately ill; His worth is proved; of all inventions Where will you find one better still? He's an example, I'm averring; But, God, what boredom -- there, unstirring, By day, by night, thus to be bid To sit beside an invalid! Low cunning must assist devotion To one who is but half-alive: You puff his pillow and contrive Amusement while you mix his potion; You sign, and think with furrowed brow -- 'Why can't the devil take you now?' "

20 Markov Chain Beginnings: Probability + Poetry Andrey Markov created a new branch of probability with his findings, now known as Markov chains Extended probability beyond coin flipping & dice rolling (independent events) to chains of linked events (what happens next depends on current state of the system)

21 Markov Chain Beginnings: Probability + Poetry

22 Markov Chains: Two Algorithms/Applications PageRank Application we use it everyday! Algorithm Implementation Experiments Markov Chain Text Algorithm (a.k.a. Markov Chain Algorithm) Algorithm Implementation Applications Experiment

23 PageRank: Introduction Inputs A bunch of webpages (vertices), each of which has links to other webpages(edges). C C Directed graph! A B D E C D E

PageRank: Introduction Purpose Rank all webpages in terms of importance. How to define importance Analogy: citation, papers with more citations are more important.

24 PageRank: Introduction Purpose Rank all webpages in terms of importance. How to define importance Analogy: citation, papers with more citations are more important. Option: count how many backlinks a webpage has. Caveat: if a page has a backlink from an important page, it also should be somewhat important. Weighted directed graph!

25 PageRank: Introduction Original Paper Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the web. Stanford Info Lab. PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites.* *

26 PageRank: Definition Intuitively A webpage has high rank if the sum of the ranks of its backlinks is large. Simple definition u: a webpage R(u): rank of u F(u): the set of webpages that u points to. B(u): the set of webpages that point to u. N(u): the number of links from u (forward links). c: a factor for normalization, i.e., a constant. R u = c v B u R(v) N v

27 PageRank: Definition R u = c v B u R(v) N v Construct a n by n matrix A: A u,v = 1/N v 0, if there is an edge from v to u, otherwise R = [R(1), R(2),, R(n)] R = car R is an eigenvalue of A with eigenvalue c.

28 PageRank: Example A u,v = 1/N v 0, if there is an edge from v to u, otherwise 1 3 R 0 = 1 R 3 R 1 = 1 2 R 0 R 2 = 1 2 R R 1 R 3 = 1 2 R R R 0 R 1 R 2 R 3 = / /2 1/ /2 1 0 R 0 R 1 R 2 R 3

29 PageRank: Rank sink issue Loop (u, v) accumulates rank, but never distributes rank (no forward links). u v w Modified PageRank E(u) is some vector that is considered as a source of rank R u = c v B u R (v) + ce(u) N V

30 PageRank: Random Surfer Model A random surfer keeps clicking on successive links randomly. If entering a rank sink, the surfer gets bored and jumps to a random page. R u = c v B u R (v) N V + 1 c N, c = 0.85 u v x w

31 PageRank: Markov chain R 0 R 1 R 2 R 3 = / /2 1/ /2 1 0 R 0 R 1 R 2 R 3 The sum of values in each column in A is 1. A is a Markov matrix! All probability is non-negative, it s guaranteed there is a steady-state vector. PageRank always has a steady state.

32 PageRank: Convergence properties Experiments on a database based on 322 million links 52 iterations. Scale very well.

33 PageRank: Implementation R(0) = [1/N, 1/N,, 1/N] e = delta = 0 while true R i+1 = 0.85AR i N delta = R i+1 R i 1 if delta < e break R i = R i+1 return R i

34 PageRank: Experiment configuration Datasets Test graphs from homework 2. Several graphs from Metrics Convergence speed, in terms of iterations. PageRank results: sparse vs dense graphs. PageRank results based on real graphs.

35 PageRank: Experiment results E V Observation: 1. Results of all graphs converge in less than 20 iterations. 2. No obvious pattern between convergence speed and graph size Iterations to convergence of all test graphs from hw2

36 PageRank: Experiment results PageRank results comparison between dense and sparse graphs Observation: PageRank of results of dense graphs are more spread out.

37 PageRank: Experiment results Why there is a jump there? Suppose if a webpage (vertex) has no backlinks (no webpage has links to it), the only rank source comes from random jumps, which is a relatively small probability. Hypothesis: If a vertex has no backlinks, its page rank is the smallest. Experiment results: matching! Graph Random txt Random txt Number of webpages which have smallest PageRank Number of webpages which have no backlinks

38 PageRank: Experiment results Number of vertices which have no backlinks = 4734 Number of vertices which have no backlinks = 4590 Wikipedia administer vote graph V = 7,115, E = 103,689 iterations = 17 Citation graph from the e-print arxiv V = 27,770, E = 352,807 iterations = 17

39 Next Algorithm/Application! Markov Chain Text

40 Markov Chain Text (Intro) Process input text, analyzing letter/word (token) probabilities Create graph (transition matrix) to represent those probabilities Generate similar text using graph Order: how many prior tokens to use when generating next token Example: 1.33 Input: Foo bar foo bar bar Foo Bar \n.33.33

41 Markov Chain Text (Algorithm) Simple (bare bones!) pseudocode for order 1: PROCESS: Create dictionary For line in input: For token_1, token_2 in line: Add to dictionary -> {token_1, token_2} GENERATE: Choose first word from dictionary -> token_1 For each desired word in output: Lookup token_2 in dictionary using token_1 token_2 -> token_1

42 Markov Chain Text (Implementation) Store probabilities for each word based on order; use transition matrix rather than dictionary. Gather metadata and use parsing rules to format output correctly (punctuation, titles before names, etc.) May be complemented by other methods n-gram methods for partial words, fat-finger grace techniques

43 Markov Chain Text (Applications) One method used in autocomplete software Also used in some bots

44 Applications Autocomplete on Phones Alternative caption: Although the Markov-chain text model is still rudimentary, it recently gave me Massachusetts Institute of America. Although I have to admit it sounds prestigious.

45 Applications Autocomplete on Phones Most text messaging applications use a Markov Chain model to predict what word you wish to type next, based on the last word.

46 Applications Subreddit generation SubredditSimulator/comme nts/3g9ioz/what_is_rsubre dditsimulator/ Fully-automated subreddit that generates random submissions and comments using Markov chains

47 Applications Subreddit generation Explanation of the Markov Chain on Reddit: Basically, you feed in a bunch of sentences, and even though it has no understanding of the meaning of the text, it picks up on patterns like "word A is often followed by word B". Then when you want to generate a new sentence, it "walks" its way through from the start of a sentence to the end of one, picking sequences of words that it knows are valid based on that initial analysis. So generally short sequences of words in the generated sentences will make sense, but often not the whole thing.

48 Markov Chain Text (Experiment) Writing a sonnet using Markov chain generated using Billy Shakespeare s extant sonnets: C++ (Processed:.05 seconds, generated 100 lines:.91 seconds) My days are In thy wrong, that beauty tempting her pretty wrongs that thereby beauty's form in this. Black lines of that I better state out of thine in their wills count the lesson true, the other mine, thou live twice in lease find true that wild music sadly? sweets dost deceive.

49 Markov Chain Text (Experiment) Writing a sonnet using Markov chain generated using Billy Shakespeare s extant sonnets: Python (Processed:.51 seconds, generated 100 lines:.71 seconds) Hence, thou wouldst thou thy might To bear greater wrong, than my invention quite Dulling my friend O for thy soul's imaginary sight Presents thy dial's shady stealth mayst in their virtue answer Muse, Make answer, this fair were born And dost beguile the conquest of heaven it hath masked not false to ruminate That on the world, unbless some other write to thy monument When that thou age black ink my transgression bow

50 Markov Chain Text (Experimental Results) Generation time (seconds) function of word count, C++ and Python Processing time (seconds) function of word count, C++ and Python C++ Python C++ Python Results from Haswell i7, OS X El Capitan

51 Markov Chain Text (Some Conclusions) Implementations were simple, sequential programs modified from existing public github repos. C++ version processed text much more quickly, as expected. Python version outperformed in generating text; why? Implementation detail; script stores processed text into local SQL dictionary. C++ version operates on vectors of strings

52 Markov Chains: More Applications Since the next state in a Markov chain is simply a function of the last state and a random variable, we can easily see applications for Markov chains Queue lengths in call centers Stresses on materials Waiting time in production facilities Inventories in supply chains Water levels in dams Stock prices and more

53 Population Process: Introduction Markov process State: number of individuals in a population Changes: addition or removal of individuals Applications Ecology Telecommunications Queueing theory Chemical kinetics An example: Birth-death process

54 Population process: Birth-death process A special case of continuous-time Markov process Two types of changes Birth Death λ 0 λ 1 λ 2 λ 3 λ μ 1 μ 2 μ 3 μ 4 μ 5 Birth rate: λ i Death rate: μ i

55 Population process: Birth-death process Pure birth process λ 0 λ 1 λ 2 λ 3 λ Poisson process λ λ λ λ λ

56 Population Population dynamics: Birth-death process P n t + h = P n t 1 λ n h μ n h + P n 1 t λ n 1 h + P n+1 t μ n+1 h + o(h) P n t + h h P n t = λ n + μ n P n t + λ n 1 P n 1 t + μ n+1 P n+1 t Transition rate matrix: λ 0 λ 0 μ 1 (λ 1 + μ 1 ) λ 1 μ 2 (λ 2 + μ 2 ) λ 3 μ 3 (λ 3 + μ 3 ) λ 4 μ 4 (λ 4 + μ 5 ) n+1 n n-1 t t+h time

57 Population dynamics: Birth-death process Applications Predict extinction time Chemistry: radioactive transformations: birth process New atom decay: death process Queueing theory M/M/1 model M/M/k model

58 Applications: Adaptive Cruise Control Some cruise-control on vehicles use Markov chains to calculate speed Operating environment is stochastic Road grade on route can also be seen as stochastic

59 Applications: Adaptive Cruise Control Vehicle speed and following distance can be optimized for best-onaverage fuel economy and optimized for travel time More in Optimization and Optimal Control in Automotive Systems

60 Open Problem

61 Open Problem: Cutoffs Stationary distribution: vector that represents the proportion of time a given Markov chain occupies each state, given enough time. Total variation distance: distance between two probability distributions is equal to the max distance assigned to any event by the distributions Mixing time: number of steps required for a chain for the distance from stationary to be small

62 Open Problem: Cutoffs Build a Markov chain via a random walk over the symmetric group on n symbols The resulting chain s stationary distribution will be uniform; this can be interpreted as a fully mixed deck. Define cutoff as the existence of a sequence c(n) such that as n approaches infinity, the distance to uniformity approaches zero. If we use a random-to-random shuffle, does such a cutoff exist?

63 Open Problem: Cutoffs

64 References Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the web. Stanford Info Lab. snap.stanford.edu Hayes, B. (2013). First Links in the Markov Chain. American Scientist, 101(2), 92. Harald Waschl, Ilya Kolmanovsky, Maarten Steinbuch, Luigi del Re. Optimization and Optimal Control in Automative Systems. ov%20chains&pg=pa142#v=onepage&q=how%20does%20cruise%20control%20work%20markov%20chains&f=false

65 Discussion

66 Questions Revisited What is the matrix A given the following graph when calculating PageRank according to the simple definition of PageRank? What is the first name of the man credited for Markov Chains? 2 What is one application that you have used personally of Markov Chains?

CONTENTS. Preface List of Symbols and Notation

CONTENTS. Preface List of Symbols and Notation CONTENTS Preface List of Symbols and Notation xi xv 1 Introduction and Review 1 1.1 Deterministic and Stochastic Models 1 1.2 What is a Stochastic Process? 5 1.3 Monte Carlo Simulation 10 1.4 Conditional