Markov Chains. Camille Crumpton, Sisi Xiong, Tyler McDaniel

Size: px
Start display at page:

Download "Markov Chains. Camille Crumpton, Sisi Xiong, Tyler McDaniel"

Transcription

1 Markov Chains Camille Crumpton, Sisi Xiong, Tyler McDaniel

2 Questions What is the matrix A given the following graph when calculating PageRank according to the simple definition of PageRank? What is the first name of the man credited for Markov Chains? 2 What is one application that you have used personally of Markov Chains?

3 Tyler McDaniel From Asheville, NC CS PhD student; work on high-perf dense linear algebra libraries at ICL Prior work with graph algorithms in the financial sector Also worked in microcontrollers/iot

4 Sisi Xiong Changde, Hunan, China PhD student majoring in Electrical Engineering. Research interests: processing large datasets using probabilistic approaches: Bloom filters

5 Camille Crumpton Hometown: Knoxville/Maryville, TN

6 Camille Crumpton Computer Science graduate student Harpist Endurance athlete

7 Outline Overview of Markov Chains History of Markov Chains PageRank Most Famous Use of Markov Chains Algorithm Implementation Experiments Text Algorithm Algorithm Implementation Applications Fun uses! Experiments More applications Population processes Adaptive cruise control Open Problem Discussion

8 Overview of Markov Chains What is a Markov Chain? A discrete-time stochastic process that satisfies the Markov property Then what is the Markov property? If one can make predictions about the future of the process with only having the knowledge of the present state (no knowledge of past states) Sometimes called memoryless property

9 Overview of Markov Chains Often represented as a graph with probabilities as weights In the example to the right, each edge weight represents the probability of the Markov process changing from one state to another Can think of a Markov Chain as a stochastic (probability-driven) finite state machine

10 Overview of Markov Chains

11 Overview of Markov Chains Can also be represented as a transition matrix

12 Overview of Markov Chains: Simple Example Let s use a Markov Chain to make a simple prediction of the weather: Raining today Sunny today 40% rain tomorrow 60% sunny tomorrow 20% rain tomorrow 80% sunny tomorrow

13 Overview of Markov Chains: Simple Example

14 Overview of Markov Chains: More Complexity

15 History behind Markov Chains

16 Andrey Andreyevich Markov Born: June 1856 Death: July 1922 (66 years old) Russian mathematician Alma mater: St. Petersburg University Asked to stay at St. Petersburg University as a researcher upon graduation Studied stochastic processes

17 Markov Chain Beginnings: Probability + Poetry Andrey Markov spent hours perusing through Alexander Pushkin s novel in verse Eugene Onegin Discovered patterns in certain letters following other letters Created an analysis of vowel and consonant patterns the precursor to Markov Chains

18 Markov Chain Beginnings: Probability + Poetry If the current letter I m reading is a vowel, what is the probability that the next letter I m reading is a vowel? A consonant? What about three letters later? Ten letters later?

19 Markov Chain Beginnings: Probability + Poetry "My uncle's shown his good intentions By falling desperately ill; His worth is proved; of all inventions Where will you find one better still? He's an example, I'm averring; But, God, what boredom -- there, unstirring, By day, by night, thus to be bid To sit beside an invalid! Low cunning must assist devotion To one who is but half-alive: You puff his pillow and contrive Amusement while you mix his potion; You sign, and think with furrowed brow -- 'Why can't the devil take you now?' "

20 Markov Chain Beginnings: Probability + Poetry Andrey Markov created a new branch of probability with his findings, now known as Markov chains Extended probability beyond coin flipping & dice rolling (independent events) to chains of linked events (what happens next depends on current state of the system)

21 Markov Chain Beginnings: Probability + Poetry

22 Markov Chains: Two Algorithms/Applications PageRank Application we use it everyday! Algorithm Implementation Experiments Markov Chain Text Algorithm (a.k.a. Markov Chain Algorithm) Algorithm Implementation Applications Experiment

23 PageRank: Introduction Inputs A bunch of webpages (vertices), each of which has links to other webpages(edges). C C Directed graph! A B D E C D E

24 PageRank: Introduction Purpose Rank all webpages in terms of importance. How to define importance Analogy: citation, papers with more citations are more important. Option: count how many backlinks a webpage has. Caveat: if a page has a backlink from an important page, it also should be somewhat important. Weighted directed graph!

25 PageRank: Introduction Original Paper Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the web. Stanford Info Lab. PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites.* *

26 PageRank: Definition Intuitively A webpage has high rank if the sum of the ranks of its backlinks is large. Simple definition u: a webpage R(u): rank of u F(u): the set of webpages that u points to. B(u): the set of webpages that point to u. N(u): the number of links from u (forward links). c: a factor for normalization, i.e., a constant. R u = c v B u R(v) N v

27 PageRank: Definition R u = c v B u R(v) N v Construct a n by n matrix A: A u,v = 1/N v 0, if there is an edge from v to u, otherwise R = [R(1), R(2),, R(n)] R = car R is an eigenvalue of A with eigenvalue c.

28 PageRank: Example A u,v = 1/N v 0, if there is an edge from v to u, otherwise 1 3 R 0 = 1 R 3 R 1 = 1 2 R 0 R 2 = 1 2 R R 1 R 3 = 1 2 R R R 0 R 1 R 2 R 3 = / /2 1/ /2 1 0 R 0 R 1 R 2 R 3

29 PageRank: Rank sink issue Loop (u, v) accumulates rank, but never distributes rank (no forward links). u v w Modified PageRank E(u) is some vector that is considered as a source of rank R u = c v B u R (v) + ce(u) N V

30 PageRank: Random Surfer Model A random surfer keeps clicking on successive links randomly. If entering a rank sink, the surfer gets bored and jumps to a random page. R u = c v B u R (v) N V + 1 c N, c = 0.85 u v x w

31 PageRank: Markov chain R 0 R 1 R 2 R 3 = / /2 1/ /2 1 0 R 0 R 1 R 2 R 3 The sum of values in each column in A is 1. A is a Markov matrix! All probability is non-negative, it s guaranteed there is a steady-state vector. PageRank always has a steady state.

32 PageRank: Convergence properties Experiments on a database based on 322 million links 52 iterations. Scale very well.

33 PageRank: Implementation R(0) = [1/N, 1/N,, 1/N] e = delta = 0 while true R i+1 = 0.85AR i N delta = R i+1 R i 1 if delta < e break R i = R i+1 return R i

34 PageRank: Experiment configuration Datasets Test graphs from homework 2. Several graphs from Metrics Convergence speed, in terms of iterations. PageRank results: sparse vs dense graphs. PageRank results based on real graphs.

35 PageRank: Experiment results E V Observation: 1. Results of all graphs converge in less than 20 iterations. 2. No obvious pattern between convergence speed and graph size Iterations to convergence of all test graphs from hw2

36 PageRank: Experiment results PageRank results comparison between dense and sparse graphs Observation: PageRank of results of dense graphs are more spread out.

37 PageRank: Experiment results Why there is a jump there? Suppose if a webpage (vertex) has no backlinks (no webpage has links to it), the only rank source comes from random jumps, which is a relatively small probability. Hypothesis: If a vertex has no backlinks, its page rank is the smallest. Experiment results: matching! Graph Random txt Random txt Number of webpages which have smallest PageRank Number of webpages which have no backlinks

38 PageRank: Experiment results Number of vertices which have no backlinks = 4734 Number of vertices which have no backlinks = 4590 Wikipedia administer vote graph V = 7,115, E = 103,689 iterations = 17 Citation graph from the e-print arxiv V = 27,770, E = 352,807 iterations = 17

39 Next Algorithm/Application! Markov Chain Text

40 Markov Chain Text (Intro) Process input text, analyzing letter/word (token) probabilities Create graph (transition matrix) to represent those probabilities Generate similar text using graph Order: how many prior tokens to use when generating next token Example: 1.33 Input: Foo bar foo bar bar Foo Bar \n.33.33

41 Markov Chain Text (Algorithm) Simple (bare bones!) pseudocode for order 1: PROCESS: Create dictionary For line in input: For token_1, token_2 in line: Add to dictionary -> {token_1, token_2} GENERATE: Choose first word from dictionary -> token_1 For each desired word in output: Lookup token_2 in dictionary using token_1 token_2 -> token_1

42 Markov Chain Text (Implementation) Store probabilities for each word based on order; use transition matrix rather than dictionary. Gather metadata and use parsing rules to format output correctly (punctuation, titles before names, etc.) May be complemented by other methods n-gram methods for partial words, fat-finger grace techniques

43 Markov Chain Text (Applications) One method used in autocomplete software Also used in some bots

44 Applications Autocomplete on Phones Alternative caption: Although the Markov-chain text model is still rudimentary, it recently gave me Massachusetts Institute of America. Although I have to admit it sounds prestigious.

45 Applications Autocomplete on Phones Most text messaging applications use a Markov Chain model to predict what word you wish to type next, based on the last word.

46 Applications Subreddit generation SubredditSimulator/comme nts/3g9ioz/what_is_rsubre dditsimulator/ Fully-automated subreddit that generates random submissions and comments using Markov chains

47 Applications Subreddit generation Explanation of the Markov Chain on Reddit: Basically, you feed in a bunch of sentences, and even though it has no understanding of the meaning of the text, it picks up on patterns like "word A is often followed by word B". Then when you want to generate a new sentence, it "walks" its way through from the start of a sentence to the end of one, picking sequences of words that it knows are valid based on that initial analysis. So generally short sequences of words in the generated sentences will make sense, but often not the whole thing.

48 Markov Chain Text (Experiment) Writing a sonnet using Markov chain generated using Billy Shakespeare s extant sonnets: C++ (Processed:.05 seconds, generated 100 lines:.91 seconds) My days are In thy wrong, that beauty tempting her pretty wrongs that thereby beauty's form in this. Black lines of that I better state out of thine in their wills count the lesson true, the other mine, thou live twice in lease find true that wild music sadly? sweets dost deceive.

49 Markov Chain Text (Experiment) Writing a sonnet using Markov chain generated using Billy Shakespeare s extant sonnets: Python (Processed:.51 seconds, generated 100 lines:.71 seconds) Hence, thou wouldst thou thy might To bear greater wrong, than my invention quite Dulling my friend O for thy soul's imaginary sight Presents thy dial's shady stealth mayst in their virtue answer Muse, Make answer, this fair were born And dost beguile the conquest of heaven it hath masked not false to ruminate That on the world, unbless some other write to thy monument When that thou age black ink my transgression bow

50 Markov Chain Text (Experimental Results) Generation time (seconds) function of word count, C++ and Python Processing time (seconds) function of word count, C++ and Python C++ Python C++ Python Results from Haswell i7, OS X El Capitan

51 Markov Chain Text (Some Conclusions) Implementations were simple, sequential programs modified from existing public github repos. C++ version processed text much more quickly, as expected. Python version outperformed in generating text; why? Implementation detail; script stores processed text into local SQL dictionary. C++ version operates on vectors of strings

52 Markov Chains: More Applications Since the next state in a Markov chain is simply a function of the last state and a random variable, we can easily see applications for Markov chains Queue lengths in call centers Stresses on materials Waiting time in production facilities Inventories in supply chains Water levels in dams Stock prices and more

53 Population Process: Introduction Markov process State: number of individuals in a population Changes: addition or removal of individuals Applications Ecology Telecommunications Queueing theory Chemical kinetics An example: Birth-death process

54 Population process: Birth-death process A special case of continuous-time Markov process Two types of changes Birth Death λ 0 λ 1 λ 2 λ 3 λ μ 1 μ 2 μ 3 μ 4 μ 5 Birth rate: λ i Death rate: μ i

55 Population process: Birth-death process Pure birth process λ 0 λ 1 λ 2 λ 3 λ Poisson process λ λ λ λ λ

56 Population Population dynamics: Birth-death process P n t + h = P n t 1 λ n h μ n h + P n 1 t λ n 1 h + P n+1 t μ n+1 h + o(h) P n t + h h P n t = λ n + μ n P n t + λ n 1 P n 1 t + μ n+1 P n+1 t Transition rate matrix: λ 0 λ 0 μ 1 (λ 1 + μ 1 ) λ 1 μ 2 (λ 2 + μ 2 ) λ 3 μ 3 (λ 3 + μ 3 ) λ 4 μ 4 (λ 4 + μ 5 ) n+1 n n-1 t t+h time

57 Population dynamics: Birth-death process Applications Predict extinction time Chemistry: radioactive transformations: birth process New atom decay: death process Queueing theory M/M/1 model M/M/k model

58 Applications: Adaptive Cruise Control Some cruise-control on vehicles use Markov chains to calculate speed Operating environment is stochastic Road grade on route can also be seen as stochastic

59 Applications: Adaptive Cruise Control Vehicle speed and following distance can be optimized for best-onaverage fuel economy and optimized for travel time More in Optimization and Optimal Control in Automotive Systems

60 Open Problem

61 Open Problem: Cutoffs Stationary distribution: vector that represents the proportion of time a given Markov chain occupies each state, given enough time. Total variation distance: distance between two probability distributions is equal to the max distance assigned to any event by the distributions Mixing time: number of steps required for a chain for the distance from stationary to be small

62 Open Problem: Cutoffs Build a Markov chain via a random walk over the symmetric group on n symbols The resulting chain s stationary distribution will be uniform; this can be interpreted as a fully mixed deck. Define cutoff as the existence of a sequence c(n) such that as n approaches infinity, the distance to uniformity approaches zero. If we use a random-to-random shuffle, does such a cutoff exist?

63 Open Problem: Cutoffs

64 References Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the web. Stanford Info Lab. snap.stanford.edu Hayes, B. (2013). First Links in the Markov Chain. American Scientist, 101(2), 92. Harald Waschl, Ilya Kolmanovsky, Maarten Steinbuch, Luigi del Re. Optimization and Optimal Control in Automative Systems. ov%20chains&pg=pa142#v=onepage&q=how%20does%20cruise%20control%20work%20markov%20chains&f=false

65 Discussion

66 Questions Revisited What is the matrix A given the following graph when calculating PageRank according to the simple definition of PageRank? What is the first name of the man credited for Markov Chains? 2 What is one application that you have used personally of Markov Chains?

CONTENTS. Preface List of Symbols and Notation

CONTENTS. Preface List of Symbols and Notation CONTENTS Preface List of Symbols and Notation xi xv 1 Introduction and Review 1 1.1 Deterministic and Stochastic Models 1 1.2 What is a Stochastic Process? 5 1.3 Monte Carlo Simulation 10 1.4 Conditional

More information

Inf 2B: Ranking Queries on the WWW

Inf 2B: Ranking Queries on the WWW Inf B: Ranking Queries on the WWW Kyriakos Kalorkoti School of Informatics University of Edinburgh Queries Suppose we have an Inverted Index for a set of webpages. Disclaimer Not really the scenario of

More information

Calculating Web Page Authority Using the PageRank Algorithm

Calculating Web Page Authority Using the PageRank Algorithm Jacob Miles Prystowsky and Levi Gill Math 45, Fall 2005 1 Introduction 1.1 Abstract In this document, we examine how the Google Internet search engine uses the PageRank algorithm to assign quantitatively

More information

PageRank. Ryan Tibshirani /36-662: Data Mining. January Optional reading: ESL 14.10

PageRank. Ryan Tibshirani /36-662: Data Mining. January Optional reading: ESL 14.10 PageRank Ryan Tibshirani 36-462/36-662: Data Mining January 24 2012 Optional reading: ESL 14.10 1 Information retrieval with the web Last time we learned about information retrieval. We learned how to

More information

Link Analysis Ranking

Link Analysis Ranking Link Analysis Ranking How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would you do it? Naïve ranking of query results Given query

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/7/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 Web pages are not equally important www.joe-schmoe.com

More information

Graph Models The PageRank Algorithm

Graph Models The PageRank Algorithm Graph Models The PageRank Algorithm Anna-Karin Tornberg Mathematical Models, Analysis and Simulation Fall semester, 2013 The PageRank Algorithm I Invented by Larry Page and Sergey Brin around 1998 and

More information

Data Mining and Matrices

Data Mining and Matrices Data Mining and Matrices 10 Graphs II Rainer Gemulla, Pauli Miettinen Jul 4, 2013 Link analysis The web as a directed graph Set of web pages with associated textual content Hyperlinks between webpages

More information

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018 Lab 8: Measuring Graph Centrality - PageRank Monday, November 5 CompSci 531, Fall 2018 Outline Measuring Graph Centrality: Motivation Random Walks, Markov Chains, and Stationarity Distributions Google

More information

Link Analysis Information Retrieval and Data Mining. Prof. Matteo Matteucci

Link Analysis Information Retrieval and Data Mining. Prof. Matteo Matteucci Link Analysis Information Retrieval and Data Mining Prof. Matteo Matteucci Hyperlinks for Indexing and Ranking 2 Page A Hyperlink Page B Intuitions The anchor text might describe the target page B Anchor

More information

Link Analysis. Stony Brook University CSE545, Fall 2016

Link Analysis. Stony Brook University CSE545, Fall 2016 Link Analysis Stony Brook University CSE545, Fall 2016 The Web, circa 1998 The Web, circa 1998 The Web, circa 1998 Match keywords, language (information retrieval) Explore directory The Web, circa 1998

More information

Today. Next lecture. (Ch 14) Markov chains and hidden Markov models

Today. Next lecture. (Ch 14) Markov chains and hidden Markov models Today (Ch 14) Markov chains and hidden Markov models Graphical representation Transition probability matrix Propagating state distributions The stationary distribution Next lecture (Ch 14) Markov chains

More information

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas Hidden Markov Models Vibhav Gogate The University of Texas at Dallas Intro to AI (CS 4365) Many slides over the course adapted from either Dan Klein, Luke Zettlemoyer, Stuart Russell or Andrew Moore 1

More information

Modelling data networks stochastic processes and Markov chains

Modelling data networks stochastic processes and Markov chains Modelling data networks stochastic processes and Markov chains a 1, 3 1, 2 2, 2 b 0, 3 2, 3 u 1, 3 α 1, 6 c 0, 3 v 2, 2 β 1, 1 Richard G. Clegg (richard@richardclegg.org) November 2016 Available online

More information

Link Mining PageRank. From Stanford C246

Link Mining PageRank. From Stanford C246 Link Mining PageRank From Stanford C246 Broad Question: How to organize the Web? First try: Human curated Web dictionaries Yahoo, DMOZ LookSmart Second try: Web Search Information Retrieval investigates

More information

Modelling data networks stochastic processes and Markov chains

Modelling data networks stochastic processes and Markov chains Modelling data networks stochastic processes and Markov chains a 1, 3 1, 2 2, 2 b 0, 3 2, 3 u 1, 3 α 1, 6 c 0, 3 v 2, 2 β 1, 1 Richard G. Clegg (richard@richardclegg.org) December 2011 Available online

More information

Markov Chains for Biosequences and Google searches

Markov Chains for Biosequences and Google searches Markov Chains for Biosequences and Google searches Nigel Buttimore Trinity College Dublin 21 January 2008 Bell Labs Ireland Alcatel-Lucent Summary A brief history of Andrei Andreyevich Markov and his work

More information

Google and Biosequence searches with Markov Chains

Google and Biosequence searches with Markov Chains Google and Biosequence searches with Markov Chains Nigel Buttimore Trinity College Dublin 3 June 2010 UCD-TCD Mathematics Summer School Frontiers of Maths and Applications Summary A brief history of Andrei

More information

Slides based on those in:

Slides based on those in: Spyros Kontogiannis & Christos Zaroliagis Slides based on those in: http://www.mmds.org High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering

More information

Chapter 10. Finite-State Markov Chains. Introductory Example: Googling Markov Chains

Chapter 10. Finite-State Markov Chains. Introductory Example: Googling Markov Chains Chapter 0 Finite-State Markov Chains Introductory Example: Googling Markov Chains Google means many things: it is an Internet search engine, the company that produces the search engine, and a verb meaning

More information

CS168: The Modern Algorithmic Toolbox Lecture #6: Markov Chain Monte Carlo

CS168: The Modern Algorithmic Toolbox Lecture #6: Markov Chain Monte Carlo CS168: The Modern Algorithmic Toolbox Lecture #6: Markov Chain Monte Carlo Tim Roughgarden & Gregory Valiant April 15, 2015 The previous lecture covered several tools for inferring properties of the distribution

More information

Lecture 7 Mathematics behind Internet Search

Lecture 7 Mathematics behind Internet Search CCST907 Hidden Order in Daily Life: A Mathematical Perspective Lecture 7 Mathematics behind Internet Search Dr. S. P. Yung (907A) Dr. Z. Hua (907B) Department of Mathematics, HKU Outline Google is the

More information

Web Ranking. Classification (manual, automatic) Link Analysis (today s lesson)

Web Ranking. Classification (manual, automatic) Link Analysis (today s lesson) Link Analysis Web Ranking Documents on the web are first ranked according to their relevance vrs the query Additional ranking methods are needed to cope with huge amount of information Additional ranking

More information

1998: enter Link Analysis

1998: enter Link Analysis 1998: enter Link Analysis uses hyperlink structure to focus the relevant set combine traditional IR score with popularity score Page and Brin 1998 Kleinberg Web Information Retrieval IR before the Web

More information

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS DATA MINING LECTURE 3 Link Analysis Ranking PageRank -- Random walks HITS How to organize the web First try: Manually curated Web Directories How to organize the web Second try: Web Search Information

More information

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine CS 277: Data Mining Mining Web Link Structure Class Presentations In-class, Tuesday and Thursday next week 2-person teams: 6 minutes, up to 6 slides, 3 minutes/slides each person 1-person teams 4 minutes,

More information

LECSS Physics 11 Introduction to Physics and Math Methods 1 Revised 8 September 2013 Don Bloomfield

LECSS Physics 11 Introduction to Physics and Math Methods 1 Revised 8 September 2013 Don Bloomfield LECSS Physics 11 Introduction to Physics and Math Methods 1 Physics 11 Introduction to Physics and Math Methods In this introduction, you will get a more in-depth overview of what Physics is, as well as

More information

0.1 Naive formulation of PageRank

0.1 Naive formulation of PageRank PageRank is a ranking system designed to find the best pages on the web. A webpage is considered good if it is endorsed (i.e. linked to) by other good webpages. The more webpages link to it, and the more

More information

Link Analysis. Leonid E. Zhukov

Link Analysis. Leonid E. Zhukov Link Analysis Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Structural Analysis and Visualization

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Mark Schmidt University of British Columbia Winter 2018 Last Time: Monte Carlo Methods If we want to approximate expectations of random functions, E[g(x)] = g(x)p(x) or E[g(x)]

More information

1 Searching the World Wide Web

1 Searching the World Wide Web Hubs and Authorities in a Hyperlinked Environment 1 Searching the World Wide Web Because diverse users each modify the link structure of the WWW within a relatively small scope by creating web-pages on

More information

Data Mining Recitation Notes Week 3

Data Mining Recitation Notes Week 3 Data Mining Recitation Notes Week 3 Jack Rae January 28, 2013 1 Information Retrieval Given a set of documents, pull the (k) most similar document(s) to a given query. 1.1 Setup Say we have D documents

More information

Notes on Markov Networks

Notes on Markov Networks Notes on Markov Networks Lili Mou moull12@sei.pku.edu.cn December, 2014 This note covers basic topics in Markov networks. We mainly talk about the formal definition, Gibbs sampling for inference, and maximum

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Mark Schmidt University of British Columbia Winter 2019 Last Time: Monte Carlo Methods If we want to approximate expectations of random functions, E[g(x)] = g(x)p(x) or E[g(x)]

More information

Approximate Inference

Approximate Inference Approximate Inference Simulation has a name: sampling Sampling is a hot topic in machine learning, and it s really simple Basic idea: Draw N samples from a sampling distribution S Compute an approximate

More information

PageRank algorithm Hubs and Authorities. Data mining. Web Data Mining PageRank, Hubs and Authorities. University of Szeged.

PageRank algorithm Hubs and Authorities. Data mining. Web Data Mining PageRank, Hubs and Authorities. University of Szeged. Web Data Mining PageRank, University of Szeged Why ranking web pages is useful? We are starving for knowledge It earns Google a bunch of money. How? How does the Web looks like? Big strongly connected

More information

Markov Chains. Chapter 16. Markov Chains - 1

Markov Chains. Chapter 16. Markov Chains - 1 Markov Chains Chapter 16 Markov Chains - 1 Why Study Markov Chains? Decision Analysis focuses on decision making in the face of uncertainty about one future event. However, many decisions need to consider

More information

Computing PageRank using Power Extrapolation

Computing PageRank using Power Extrapolation Computing PageRank using Power Extrapolation Taher Haveliwala, Sepandar Kamvar, Dan Klein, Chris Manning, and Gene Golub Stanford University Abstract. We present a novel technique for speeding up the computation

More information

Math 304 Handout: Linear algebra, graphs, and networks.

Math 304 Handout: Linear algebra, graphs, and networks. Math 30 Handout: Linear algebra, graphs, and networks. December, 006. GRAPHS AND ADJACENCY MATRICES. Definition. A graph is a collection of vertices connected by edges. A directed graph is a graph all

More information

Uncertainty and Randomization

Uncertainty and Randomization Uncertainty and Randomization The PageRank Computation in Google Roberto Tempo IEIIT-CNR Politecnico di Torino tempo@polito.it 1993: Robustness of Linear Systems 1993: Robustness of Linear Systems 16 Years

More information

Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University.

Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University. Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University http://www.mmds.org #1: C4.5 Decision Tree - Classification (61 votes) #2: K-Means - Clustering

More information

IR: Information Retrieval

IR: Information Retrieval / 44 IR: Information Retrieval FIB, Master in Innovation and Research in Informatics Slides by Marta Arias, José Luis Balcázar, Ramon Ferrer-i-Cancho, Ricard Gavaldá Department of Computer Science, UPC

More information

LAB 2: INTRODUCTION TO MOTION

LAB 2: INTRODUCTION TO MOTION Lab 2 - Introduction to Motion 3 Name Date Partners LAB 2: INTRODUCTION TO MOTION Slow and steady wins the race. Aesop s fable: The Hare and the Tortoise Objectives To explore how various motions are represented

More information

Stochastic Processes and Advanced Mathematical Finance. Stochastic Processes

Stochastic Processes and Advanced Mathematical Finance. Stochastic Processes Steven R. Dunbar Department of Mathematics 203 Avery Hall University of Nebraska-Lincoln Lincoln, NE 68588-0130 http://www.math.unl.edu Voice: 402-472-3731 Fax: 402-472-8466 Stochastic Processes and Advanced

More information

Part 1: You are given the following system of two equations: x + 2y = 16 3x 4y = 2

Part 1: You are given the following system of two equations: x + 2y = 16 3x 4y = 2 Solving Systems of Equations Algebraically Teacher Notes Comment: As students solve equations throughout this task, have them continue to explain each step using properties of operations or properties

More information

CSEP 573: Artificial Intelligence

CSEP 573: Artificial Intelligence CSEP 573: Artificial Intelligence Hidden Markov Models Luke Zettlemoyer Many slides over the course adapted from either Dan Klein, Stuart Russell, Andrew Moore, Ali Farhadi, or Dan Weld 1 Outline Probabilistic

More information

COMPSCI 514: Algorithms for Data Science

COMPSCI 514: Algorithms for Data Science COMPSCI 514: Algorithms for Data Science Arya Mazumdar University of Massachusetts at Amherst Fall 2018 Lecture 4 Markov Chain & Pagerank Homework Announcement Show your work in the homework Write the

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Lecture #9: Link Analysis Seoul National University 1 In This Lecture Motivation for link analysis Pagerank: an important graph ranking algorithm Flow and random walk formulation

More information

Proposition logic and argument. CISC2100, Spring 2017 X.Zhang

Proposition logic and argument. CISC2100, Spring 2017 X.Zhang Proposition logic and argument CISC2100, Spring 2017 X.Zhang 1 Where are my glasses? I know the following statements are true. 1. If I was reading the newspaper in the kitchen, then my glasses are on the

More information

Where are my glasses?

Where are my glasses? Proposition logic and argument CISC2100, Spring 2017 X.Zhang 1 Where are my glasses? I know the following statements are true. 1. If I was reading the newspaper in the kitchen, then my glasses are on the

More information

Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides

Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides slide 1 Inference with Bayes rule: Example In a bag there are two envelopes one has a red ball (worth $100) and a black ball one

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University.

CS246: Mining Massive Datasets Jure Leskovec, Stanford University. CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu What is the structure of the Web? How is it organized? 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive

More information

CS168: The Modern Algorithmic Toolbox Lecture #14: Markov Chain Monte Carlo

CS168: The Modern Algorithmic Toolbox Lecture #14: Markov Chain Monte Carlo CS168: The Modern Algorithmic Toolbox Lecture #14: Markov Chain Monte Carlo Tim Roughgarden & Gregory Valiant May 17, 2018 The previous lecture covered several tools for inferring properties of the distribution

More information

Name of the Student:

Name of the Student: SUBJECT NAME : Probability & Queueing Theory SUBJECT CODE : MA 6453 MATERIAL NAME : Part A questions REGULATION : R2013 UPDATED ON : November 2017 (Upto N/D 2017 QP) (Scan the above QR code for the direct

More information

Online Social Networks and Media. Link Analysis and Web Search

Online Social Networks and Media. Link Analysis and Web Search Online Social Networks and Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart How to organize the web Second try: Web Search Information

More information

From M&Ms to Mathematics, or, How I learned to answer questions and help my kids love math.

From M&Ms to Mathematics, or, How I learned to answer questions and help my kids love math. From M&Ms to Mathematics, or, How I learned to answer questions and help my kids love math. Steven J. Miller, Williams College sjm1@williams.edu, Steven.Miller.MC.96@aya.yale.edu (with Cameron and Kayla

More information

Queueing Theory I Summary! Little s Law! Queueing System Notation! Stationary Analysis of Elementary Queueing Systems " M/M/1 " M/M/m " M/M/1/K "

Queueing Theory I Summary! Little s Law! Queueing System Notation! Stationary Analysis of Elementary Queueing Systems  M/M/1  M/M/m  M/M/1/K Queueing Theory I Summary Little s Law Queueing System Notation Stationary Analysis of Elementary Queueing Systems " M/M/1 " M/M/m " M/M/1/K " Little s Law a(t): the process that counts the number of arrivals

More information

Lecture 5: Introduction to Markov Chains

Lecture 5: Introduction to Markov Chains Lecture 5: Introduction to Markov Chains Winfried Just Department of Mathematics, Ohio University January 24 26, 2018 weather.com light The weather is a stochastic process. For now we can assume that this

More information

Name: Packet Due Date: Tuesday, 9/18. Science

Name: Packet Due Date: Tuesday, 9/18. Science Name: Packet Due Date: Tuesday, 9/18 Science Module 2 Chapter 1 Phase Change Describing Phase Change at Two Scales What happened to the liquid in Titan s Lake? (NGSS Performance Expectations: MS-PS1-1;

More information

Definition and Examples of DTMCs

Definition and Examples of DTMCs Definition and Examples of DTMCs Natarajan Gautam Department of Industrial and Systems Engineering Texas A&M University 235A Zachry, College Station, TX 77843-3131 Email: gautam@tamuedu Phone: 979-845-5458

More information

CISC-102 Fall 2017 Week 1 David Rappaport Goodwin G-532 Office Hours: Tuesday 1:30-3:30

CISC-102 Fall 2017 Week 1 David Rappaport Goodwin G-532 Office Hours: Tuesday 1:30-3:30 Week 1 Fall 2017 1 of 42 CISC-102 Fall 2017 Week 1 David Rappaport daver@cs.queensu.ca Goodwin G-532 Office Hours: Tuesday 1:30-3:30 Homework Homework every week. Keep up to date or you risk falling behind.

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Basics A new 4 credit unit course Part of Theoretical Computer Science courses at the Department of Mathematics There will be 4 hours

More information

Statistical Problem. . We may have an underlying evolving system. (new state) = f(old state, noise) Input data: series of observations X 1, X 2 X t

Statistical Problem. . We may have an underlying evolving system. (new state) = f(old state, noise) Input data: series of observations X 1, X 2 X t Markov Chains. Statistical Problem. We may have an underlying evolving system (new state) = f(old state, noise) Input data: series of observations X 1, X 2 X t Consecutive speech feature vectors are related

More information

MATH HOMEWORK PROBLEMS D. MCCLENDON

MATH HOMEWORK PROBLEMS D. MCCLENDON MATH 46- HOMEWORK PROBLEMS D. MCCLENDON. Consider a Markov chain with state space S = {0, }, where p = P (0, ) and q = P (, 0); compute the following in terms of p and q: (a) P (X 2 = 0 X = ) (b) P (X

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu November 16, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision

More information

The Theory behind PageRank

The Theory behind PageRank The Theory behind PageRank Mauro Sozio Telecom ParisTech May 21, 2014 Mauro Sozio (LTCI TPT) The Theory behind PageRank May 21, 2014 1 / 19 A Crash Course on Discrete Probability Events and Probability

More information

Exercises Solutions. Automation IEA, LTH. Chapter 2 Manufacturing and process systems. Chapter 5 Discrete manufacturing problems

Exercises Solutions. Automation IEA, LTH. Chapter 2 Manufacturing and process systems. Chapter 5 Discrete manufacturing problems Exercises Solutions Note, that we have not formulated the answers for all the review questions. You will find the answers for many questions by reading and reflecting about the text in the book. Chapter

More information

AP PHYSICS C Mechanics - SUMMER ASSIGNMENT FOR

AP PHYSICS C Mechanics - SUMMER ASSIGNMENT FOR AP PHYSICS C Mechanics - SUMMER ASSIGNMENT FOR 2018-2019 Dear Student: The AP physics course you have signed up for is designed to prepare you for a superior performance on the AP test. To complete material

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 18: HMMs and Particle Filtering 4/4/2011 Pieter Abbeel --- UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore

More information

1 Probabilities. 1.1 Basics 1 PROBABILITIES

1 Probabilities. 1.1 Basics 1 PROBABILITIES 1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability

More information

CS Homework 3. October 15, 2009

CS Homework 3. October 15, 2009 CS 294 - Homework 3 October 15, 2009 If you have questions, contact Alexandre Bouchard (bouchard@cs.berkeley.edu) for part 1 and Alex Simma (asimma@eecs.berkeley.edu) for part 2. Also check the class website

More information

Conceptual Explanations: Simultaneous Equations Distance, rate, and time

Conceptual Explanations: Simultaneous Equations Distance, rate, and time Conceptual Explanations: Simultaneous Equations Distance, rate, and time If you travel 30 miles per hour for 4 hours, how far do you go? A little common sense will tell you that the answer is 120 miles.

More information

STAT/SOC/CSSS 221 Statistical Concepts and Methods for the Social Sciences. Random Variables

STAT/SOC/CSSS 221 Statistical Concepts and Methods for the Social Sciences. Random Variables STAT/SOC/CSSS 221 Statistical Concepts and Methods for the Social Sciences Random Variables Christopher Adolph Department of Political Science and Center for Statistics and the Social Sciences University

More information

Application. Stochastic Matrices and PageRank

Application. Stochastic Matrices and PageRank Application Stochastic Matrices and PageRank Stochastic Matrices Definition A square matrix A is stochastic if all of its entries are nonnegative, and the sum of the entries of each column is. We say A

More information

Markov Chains and Hidden Markov Models

Markov Chains and Hidden Markov Models Markov Chains and Hidden Markov Models CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018 Soleymani Slides are based on Klein and Abdeel, CS188, UC Berkeley. Reasoning

More information

Algebraic Representation of Networks

Algebraic Representation of Networks Algebraic Representation of Networks 0 1 2 1 1 0 0 1 2 0 0 1 1 1 1 1 Hiroki Sayama sayama@binghamton.edu Describing networks with matrices (1) Adjacency matrix A matrix with rows and columns labeled by

More information

From M&Ms to Mathematics, or, How I learned to answer questions and help my kids love math.

From M&Ms to Mathematics, or, How I learned to answer questions and help my kids love math. From M&Ms to Mathematics, or, How I learned to answer questions and help my kids love math. Steven J. Miller, Williams College Steven.J.Miller@williams.edu http://web.williams.edu/mathematics/sjmiller/public_html/

More information

Data and Algorithms of the Web

Data and Algorithms of the Web Data and Algorithms of the Web Link Analysis Algorithms Page Rank some slides from: Anand Rajaraman, Jeffrey D. Ullman InfoLab (Stanford University) Link Analysis Algorithms Page Rank Hubs and Authorities

More information

Homework every week. Keep up to date or you risk falling behind. Quizzes and Final exam are based on homework questions.

Homework every week. Keep up to date or you risk falling behind. Quizzes and Final exam are based on homework questions. Week 1 Fall 2016 1 of 25 CISC-102 Fall 2016 Week 1 David Rappaport daver@cs.queensu.ca Goodwin G-532 Office Hours: Monday 1:00-3:00 (or by appointment) Homework Homework every week. Keep up to date or

More information

Pr[positive test virus] Pr[virus] Pr[positive test] = Pr[positive test] = Pr[positive test]

Pr[positive test virus] Pr[virus] Pr[positive test] = Pr[positive test] = Pr[positive test] 146 Probability Pr[virus] = 0.00001 Pr[no virus] = 0.99999 Pr[positive test virus] = 0.99 Pr[positive test no virus] = 0.01 Pr[virus positive test] = Pr[positive test virus] Pr[virus] = 0.99 0.00001 =

More information

CS 188: Artificial Intelligence Spring 2009

CS 188: Artificial Intelligence Spring 2009 CS 188: Artificial Intelligence Spring 2009 Lecture 21: Hidden Markov Models 4/7/2009 John DeNero UC Berkeley Slides adapted from Dan Klein Announcements Written 3 deadline extended! Posted last Friday

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence Markov Models Instructors: Dan Klein and Pieter Abbeel --- University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to

More information

CS246 Final Exam, Winter 2011

CS246 Final Exam, Winter 2011 CS246 Final Exam, Winter 2011 1. Your name and student ID. Name:... Student ID:... 2. I agree to comply with Stanford Honor Code. Signature:... 3. There should be 17 numbered pages in this exam (including

More information

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10 EECS 70 Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10 Introduction to Basic Discrete Probability In the last note we considered the probabilistic experiment where we flipped

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu March 16, 2016 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision

More information

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012 CSE 573: Artificial Intelligence Autumn 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 Outline

More information

Discrete Structures Proofwriting Checklist

Discrete Structures Proofwriting Checklist CS103 Winter 2019 Discrete Structures Proofwriting Checklist Cynthia Lee Keith Schwarz Now that we re transitioning to writing proofs about discrete structures like binary relations, functions, and graphs,

More information

Rotational Motion Test

Rotational Motion Test Rotational Motion Test Multiple Choice: Write the letter that best answers the question. Each question is worth 2pts. 1. Angular momentum is: A.) The sum of moment of inertia and angular velocity B.) The

More information

LAB 2 - ONE DIMENSIONAL MOTION

LAB 2 - ONE DIMENSIONAL MOTION Name Date Partners L02-1 LAB 2 - ONE DIMENSIONAL MOTION OBJECTIVES Slow and steady wins the race. Aesop s fable: The Hare and the Tortoise To learn how to use a motion detector and gain more familiarity

More information

A Note on Google s PageRank

A Note on Google s PageRank A Note on Google s PageRank According to Google, google-search on a given topic results in a listing of most relevant web pages related to the topic. Google ranks the importance of webpages according to

More information

THE PERIODIC TABLE Element Research Project

THE PERIODIC TABLE Element Research Project NAME PER DUE DATE: Monday -April 12, 2016 MAIL BOX THE PERIODIC TABLE Element Research Project Rough Draft Progress Check - April 1 Rough Draft Complete - April 8 Pictures for poster printed & or poster

More information

4 = 9 4 = = 2 = = 2 8 = = = 11 = = 4 2 = = = 17 = 9 15 = = = = 4 = 5 20 = = = 2 1

4 = 9 4 = = 2 = = 2 8 = = = 11 = = 4 2 = = = 17 = 9 15 = = = = 4 = 5 20 = = = 2 1 Name MENTAL MATHS Addition & Subtraction Multiplication 9 9 9 0 9 + = = = = = + = = = = = + = = 9 = 9 = = + 9 = = = 0 = = + = = 9 = = = Number & place value Write the answer. Then write the turnaround

More information

Pre-Test Unit 4: Exponential Functions KEY

Pre-Test Unit 4: Exponential Functions KEY Pre-Test Unit 4: Exponential Functions KEY You may use a calculator on parts of the test. Evaluate the following rational roots. NO CALCULATOR. (4 pts; 2 pts for correct process, 2 pts for correct answer)

More information

Science Notebook Motion, Force, and Models

Science Notebook Motion, Force, and Models 5 th Science Notebook Motion, Force, and Models Investigation 1: Motion and Variables Name: Big Question: How does investigating a pendulum help you understand how scientists use math to do their work?

More information

Google PageRank. Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano

Google PageRank. Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano Google PageRank Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano fricci@unibz.it 1 Content p Linear Algebra p Matrices p Eigenvalues and eigenvectors p Markov chains p Google

More information

Student Questionnaire (s) Main Survey

Student Questionnaire (s) Main Survey School: Class: Student: Identification Label IEA Third International Mathematics and Science Study - Repeat Student Questionnaire (s) Main Survey TIMSS Study Center Boston College Chestnut Hill, MA 02467

More information

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma COMS 4771 Probabilistic Reasoning via Graphical Models Nakul Verma Last time Dimensionality Reduction Linear vs non-linear Dimensionality Reduction Principal Component Analysis (PCA) Non-linear methods

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics February 26, 2018 CS 361: Probability & Statistics Random variables The discrete uniform distribution If every value of a discrete random variable has the same probability, then its distribution is called

More information

Unit 1 Logic Unit Math 114

Unit 1 Logic Unit Math 114 Unit 1 Logic Unit Math 114 Section 1.1 Deductive and Induction Reasoning Deductive Reasoning: The application of a general statement to a specific instance. Deductive reasoning goes from general to specific

More information

Homework set 2 - Solutions

Homework set 2 - Solutions Homework set 2 - Solutions Math 495 Renato Feres Simulating a Markov chain in R Generating sample sequences of a finite state Markov chain. The following is a simple program for generating sample sequences

More information