Midterm 2 Review. CS70 Summer Lecture 6D. David Dinh 28 July UC Berkeley

Similar documents
Disjointness and Additivity

Essentials on the Analysis of Randomized Algorithms

Approximate Counting and Markov Chain Monte Carlo

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Northwestern University Department of Electrical Engineering and Computer Science

Randomized Algorithms

Quick Tour of Basic Probability Theory and Linear Algebra

Markov chains. Randomness and Computation. Markov chains. Markov processes

A Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes.

CSE 312 Final Review: Section AA

UC Berkeley, CS 174: Combinatorics and Discrete Probability (Fall 2008) Midterm 1. October 7, 2008

ACM 116: Lecture 2. Agenda. Independence. Bayes rule. Discrete random variables Bernoulli distribution Binomial distribution

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Recap. Probability, stochastic processes, Markov chains. ELEC-C7210 Modeling and analysis of communication networks

Markov Chains. X(t) is a Markov Process if, for arbitrary times t 1 < t 2 <... < t k < t k+1. If X(t) is discrete-valued. If X(t) is continuous-valued

Lecture Notes 7 Random Processes. Markov Processes Markov Chains. Random Processes

Tail Inequalities. The Chernoff bound works for random variables that are a sum of indicator variables with the same distribution (Bernoulli trials).

Powerful tool for sampling from complicated distributions. Many use Markov chains to model events that arise in nature.

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

Lecture 4: Two-point Sampling, Coupon Collector s problem

Lecture 11. Probability Theory: an Overveiw

Recitation 2: Probability

Probability. Lecture Notes. Adolfo J. Rumbos

Guidelines for Solving Probability Problems

Week 12-13: Discrete Probability

INTRODUCTION TO MCMC AND PAGERANK. Eric Vigoda Georgia Tech. Lecture for CS 6505

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Chapter 5. Chapter 5 sections

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

INTRODUCTION TO MCMC AND PAGERANK. Eric Vigoda Georgia Tech. Lecture for CS 6505

MARKOV PROCESSES. Valerio Di Valerio

CS70: Jean Walrand: Lecture 26.

Midterm Exam 1 (Solutions)

CS 70 Discrete Mathematics and Probability Theory Summer 2016 Dinh, Psomas, and Ye Final Exam

Probability & Computing

Lecture 2: Review of Probability

Discrete Mathematics and Probability Theory Fall 2011 Rao Midterm 2 Solutions

COMPSCI 240: Reasoning Under Uncertainty

CS261: A Second Course in Algorithms Lecture #18: Five Essential Tools for the Analysis of Randomized Algorithms

CS145: Probability & Computing

The Theory behind PageRank

BASICS OF PROBABILITY

Continuous Distributions

Probability Models. 4. What is the definition of the expectation of a discrete random variable?

Sample Spaces, Random Variables

Introduction to Stochastic Processes

Statistics 253/317 Introduction to Probability Models. Winter Midterm Exam Friday, Feb 8, 2013

9/6. Grades. Exam. Card Game. Homework/quizzes: 15% (two lowest scores dropped) Midterms: 25% each Final Exam: 35%

SDS 321: Introduction to Probability and Statistics

Lecture 4: Sampling, Tail Inequalities

IEOR 6711: Stochastic Models I SOLUTIONS to the First Midterm Exam, October 7, 2008

1 Presessional Probability

Stat 5101 Notes: Algorithms

CSE548, AMS542: Analysis of Algorithms, Spring 2014 Date: May 12. Final In-Class Exam. ( 2:35 PM 3:50 PM : 75 Minutes )

6 Markov Chain Monte Carlo (MCMC)

Lecture 16. Lectures 1-15 Review

Lecture 21. David Aldous. 16 October David Aldous Lecture 21

CDA5530: Performance Models of Computers and Networks. Chapter 2: Review of Practical Random Variables

Limiting Distributions

X = X X n, + X 2

Moments. Raw moment: February 25, 2014 Normalized / Standardized moment:

STAT 302 Introduction to Probability Learning Outcomes. Textbook: A First Course in Probability by Sheldon Ross, 8 th ed.

Midterm Exam 2 (Solutions)

Expectation of Random Variables

ECE-517: Reinforcement Learning in Artificial Intelligence. Lecture 4: Discrete-Time Markov Chains

Lecture 2: Repetition of probability theory and statistics

Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008

Markov Processes Hamid R. Rabiee

Lecture 1: August 28

MAS113 Introduction to Probability and Statistics. Proofs of theorems

Midterm #1. Lecture 10: Joint Distributions and the Law of Large Numbers. Joint Distributions - Example, cont. Joint Distributions - Example

6.1 Moment Generating and Characteristic Functions

1 Stat 605. Homework I. Due Feb. 1, 2011

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

Review of Probabilities and Basic Statistics

Part (A): Review of Probability [Statistics I revision]

Continuous Expectation and Variance, the Law of Large Numbers, and the Central Limit Theorem Spring 2014

Markov Chains (Part 3)

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 5: May 9, Abstract

Part I Stochastic variables and Markov chains

Mathematical Methods for Computer Science

Expectation is linear. So far we saw that E(X + Y ) = E(X) + E(Y ). Let α R. Then,

STAT/MATH 395 PROBABILITY II

Week 2. Review of Probability, Random Variables and Univariate Distributions

Random variables. DS GA 1002 Probability and Statistics for Data Science.

LECTURE 3. Last time:

Spring 2012 Math 541B Exam 1

Name: Matriculation Number: Tutorial Group: A B C D E

Random Variables. Saravanan Vijayakumaran Department of Electrical Engineering Indian Institute of Technology Bombay

Lectures for APM 541: Stochastic Modeling in Biology. Jay Taylor

SDS 321: Introduction to Probability and Statistics

1 Random Variable: Topics

MATH 3510: PROBABILITY AND STATS June 15, 2011 MIDTERM EXAM

Chapter 6: Large Random Samples Sections

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

GEOMETRIC -discrete A discrete random variable R counts number of times needed before an event occurs

Discrete Mathematics and Probability Theory Fall 2013 Vazirani Note 16. A Brief Introduction to Continuous Probability

Lecture 4a: Continuous-Time Markov Chain Models

Deep Learning for Computer Vision

Randomized Algorithms I. Spring 2013, period III Jyrki Kivinen

Transcription:

Midterm 2 Review CS70 Summer 2016 - Lecture 6D David Dinh 28 July 2016 UC Berkeley

Midterm 2: Format 8 questions, 190 points, 110 minutes (same as MT1). Two pages (one double-sided sheet) of handwritten notes. Coverage: we will assume knowledge of all the material from the beginning of the class to yesterday, but we will only explicitly test for material seen after MT1. We will give you a formula sheet (see MT2 logistics post on Piazza to see it). On it: all the distributions we ll expect you to know (with expectation + variance), and Chernoff bounds. 1

Probability Basics

Events and Sample Spaces Probability space: set of outcomes, denoted with Ω. Each outcome in the probability space ω occurs with some probability. 2

Events and Sample Spaces Probability space: set of outcomes, denoted with Ω. Each outcome in the probability space ω occurs with some probability. Pr[ω] [0, 1] Pr[ω] = 1 ω Ω 2

Events and Sample Spaces Probability space: set of outcomes, denoted with Ω. Each outcome in the probability space ω occurs with some probability. Pr[ω] [0, 1] Pr[ω] = 1 ω Ω Uniform probability space: each outcome has the same probability. 2

Events and Sample Spaces Probability space: set of outcomes, denoted with Ω. Each outcome in the probability space ω occurs with some probability. Pr[ω] [0, 1] Pr[ω] = 1 ω Ω Uniform probability space: each outcome has the same probability. An event E is a set of outcomes; the probability that an event happens is Pr[E] = ω E Pr[ω]. 2

Events and Sample Spaces Probability space: set of outcomes, denoted with Ω. Each outcome in the probability space ω occurs with some probability. Pr[ω] [0, 1] Pr[ω] = 1 ω Ω Uniform probability space: each outcome has the same probability. An event E is a set of outcomes; the probability that an event happens is Pr[E] = ω E Pr[ω]. Events can be combined using standard set operations. 2

Disjointness and Additivity If A, B disjoint (no intersection): Pr[A B] = Pr[A] + Pr[B]. Pairwise disjoint events (any two are disjoint) can also be summed. 3

Disjointness and Additivity If A, B disjoint (no intersection): Pr[A B] = Pr[A] + Pr[B]. Pairwise disjoint events (any two are disjoint) can also be summed. Inclusion-exclusion: Pr[A B] = Pr[A] + Pr[B] Pr[A B]. 3

Disjointness and Additivity If A, B disjoint (no intersection): Pr[A B] = Pr[A] + Pr[B]. Pairwise disjoint events (any two are disjoint) can also be summed. Inclusion-exclusion: Pr[A B] = Pr[A] + Pr[B] Pr[A B]. Union bound: Pr[A 1 A 2... A n ] Pr[A 1 ] + Pr[A 2 ] +... Pr[A n ]. 3

Disjointness and Additivity If A, B disjoint (no intersection): Pr[A B] = Pr[A] + Pr[B]. Pairwise disjoint events (any two are disjoint) can also be summed. Inclusion-exclusion: Pr[A B] = Pr[A] + Pr[B] Pr[A B]. Union bound: Pr[A 1 A 2... A n ] Pr[A 1 ] + Pr[A 2 ] +... Pr[A n ]. Total probability: if A 1,..., A n partition the entire sample space (disjoint, covers all of it), then Pr[B] = Pr[B A 1 ] +... + Pr[B A n ]. 3

What are the probabilities? 4

Conditional Probability and the Product Rule Intuitively: If I know B is true, what s the probability that A is true? 5

Conditional Probability and the Product Rule Intuitively: If I know B is true, what s the probability that A is true? Definition: Pr[A B] Pr[A B] =. Pr[B] 5

Conditional Probability and the Product Rule Intuitively: If I know B is true, what s the probability that A is true? Definition: Pr[A B] Pr[A B] =. Pr[B] 5

Conditional Probability and the Product Rule Intuitively: If I know B is true, what s the probability that A is true? Definition: Pr[A B] Pr[A B] =. Pr[B] From definition: Pr[A B] = Pr[A] Pr[B A]. 5

Conditional Probability and the Product Rule Intuitively: If I know B is true, what s the probability that A is true? Definition: Pr[A B] Pr[A B] =. Pr[B] From definition: Pr[A B] = Pr[A] Pr[B A]. Generally: product rule. Pr[A 1... A n ] = Pr[A 1 ] Pr[A 2 A 1 ]... Pr[A n A 1... A n 1 ]. 5

Conditional Probability and the Product Rule Intuitively: If I know B is true, what s the probability that A is true? Definition: Pr[A B] Pr[A B] =. Pr[B] From definition: Pr[A B] = Pr[A] Pr[B A]. Generally: product rule. Pr[A 1... A n ] = Pr[A 1 ] Pr[A 2 A 1 ]... Pr[A n A 1... A n 1 ]. 5

Correlation and Independence Knowing that A is true tells you nothing about B. Independence: Pr[A B] = Pr[A] Pr[B]. Equivalently: Pr[A B] = Pr[A]. 6

Correlation and Independence Knowing that A is true tells you nothing about B. Independence: Pr[A B] = Pr[A] Pr[B]. Equivalently: Pr[A B] = Pr[A]. Or maybe knowing that one is true tells you that the other is likely to be true, too. Positive Correlation: Pr[A B] > Pr[A] Pr[B]. Negative Correlation: Pr[A B] < Pr[A] Pr[B]. 6

Bayes Theorem 7

Bayes Theorem You know you will get a good grade in CS70 with some probability (prior). You take midterm 2 and get a good grade (observation). With this new information, figure out the probability that you get a good grade in CS70 (posterior). 7

Bayes Theorem II Pr[A B] = Pr[A] Pr[B A] Pr[B] 8

Bayes Theorem II Pr[A B] = Pr[A] Pr[B A] Pr[B] Or if I know for sure that exactly one of A 1,..., A n hold, then: Pr[A k B] = Pr[A k] Pr[B A k ] k Pr[A k] Pr[B A k ]. 8

Bayes Theorem II Pr[A B] = Pr[A] Pr[B A] Pr[B] Or if I know for sure that exactly one of A 1,..., A n hold, then: Pr[A k B] = Pr[A k] Pr[B A k ] k Pr[A k] Pr[B A k ]. 8

Random Variables

Fundamentals Random variable: function that assigns a real number X(ω) to each outcome ω in a probability space. 9

Fundamentals Random variable: function that assigns a real number X(ω) to each outcome ω in a probability space. Example: I catch 5 Pokemon. How many different kinds of Pokemon do I catch? 9

Fundamentals Random variable: function that assigns a real number X(ω) to each outcome ω in a probability space. Example: I catch 5 Pokemon. How many different kinds of Pokemon do I catch? Outcome: the exact type of each Pokemon I catch. 9

Fundamentals Random variable: function that assigns a real number X(ω) to each outcome ω in a probability space. Example: I catch 5 Pokemon. How many different kinds of Pokemon do I catch? Outcome: the exact type of each Pokemon I catch. Random variable: maps outcome to a number, e.g. {Psyduck, Typhlosion, Psyduck, Dratini, Typhlosion} 3. 9

Fundamentals Random variable: function that assigns a real number X(ω) to each outcome ω in a probability space. Example: I catch 5 Pokemon. How many different kinds of Pokemon do I catch? Outcome: the exact type of each Pokemon I catch. Random variable: maps outcome to a number, e.g. {Psyduck, Typhlosion, Psyduck, Dratini, Typhlosion} 3. Discrete distributions: when there are a finite number of values an R.V. can take: pairs of values and probabilities. Probability of R.V. taking on a value: probability that an event that maps onto that value occurs. 9

Continuous RVs Probability space is infinite and maps onto a continuous set of reals. 10

Continuous RVs Probability space is infinite and maps onto a continuous set of reals. Distributions represented with a pdf Pr[X [t, t + δ]] f X (t) = lim δ 0 δ...or, equivalently, a cdf:. F X (t) = Pr[X t] = t f X (z)dz 10

Continuous RVs Probability space is infinite and maps onto a continuous set of reals. Distributions represented with a pdf Pr[X [t, t + δ]] f X (t) = lim δ 0 δ...or, equivalently, a cdf:. F X (t) = Pr[X t] = Pr[X [a, b]] = b a t f X (z)dz f X (t)dt = F X (b) F X (a) 10

Continuous RVs Probability space is infinite and maps onto a continuous set of reals. Distributions represented with a pdf Pr[X [t, t + δ]] f X (t) = lim δ 0 δ...or, equivalently, a cdf:. F X (t) = Pr[X t] = Pr[X [a, b]] = b a t f X (z)dz f X (t)dt = F X (b) F X (a) 10

Independence Random variables X, Y are independent if the events Y = a and X = b are independent for all a, b. 11

Independence Random variables X, Y are independent if the events Y = a and X = b are independent for all a, b. If X, Y independent, then f(x), g(y) independent for all f, g. 11

Expectation Average over a huge (approaching ) number of trials. 12

Expectation Average over a huge (approaching ) number of trials. Discrete: E[X] = t t Pr[X = t] 12

Expectation Average over a huge (approaching ) number of trials. Discrete: E[X] = t t Pr[X = t] Continuous: E[X] = tf X (t)dt 12

Expectation Average over a huge (approaching ) number of trials. Discrete: E[X] = t t Pr[X = t] Continuous: E[X] = tf X (t)dt Functions of variables: 12

Expectation Average over a huge (approaching ) number of trials. Discrete: E[X] = t t Pr[X = t] Continuous: E[X] = tf X (t)dt Functions of variables: Discrete: E[g(X)] = t g(t) Pr[X = t] 12

Expectation Average over a huge (approaching ) number of trials. Discrete: E[X] = t t Pr[X = t] Continuous: E[X] = tf X (t)dt Functions of variables: Discrete: E[g(X)] = t g(t) Pr[X = t] Continuous: E[g(X)] = g(t)f X (t)dt 12

Expectation: Properties Linearity of expectation: E[a i X i] = a i i E[X i] for any random variables X i! 13

Expectation: Properties Linearity of expectation: E[a i X i] = a i i E[X i] for any random variables X i! For independent X, Y: E[XY] = E[X]E[Y]. 13

Variance How spread-out is my distribution? 14

Variance How spread-out is my distribution? Var[X] = E[(X E[X]) 2 ] = E[X 2 ] E[X] 2 14

Variance How spread-out is my distribution? For any X: Var[aX] = a 2 Var[X] Var[X] = E[(X E[X]) 2 ] = E[X 2 ] E[X] 2 For independent X, Y: Var[X + Y] = Var[X] + Var[Y] 14

Variance How spread-out is my distribution? Var[X] = E[(X E[X]) 2 ] = E[X 2 ] E[X] 2 For any X: Var[aX] = a 2 Var[X] For independent X, Y: Var[X + Y] = Var[X] + Var[Y] Standard deviation is defined as square root of variance. 14

Indicators If A is an event, let indicator r.v. be defined as 1 if our outcome is in the event and 0 otherwise. 15

Indicators If A is an event, let indicator r.v. be defined as 1 if our outcome is in the event and 0 otherwise. Expectation? Same as probability that event happened! 15

Common Discrete Distributions Uniform: Choose random integer in some finite interval. 16

Common Discrete Distributions Uniform: Choose random integer in some finite interval. Bernoulli: 1 w.p. p and 0 w.p. 1 p. 16

Common Discrete Distributions Uniform: Choose random integer in some finite interval. Bernoulli: 1 w.p. p and 0 w.p. 1 p. Binomial: I catch n Pokemon. Each Pokemon is a Lucario with probability p. How many Lucarios do I catch? Or: sum of binomials! 16

Common Discrete Distributions Uniform: Choose random integer in some finite interval. Bernoulli: 1 w.p. p and 0 w.p. 1 p. Binomial: I catch n Pokemon. Each Pokemon is a Lucario with probability p. How many Lucarios do I catch? Or: sum of binomials! Geometric: Each Pokemon is a Shiftry with probability p. How many Pokemon do I need to catch until I first run into a Shiftry?. Memorylessness. 16

Common Discrete Distributions Uniform: Choose random integer in some finite interval. Bernoulli: 1 w.p. p and 0 w.p. 1 p. Binomial: I catch n Pokemon. Each Pokemon is a Lucario with probability p. How many Lucarios do I catch? Or: sum of binomials! Geometric: Each Pokemon is a Shiftry with probability p. How many Pokemon do I need to catch until I first run into a Shiftry?. Memorylessness. Poisson: I catch, on average, one Pokemon every minute. How many Pokemon do I catch in an hour? 16

Common Discrete Distributions Uniform: Choose random integer in some finite interval. Bernoulli: 1 w.p. p and 0 w.p. 1 p. Binomial: I catch n Pokemon. Each Pokemon is a Lucario with probability p. How many Lucarios do I catch? Or: sum of binomials! Geometric: Each Pokemon is a Shiftry with probability p. How many Pokemon do I need to catch until I first run into a Shiftry?. Memorylessness. Poisson: I catch, on average, one Pokemon every minute. How many Pokemon do I catch in an hour? We ll give the exact distribution functions, expectation and variance for these distributions to you on the exam... 16

Common Discrete Distributions Uniform: Choose random integer in some finite interval. Bernoulli: 1 w.p. p and 0 w.p. 1 p. Binomial: I catch n Pokemon. Each Pokemon is a Lucario with probability p. How many Lucarios do I catch? Or: sum of binomials! Geometric: Each Pokemon is a Shiftry with probability p. How many Pokemon do I need to catch until I first run into a Shiftry?. Memorylessness. Poisson: I catch, on average, one Pokemon every minute. How many Pokemon do I catch in an hour? We ll give the exact distribution functions, expectation and variance for these distributions to you on the exam... but you should intuitively understand them. 16

Common Continuous Distributions Uniform: Pick random real number in some interval. 17

Common Continuous Distributions Uniform: Pick random real number in some interval. Exponential: I catch, on average, one Pokemon every minute. When do I catch my first Pokemon? Continuous analog of geometric. 17

Common Continuous Distributions Uniform: Pick random real number in some interval. Exponential: I catch, on average, one Pokemon every minute. When do I catch my first Pokemon? Continuous analog of geometric. Normal: Continuous analog of binomial. Models sums of lots of i.i.d. random variables (CLT). 17

Common Continuous Distributions Uniform: Pick random real number in some interval. Exponential: I catch, on average, one Pokemon every minute. When do I catch my first Pokemon? Continuous analog of geometric. Normal: Continuous analog of binomial. Models sums of lots of i.i.d. random variables (CLT). We ll give the exact pdf, expectation and variance for these distributions to you on the exam... but you should intuitively understand them. 17

Tail Bounds and LLN

Confidence Intervals Confidence intervals: if X falls in [a, b] with probability 1 α, then we say that [a, b] is an 1 α confidence interval for X. 18

Markov For X non-negative, a positive: Pr[X A] E[X] a Not a very tight bound most of the time! 19

Markov For X non-negative, a positive: Pr[X A] E[X] a Not a very tight bound most of the time! Or: for monotone non-decreasing function f that takes non-negative values, and non-negative X: for all a s.t. f(a) > 0. Pr[X a] E[f(X)] f(a) 19

Chebyshev for all a > 0. Pr[ X E[X] a] Var[X] a 2 20

Chebyshev for all a > 0. Pr[ X E[X] a] Var[X] a 2 How did we get this? Just use Markov and use f(x) = x 2 as our function. 20

Chernoff Family of exponential bounds for sum of mutually independent 0-1 random variables. 21

Chernoff Family of exponential bounds for sum of mutually independent 0-1 random variables. General approach to derive these: note that Bound using Markov. Choose a good t. Pr[Xgea] = Pr[e tx e ta ]. Pr[e tx e ta ] E[etX ] e ta 21

Chernoff Family of exponential bounds for sum of mutually independent 0-1 random variables. General approach to derive these: note that Bound using Markov. Choose a good t. Pr[Xgea] = Pr[e tx e ta ]. Pr[e tx e ta ] E[etX ] e ta All the bounds you need are on the equation sheet on the exam. 21

LLN and CLT If X 1, X 2,... are pairwise independent, and identically distributed with i mean µ: Pr[ X i n µ ϵ] 0 as n. 22

LLN and CLT If X 1, X 2,... are pairwise independent, and identically distributed with i mean µ: Pr[ X i n µ ϵ] 0 as n. With many i.i.d. samples we converge not only to the mean, but also to a normal distribution with the same variance. CLT: Suppose X 1, X 2,... are i.i.d. random variables with expectation µ and variance σ 2. Let S n := A n nµ σ n = ( i X i) nµ σ n Then S n tends towards N (0, 1) as n. 22

LLN and CLT If X 1, X 2,... are pairwise independent, and identically distributed with i mean µ: Pr[ X i n µ ϵ] 0 as n. With many i.i.d. samples we converge not only to the mean, but also to a normal distribution with the same variance. CLT: Suppose X 1, X 2,... are i.i.d. random variables with expectation µ and variance σ 2. Let S n := A n nµ σ n = ( i X i) nµ σ n Then S n tends towards N (0, 1) as n. Or: Pr[S n a] 1 2π α e x2 /2 dx 22

LLN and CLT If X 1, X 2,... are pairwise independent, and identically distributed with i mean µ: Pr[ X i n µ ϵ] 0 as n. With many i.i.d. samples we converge not only to the mean, but also to a normal distribution with the same variance. CLT: Suppose X 1, X 2,... are i.i.d. random variables with expectation µ and variance σ 2. Let S n := A n nµ σ n = ( i X i) nµ σ n Then S n tends towards N (0, 1) as n. Or: Pr[S n a] 1 2π α e x2 /2 dx This is an approximation, not a bound. 22

Markov Chains

Definitions Set of states, transition probabilities, and initial distribution. 23

Definitions Set of states, transition probabilities, and initial distribution. 23

Definitions Set of states, transition probabilities, and initial distribution. Also representable as a transition matrix. 0.9 0.1 0 P = 0 0.4 0.6 0.1 0.4 0.5 23

Definitions Set of states, transition probabilities, and initial distribution. Also representable as a transition matrix. 0.9 0.1 0 P = 0 0.4 0.6 0.1 0.4 0.5 Distributions are row vectors. Timesteps correspond to matrix multiplication: π πp. 23

Hitting Time How long does it take us to get to some state j? 24

Hitting Time How long does it take us to get to some state j? Strategy: let β(i) be the time it takes to get to j from i, for each state i. β(j) = 0. 24

Hitting Time How long does it take us to get to some state j? Strategy: let β(i) be the time it takes to get to j from i, for each state i. β(j) = 0. Set up system of linear equations and solve. 24

State Classifications State j is accessible from i: can get from i to j with nonzero probability. Equivalently: exists path from i to j. 25

State Classifications State j is accessible from i: can get from i to j with nonzero probability. Equivalently: exists path from i to j. i accessible from j and j accessible from i: i, j communicate. 25

State Classifications State j is accessible from i: can get from i to j with nonzero probability. Equivalently: exists path from i to j. i accessible from j and j accessible from i: i, j communicate. If, given that we re at some state, we will see that state again with probability 1, state is recurrent. If there is a nonzero probability that we don t see state again, state is transient. 25

State Classifications State j is accessible from i: can get from i to j with nonzero probability. Equivalently: exists path from i to j. i accessible from j and j accessible from i: i, j communicate. If, given that we re at some state, we will see that state again with probability 1, state is recurrent. If there is a nonzero probability that we don t see state again, state is transient. Every finite chain has a recurrent state. State is periodic if, once we re at a state, we can only return to that state at evenly spaced timesteps. 25

State Classifications State j is accessible from i: can get from i to j with nonzero probability. Equivalently: exists path from i to j. i accessible from j and j accessible from i: i, j communicate. If, given that we re at some state, we will see that state again with probability 1, state is recurrent. If there is a nonzero probability that we don t see state again, state is transient. Every finite chain has a recurrent state. State is periodic if, once we re at a state, we can only return to that state at evenly spaced timesteps. Ergodic state: aperiodic + recurrent. 25

Markov Chain Classifications Irreducible Markov chain: all states communicate with every other state. Equivalently: graph representation is strongly connected. 26

Markov Chain Classifications Irreducible Markov chain: all states communicate with every other state. Equivalently: graph representation is strongly connected. Periodic Markov chain: any state is periodic. 26

Markov Chain Classifications Irreducible Markov chain: all states communicate with every other state. Equivalently: graph representation is strongly connected. Periodic Markov chain: any state is periodic. Ergodic Markov chain: every state is ergodic. Any finite, irreducible, aperiodic Markov chain is ergodic. 26

Stationary Distributions Distribution is unchanged by state. Intuitively: if I have a lot (approaching infinity) of people on the same MC: the number of people at each state is constant (even if the individual people may move around). 27

Stationary Distributions Distribution is unchanged by state. Intuitively: if I have a lot (approaching infinity) of people on the same MC: the number of people at each state is constant (even if the individual people may move around). To find limiting distribution? Solve balance equations: π = πp. 27

Stationary Distributions Distribution is unchanged by state. Intuitively: if I have a lot (approaching infinity) of people on the same MC: the number of people at each state is constant (even if the individual people may move around). To find limiting distribution? Solve balance equations: π = πp. Let r t i,j be the probability that we first (if i = j, we don t count the zeroth timestep) hit j exactly t timesteps after we start at i. 27

Stationary Distributions Distribution is unchanged by state. Intuitively: if I have a lot (approaching infinity) of people on the same MC: the number of people at each state is constant (even if the individual people may move around). To find limiting distribution? Solve balance equations: π = πp. Let r t i,j be the probability that we first (if i = j, we don t count the zeroth timestep) hit j exactly t timesteps after we start at i. Then h i,j = t 1 trt i,j. 27

Stationary Distributions Distribution is unchanged by state. Intuitively: if I have a lot (approaching infinity) of people on the same MC: the number of people at each state is constant (even if the individual people may move around). To find limiting distribution? Solve balance equations: π = πp. Let r t i,j be the probability that we first (if i = j, we don t count the zeroth timestep) hit j exactly t timesteps after we start at i. Then h i,j = t 1 trt i,j. Suppose we are given a finite, irreducible, aperiodic Markov chain. Then: 27

Stationary Distributions Distribution is unchanged by state. Intuitively: if I have a lot (approaching infinity) of people on the same MC: the number of people at each state is constant (even if the individual people may move around). To find limiting distribution? Solve balance equations: π = πp. Let r t i,j be the probability that we first (if i = j, we don t count the zeroth timestep) hit j exactly t timesteps after we start at i. Then h i,j = t 1 trt i,j. Suppose we are given a finite, irreducible, aperiodic Markov chain. Then: There is a unqiue stationary distribution π. 27

Stationary Distributions Distribution is unchanged by state. Intuitively: if I have a lot (approaching infinity) of people on the same MC: the number of people at each state is constant (even if the individual people may move around). To find limiting distribution? Solve balance equations: π = πp. Let r t i,j be the probability that we first (if i = j, we don t count the zeroth timestep) hit j exactly t timesteps after we start at i. Then h i,j = t 1 trt i,j. Suppose we are given a finite, irreducible, aperiodic Markov chain. Then: There is a unqiue stationary distribution π. For all j, i, the limit lim t P t j,i exists and is independent of j. 27

Stationary Distributions Distribution is unchanged by state. Intuitively: if I have a lot (approaching infinity) of people on the same MC: the number of people at each state is constant (even if the individual people may move around). To find limiting distribution? Solve balance equations: π = πp. Let r t i,j be the probability that we first (if i = j, we don t count the zeroth timestep) hit j exactly t timesteps after we start at i. Then h i,j = t 1 trt i,j. Suppose we are given a finite, irreducible, aperiodic Markov chain. Then: There is a unqiue stationary distribution π. For all j, i, the limit lim t P t j,i exists and is independent of j. π i = lim t P t j,i = 1/h i,i 27

Random Walks on Undirected Graphs Markov chain on an undirected graph. At a vertex, pick edge with uniform probability and walk down it. 28

Random Walks on Undirected Graphs Markov chain on an undirected graph. At a vertex, pick edge with uniform probability and walk down it. For undirected graphs: aperiodic if and only if graph is not bipartite. 28

Random Walks on Undirected Graphs Markov chain on an undirected graph. At a vertex, pick edge with uniform probability and walk down it. For undirected graphs: aperiodic if and only if graph is not bipartite. Stationary distribution: π v = d(v)/(2 E ). 28

Random Walks on Undirected Graphs Markov chain on an undirected graph. At a vertex, pick edge with uniform probability and walk down it. For undirected graphs: aperiodic if and only if graph is not bipartite. Stationary distribution: π v = d(v)/(2 E ). Cover time (expected time that it takes to hit all the vertices, starting from the worst vertex possible): bounded above by 4 V E. 28

Good luck on the midterm! 28