Disjointness and Additivity

Similar documents
Midterm 2 Review. CS70 Summer Lecture 6D. David Dinh 28 July UC Berkeley

Essentials on the Analysis of Randomized Algorithms

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Approximate Counting and Markov Chain Monte Carlo

Northwestern University Department of Electrical Engineering and Computer Science

Randomized Algorithms

Quick Tour of Basic Probability Theory and Linear Algebra

Markov chains. Randomness and Computation. Markov chains. Markov processes

UC Berkeley, CS 174: Combinatorics and Discrete Probability (Fall 2008) Midterm 1. October 7, 2008

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

CSE 312 Final Review: Section AA

A Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes.

Recap. Probability, stochastic processes, Markov chains. ELEC-C7210 Modeling and analysis of communication networks

ACM 116: Lecture 2. Agenda. Independence. Bayes rule. Discrete random variables Bernoulli distribution Binomial distribution

Tail Inequalities. The Chernoff bound works for random variables that are a sum of indicator variables with the same distribution (Bernoulli trials).

Lecture Notes 7 Random Processes. Markov Processes Markov Chains. Random Processes

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

Markov Chains. X(t) is a Markov Process if, for arbitrary times t 1 < t 2 <... < t k < t k+1. If X(t) is discrete-valued. If X(t) is continuous-valued

Lecture 4: Two-point Sampling, Coupon Collector s problem

Lecture 11. Probability Theory: an Overveiw

Powerful tool for sampling from complicated distributions. Many use Markov chains to model events that arise in nature.

Recitation 2: Probability

Probability. Lecture Notes. Adolfo J. Rumbos

Week 12-13: Discrete Probability

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Guidelines for Solving Probability Problems

CS 70 Discrete Mathematics and Probability Theory Summer 2016 Dinh, Psomas, and Ye Final Exam

Midterm Exam 1 (Solutions)

INTRODUCTION TO MCMC AND PAGERANK. Eric Vigoda Georgia Tech. Lecture for CS 6505

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Chapter 5. Chapter 5 sections

INTRODUCTION TO MCMC AND PAGERANK. Eric Vigoda Georgia Tech. Lecture for CS 6505

CS70: Jean Walrand: Lecture 26.

MARKOV PROCESSES. Valerio Di Valerio

Lecture 2: Review of Probability

COMPSCI 240: Reasoning Under Uncertainty

Discrete Mathematics and Probability Theory Fall 2011 Rao Midterm 2 Solutions

BASICS OF PROBABILITY

Introduction to Stochastic Processes

Probability & Computing

Statistics 253/317 Introduction to Probability Models. Winter Midterm Exam Friday, Feb 8, 2013

CS261: A Second Course in Algorithms Lecture #18: Five Essential Tools for the Analysis of Randomized Algorithms

SDS 321: Introduction to Probability and Statistics

The Theory behind PageRank

CS145: Probability & Computing

Continuous Distributions

CSE548, AMS542: Analysis of Algorithms, Spring 2014 Date: May 12. Final In-Class Exam. ( 2:35 PM 3:50 PM : 75 Minutes )

Stat 5101 Notes: Algorithms

Probability Models. 4. What is the definition of the expectation of a discrete random variable?

X = X X n, + X 2

9/6. Grades. Exam. Card Game. Homework/quizzes: 15% (two lowest scores dropped) Midterms: 25% each Final Exam: 35%

Sample Spaces, Random Variables

6 Markov Chain Monte Carlo (MCMC)

Lecture 4: Sampling, Tail Inequalities

Lecture 21. David Aldous. 16 October David Aldous Lecture 21

Limiting Distributions

1 Stat 605. Homework I. Due Feb. 1, 2011

1 Presessional Probability

CDA5530: Performance Models of Computers and Networks. Chapter 2: Review of Practical Random Variables

IEOR 6711: Stochastic Models I SOLUTIONS to the First Midterm Exam, October 7, 2008

Midterm Exam 2 (Solutions)

Lecture 16. Lectures 1-15 Review

Expectation of Random Variables

MAS113 Introduction to Probability and Statistics. Proofs of theorems

Midterm #1. Lecture 10: Joint Distributions and the Law of Large Numbers. Joint Distributions - Example, cont. Joint Distributions - Example

Lecture 2: Repetition of probability theory and statistics

Moments. Raw moment: February 25, 2014 Normalized / Standardized moment:

Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008

6.1 Moment Generating and Characteristic Functions

Lecture 1: August 28

Random Variables. Saravanan Vijayakumaran Department of Electrical Engineering Indian Institute of Technology Bombay

Part (A): Review of Probability [Statistics I revision]

Markov Processes Hamid R. Rabiee

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 5: May 9, Abstract

Markov Chains (Part 3)

STAT 302 Introduction to Probability Learning Outcomes. Textbook: A First Course in Probability by Sheldon Ross, 8 th ed.

ECE-517: Reinforcement Learning in Artificial Intelligence. Lecture 4: Discrete-Time Markov Chains

Review of Probabilities and Basic Statistics

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

Continuous Expectation and Variance, the Law of Large Numbers, and the Central Limit Theorem Spring 2014

Name: Matriculation Number: Tutorial Group: A B C D E

Mathematical Methods for Computer Science

Spring 2012 Math 541B Exam 1

1 Random Variable: Topics

LECTURE 3. Last time:

Part I Stochastic variables and Markov chains

Week 2. Review of Probability, Random Variables and Univariate Distributions

Chapter 6: Large Random Samples Sections

SDS 321: Introduction to Probability and Statistics

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Expectation is linear. So far we saw that E(X + Y ) = E(X) + E(Y ). Let α R. Then,

Lecture 2: Review of Basic Probability Theory

Lectures for APM 541: Stochastic Modeling in Biology. Jay Taylor

Random variables. DS GA 1002 Probability and Statistics for Data Science.

MATH 3510: PROBABILITY AND STATS June 15, 2011 MIDTERM EXAM

. Find E(V ) and var(v ).

Randomized Algorithms I. Spring 2013, period III Jyrki Kivinen

Overview. CSE 21 Day 5. Image/Coimage. Monotonic Lists. Functions Probabilistic analysis

STAT/MATH 395 PROBABILITY II

GEOMETRIC -discrete A discrete random variable R counts number of times needed before an event occurs

Transcription:

Midterm 2: Format Midterm 2 Review CS70 Summer 2016 - Lecture 6D David Dinh 28 July 2016 UC Berkeley 8 questions, 190 points, 110 minutes (same as MT1). Two pages (one double-sided sheet) of handwritten notes. Coverage: we will assume knowledge of all the material from the beginning of the class to yesterday, but we will only explicitly test for material seen after MT1. We will give you a formula sheet (see MT2 logistics post on Piazza to see it). On it: all the distributions we ll expect you to know (with expectation + variance), and Chernoff bounds. Probability Basics 1 Events and Sample Spaces Disjointness and Additivity What are the probabilities? Probability space: set of outcomes, denoted with Ω. Each outcome in the probability space ω occurs with some probability. Pr[ω] [0, 1] Pr[ω] = 1 ω Ω Uniform probability space: each outcome has the same probability. An event E is a set of outcomes; the probability that an event happens is Pr[E] = ω E Pr[ω]. If A, B disjoint (no intersection): Pr[A B] = Pr[A] + Pr[B]. Pairwise disjoint events (any two are disjoint) can also be summed. Inclusion-exclusion: Pr[A B] = Pr[A] + Pr[B] Pr[A B]. Union bound: Pr[A 1 A 2... A n ] Pr[A 1 ] + Pr[A 2 ] +... Pr[A n ]. Total probability: if A 1,..., A n partition the entire sample space (disjoint, covers all of it), then Pr[B] = Pr[B A 1 ] +... + Pr[B A n ]. Events can be combined using standard set operations. 2 3 4

Conditional Probability and the Product Rule Intuitively: If I know B is true, what s the probability that A is true? Definition: Pr[A B] Pr[A B] =. Pr[B] From definition: Pr[A B] = Pr[A] Pr[B A]. Generally: product rule. Bayes Theorem II Pr[A 1... A n ] = Pr[A 1 ] Pr[A 2 A 1 ]... Pr[A n A 1... A n 1 ]. 5 Pr[A B] = Pr[A] Pr[B A] Pr[B] Or if I know for sure that exactly one of A 1,..., A n hold, then: Pr[A k B] = Pr[A k] Pr[B A k ] k Pr[A k] Pr[B A k ]. Correlation and Independence Knowing that A is true tells you nothing about B. Independence: Pr[A B] = Pr[A] Pr[B]. Equivalently: Pr[A B] = Pr[A]. Or maybe knowing that one is true tells you that the other is likely to be true, too. Positive Correlation: Pr[A B] > Pr[A] Pr[B]. Negative Correlation: Pr[A B] < Pr[A] Pr[B]. Random Variables 6 Bayes Theorem You know you will get a good grade in CS70 with some probability (prior). You take midterm 2 and get a good grade (observation). With this new information, figure out the probability that you get a good grade in CS70 (posterior). Fundamentals Random variable: function that assigns a real number X(ω) to each outcome ω in a probability space. Example: I catch 5 Pokemon. How many different kinds of Pokemon do I catch? Outcome: the exact type of each Pokemon I catch. Random variable: maps outcome to a number, e.g. {Psyduck, Typhlosion, Psyduck, Dratini, Typhlosion} 3. Discrete distributions: when there are a finite number of values an R.V. can take: pairs of values and probabilities. Probability of R.V. taking on a value: probability that an event that maps onto that value occurs. 7 8 9

Continuous RVs Probability space is infinite and maps onto a continuous set of reals. Distributions represented with a pdf Pr[X [t, t + δ]] f X (t) = lim δ 0 δ...or, equivalently, a cdf:. F X (t) = Pr[X t] = Pr[X [a, b]] = b a t f X (z)dz f X (t)dt = F X (b) F X (a) Independence Random variables X, Y are independent if the events Y = a and X = b are independent for all a, b. If X, Y independent, then f(x), g(y) independent for all f, g. Expectation Average over a huge (approaching ) number of trials. Discrete: Continuous: Functions of variables: Discrete: E[X] = t t Pr[X = t] E[X] = tf X (t)dt E[g(X)] = t g(t) Pr[X = t] Continuous: E[g(X)] = g(t)f X (t)dt 10 11 12 Expectation: Properties Variance Indicators How spread-out is my distribution? Linearity of expectation: E[a i X i] = a i i E[X i] for any random variables X i! For independent X, Y: E[XY] = E[X]E[Y]. Var[X] = E[(X E[X]) 2 ] = E[X 2 ] E[X] 2 For any X: Var[aX] = a 2 Var[X] For independent X, Y: Var[X + Y] = Var[X] + Var[Y] Standard deviation is defined as square root of variance. If A is an event, let indicator r.v. be defined as 1 if our outcome is in the event and 0 otherwise. Expectation? Same as probability that event happened! 13 14 15

Common Discrete Distributions Common Continuous Distributions Uniform: Choose random integer in some finite interval. Bernoulli: 1 w.p. p and 0 w.p. 1 p. Binomial: I catch n Pokemon. Each Pokemon is a Lucario with probability p. How many Lucarios do I catch? Or: sum of binomials! Geometric: Each Pokemon is a Shiftry with probability p. How many Pokemon do I need to catch until I first run into a Shiftry?. Memorylessness. Poisson: I catch, on average, one Pokemon every minute. How many Pokemon do I catch in an hour? We ll give the exact distribution functions, expectation and variance for these distributions to you on the exam... but you should intuitively understand them. Uniform: Pick random real number in some interval. Exponential: I catch, on average, one Pokemon every minute. When do I catch my first Pokemon? Continuous analog of geometric. Normal: Continuous analog of binomial. Models sums of lots of i.i.d. random variables (CLT). We ll give the exact pdf, expectation and variance for these distributions to you on the exam... but you should intuitively understand them. Tail Bounds and LLN 16 17 Confidence Intervals Markov Chebyshev For X non-negative, a positive: Confidence intervals: if X falls in [a, b] with probability 1 α, then we say that [a, b] is an 1 α confidence interval for X. Pr[X A] E[X] a Not a very tight bound most of the time! Or: for monotone non-decreasing function f that takes non-negative values, and non-negative X: for all a s.t. f(a) > 0. Pr[X a] E[f(X)] f(a) for all a > 0. Pr[ X E[X] a] Var[X] a 2 How did we get this? Just use Markov and use f(x) = x 2 as our function. 18 19 20

Chernoff Family of exponential bounds for sum of mutually independent 0-1 random variables. General approach to derive these: note that Bound Pr[Xgea] = Pr[e tx e ta ]. Pr[e tx e ta ] E[etX ] e ta using Markov. Choose a good t. All the bounds you need are on the equation sheet on the exam. LLN and CLT If X 1, X 2,... are pairwise independent, and identically distributed with i mean µ: Pr[ X i n µ ϵ] 0 as n. With many i.i.d. samples we converge not only to the mean, but also to a normal distribution with the same variance. CLT: Suppose X 1, X 2,... are i.i.d. random variables with expectation µ and variance σ 2. Let S n := A n nµ σ n = ( i X i) nµ σ n Then S n tends towards N (0, 1) as n. Or: Pr[S n a] 1 2π α This is an approximation, not a bound. e x2 /2 dx Markov Chains 21 22 Definitions Hitting Time State Classifications Set of states, transition probabilities, and initial distribution. State j is accessible from i: can get from i to j with nonzero probability. Equivalently: exists path from i to j. Also representable as a transition matrix. 0.9 0.1 0 P = 0 0.4 0.6 0.1 0.4 0.5 How long does it take us to get to some state j? Strategy: let β(i) be the time it takes to get to j from i, for each state i. β(j) = 0. Set up system of linear equations and solve. i accessible from j and j accessible from i: i, j communicate. If, given that we re at some state, we will see that state again with probability 1, state is recurrent. If there is a nonzero probability that we don t see state again, state is transient. Every finite chain has a recurrent state. State is periodic if, once we re at a state, we can only return to that state at evenly spaced timesteps. Ergodic state: aperiodic + recurrent. Distributions are row vectors. Timesteps correspond to matrix multiplication: π πp. 23 24 25

Markov Chain Classifications Irreducible Markov chain: all states communicate with every other state. Equivalently: graph representation is strongly connected. Periodic Markov chain: any state is periodic. Ergodic Markov chain: every state is ergodic. Any finite, irreducible, aperiodic Markov chain is ergodic. Stationary Distributions Distribution is unchanged by state. Intuitively: if I have a lot (approaching infinity) of people on the same MC: the number of people at each state is constant (even if the individual people may move around). To find limiting distribution? Solve balance equations: π = πp. Let r t i,j be the probability that we first (if i = j, we don t count the zeroth timestep) hit j exactly t timesteps after we start at i. Then h i,j = t 1 trt i,j. Suppose we are given a finite, irreducible, aperiodic Markov chain. Then: There is a unqiue stationary distribution π. For all j, i, the limit lim t P t j,i exists and is independent of j. π i = lim t P t j,i = 1/h i,i Random Walks on Undirected Graphs Markov chain on an undirected graph. At a vertex, pick edge with uniform probability and walk down it. For undirected graphs: aperiodic if and only if graph is not bipartite. Stationary distribution: π v = d(v)/(2 E ). Cover time (expected time that it takes to hit all the vertices, starting from the worst vertex possible): bounded above by 4 V E. 26 27 28 Good luck on the midterm! 28