Lecture 08: Poisson and More. Lisa Yan July 13, 2018

Similar documents
More discrete distributions

Binomial in the Limit

Poisson Chris Piech CS109, Stanford University. Piech, CS106A, Stanford University

a zoo of (discrete) random variables

a zoo of (discrete) random variables

Bernoulli and Binomial

CSE 312, 2011 Winter, W.L.Ruzzo. 6. random variables

Lecture 13: Covariance. Lisa Yan July 25, 2018

Math/Stat 352 Lecture 8

random variables T T T T H T H H

Common Discrete Distributions

Continuous Variables Chris Piech CS109, Stanford University

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

Random Variables Example:

Lecture 10: Normal RV. Lisa Yan July 18, 2018

ELEG 3143 Probability & Stochastic Process Ch. 2 Discrete Random Variables

Debugging Intuition. How to calculate the probability of at least k successes in n trials?

Analysis of Engineering and Scientific Data. Semester

Lecture 18: Central Limit Theorem. Lisa Yan August 6, 2018

Expectations. Definition Let X be a discrete rv with set of possible values D and pmf p(x). The expected value or mean value of X, denoted by E(X ) or

Fundamental Tools - Probability Theory II

Chapter 3 Discrete Random Variables

SDS 321: Introduction to Probability and Statistics

Discrete Distributions

Topic 9 Examples of Mass Functions and Densities

Outline PMF, CDF and PDF Mean, Variance and Percentiles Some Common Distributions. Week 5 Random Variables and Their Distributions

SDS 321: Introduction to Probability and Statistics

Relationship between probability set function and random variable - 2 -

Lecture 16. Lectures 1-15 Review

Quick review on Discrete Random Variables

Random Variables. Definition: A random variable (r.v.) X on the probability space (Ω, F, P) is a mapping

p. 4-1 Random Variables

Notes 12 Autumn 2005

CMPSCI 240: Reasoning Under Uncertainty

Lecture 13. Poisson Distribution. Text: A Course in Probability by Weiss 5.5. STAT 225 Introduction to Probability Models February 16, 2014

STAT 430/510 Probability Lecture 7: Random Variable and Expectation

Statistics for Economists. Lectures 3 & 4

Lecture 3. Discrete Random Variables

Discrete Random Variable

3 Multiple Discrete Random Variables

Special Discrete RV s. Then X = the number of successes is a binomial RV. X ~ Bin(n,p).

Mathematical Statistics 1 Math A 6330

Discrete Random Variables

Lecture 10: Bayes' Theorem, Expected Value and Variance Lecturer: Lale Özkahya

Things to remember when learning probability distributions:

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

UCSD CSE 21, Spring 2014 [Section B00] Mathematics for Algorithm and System Analysis

Discrete random variables and probability distributions

Chapter 4 : Discrete Random Variables

Binomial and Poisson Probability Distributions

Binomial random variable

BINOMIAL DISTRIBUTION

Chapter 3. Discrete Random Variables and Their Probability Distributions

Introduction to Probability and Statistics Slides 3 Chapter 3

Chapter 2: Discrete Distributions. 2.1 Random Variables of the Discrete Type

Topic 3: The Expectation of a Random Variable

Special Mathematics Discrete random variables

CSE 312: Foundations of Computing II Random Variables, Linearity of Expectation 4 Solutions

What is a random variable

STAT 430/510: Lecture 10

HW on Ch Let X be a discrete random variable with V (X) = 8.6, then V (3X+5.6) is. V (3X + 5.6) = 3 2 V (X) = 9(8.6) = 77.4.

Notes for Math 324, Part 17

Stochastic Models of Manufacturing Systems

STA 4321/5325 Solution to Extra Homework 1 February 8, 2017

Discrete Distributions

(Ch 3.4.1, 3.4.2, 4.1, 4.2, 4.3)

Midterm Exam 1 Solution

Independent random variables

A random variable is a quantity whose value is determined by the outcome of an experiment.

A Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes.

STAT 430/510: Lecture 16

Continuous Random Variables. What continuous random variables are and how to use them. I can give a definition of a continuous random variable.

Brief Review of Probability

CDA6530: Performance Models of Computers and Networks. Chapter 2: Review of Practical Random Variables

CS 237: Probability in Computing

Lecture 14. Text: A Course in Probability by Weiss 5.6. STAT 225 Introduction to Probability Models February 23, Whitney Huang Purdue University

4. Discrete Probability Distributions. Introduction & Binomial Distribution

Continuous Distributions

Randomized Load Balancing:The Power of 2 Choices

Solutions to Problem Set 4

Conditional Probability

CS5314 Randomized Algorithms. Lecture 15: Balls, Bins, Random Graphs (Hashing)

Discrete Probability Distributions

Chapter 3. Discrete Random Variables and Their Probability Distributions

CSE 312, 2012 Autumn, W.L.Ruzzo. 6. random variables T T T T H T H H

UCSD CSE 21, Spring 2014 [Section B00] Mathematics for Algorithm and System Analysis

STAT509: Discrete Random Variable

(Ch 3.4.1, 3.4.2, 4.1, 4.2, 4.3)

SDS 321: Introduction to Probability and Statistics

Random Variable And Probability Distribution. Is defined as a real valued function defined on the sample space S. We denote it as X, Y, Z,

Lecture 10. Variance and standard deviation

An-Najah National University Faculty of Engineering Industrial Engineering Department. Course : Quantitative Methods (65211)

Discrete Probability

Lecture 2: Repetition of probability theory and statistics

CSE 312 Final Review: Section AA

1 Review of Probability

Lecture 04: Conditional Probability. Lisa Yan July 2, 2018

Continuous-Valued Probability Review

Chapter 5. Chapter 5 sections

IE 230 Probability & Statistics in Engineering I. Closed book and notes. 120 minutes.

Transcription:

Lecture 08: Poisson and More Lisa Yan July 13, 2018

Announcements PS1: Grades out later today Solutions out after class today PS2 due today PS3 out today (due next Friday 7/20) 2

Midterm announcement Tuesday, July 24, 7-9pm Hewlett 201 Closed book, closed computer, no calculators Two 8.5 x 11 pages of notes (front and back) allowed Covers up to and including Friday 7/20 s lecture Practice midterm (and PS4) to be released mid-next week If you need alternate accommodations (OAE, academic reasons) and you have not yet heard from me, contact me ASAP 3

Goals for today More discrete distributions Poisson approximation Geometric Negative Binomial 4

Summary of RVs Expectation: E[X] = i x i p(x i ) E[aX + b] = ae[x] + b Linearity of Expectation E[g(X)] = i g(x i ) p(x i ) Variance: Var(X) = E[(X E[X]) 2 ] = E[X 2 ] (E[X]) 2 Var(aX + b) = a 2 Var(X) 5

Discrete RV distributions, part 1 X ~ Ber(p) X {0,1} 1 = success w.p. p, 0 = failure w.p. (1 p) Y ~ Bin(n,p) n Y = Xi i=1 s.t. X i ~ Ber(p) Z ~ Poi(l) Binomial is sum of n independent Bernoullis Poisson is Binomial as n, p 0, where l = np 6

Poisson is Binomial in Limit Bin(10,0.3), Bin(100,0.03), Poi(3) When n is large, p is small, and l = np is moderate, Poisson approximates Binomial. moderate? n > 20 and p < 0.05 n > 100 and p < 0.1 n, p 0 P(X=k) k 7

Poisson vs Binomial (in Python) from matplotlib import pyplot as plt import numpy as np from scipy.stats import poisson, binom def plot_pmf(lamb, n, p): fig = plt.figure() ax = plt.gca() # reasonable x range with nonzero probabilities x = np.arange(poisson.ppf(0.01, lamb), poisson.ppf(0.9999, lamb)) w = (x[1]-x[0])/3 ax.bar(x, poisson.pmf(x, lamb), width=w, alpha=0.5, label= poi({}).format(lamb)) ax.bar(x+w, binom.pmf(x, n, p), width=w, alpha=0.5, label= bin({},{}).format(n,str(p)[1:])) ax.legend() plt.show() # "moderate": n > 20, p < 0.05 or n > 100, p < 0.1 lamb = 500 # poisson parameter, lambda n = 1000 # binomial parameters n p = lamb/float(n) # p plot_pmf(lamb, n, p) 8

Poisson is like the pizza world record Bake a giant Neapolitan pizza (2km long) Ingredients: Many basil leaves, lots of other stuff Slice the pizza Some slices get more basil, some slices get less Pick a slice Let X be # of basil leaves in this slice. X ~ Poi(l),where l = total # basil leaves # pizza slices 9

Poisson is life? Two ways to think about Poisson: # events that occur in an interval of time 1 minute =... 60000 milliseconds # basil leaves in a particular slice pizza slice, where there are many slices... 0 1 0 0 1...... 0 0 1 0 0... n 10

CS = Show me your stepback Hash tables Strings = basil leaves Buckets = pizza slices Server crashes in data center Crashed servers = basil leaves Server rack = particular slice of pizza Facebook login requests (i.e., web server requests) Requests = basil leaves Server receiving request = pizza slice 11

Defective Chips Computer chips are produced. p = 0.1 that a chip is defective (chips are independent) Consider a sample of n = 10 chips. P(sample contains 1 defective chip)? Solution: Define:Y = # defective chips in sample. P(Y 1) = P(Y = 0) + P(Y = 1) Using Y ~ Bin(10,0.1): P(Y 1) = 10 0 (0.1)0 (0.9) 10 + 10 1 (0.1)1 (0.9) 9 0.7361 Using Y ~ Poi(l =(0.1)(10) = 1): P(Y 1) = 1 0 0 ) 1 + 1+, ) 1 = 2 ) 1 0.7358 12

The Poisson Paradigm Poisson can still provide a good approximation, even when assumptions are mildly violated. You can apply the Poisson approximation when: Successes in trials are not entirely independent e.g.: # entries in each bucket in large hash table. Probability of Success in each trial varies (slightly), like a small relative change in a very small p e.g.: Average # requests to web server/sec may fluctuate slightly due to load on network 13

Birthday Poisson Problem What is the probability that of m people, none share the same birthday regardless of year? To use Poisson approximation: Define trial (with probability of success, p) Pair of people (x, y), x y E x,y = x and y have same birthday (trial success) p = P(E x,y ) = 1/365 not all independent m Compute # of said trials, n n = 2 = m(m 1)/2 Compute l and answer problem. X ~ Poi(l) where l = np = m(m 1)/730 P(X = 0) = (m(m 1)/730) 0 # m(m 1)/730 0 14

Birthday Poisson Problem What is the probability that of m people, none share the same birthday regardless of year? To use Poisson approximation: Define trial (with probability of success, p) m Compute # of said trials, n n = 2 = m(m 1)/2 Compute l and answer problem. X ~ Poi(l) where l = np = m(m 1)/730 (m(m 1)/730) 0 P(X = 0) = # m(m 1)/730 = # 0 m(m 1)/730 Solve for smallest integer m: # m(m 1)/730 0.5 ln(# m(m 1)/730 ) ln(0.5) à Pair w/same birthday, p = 1/365 m 23 same as before 15

Break Attendance: tinyurl.com/cs109summer2018 16

Geometric RV Consider independent trials of Ber(p) random variables. X is # of trials until first success. Geometric RV, X: X ~ Geo(p) E[X] = 1/p Var(X) = (1 p)/p 2 P(X=k) = p X (k) = 1 p k 1 p X {1, 2, } = Z + Examples: Flipping a coin (P(heads) = p) until first heads appears Urn with N red and M white balls. Draw balls (with replacement), p = N/(N+M) until draw first red ball. Generate bits with P(bit = 1) = p until first 1 generated 17

CDF of a Geometric RV P(X=k) = p X (k) = 1 p k 1 p Let X ~ Geo(p). (# of trials until first success, P(trial success) = p) Recall CDF = cumulative distribution function, F X (k) = P(X k). What is the CDF of X? Solution 1: P(X k) = k m=1 P(X=m) = k m=1 = p k 1 m=0 (1 p) m (1 p) m 1 p Calculation Reference: n n+ 1 i 1- x å x = 1- x i= 0 = p 1 1 p k 1+1 1 (1 p) = 1 (1 p) k 18

CDF of a Geometric RV P(X=k) = p X (k) = 1 p k-1 p Let X ~ Geo(p). (# of trials until first success, P(trial success) = p) Recall CDF = cumulative distribution function, F X (k) = P(X k). What is the CDF of X? Solution 2: Define: C i = success on ith trial P(X k) = 1 P(X > k) = 1 P(C 1C C 2C C kc ) = 1 P(C 1C )P(C 2C ) P(C kc ) = 1 (1 p) k 19

Catching Pokemon Wild Pokemon are captured by throwing Pokeballs at them. Each ball has probability p = 0.1 of capturing the Pokemon. Each ball is an independent trial. How many do you need to bring with you so that the probability of successful capture is at least 99%? Solution: Define: X = # of Pokeballs needed until capture X ~ Geo(p) WTF: F X (k) = P(X k) 0.99 P(X k) = 1 (1 p) k 0.99 0.01 (1 p) k ln(0.01) ln(0.9) ln(0.01) k ln(1 p) à k 43.7 20

Negative Binomial RV Consider independent trials of Ber(p) random variables. X is # of trials until r th success. Negative Binomial RV, X: X ~ NegBin(r,p) P(X=k) = E[X] = r/p Var(X) = r (1 p)/p 2 X {r, r+1, r + 2, } Examples: # of coin flips until r-th heads appears # of strings to hash into table until bucket 1 has r entries k 1 r 1 pr 1 p k r Note: Geo(p) = NegBin(1,p) 21

Getting that degree A conference accepts papers randomly (???). Each paper is accepted independently with probability p = 0.25. A hypothetical grad student needs 3 accepted papers to graduate. What is P(grad student needs exactly 10 submissions)? Solution: Define: X = # of tries to get 3 accepts X ~ NegBin(3, 0.25) P(X=k) = k 1 r 1 pr 1 p k r WTF: P(X = 10) = 10 1 3 1 0.253 1 0.25 10 3 9 2 0.253 0.75 7 = 0.075 22

Hypergeometric RV Consider m red balls and N m black balls in a hat. Draw n balls without replacement. X is # of red balls drawn. Hyper Geometric RV, X: X ~ HypG(n,N,m) P(X=k) = E[X] = n (m/n) m k N m n k N n optional Var(X) = [nm (N n)(n m)]/ [N 2 (N 1)] X {0, 1, 2,, m} Note: If p = m/n (probability of drawing red on 1 st draw), HypG(n, N, m) à Bin(n, m/n), as N à m/n remains constant 23

Defining Random Variables and PMFs. 1. Read the problem. Define your experiment. 2. Determine the constant factors in the experiment and the variable desired outcome we want. 3. Define your random variable. 4. Pattern match to an existing probability mass function, or define a new one via counting. 24

Check your experiment goals Want To Find (WTF): # successes in an experiment 1. Start and finish experiment. 2. Review and count occurrences. Bernoulli: 1. Flip coin once. 2. Check if heads. Binomial: 1. Flip coin n times. 2. Check if k heads. Poisson: 1. Wait a unit of time. 2. Check if k occurrences. # trials until success 1. Start experiment. 2. Once success is observed, finish the experiment. Geometric: 1. Flip coin repeatedly. 2. If Heads, exit. Negative-Binomial: 1. Flip coin repeatedly 2. If k heads, exit. 25

More RV practice Are the below Ber(p), Bin(n,p), Poi(l), Geo(p), NegBin(r,p)? 1. # of snapchats you receive in a day Poi(l) 2. # of children until the first one with brown eyes Geo(p) 3. Whether stock went up or down Ber(p) aka Bin(1,p) 4. # of probability problems you try until you get 5 correct (if you are randomly correct) NegBin(r,p) 5. # of years in some decade with more than 6 Atlantic hurricanes Bin(n,p), where p = P( 6 hurricanes in a year) 26

Midterm question: Bit Coin Mining You mine a bitcoin if, for a given data D, you find a number N such that Hash(D,N) produces an M-length bitstring that starts with g zeros. Miner SHA-256 Hash( Data, ) Choice N M bits, all outcomes equally likely 1. What is the probability that the first number you try will produce a (random) bit string that starts with g zeros (i.e., you mine a bitcoin)? WTF: P(M bits starts with g zeros) P(M-length starts with g zeros) = (0.5) g 2. How many different numbers do you expect to have to try before you mine 5 bitcoins? WTF: Expectation of # of tries of different N until 5 coins Observation: P(a particular N mines a bitcoin) = (0.5) g Define: X = # trials until 5 successes X ~ NegBin(r,p), r = 5, p = (0.5) g E[X] = 5/(0.5) g 27