Lecture 16. Lectures 1-15 Review

Similar documents
Lecture 10. Variance and standard deviation

Lecture 7. Bayes formula and independence

Lecture 9. Expectations of discrete random variables

18.600: Lecture 7 Bayes formula and independence

18.600: Lecture 7 Bayes formula and independence

Lecture 14. More discrete random variables

Lecture 12. Poisson random variables

18.175: Lecture 17 Poisson random variables

Topic 3: The Expectation of a Random Variable

STAT:5100 (22S:193) Statistical Inference I

18.175: Lecture 13 Infinite divisibility and Lévy processes

LECTURE 1. 1 Introduction. 1.1 Sample spaces and events

1 INFO Sep 05

18.440: Lecture 28 Lectures Review

18.440: Lecture 26 Conditional expectation

18.440: Lecture 19 Normal random variables

18.440: Lecture 25 Covariance and some conditional expectation exercises

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

SDS 321: Introduction to Probability and Statistics

Dept. of Linguistics, Indiana University Fall 2015

Massachusetts Institute of Technology

MA : Introductory Probability

18.175: Lecture 8 Weak laws and moment-generating/characteristic functions

CSE 312 Final Review: Section AA

Random Variables. Lecture 6: E(X ), Var(X ), & Cov(X, Y ) Random Variables - Vocabulary. Random Variables, cont.

Lecture 10: Bayes' Theorem, Expected Value and Variance Lecturer: Lale Özkahya

18.600: Lecture 4 Axioms of probability and inclusion-exclusion

Introduction to Probability 2017/18 Supplementary Problems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Classical Probability

Bernoulli Trials, Binomial and Cumulative Distributions

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

Conditional Probability

Recitation 2: Probability

Lecture 4: Probability and Discrete Random Variables

2. AXIOMATIC PROBABILITY

CME 106: Review Probability theory

Lectures on Elementary Probability. William G. Faris

4th IIA-Penn State Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur

Independence. CS 109 Lecture 5 April 6th, 2016

Things to remember when learning probability distributions:

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

STAT 414: Introduction to Probability Theory

18.600: Lecture 24 Covariance and some conditional expectation exercises

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture 18. Uniform random variables

Probability Theory and Applications

Binomial and Poisson Probability Distributions

6.041/6.431 Fall 2010 Quiz 2 Solutions

STAT 430/510 Probability Lecture 7: Random Variable and Expectation

CS206 Review Sheet 3 October 24, 2018

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 3 9/10/2008 CONDITIONING AND INDEPENDENCE

Lecture 3. Discrete Random Variables

Statistical Inference

ACM 116: Lecture 2. Agenda. Independence. Bayes rule. Discrete random variables Bernoulli distribution Binomial distribution

Probability Review. Chao Lan

Lecture Notes 1 Basic Probability. Elements of Probability. Conditional probability. Sequential Calculation of Probability

Part (A): Review of Probability [Statistics I revision]

M378K In-Class Assignment #1

Randomized Algorithms

Probability and Statistics. Vittoria Silvestri

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

18.440: Lecture 28 Lectures Review

Introduction to Stochastic Processes

Probability Theory for Machine Learning. Chris Cremer September 2015

Probability: Why do we care? Lecture 2: Probability and Distributions. Classical Definition. What is Probability?

STAT 418: Probability and Stochastic Processes

Lecture 2: Probability and Distributions

STA 111: Probability & Statistical Inference

Probability Review. Gonzalo Mateos

Counting principles, including permutations and combinations.

Copyright c 2006 Jason Underdown Some rights reserved. choose notation. n distinct items divided into r distinct groups.

Discrete Distributions

Topic 3: The Expectation of a Random Variable

1 Probability and Random Variables

Advanced Herd Management Probabilities and distributions

Continuous Distributions

Chapter 4 : Discrete Random Variables

Conditional Probability and Bayes

Statistics for Managers Using Microsoft Excel/SPSS Chapter 4 Basic Probability And Discrete Probability Distributions

Math LM (27794) - Lectures 01

Probability: Terminology and Examples Class 2, Jeremy Orloff and Jonathan Bloom

MATH1231 Algebra, 2017 Chapter 9: Probability and Statistics

3 Multiple Discrete Random Variables

5. Conditional Distributions

3. DISCRETE RANDOM VARIABLES

CSC Discrete Math I, Spring Discrete Probability

Review of Probabilities and Basic Statistics

Random Variables Example:

What is Probability? Probability. Sample Spaces and Events. Simple Event

PROBABILITY VITTORIA SILVESTRI

Relationship between probability set function and random variable - 2 -

SDS 321: Introduction to Probability and Statistics

Decision making and problem solving Lecture 1. Review of basic probability Monte Carlo simulation

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

Probability and Probability Distributions. Dr. Mohammed Alahmed

Probability Theory Review

Sample Spaces, Random Variables

Lecture 4: Probability, Proof Techniques, Method of Induction Lecturer: Lale Özkahya

Transcription:

18.440: Lecture 16 Lectures 1-15 Review Scott Sheffield MIT 1

Outline Counting tricks and basic principles of probability Discrete random variables 2

Outline Counting tricks and basic principles of probability Discrete random variables 3

Selected counting tricks Break choosing one of the items to be counted into a sequence of stages so that one always has the same number of choices to make at each stage. Then the total count becomes a product of number of choices available at each stage. Overcount by a fixed factor. If you have n elements you wish to divide into r distinct piles of sizes n 1, n 2... n r, how many ways to do that? Answer n n! n 1,n 2,...,n r := n 1!n 2!...n r!. How many sequences a 1,..., a k of non-negative integers satisfy a 1 + a 2 +... + a k = n? n+k 1 n Answer:. Represent partition by k 1 bars and n stars, e.g., as. 4

Axioms of probability Have a set S called sample space. P(A) [0, 1] for all (measurable) A S. P(S) = 1. Finite additivity: P(A B) = P(A) + P(B) if A B =. i=1 i=1 Countable additivity: P( E i ) = for each pair i and j. P(E i ) if E i E j = 5

Consequences of axioms P(A c ) = 1 P(A) A B implies P(A) P(B) P(A B) = P(A) + P(B) P(AB) P(AB) P(A) 6

Inclusion-exclusion identity Observe P(A B) = P(A) + P(B) P(AB). Also, P(E F G ) = P(E ) + P(F ) + P(G ) P(EF ) P(EG ) P(FG ) + P(EFG ). More generally, n P( n i=1e i ) = P(E i ) P(E i1 E i2 ) +... i=1 i 1 <i 2 + ( 1) (r+1) P(E i1 E i2... E ir ) i 1 <i 2 <...<i r = +... + ( 1) n+1 P(E 1 E 2... E n ). ( ) n The notation i 1 <i 2 <...<i r means a sum over all of the r subsets of size r of the set {1, 2,..., n}. 7

Famous hat problem n people toss hats into a bin, randomly shuffle, return one hat to each person. Find probability nobody gets own hat. Inclusion-exclusion. Let E i be the event that ith person gets own hat. What is P(E i1 E i2... E ir )? (n r)! Answer: n!. ( ) n There are r terms like that in the inclusion exclusion sum. ( ) n (n r)! What is r n!? 1 Answer: r!. 1 P( n i=1e 1 1 i ) = 1 1 2! + 3! 4! +... ± n! 1 P( n E 1 1 i ) = 1 1 + 1 +... ± 1 1/e.36788 i=1 2! 3! 4! n! 8

Conditional probability Definition: P(E F ) = P(EF )/P(F ). Call P(E F ) the conditional probability of E given F or probability of E conditioned on F. Nice fact: P(E 1 E 2 E 3... E n ) = P(E 1 )P(E 2 E 1 )P(E 3 E 1 E 2 )... P(E n E 1... E n 1 ) Useful when we think about multi-step experiments. For example, let E i be event ith person gets own hat in the n-hat shuffle problem. 9

Dividing probability into two cases P(E ) = P(EF ) + P(EF c ) = P(E F )P(F ) + P(E F c )P(F c ) In words: want to know the probability of E. There are two scenarios F and F c. If I know the probabilities of the two scenarios and the probability of E conditioned on each scenario, I can work out the probability of E. 10

Bayes theorem Bayes theorem/law/rule states the following: P(B A)P(A) P(A B) = P(B). Follows from definition of conditional probability: P(AB) = P(B)P(A B) = P(A)P(B A). Tells how to update estimate of probability of A when new evidence restricts your sample space to B. So P(A B) is Ratio P(B A) P(B) P(B A) P(B) times P(A). determines how compelling new evidence is. 11

P( F ) is a probability measure We can check the probability axioms: 0 P(E F ) 1, P(S F ) = 1, and P( E i ) = P(E i F ), if i ranges over a countable set and the E i are disjoint. The probability measure P( F ) is related to P( ). To get former from latter, we set probabilities of elements outside of F to zero and multiply probabilities of events inside of F by 1/P(F ). P( ) is the prior probability measure and P( F ) is the posterior measure (revised after discovering that F occurs). 12

Independence Say E and F are independent if P(EF ) = P(E )P(F ). Equivalent statement: P(E F ) = P(E ). Also equivalent: P(F E ) = P(F ). 13

Independence of multiple events Say E 1... E n are independent if for each {i 1, i 2,..., i k } {1, 2,... n} we have P(E i1 E i2... E ik ) = P(E i1 )P(E i2 )... P(E ik ). In other words, the product rule works. Independence implies P(E 1 E 2 E 3 E 4 E 5 E 6 ) = P(E 1 )P(E 2 )P(E 3 )P(E 4 )P(E 5 )P(E 6 ) = P(E 1 E 2 E 3 ), and other similar P(E 4 )P(E 5 )P(E 6 ) statements. Does pairwise independence imply independence? No. Consider these three events: first coin heads, second coin heads, odd number heads. Pairwise independent, not independent. 14

Outline Counting tricks and basic principles of probability Discrete random variables 15

Outline Counting tricks and basic principles of probability Discrete random variables 16

Random variables A random variable X is a function from the state space to the real numbers. Can interpret X as a quantity whose value depends on the outcome of an experiment. Say X is a discrete random variable if (with probability one) if it takes one of a countable set of values. For each a in this countable set, write p(a) := P{X = a}. Call p the probability mass function. Write F (a) = P{X a} = x a p(x). Call F the cumulative distribution function. 17

Indicators Given any event E, can define an indicator random variable, i.e., let X be random variable equal to 1 on the event E and 0 otherwise. Write this as X = 1 E. The value of 1 E (either 1 or 0) indicates whether the event has occurred. k If E 1, E 2,..., E k are events then X = i=1 1 E i is the number of these events that occur. Example: in n-hat shuffle problem, let E i be the event ith person gets own hat. n Then is total number of people who get own hats. i=1 1 E i 18

Expectation of a discrete random variable Say X is a discrete random variable if (with probability one) it takes one of a countable set of values. For each a in this countable set, write p(a) := P{X = a}. Call p the probability mass function. The expectation of X, written E [X ], is defined by E [X ] = x:p(x)>0 xp(x). Represents weighted average of possible values X can take, each value being weighted by its probability. 19

Expectation when state space is countable If the state space S is countable, we can give SUM OVER STATE SPACE definition of expectation: E [X ] = P{s}X (s). s S Agrees with the SUM OVER POSSIBLE X VALUES definition: E [X ] = xp(x). x:p(x)>0 20

Expectation of a function of a random variable If X is a random variable and g is a function from the real numbers to the real numbers then g(x ) is also a random variable. How can we compute E [g(x )]? Answer: E[g(X )] = x:p(x)>0 g(x)p(x). 21

Additivity of expectation If X and Y are distinct random variables, then E [X + Y ] = E [X ] + E [Y ]. In fact, for real constants a and b, we have E [ax + by ] = ae [X ] + be [Y ]. This is called the linearity of expectation. Can extend to more variables E [X 1 + X 2 +... + X n ] = E [X 1 ] + E [X 2 ] +... + E [X n ]. 22

Defining variance in discrete case Let X be a random variable with mean µ. The variance of X, denoted Var(X ), is defined by Var(X ) = E [(X µ) 2 ]. Taking g(x) = (x µ) 2, and recalling that E [g(x )] = g(x)p(x), we find that x:p(x)>0 Var[X ] = x:p(x)>0 (x µ) 2 p(x). Variance is one way to measure the amount a random variable varies from its mean over successive trials. Very important alternate formula: Var[X ] = E [X 2 ] (E [X ]) 2. 23

Identity If Y = X + b, where b is constant, then Var[Y ] = Var[X ]. Also, Var[aX ] = a 2 Var[X ]. Proof: Var[aX ] = E [a 2 X 2 ] E [ax ] 2 = a 2 E [X 2 ] a 2 E [X ] 2 = a 2 Var[X ]. 24

Standard deviation Write SD[X ] = Var[X ]. Satisfies identity SD[aX ] = asd[x ]. Uses the same units as X itself. If we switch from feet to inches in our height of randomly chosen person example, then X, E [X ], and SD[X ] each get multiplied by 12, but Var[X ] gets multiplied by 144. 25

Bernoulli random variables Toss fair coin n times. (Tosses are independent.) What is the probability of k heads? ( ) n Answer: k /2 n. What if coin has p probability to be heads? ( ) n Answer: k p k (1 p) n k. ( ) n Writing q = 1 p, we can write this as pkqn k k Can use binomial theorem to show probabilities sum to one: n n 1 = 1 n = (p + q) n ( ) = pkq n k k=0 k. Number of heads is binomial random variable with parameters (n, p). 26

Decomposition approach to computing expectation Let X be a binomial random variable with parameters (n, p). Here is one way to compute E [X ]. Think of X as representing number of heads in n tosses of coin that is heads with probability p. n Write X = j=1 X j, where X j is 1 if the jth coin is heads, 0 otherwise. In other words, X j is the number of heads (zero or one) on the jth toss. Note that E [X j ] = p 1 + (1 p) 0 = p for each j. Conclude by additivity of expectation that n n E [X ] = E [X j ] = p = np. j=1 j=1 27

Compute variance with decomposition trick n n X = j=1 X j, so n n n n E [X 2 ] = E [ X i X j ] = E [X i X j ] i=1 j=1 i=1 j=1 E [X i X j ] is p if i = j, p 2 otherwise. n i=1 j=1 E [X i X j ] has n terms equal to p and (n 1)n terms equal to p 2. 2 2 So E[X 2 ] = np + (n 1)np = np + (np) 2 np. Thus Var[X ] = E [X 2 ] E [X ] 2 = np np 2 = np(1 p) = npq. Can show generally that if X 1,..., X n independent then n n Var[ X j ] = Var[X j ] j=1 j=1 28

Bernoulli random variable with n large and np = λ Let λ be some moderate-sized number. Say λ = 2 or λ = 3. Let n be a huge number, say n = 10 6. Suppose I have a coin that comes on heads with probability λ/n and I toss it n times. How many heads do I expect to see? Answer: np = λ. Let k be some moderate sized number (say k = 4). What is the probability that I see exactly k heads? ( Binomial formula: n ) n(n 1)(n 2)...(n k+1) p k (1 p) n k = p k (1 p) n k. k k! This is approximately λk (1 p) n k λk e λ k! k!. A Poisson random variable X with parameter λ satisfies λ k k! P{X = k} = e λ for integer k 0. 29

Expectation and variance A Poisson random variable X with parameter λ satisfies λ P{X = k} = k k! e λ for integer k 0. Clever computation tricks yield E [X ] = λ and Var[X ] = λ. We think of a Poisson random variable as being (roughly) a Bernoulli (n, p) random variable with n very large and p = λ/n. This also suggests E [X ] = np = λ and Var[X ] = npq λ. 30

Poisson point process A Poisson point process is a random function N(t) called a Poisson process of rate λ. For each t > s 0, the value N(t) N(s) describes the number of events occurring in the time interval (s, t) and is Poisson with rate (t s)λ. The numbers of events occurring in disjoint intervals are independent random variables. Probability to see zero events in first t time units is e λt. Let T k be time elapsed, since the previous event, until the kth event occurs. Then the T k are independent random variables, each of which is exponential with parameter λ. 31

Geometric random variables Consider an infinite sequence of independent tosses of a coin that comes up heads with probability p. Let X be such that the first heads is on the X th toss. Answer: P{X = k} = (1 p) k 1 p = qk 1p, where q = 1 p is tails probability. Say X is a geometric random variable with parameter p. Some cool calculation tricks show that E [X ] = 1/p. And Var[X ] = q/p 2. 32

Negative binomial random variables Consider an infinite sequence of independent tosses of a coin that comes up heads with probability p. Let X be such that the rth heads is on the X th toss. ( ) k 1 Then P{X = k} = r 1 p r 1 (1 p) k r p. Call X negative binomial random variable with parameters (r, p). So E[X ] = r/p. And Var[X ] = rq/p 2. 33

MIT OpenCourseWare http://ocw.mit.edu 18.440 Probability and Random Variables Spring 2014 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.