Recap of Basic Probability Theory

Similar documents
Recap of Basic Probability Theory

LECTURE 1. 1 Introduction. 1.1 Sample spaces and events

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Random Variable. Pr(X = a) = Pr(s)

Statistical Inference

Recitation 2: Probability

Stochastic Processes - lesson 2

Discrete Random Variables

Sample Spaces, Random Variables

the time it takes until a radioactive substance undergoes a decay

1 Presessional Probability

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

Lecture 4: Probability, Proof Techniques, Method of Induction Lecturer: Lale Özkahya

Math 564 Homework 1. Solutions.

1.1 Review of Probability Theory

Example 1. The sample space of an experiment where we flip a pair of coins is denoted by:

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Algorithms for Uncertainty Quantification

Notes 1 : Measure-theoretic foundations I

RVs and their probability distributions

Statistical Theory 1

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Northwestern University Department of Electrical Engineering and Computer Science

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Probability Theory for Machine Learning. Chris Cremer September 2015

Lecture 2: Repetition of probability theory and statistics

Venn Diagrams; Probability Laws. Notes. Set Operations and Relations. Venn Diagram 2.1. Venn Diagrams; Probability Laws. Notes

Discrete Probability. Mark Huiskes, LIACS Probability and Statistics, Mark Huiskes, LIACS, Lecture 2

Week 2. Review of Probability, Random Variables and Univariate Distributions

MATH MW Elementary Probability Course Notes Part I: Models and Counting

Random variables (discrete)

Probability Theory. Probability and Statistics for Data Science CSE594 - Spring 2016

Deep Learning for Computer Vision

Bandits, Experts, and Games

Week 12-13: Discrete Probability

Fundamentals of Probability CE 311S

Chapter 2: Random Variables

Bayesian statistics, simulation and software

BASICS OF PROBABILITY CHAPTER-1 CS6015-LINEAR ALGEBRA AND RANDOM PROCESSES

EE514A Information Theory I Fall 2013

Statistical Preliminaries. Stony Brook University CSE545, Fall 2016

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 2 MATH00040 SEMESTER / Probability

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 3 9/10/2008 CONDITIONING AND INDEPENDENCE

Dept. of Linguistics, Indiana University Fall 2015

STAT 712 MATHEMATICAL STATISTICS I

Lecture 10: Bayes' Theorem, Expected Value and Variance Lecturer: Lale Özkahya

Chapter 2 Random Variables

Lecture 8: Probability

Probability Theory and Random Variables

Introduction to Probability. Ariel Yadin. Lecture 1. We begin with an example [this is known as Bertrand s paradox]. *** Nov.

Brief Review of Probability

1 INFO Sep 05

Probability, Random Processes and Inference

MATH1231 Algebra, 2017 Chapter 9: Probability and Statistics

Quick Tour of Basic Probability Theory and Linear Algebra

Appendix A : Introduction to Probability and stochastic processes

Random Variables. Definition: A random variable (r.v.) X on the probability space (Ω, F, P) is a mapping

Chap 4 Probability p227 The probability of any outcome in a random phenomenon is the proportion of times the outcome would occur in a long series of

Fundamental Tools - Probability Theory II

Probability Review. Gonzalo Mateos

The probability of an event is viewed as a numerical measure of the chance that the event will occur.

Lecture 3. Discrete Random Variables

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

Introduction to probability theory

Introduction to Probability

Chapter 1 Statistical Reasoning Why statistics? Section 1.1 Basics of Probability Theory

IEOR 6711: Stochastic Models I SOLUTIONS to the First Midterm Exam, October 7, 2008

ECE 340 Probabilistic Methods in Engineering M/W 3-4:15. Lecture 2: Random Experiments. Prof. Vince Calhoun

Notes 6 Autumn Example (One die: part 1) One fair six-sided die is thrown. X is the number showing.

Probability Theory and Applications

Probability theory basics

2. AXIOMATIC PROBABILITY

Random variable X is a mapping that maps each outcome s in the sample space to a unique real number x, x. X s. Real Line

CMPSCI 240: Reasoning Under Uncertainty

Statistics and Econometrics I

Lecture Notes 2 Random Variables. Discrete Random Variables: Probability mass function (pmf)

A Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes.

M378K In-Class Assignment #1

Chapter 1. Sets and probability. 1.3 Probability space

Random Variables Example:

MTH 202 : Probability and Statistics

Review of Probability Theory

Probability Notes (A) , Fall 2010

1: PROBABILITY REVIEW

Random variables. DS GA 1002 Probability and Statistics for Data Science.

Random variable X is a mapping that maps each outcome s in the sample space to a unique real number x, < x <. ( ) X s. Real Line

Chapter 2. Probability. Math 371. University of Hawai i at Mānoa. Summer 2011

PROBABILITY VITTORIA SILVESTRI

Lecture 4: Probability and Discrete Random Variables

Relationship between probability set function and random variable - 2 -

Recap. The study of randomness and uncertainty Chances, odds, likelihood, expected, probably, on average,... PROBABILITY INFERENTIAL STATISTICS

Topic 3: The Expectation of a Random Variable

Probability Year 9. Terminology

Almost Sure Convergence of a Sequence of Random Variables

Lecture 3: Random variables, distributions, and transformations

Lecture 1: Review on Probability and Statistics

Transcription:

02407 Stochastic Processes Recap of Basic Probability Theory Uffe Høgsbro Thygesen Informatics and Mathematical Modelling Technical University of Denmark 2800 Kgs. Lyngby Denmark Email: uht@imm.dtu.dk Stochastic experiments The probability triple (Ω, F, P): Ω: The sample space, ω Ω F: The set of events, A F A Ω P: The probability measure, A F P(A) [0, 1] Random variables Distribution functions Conditioning 1 / 39 2 / 39 Why recap? Stochastic processes is applied probability A firm understanding of probability (as taught in e.g. 02405) will get you far We need a more solid basis than most students develop in e.g. 02405. What to recap? The concepts are most important: What is a stochastic variable, what is conditioning, etc. Specific models and formulas: That a binomial distribution appears as the sum of Bernoulli variates, etc. We perform a stochastic experiment. We use ω to denote the outcome. The sample space Ω is the set of all possible outcomes. Ω ω 3 / 39 4 / 39

The sample space Ω Events Ω can be a very simple set, e.g. {H, T } (tossing a coin a.k.a. Bernouilli experiment) {1, 2, 3, 4, 5, 6} (throwing a die once). N (typical for single discrete stochastic variables) R d (typical for multivariate continuous stochastic variables) or a more complicated set, e.g. The set of all functions R R d with some regularity properties. Often we will not need to specify what Ω is. Events are sets of outcomes/subsets of Ω Events correspond to statements about the outcome. For a die thrown once, the event A = {1, 2, 3} corresponds to the statement the die showed no more than three. A ω Ω 5 / 39 6 / 39 Probability Logical operators as set operators A Probability is a set measure of an event If A is an event, then P(A) is the probability that the event A occurs in the stochastic experiment - a number between 0 and 1. (What exactly does this mean? C.f. G&S p 5, and appendix III) Regardless of interpretation, we can pose simle conditions for mathematical consistency. An important question: Which events are measurable, i.e. have a probability assigned to them? We want our usual logical reasoning to work! So: If A and B are legal statements, represented by measurable subsets of Ω, then so are Not A, i.e. A c = Ω\A A or B, i.e. A B. 7 / 39 8 / 39

Parallels between statements and sets An infinite, but countable, number of statements Set Statement A The event A occured (ω A) A c Not A A B A and B A B A or B (A B)\(A B) A exclusive-or B See also table 1.1 in Grimmett & Stirzaker, page 3 For the Bernoulli experiment, we need statements like At least one experiment shows heads or In the long run, every other experiment shows heads. So: If A i are events for i N, then so is i N A i. 9 / 39 10 / 39 All events considered form a σ-field F (Trivial) examples of σ-fields Definition: 1. The empty set is an event, F 2. Given a countable set of events A 1, A 2,..., its union is also an event, i N A i F 3. If A is an event, then so is the complementary set A c. 1. F = {, Ω} This is the deterministic case: All statements are either true ( ω) or false ( ω). 2. F = {, A, A c, Ω} This corresponds to the Bernoulli experiment or tossing a coin: The event A corresponds to heads. 3. F = 2 Ω = set of all subsets of Ω. When Ω is finite or enumerable, we can actually work with 2 Ω ; otherwise not. 11 / 39 12 / 39

Define P(A) for all events A 1. P( ) = 0, P(Ω) = 1 2. If A 1, A 2,... are mutually excluding events (ie. A i A j = for i j), then P( A i ) = P(A i ) A P : F [0, 1] satisfying these is called a probability measure. The triple (Ω, F, P) is called a probability space. In stochastic processes, we want to know what to expect from the future, conditional on our past observations. A A^B B Ω A^B B P(A B) = P(A B) P(B) 13 / 39 14 / 39 Be careful when conditioning! Lemma: The law of total probability If you are not careful about specifying the events involved, you can easily obtain wrong conclusions. Example: A family has two children. Each child is a boy with probability 1/2, independently of the other. Given that at least one is a boy, what is the probability that they are both boys? The meaningless answer: P(two boys at least one boy) = P(other child is a boy) = 1 2 The right answer: Let B 1,...,B n be a partition of Ω (ie., mutually disjoint and n B i = Ω) Then n P(A) = P(A B i )P(B i ) P(two boys at least one boy) = P(two boys) P(at least one boy) = 1/4 3/4 = 1 3 15 / 39 16 / 39

Independence Events A and B are called independent if P(A B) = P(A)P(B) When 0 < P(B) < 1, this is the same as P(A B) = P(A) = P(A B c ) A family {A i : i I} of events is called independent if P( i J A i ) = P(A i ) i J Informally: A quantity which is assigned by a stochastic experiment. Formally: A mapping X : Ω R. for any finite subset J of I. A Technical comment We want the P(X x) to be well defined. So we require x R : {ω : X(ω) x} F 17 / 39 18 / 39 Examples of stochastic variables Indicator functions: { 1 when ω A, X(ω) = I A (ω) = 0 else. Bernoulli variables: Ω = {H, T }, X(H) = 1, X(T) = 0. Properties: F(x) = P(X x) 1. lim x F(x) = 0, lim x + F(x) = 1. 2. x < y F(x) F(y) 3. F is right-continuous, ie. F(x + h) F(x) as h 0. 19 / 39 20 / 39

A variable which is neither continuous nor discrete Discrete variables: ImX is a countable set. So F X is a step function. Typically ImX Z. Continuous variables: X has a probability density function (pdf) f, i.e. so F is differentiable. F(x) = x f(u) du Let X U(0, 1), i.e. uniform on [0, 1] so that F X (x) = x for 0 x 1. Toss a fair coin. If heads, then set Y = X. If tails, then set Y = 0. F Y (y) = 1 2 + 1 x for 0 y 1. 2 We say that Y has an atom at 0. 21 / 39 22 / 39 Mean and variance The mean of a stochastic variable is in the discrete case, and EX = i Z EX = + ip(x = i) f(x) dx in the continuous case. In both cases we assume that the sum/integral exists absolutely. The variance of X is The conditional is the mean in the conditional distribution E(Y X = x) = yf Y X (y x) y It can be seen as a stochastic variable: Let ψ(x) = E(Y X = x), then ψ(x) is the conditional of Y given X We have ψ(x) = E(Y X) E(E(Y X)) = EY VX = E(X Ex) 2 = EX 2 (EX) 2 23 / 39 24 / 39

variance V(Y X) Random vectors is the variance in the conditional distribution. V(Y X = x) = (y ψ(x)) 2 f Y X (y x) y This can also be written as V(Y X) = E(Y 2 X) (E(Y X)) 2 and can be manipulated into (try it!) VY = EV(Y X) + VE(Y X) When a single stochastic experiment defines the value of several stochastic variables. Example: Throw a dart. Record both vertical and horizontal distance to center. X = (X 1,...,X n ) : Ω R n Also random vectors are characterised by the F : R n [0, 1]: F(x) = P(X 1 x 1,...,X n x n ) which partitions the variance of Y. where x = (x 1,...,x n ). 25 / 39 26 / 39 Example In one experiment, we toss two fair coins and assign the results to V and X. In another experiment, we toss one fair coin and assign the result to both Y and Z. V, X, Y and Z are all identically distributed. But (V, X) and (Y, Z) are not identically distributed. E.g. P(V = X = heads) = 1/4 while P(Y = Z = heads) = 1/2. Start with working out one single Bernoulli experiment. Then consider a finite number of Bernoulli experiments: The binomial distribution Next, a sequence of Bernoulli experiments:. Waiting times in the Bernoulli process: The negative binomial distribution. 27 / 39 28 / 39

The Bernoulli experiment A finite number of Bernoulli variables A Bernoulli experiment models e.g. tossing a coin. The sample space is Ω = {H, T }. Events are F = {, {H}, {T }, Ω} = 2 Ω = {0, 1} Ω. The probability P : F [0, 1] is defined by (!) P({H}) = p. The stochastic variable X : Ω R with X(H) = 1 X(T) = 0 We toss a coin n times. The sample space is Ω = {H, T } n. For the case n = 2, this is {TT, TH, HT, HH}. Events are F = 2 Ω = {0, 1} Ω. How many events are there? F = 2 Ω = 2 (2n) (a lot). Introduce A i for the event the i th toss showed heads. is Bernoulli distributed with parameter p. 29 / 39 30 / 39 The probability P : F [0, 1] is defined by P(A i ) = p and by requiring that the events {A i : i = 1,...,n} are independent. From this we derive P({ω}) = p k (1 p) n k if ω has k heads and n k tails. and from that the probability of any event. Define the stochastic variable X as number of heads X = n 1(A i ) To find its probability mass function, consider the events P(X = x) which is shorthand for P({ω : X(ω) = x}) This event {X = x} has ( n x) elements. Each ω {X = x} has probability P(ω) = p x (1 p) n x so the probability is P(X = x) = ( ) n p x (1 p) n x x 31 / 39 32 / 39

Properties of B(n,p) Problem 3.11.8 Probability mass function Cumulated Mean value Variance f X (x) = P(X = x) = EX = E F X (x) = P(X x) = n 1(A i ) = VX = ( ) n p x (1 p) n x x x f X (i) i=0 n P(A i ) = np n V1(A i ) = np(1 p) because {A i : i = 1,... n} are (pairwise) independent. Let X B(n, p) and Y B(m, p) be independent. Show that Z = X + Y B(n + m, p) Solution: Consider m + n independent Bernoulli trials, each w.p. p. Set X = n 1(A i) and Y = n+m i=n+1 1(A i). Then X and Y are as in the problem, and Z = n+m 1(A i ) B(n, p) 33 / 39 34 / 39 Probabilities in the Bernoulli process A sequence of Bernoulli experiments. The sample space Ω is the set of functions N {0, 1}. Introduce events A i for the ith toss showed heads. Strictly: A i = {ω : ω(i) = 1} Let F be the smallest σ-field that contains all A i. Define (!) P : F [0, 1] by P(A i ) = p and {A i : i N} are independent. 35 / 39 36 / 39

Waiting times in the Bernoulli process The geometric distribution Let W r be the waiting time for the rth succes: W t = min{i : i 1(A j ) = r} j=1 To find the probability mass function of W r, note that W r = k is the same event as k 1 ( 1(A i ) = r 1) A k Since the two events involved here are independent, we get ( ) k 1 f W (k) = P(W r = k) = p r (1 p) k r r 1 The waiting time W to the first success P(W = k) = (1 p) k 1 p (First k 1 failures and then one success) The survival function is G W (k) = P(W > k) = (1 p) k 37 / 39 38 / 39 Summary We need be precise in our use of, at least until we have developed intuition. When in doubt, ask: What is the stochastic experiment? What is the probability triple? Which event am I considering? Venn diagrams a very useful. This holds particularly for conditioning, which is central to stochastic processes. Indicator functions are powerful tools, once mastered. You need to know the distributions that can be derived from the Bernoulli process: The binomial, geometric, and negative binomial distribution. 39 / 39