Sample Space: Specify all possible outcomes from an experiment. Event: Specify a particular outcome or combination of outcomes.

Similar documents
ECE 302: Chapter 02 Probability Model

Lecture Notes 1 Basic Probability. Elements of Probability. Conditional probability. Sequential Calculation of Probability

CMPSCI 240: Reasoning about Uncertainty

LECTURE 1. 1 Introduction. 1.1 Sample spaces and events

Statistical Inference

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

Venn Diagrams; Probability Laws. Notes. Set Operations and Relations. Venn Diagram 2.1. Venn Diagrams; Probability Laws. Notes

Fundamentals of Probability CE 311S

Dept. of Linguistics, Indiana University Fall 2015

2011 Pearson Education, Inc

Origins of Probability Theory

Probability- describes the pattern of chance outcomes

EE 178 Lecture Notes 0 Course Introduction. About EE178. About Probability. Course Goals. Course Topics. Lecture Notes EE 178

P (A B) P ((B C) A) P (B A) = P (B A) + P (C A) P (A) = P (B A) + P (C A) = Q(A) + Q(B).

Recitation 2: Probability

Probability 1 (MATH 11300) lecture slides

Notes 1 Autumn Sample space, events. S is the number of elements in the set S.)

the time it takes until a radioactive substance undergoes a decay

Probability (Devore Chapter Two)

Information Science 2

Lecture 1. ABC of Probability

Lecture notes for probability. Math 124

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Conditional Probability and Bayes

Introduction to Probability

Lecture 3 Probability Basics

Chap 1: Experiments, Models, and Probabilities. Random Processes. Chap 1 : Experiments, Models, and Probabilities

Conditional Probability

I - Probability. What is Probability? the chance of an event occuring. 1classical probability. 2empirical probability. 3subjective probability

Axioms of Probability

Chapter 3 : Conditional Probability and Independence

Axiomatic Foundations of Probability. Definition: Probability Function

Probability and Independence Terri Bittner, Ph.D.

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 3 9/10/2008 CONDITIONING AND INDEPENDENCE

ECE353: Probability and Random Processes. Lecture 2 - Set Theory

tossing a coin selecting a card from a deck measuring the commuting time on a particular morning

Introduction and basic definitions

With Question/Answer Animations. Chapter 7

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Topic 5 Basics of Probability

Module 1. Probability

Chapter 1: Introduction to Probability Theory

Math Treibergs. Solutions to Fourth Homework February 13, 2009

Game Theory Lecture 10+11: Knowledge

PROBABILITY THEORY 1. Basics

The probability of an event is viewed as a numerical measure of the chance that the event will occur.

ECE 340 Probabilistic Methods in Engineering M/W 3-4:15. Lecture 2: Random Experiments. Prof. Vince Calhoun

Midterm #1 - Solutions

CMPSCI 240: Reasoning Under Uncertainty

Probability Theory and Applications

STA Module 4 Probability Concepts. Rev.F08 1

Dynamic Programming Lecture #4

Probability - Lecture 4

The enumeration of all possible outcomes of an experiment is called the sample space, denoted S. E.g.: S={head, tail}

2. AXIOMATIC PROBABILITY

Chapter 2 Classical Probability Theories

F71SM STATISTICAL METHODS

Chapter Learning Objectives. Random Experiments Dfiii Definition: Dfiii Definition:

Probabilistic models

Example: Suppose we toss a quarter and observe whether it falls heads or tails, recording the result as 1 for heads and 0 for tails.

Chapter 1 (Basic Probability)

Probability: Axioms, Properties, Interpretations

STAT:5100 (22S:193) Statistical Inference I

Probability the chance that an uncertain event will occur (always between 0 and 1)

LECTURE NOTES by DR. J.S.V.R. KRISHNA PRASAD

Probability deals with modeling of random phenomena (phenomena or experiments whose outcomes may vary)

7.1 What is it and why should we care?

Discrete Probability. Mark Huiskes, LIACS Probability and Statistics, Mark Huiskes, LIACS, Lecture 2

Lecture 1. Chapter 1. (Part I) Material Covered in This Lecture: Chapter 1, Chapter 2 ( ). 1. What is Statistics?

CMPSCI 240: Reasoning about Uncertainty

ECE 450 Lecture 2. Recall: Pr(A B) = Pr(A) + Pr(B) Pr(A B) in general = Pr(A) + Pr(B) if A and B are m.e. Lecture Overview

INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. NPTEL National Programme on Technology Enhanced Learning. Probability Methods in Civil Engineering

PROBABILITY CHAPTER LEARNING OBJECTIVES UNIT OVERVIEW

MATH2206 Prob Stat/20.Jan Weekly Review 1-2

Lecture 1 : The Mathematical Theory of Probability

ELEG 3143 Probability & Stochastic Process Ch. 1 Probability

UNIT 5 ~ Probability: What Are the Chances? 1

1 Probability Theory. 1.1 Introduction

Probability & Random Variables

Chapter 14. From Randomness to Probability. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Chapter 2. Conditional Probability and Independence. 2.1 Conditional Probability

Brief Review of Probability

Conditional Probability

Independent Events. The multiplication rule for independent events says that if A and B are independent, P (A and B) = P (A) P (B).

Properties of Probability

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 4.1-1

Nuevo examen - 02 de Febrero de 2017 [280 marks]

Probability. 25 th September lecture based on Hogg Tanis Zimmerman: Probability and Statistical Inference (9th ed.)

If S = {O 1, O 2,, O n }, where O i is the i th elementary outcome, and p i is the probability of the i th elementary outcome, then

Proving simple set properties...

Lecture 2. Conditional Probability

CS626 Data Analysis and Simulation

4. Probability of an event A for equally likely outcomes:

3.2 Probability Rules

Important Concepts Read Chapter 2. Experiments. Phenomena. Probability Models. Unpredictable in detail. Examples

Conditional Probability & Independence. Conditional Probabilities

Statistical Theory 1

Mean, Median and Mode. Lecture 3 - Axioms of Probability. Where do they come from? Graphically. We start with a set of 21 numbers, Sta102 / BME102

Introduction to Probability Theory, Algebra, and Set Theory

Week 2: Probability: Counting, Sets, and Bayes

Transcription:

Chapter 2 Introduction to Probability 2.1 Probability Model Probability concerns about the chance of observing certain outcome resulting from an experiment. However, since chance is an abstraction of something not physically measurable, we need a precise mathematical definition. In this chapter, we will use the follow definition of the probability model, which consists of three components: Sample Space: Specify all possible outcomes from an experiment. Event: Specify a particular outcome or combination of outcomes. Probability Law: Specify how likely should an event occur. A pictorial summary of these concepts are illustrated in Figure 2.1. Figure 2.1: Illustration of sample space, event and probability law. 1

Sample Space Definition 1. A sample space Ω is the set of all possible outcomes from an experiment. We denote ω as an element in Ω. Examples: Coin flip: Ω = {H, T }. Throw a dice: Ω = {1, 2, 3, 4, 5, 6}. Waiting time for a bus in West Lafayette: Ω = {t 0 t 30minutes} In the last example, we see that a sample space can be continuous. Counter Examples: Throw a dice: Ω = {1, 2, 3} is not a sample space because it is not exhaustive. Throw a dice: Ω = {1, 1, 2, 3, 4, 5, 6} is not a sample space because it is not exclusive. Therefore, in order to make a valid sample space, we have to make sure Ω contains all possible outcomes and there is no repetition in the outcomes. Event Definition 2. An event F is a subset in the sample space Ω. Note that an outcome ω is an element in Ω but an event F is a subset contained in Ω, i.e., F Ω. Thus, an event can contain one single outcome but it can also contain many elements. Example: Throw a dice. Let Ω = {1, 2, 3, 4, 5, 6}. F 1 = {even numbers} = {2, 4, 6}. F 2 = {less than 3} = {1, 2}. Example: Wait a bus. Let Ω = {0 t 30}. F 1 = {wait less than 10 minutes} = {0 t < 10} F 2 = {wait less than 5 or more than 20 minutes} = {0 t < 5} {20 < t 30}. c 2017 Stanley Chan. All Rights Reserved. 2

In this example, we see that we can create a new event by operating on existing events through set operations, e.g., union, intersection, etc. Formally, we define the collection of all possible events as the event space. Definition 3. The collection of all possible events is called the event space or the σ-field, denoted as F. An event space satisfies the following two properties: If F F, then also F c F. If F 1, F 2,... F, then the union F i F. The two properties of the event space are essential to ensure that every possible subsets in Ω are included, because any other set operations can be derived from complement and union. Example. In a coin flip experiment where Ω = {H, T }, the event space is F = {, H, T, Ω}. Probability Law Definition 4. A probability law is a function P : F [0, 1] that maps an event A to a real number in [0, 1]. The function must satisfy three axioms known as the axioms of probability: I. Non-negativity: P[A] 0, for any A Ω. II. Normalization: P[Ω] = 1. III. Additivity: For any disjoint sets {A 1, A 2,...}, it holds that [ ] P A i = P[A i ]. The non-negativity axiom ensures that a probability value cannot be negative. The normalization axioms ensures that the probability of observing all possible outcomes is 1. The additivity axiom defines how set operations can be translated into probability operations. The infinite number of sets in the axiom makes sure that it is applicable to both discrete and continuous sample spaces. Finite Additivity. The countable additivity stated in the axiom III involves an infinite number of sets. As a special case we can simplify the infinite number into finite number, which states that for any two disjoint sets A and B, we have P[A B] = P[A] +. c 2017 Stanley Chan. All Rights Reserved. 3

In words, if A and B are disjoint, then the probability of observing either A or B is the sum of the two individual probabilities. The union of A and B is equivalent to the logical OR. Once this OR operation is defined, all other logical operations can be subsequently defined. The following corollaries are some examples. Corollary 1. P[A c ] = 1 P[A]. Proof. Since Ω = A A c, by finite additivity we have P[Ω] = P[A A c ] = P[A] + P[A c ]. By normalization axiom, we have P[Ω] = 1. Therefore, P[A c ] = 1 P[A]. Corollary 2. For any A Ω, P[A] 1. Proof. We prove by contradiction. Assume P[A] > 1. Consider the complement A c where A A c = Ω. Since P[A c ] = 1 P[A], we must have P[A c ] < 0 because by hypothesis P[A] > 1. But P[A c ] < 0 violates the non-negativity axiom. So we must have P[A] 1. Corollary 3. P[ ] = 0. Proof. Since Ω = Ω, by the first corollary we have P[ ] = 1 P[Ω] = 0. Corollary 4. For any A and B, P[A B] = P[A] + P[A B]. Note that this statement is different from axiom III because A and B are not necessarily disjoint. Proof. First, observe that A B can be partitioned into three disjoint subsets as A B = (A\B) (A B) (B\A). Since A\B = A B c and B\A = B A c, by finite additivity we have that P[A B] = P[A\B] + P[A B] + P[B\A] = P[A B c ] + P[A B] + P[B A c ] (a) = P[A B c ] + P[A B] + P[B A c ] + P[A B] P[A B] (b) = P[A (B c B)] + P[(A c A) B] P[A B] = P[A Ω] + P[Ω B] P[A B] = P[A] + P[A B], where in (a) we added and subtracted a term P[A B], and in (b) we used finite additivity so that P[A B c ] + P[A B] = P[(A B c ) (A B)] = P[A (B c B)]. The above proof is a rigorous way of deriving the result. A simpler way to visualize the result is to draw a Venn diagram and show that A B contains an overlapping part A B which needs to be subtracted. c 2017 Stanley Chan. All Rights Reserved. 4

Corollary 5 (Union Bound). For any A and B, P[A B] P[A] +. Proof. Since P[A B] = P[A] + P[A B] and by non-negativity axiom P[A B] 0, we must have P[A B] P[A] +. Union bound is a common tool we use to analyze probabilities when the intersection A B is difficult to evaluate. Corollary 6. If A B, then P[A]. Proof. If A B, then there exists a set B\A such that B = A (B\A). Therefore, by finite additivity we have = P[A] + P[B\A] P[A]. This corollary is useful when considering two events of different sizes. For example, in the bus waiting example, if we let A = {t 5}, and B = {t 10}, then P[A] because we have to wait for the first 5 minutes in order to go into the remaining 5 minutes. 2.2 Conditional Probability Definition 5. Assume 0. The conditional probability of A given B is P[A B] def = P[A B]. (2.1) Pictorially, a conditional probability is the proportion of P[A B] over. It is the probability that A happens when we know that B has already happened. The difference between P[A B] and P[A B] is the denominator they carry: P[A B] = P[A B] and P[A B] = P[A B]. P[Ω] Since P[Ω], P[A B] is always larger than or equal to P[A B]. Conditional probabilities are ubiquitous in this course and beyond. They concern about the likelihood that one event happens subject to another event has happened. This notion of causality is common in our daily lives. The followings are some examples. Example. Throw a dice. Let A = {Get 3} and B = {odd numbers}. c 2017 Stanley Chan. All Rights Reserved. 5

Figure 2.2: Illustration of conditional probability and its comparison with P[A B]. Clearly, P[A] = 1/6 and = 1/2. It is also not difficult to see that P[A B] = P[A] = 1/6 because A B and so A B = A. The conditional probability of A given B is P[A B] = P[A B] = 1 3. In words, if we know that we have an odd number, then the probability of obtaining a 3 has to be computed over {1, 3, 5}, which give us a probability 1. If we do not know that we have 3 an odd number, then the probability of obtaining a 3 has to be computed from the sample space {1, 2, 3, 4, 5, 6} which will give us 1. 6 Example. Let In this example, A = {Eat 2 burgers} and B = {Finish a football game}. P[A] = Probability that you eat 2 burgers = Probability that you just finish a football game P[A B] = Probability that you just finish a football game and you eat 2 burgers P[A B] = Probability that you eat 2 burgers given that you just finish a football game. Without knowing that you just finish a football game, you may not be hungry and so the probability of eating 2 burgers (i.e. P[A]) could be low. However, if we know that you finish a football game, then it is quite likely you are hungry and want to eat 2 burgers. This is the conditional probability P[A B]. Example. Let A = {Purdue gets Big Ten champion} and B = {Purdue wins 15 games consecutively}. c 2017 Stanley Chan. All Rights Reserved. 6

In this example, P[A] = Probability that Purdue gets chamion = Probability that Purdue wins 15 games consecutively P[A B] = Probability that Purdue gets champion and wins 15 games consecutively P[A B] = Probability that Purdue gets champion given that we win 15 games consecutively. If we do not know if Purdue has won 15 games consecutively, then it is unlikely that we will get the champion because the sample space of all possible competition reults is large. However, if we have already won 15 games consecutively, then the denominator of the probability becomes much smaller. In this case, the conditional probability is high. Proposition 1. Let > 0. The conditional probability P[A B] satisfies Axiom I to Axiom III. Proof. Let s check the axioms: Axiom I: P[A B] = P[A B]. Since > 0 and Axiom II requires P[A B] 0, we therefore have P[A B] 0. Axiom II: P[Ω B] = P[Ω B] = = 1. Axiom III: Consider two disjoint sets A and C. Then, P[(A C) B] P[(A B) (C B)] P[A C B] = = (a) P[A B] P[C B] = + = P[A B] + P[C B], where (a) holds because if A and C are disjoint then A B and C B are also disjoint. The implication of Proposition 1 is that conditional probabilities are legitimate probabilities. In proving the proposition, we note that the set B is present and is fixed for all three axioms. 2.3 Independence Definition 6. Two events A and B are statistically independent if P[A B] = P[A]. Disjoint VS Independent. It should be cautioned that disjoint and independent are two different concepts, i.e., Disjoint Independence. c 2017 Stanley Chan. All Rights Reserved. 7

If A and B are disjoint, then A B =. This only implies that P[A B] = 0. However, it says nothing about if P[A B] can be factorized into P[A]. If A and B are independent, then we have P[A B] = P[A]. But this does not imply that P[A B] = 0. The only possibility that Disjoint Independence is when P[A] = 0 or = 0. Example. Throw a dice twice. Let A = {1st dice is 3} and B = {2nd dice is 4}. Are A and B independent? We can show that P[A B] = P[(3, 4)] = 1 36 P[A] = 1 6, and = 1 6. So P[A B] = P[A]. Thus, A and B are independent. Example. Throw a dice twice. Let A = {1st dice is 1} and B = {sum is 7}. Are A and B independent? Note that P[A B] = P[(1, 6)] = 1 36 P[A] = 1 6 = P[(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)] = 1 6. So P[A B] = P[A]. Thus, A and B are independent. Example. Throw a dice twice. Let A = {max is 2} and B = {min is 2}. Are A and B independent? Let us first list out A and B: A = {(1, 2), (2, 1), (2, 2)} B = {(2, 2), (2, 3), (2, 4), (2, 5), (2, 6), (3, 2), (4, 2), (5, 2), (6, 2)}. c 2017 Stanley Chan. All Rights Reserved. 8

Therefore, the probabilities are P[A] = 3 36 and = 9 36 P[A B] = P[(2, 2)] = 1 36. Clearly, P[A B] P[A] and so A and B are dependent. Independence Via Conditional Probability. Recall that P[A B] = P[A B]. If A and B are independent, then P[A B] = P[A] and so P[A B] = P[A B] = P[A] = P[A]. This suggests an interpretation of independence: If the occurrence of B provides no additional information about the occurrence of A, then A and B are independent. However, we do not define independence via conditional probability because P[A B] = P[A] holds for 0 but conditional probability P[A B] requires > 0. 2.4 Bayes Theorem and Law of Total Probability Theorem 1. (Bayes Theorem) For any two events A and B such that P[A] > 0 and > 0, it holds that P[B A] P[A] P[A B] =. Proof. By definition of conditional probabilities, we have P[A B] = P[A B] and P[B A] = P[B A]. P[A] Rearranging the terms yields P[A B] = P[B A]P[A], which gives the desired result by dividing both sides by. Bayes Theorem provides two views of the intersection P[A B] using two different conditional probabilities. See Figure 2.3 for a pictorial illustration. We call P[B A] the conditional probability of B given A, and P[A B] the posterior probability of B given A. c 2017 Stanley Chan. All Rights Reserved. 9

Figure 2.3: Bayes theorem provides two views of P[A B] using P[A B] and P[B A]. Theorem 2. (Law of Total Probability) Let {A 1, A 2,..., A n } be a partition of Ω, i.e., A 1,..., A n are disjoint and Ω = A 1 A 2... A n. Then, for any B Ω, = n P[B A i ] P[A i ]. Proof. We start from the right hand side. [ n n n ] P[B A i ] P[A i ] (a) = P[B A i ] (b) = P (B A i ) (c) = P [ ( n )] B A i (d) = P[B Ω] =, where (a) follows from the definition of conditional probability, (b) is due to Axiom III, (c) holds because of the distributive property of sets, and (d) is resulted from the partition property of {A 1, A 2,..., A n }. Interpretation. Law of total probability can be understood as follows. If the sample space Ω consists of disjoint subset A 1,..., A n, then we can compute the probability by summing over its portion P[B A 1 ],..., P[B A n ]. However, the probability of having A 1,..., A n is determined by P[A 1 ],..., P[A n ]. Therefore, when performing the sum we need to weight each P[B A i ] by P[A i ]. See Figure 2.4 for illustration. Corollary 7. Let {A 1, A 2,..., A n } be a partition of Ω, i.e., A 1,..., A n Ω = A 1 A 2... A n. Then, for any B Ω, are disjoint and P[A j B] = P[B A j] P[A j ] n P[B A i] P[A i ]. c 2017 Stanley Chan. All Rights Reserved. 10

Figure 2.4: Law of total probability decomposes the probability into multiple conditional probabilities P[B A i ]. The probability of obtaining each P[B A i ] is P[A i ]. Proof. We just need to apply Bayes Theorem and Law of Total Probability: P[A j B] = P[B A j] P[A j ] = P[B A j ] P[A j ] n P[B A i] P[A i ]. Example. Consider a communication channel shown in Figure 2.5. The probability of sending a 1 is p and the probability of sending a 0 is 1 p. Given that 1 is sent, the probability of receiving 1 is 1 η. Given that 0 is sent, the probability of receiving 0 is 1 ε. We want to find the probability that a 1 has been correctly received. Define the events S 0 = 0 is sent, and R 0 = 0 is received. S 1 = 1 is sent, and R 1 = 1 is received. Then, the probability that 1 is received is P[R 1 ]. However, P[R 1 ] 1 η because 1 η is the conditional probability that 1 is received given 1 is sent. It is possible that we receive 1 as a result of an error when 0 is sent. Therefore, we need to consider the probabilities of having S 0 and S 1. Using Law of total probability we have P[R 1 ] = P[R 1 S 1 ] P[S 1 ] + P[R 1 S 0 ] P[S 0 ] = (1 η)p + ε(1 p). Now, suppose that we have received 1. What is the probability that 1 was originally sent? This is asking the posterior probability P[S 1 R 1 ], which can be found using Bayes Theorem P[S 1 R 1 ] = P[R 1 S 1 ] P[S 1 ] P[R 1 ] = (1 η)p (1 η)p + ε(1 p). c 2017 Stanley Chan. All Rights Reserved. 11

Figure 2.5: A two-channel communication system. Example. Consider a tennis tournament. Your probability of winning the game is 0.3 against 1 2 0.4 against 1 4 0.5 against 1 4 of the players (Event A). of the players (Event B). of the players (Event C). What is the probability of winning the game? Let W be the event that you win the game. Then, by Law of Total Probability, we have P[W ] = P[W A] P[A] + P[W B] + P[W C] P[C] = (0.3)(0.5) + (0.4)(0.25) + (0.5)(0.25) = 0.375. c 2017 Stanley Chan. All Rights Reserved. 12