Probability Theory Review - PDF Free Download

Cogsci 118A: Natural Computation I Lecture 2 (01/07/10) Lecturer: Angela Yu Probability Theory Review Scribe: Joseph Schilz Lecture Summary 1. Set theory: terms and operators In this section, we provide definitions for the set terms theory operations used in our review of probability. 2. Probability spaces We define and explain the components of a probability space. 3. Joint probability, conditional probability, and marginal probability We define and explain concepts in probability involving multiple random variables. 1 Set theory: terms and operators To avoid inconsistencies and ambiguities, we will pursue a formal understanding of probability. This formal understanding will be based on the theory of sets, and this lecture requires familiarity with the following set theory terms and operators: 1. Naming conventions: We generally denote by lower case letters a,b,c,d,...,x,y,z singular, non-set items. We denote by upper case letters A,B,C,D,... sets of items. Calligraphic letters A, B, C... are often used to denote sets which contain other sets. 2. Set membership: We denote that an element x is a member of the set A by: x A. It is also possible for one set to be a member of another set. If the set E is a member of F then we write: E F. 3. Subset\superset relation:,,, If all elements of the set A can also be found in B, we say that A is a subset of B and write: A B. Then it s also true that B is a superset of A and we may write: B A. Formally we write A B iff x A(x B). Note: Some authors use and interchangeably with and. Other authors may write A B only in cases where A B and A B. In this course and will be used in the former sense, though we may use and to emphasize the possible equality of the sets in question. 4. The empty set: The empty set is the set which contains no elements. It is denoted by or by set brackets enclosing no elements: {}. For any set A, it is true that A. 2 (01/07/10)-1

5. Union: For sets A and B, the union of A and B is the set containing exactly those elements in either A or B. This is denoted A B. Formally, A B = {x : x A x B}. 6. Intersection: For sets A and B, the intersection of A and B is the set containing exactly those elements in both A and B. This is denoted A B. Formally, A B = {x : x A x B}. 7. Mutually exclusive: Two sets A and B are said to be mutually exclusive if they contain no elements in common. More than two sets, for example A 1,A 2,A 3,..., are said to be mutually exclusive if no two sets contain any elements in common. Formally, A 1,A 2,A 3,... are mutually exclusive iff i j(i j A i A j = ). 8. Complementation: A C In the context of a set U, where A U, the complement of A is the set of all those elements in U that are not in A. This is denoted A C. Throughout this lecture, the set which fulfills the role of U will generally be Ω. 2 Probability Spaces This section discusses probability spaces. A probability space is a set of three elements: a sample space Ω, an event space F, and a probability measure P, each obeying special properties. Section 2.1, Section 2.2, and Section 2.3 discuss these elements respectively. Together, these elements define a mathematical experiment: they list its possible outcomes, the events to which we can assign probability, and the actual probabilities that we assign to those events. Example: We take the single roll of a numbered, six-sided die as an example of a mathematical experiment. We expand upon this example as this section progresses. 2.1 Sample Space The sample space is the set of all possible outcomes of our mathematical experiment. It is denoted Ω. The sample space may be finite or it may be infinite, depending upon the experiment. It cannot be empty. That is: Ω. The sample space may also be called the outcome space. The variable ω is often used to denote the elements of a sample space. Example: If Ω is the set of all possible outcomes of rolling a six-sided die, then Ω = {1,2,3,4,5,6}. 2.2 Event Space The event space is the set of all events to which we can assign probabilities. In other words, it is the set of all events of which we are able to measure the probability. The event space is denoted by F. An event is a subset of Ω. An event is often denoted by the letter E, with some identifying subscript. The event which contains every element of Ω may be denoted E Ω, and by definition then E Ω = Ω. The event which contains no elements may be denoted E 0 and is equal to. An event which contains a single outcome from Ω is called an atomic event. 2 (01/07/10)-2

Example: In our example Ω = {1,2,3,4,5,6}. Consider the atomic event in which a four is rolled: we might call this E 4 where E 4 = {4}. Let E o = {1,3,5} be the event in which some odd number is rolled and E e = {2,4,6} be the event in which some even number is rolled. Then we might have E 4,E o,e e,e Ω F. (HW) Including the null event {} and the all inclusive event of the sample space itself Ω, how many total events could there be in our event space F? An event space must also satisfy the three additional properties of a sigma algebra. These properties are: 1. F The empty set must be a member of F. This ensures that F will not be empty. 2. E F E C F For any event we find in F, we must also find its complement E C in F. This requirement is called closure under complements. 3. E 1,E 2,E 3,... F i E i F For any finite or countably infinite group of events in F, we must also find the set which is the union of those events in F. This requirement is called closure under countable unions. (HW) Demonstrate that closure under complementation and closure under countable union together imply closure under countable intersection. Example: Consider F = {E 4,E o,e e,e Ω }. F does not meet condition (1) of a sigma algebra as / F. We can add, which we denote E 0, to F to produce F = {E 0,E 4,E o,e e,e Ω }. Though F satisfies condition (1), it does not satisfy closure under complements. For example, E C 4 = {4} C = {1,2,3,5,6} is not in F. Consider F = {E 0,E o,e e,e Ω }. F satisfies condition two, as E C 0 = {} C = {1,2,3,4,5,6} = E Ω is in F, E C o = {1,3,5} C = {2,4,6} = E e is in F, and so on. F also satisfies closure under countable unions. For example, E 0 E o = {} {1,3,5} = {1,3,5} = E o is in F, as is any other arbitrary union. Thus, F is a sigma algebra with respect to Ω. Though in common speech we might use event and outcome interchangeably, they have distinct definitions in probability theory. Example: Suppose our die is fair. Having no number come up is not a possible outcome of rolling a die, thus it does not belong in our sample space Ω. However, the event in which no number comes up when rolling a die is an event to which we can assign probability. Thus, E 0 = {} = is a valid element of F. Later we will see that the probability we assign to E 0 is zero. 2.3 Probability Measure A probability measure P is a function from F into the real numbers. It is the function that assigns to each event in F a numerical probability. Furthermore, P is subject to the following conditions: 2 (01/07/10)-3

1. E F(0 P(E) 1) For any event in our event space, the probability of that event occurring must be between zero and one, inclusive. We can then define our probability measure as P : F [0,1]. 2. P(Ω) = 1 The probability of the event Ω, also denoted E Ω occurring must be equal to one. Recall, E Ω is the event in which any one of the possible outcomes of the experiment occurs. Thus, (2) states that when the experiment is performed we are certain to see an outcome from the sample space. 3. For mutually exclusive events E 1,E 2,...,E n, P( i E i ) = i P(E i ) By (3), we can easily determine the probability of an event E if we know the probabilities of a mutually exclusive group of simpler events which together compose E. Example: Given P(E e ) = 1/2 and P(E 5 ) = 1/6 we may determine the probability that the die lands on an even number or five by (3). E e and E 5 are mutually exclusive, so P(E e E 5 ) = P(E e ) + P(E 5 ) = 1/2 + 1/6 = 4/6. Probability measures have additional properties, which can be derived from the three stated above. 1. P( ) = 0 The probability of no outcome occurring as a result of the experiment is equal to zero. Proof By (2) we have P(Ω) = 1. We also find that P(Ω) = P(Ω ) (3) = P(Ω) + P( ) (2) = 1 + P( ). Both the left and right sides of the equation are equal to one, and so P( ) must equal zero. 2. Monotonicity: A B P(A) P(B) Example: Suppose our die may be unfair. Even so, under no parameters would it be more probable to roll a one than to roll either a one or a two. Formally, we can write this as: {1} {1,2} P({1}) P({1,2}). (HW) Prove that monotonicity follows from the definition of a probability measure. 3. Sub-additivity: A i A i P(A) i P(A i ) Example: Even if our die is unfair, we may be confident that P({6}) P({4,5,6}+P({5,6}) because {6} {4,5,6} {5,6}. (HW) Prove that sub-additivity follows from the definition of a probability measure. Note: A probability measure may sometimes be denoted by a regular upper-case P. Though in one context this upper-case P may refer to a probability measure, in a different context it may refer to a cumulative density function or CDF. The symbol P unambiguously refers to a probability measure. 2 (01/07/10)-4

3 Joint Probability, Conditional Probability, and Marginal Probability 3.1 Random Variables For this course, we will consider a random variable (r.v.) to be a variable representing experimental outcomes which are numbers. That is, we assume that Ω R, or R n. For an arbitrary sample space Ω, we construct an Ω R by defining x : Ω R. Then let Ω be the image of Ω under x. We define a new probability measure P by P (x A) = P({ω : x(ω) A}). In practice, we then refer to Ω and P as Ω and P respectively. Example: Consider the toss of a coin. Let Ω = { heads, tails }. Then we may define x by x( heads ) = 1 and x( tails ) = 0. Then Ω = {1,0}. If the coin is fair then P (1) = P( heads ) = 1/2 and P (0) = P( tails ) = 1/2. We generally consider P(x E), the probability that the outcome of our experiment x is a member of the event E. As done above, we may sometimes write probabilities of the form P(E) or P(x). Example: Consider the toss of a single fair coin. We might write that P(x {0}) = 1/2, however it is often equally clear to write that P(0) = 1/2. Now consider tossing two fair coins, the outcomes of which are given by random variables x 1 and x 2. If the coins are independent (see Section 3.2.1), then it is true that P(x 1 A,x 2 B) = P(x 1 A)P(x 2 B) no matter our choices of A and B. If we simply wished to emphasize this attribute, we might write: P(x 1,x 2 ) = P(x 1 )P(x 2 ). 3.2 Joint Probability Joint probability is the probability of two events both occurring as a result of performing an experiment. Using random variables, we write the probability of our outcome x being both a member of event E 1 and event E 2 as P(x E 1,x E 2 ). We define this as P(x E 1,x E 2 ) = P(x E 1 E 2 ) 3.2.1 Independence We say that events E 1, E 2 are independent if P(x E 1,x E 2 ) = P(x E 1 )P(x E 2 ). Example: Consider drawing a single card from a fair deck of 52. We find the probability that the card is both a heart and a two, written P( hearts, two ). We find that P( heart )P( two ) = (1/4)(1/13) = 1/52. We reason that drawing a card which is both a heart and a two will happen iff we have drawn the two of hearts, which will occur with probability 1/52. Thus we confirm the independence of the two events. Consider the probability of drawing a card which is both a heart and red, written P( hearts, red ). We find that P( heart )P( red ) = (1/4)(1/2) = 1/8. However, we reason that drawing a card which is both a heart and red will happen iff we have drawn a heart. So P( hearts, red ) = P( hearts ) = 1/4. Thus we confirm that the two events are not independent. (HW) Show that if events A and B are independent, then P(A B) = P(A) and P(B A) = P(B). 2 (01/07/10)-5

3.3 Conditional Probability Conditional probability is the probability of an event occurring given the assurance that a separate event occurs. We denote the probability of A given B as P(A B). We define conditional probability as: P(A B) = P(A,B) P(B). Example: Suppose we ve tossed a fair die, and we know only that an even number has come up. We find the probability that a two has come up using conditional probability: P({2} {2,4,6}) = P({2}, {2,4,6}) P({2,4,6}) = P({2} {2,4,6}) P({2,4,6}) = P({2}) P({2,4,6}) = 1/6 1/2 = 1/3. 3.4 Marginal Probability Marginal probability is a property relating the unconditional probability of a single event to a sum of joint probabilities. Specifically, for a group of mutually exclusive events E 1,E 2,E 3,..., such that E i = Ω, P(x) = P(x,y E i ). i Example: Consider tossing two fair coins, the outcomes of which are given by random variables x and y respectively. Knowing that P(x = 1,y = 1) = P(x = 1,y = 0) = 1/4, we can find P(x = 1) as P(x = 1,y = 1) + P(x = 1,y = 0) = 1/4 + 1/4 = 1/2. 6