Notes 1 Autumn Sample space, events. S is the number of elements in the set S.)

Similar documents
Lecture 1 : The Mathematical Theory of Probability

Fundamentals of Probability CE 311S

Problems from Probability and Statistical Inference (9th ed.) by Hogg, Tanis and Zimmerman.

Chapter 14. From Randomness to Probability. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Statistical Inference

Probability and Independence Terri Bittner, Ph.D.

Chapter 2 Class Notes

SDS 321: Introduction to Probability and Statistics

2011 Pearson Education, Inc

STA111 - Lecture 1 Welcome to STA111! 1 What is the difference between Probability and Statistics?

Axioms of Probability. Set Theory. M. Bremer. Math Spring 2018

Probability: Sets, Sample Spaces, Events

STA Module 4 Probability Concepts. Rev.F08 1

Probability: Terminology and Examples Class 2, Jeremy Orloff and Jonathan Bloom

LECTURE 1. 1 Introduction. 1.1 Sample spaces and events

Basic Probability. Introduction

I - Probability. What is Probability? the chance of an event occuring. 1classical probability. 2empirical probability. 3subjective probability

RVs and their probability distributions

Chapter 2 PROBABILITY SAMPLE SPACE

God doesn t play dice. - Albert Einstein

Lecture Lecture 5

Lecture 3 Probability Basics

Introduction to Probability

MATH MW Elementary Probability Course Notes Part I: Models and Counting

Sample Space: Specify all possible outcomes from an experiment. Event: Specify a particular outcome or combination of outcomes.

Independence. P(A) = P(B) = 3 6 = 1 2, and P(C) = 4 6 = 2 3.

Chapter 2.5 Random Variables and Probability The Modern View (cont.)

Introduction to Probability and Sample Spaces

CMPSCI 240: Reasoning about Uncertainty

STAT 430/510 Probability

Notes Week 2 Chapter 3 Probability WEEK 2 page 1

Outline Conditional Probability The Law of Total Probability and Bayes Theorem Independent Events. Week 4 Classical Probability, Part II

4 Lecture 4 Notes: Introduction to Probability. Probability Rules. Independence and Conditional Probability. Bayes Theorem. Risk and Odds Ratio

P (A) = P (B) = P (C) = P (D) =

Example: Suppose we toss a quarter and observe whether it falls heads or tails, recording the result as 1 for heads and 0 for tails.

3.2 Probability Rules

Introduction to Probability. Ariel Yadin. Lecture 1. We begin with an example [this is known as Bertrand s paradox]. *** Nov.

Lecture Notes 1 Basic Probability. Elements of Probability. Conditional probability. Sequential Calculation of Probability

Probability (Devore Chapter Two)

Statistics for Financial Engineering Session 2: Basic Set Theory March 19 th, 2006

Section 13.3 Probability

Probability Notes (A) , Fall 2010

STAT:5100 (22S:193) Statistical Inference I

6.2 Introduction to Probability. The Deal. Possible outcomes: STAT1010 Intro to probability. Definitions. Terms: What are the chances of?

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

LECTURE NOTES by DR. J.S.V.R. KRISHNA PRASAD

the time it takes until a radioactive substance undergoes a decay

Probability deals with modeling of random phenomena (phenomena or experiments whose outcomes may vary)

Econ 325: Introduction to Empirical Economics

Discrete Mathematics and Probability Theory Fall 2012 Vazirani Note 14. Random Variables: Distribution and Expectation

ECE353: Probability and Random Processes. Lecture 2 - Set Theory

Chap 4 Probability p227 The probability of any outcome in a random phenomenon is the proportion of times the outcome would occur in a long series of

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 2 MATH00040 SEMESTER / Probability

Compound Events. The event E = E c (the complement of E) is the event consisting of those outcomes which are not in E.

Notes 6 Autumn Example (One die: part 1) One fair six-sided die is thrown. X is the number showing.

CIVL Why are we studying probability and statistics? Learning Objectives. Basic Laws and Axioms of Probability

Probability- describes the pattern of chance outcomes

Deep Learning for Computer Vision

Introduction to Probability 2017/18 Supplementary Problems

Let us think of the situation as having a 50 sided fair die; any one number is equally likely to appear.

Statistics 251: Statistical Methods

The enumeration of all possible outcomes of an experiment is called the sample space, denoted S. E.g.: S={head, tail}

Statistical Theory 1

Lecture 3 - Axioms of Probability

Introduction and basic definitions

MODULE 2 RANDOM VARIABLE AND ITS DISTRIBUTION LECTURES DISTRIBUTION FUNCTION AND ITS PROPERTIES

Lecture 1. ABC of Probability

Week 2. Section Texas A& M University. Department of Mathematics Texas A& M University, College Station 22 January-24 January 2019

Chapter 2. Conditional Probability and Independence. 2.1 Conditional Probability

HW2 Solutions, for MATH441, STAT461, STAT561, due September 9th

PERMUTATIONS, COMBINATIONS AND DISCRETE PROBABILITY

Toss 1. Fig.1. 2 Heads 2 Tails Heads/Tails (H, H) (T, T) (H, T) Fig.2

Probability Theory and Simulation Methods

Conditional Probability, Independence and Bayes Theorem Class 3, Jeremy Orloff and Jonathan Bloom

Sociology 6Z03 Topic 10: Probability (Part I)

Recap. The study of randomness and uncertainty Chances, odds, likelihood, expected, probably, on average,... PROBABILITY INFERENTIAL STATISTICS

STOR Lecture 4. Axioms of Probability - II

BASICS OF PROBABILITY CHAPTER-1 CS6015-LINEAR ALGEBRA AND RANDOM PROCESSES

CM10196 Topic 2: Sets, Predicates, Boolean algebras

Chapter 2. Conditional Probability and Independence. 2.1 Conditional Probability

ECE 340 Probabilistic Methods in Engineering M/W 3-4:15. Lecture 2: Random Experiments. Prof. Vince Calhoun

PROBABILITY CHAPTER LEARNING OBJECTIVES UNIT OVERVIEW

PROBABILITY THEORY 1. Basics

P (E) = P (A 1 )P (A 2 )... P (A n ).

Basic Statistics and Probability Chapter 3: Probability

Probability Year 9. Terminology

Probability Theory and Applications

If the objects are replaced there are n choices each time yielding n r ways. n C r and in the textbook by g(n, r).

1 Preliminaries Sample Space and Events Interpretation of Probability... 13

ECE 450 Lecture 2. Recall: Pr(A B) = Pr(A) + Pr(B) Pr(A B) in general = Pr(A) + Pr(B) if A and B are m.e. Lecture Overview

Probability COMP 245 STATISTICS. Dr N A Heard. 1 Sample Spaces and Events Sample Spaces Events Combinations of Events...

Mutually Exclusive Events

4/17/2012. NE ( ) # of ways an event can happen NS ( ) # of events in the sample space

EnM Probability and Random Processes

Example. If 4 tickets are drawn with replacement from ,

Properties of Probability

Almost Sure Convergence of a Sequence of Random Variables

Probability: Axioms, Properties, Interpretations

3 PROBABILITY TOPICS

Steve Smith Tuition: Maths Notes

Transcription:

MAS 108 Probability I Notes 1 Autumn 2005 Sample space, events The general setting is: We perform an experiment which can have a number of different outcomes. The sample space is the set of all possible outcomes of the experiment. We usually call it S. It is important to be able to list the outcomes clearly. For example, if I plant ten bean seeds and count the number that germinate, the sample space is S = {0,1,2,3,4,5,6,7,8,9,10}. (This notation means the set whose members are 0,1,...,9,10. See the handout on mathematical notation.) If I toss a coin three times and record the results of the three tosses, the sample space is S = {HHH,HHT,HT H,HT T,T HH,T HT,T T H,T T T }, where (for example) HT H means heads on the first toss, then tails, then heads again. If I toss a fair coin three times and simply count the number of heads obtained, this is a different experiment, with a different sample space, namely S = {0,1,2,3}. Sometimes we can assume that all the outcomes are equally likely. Don t assume this unless either you are told to, or there is some physical reason for assuming it. In the beans example, it is most unlikely. In the coins example, the assumption will hold if the coin is fair : this means that there is no physical reason for it to favour one side over the other. Of course, in the last example, the four outcomes would definitely not be equally likely! If all outcomes are equally likely, then each has probability 1/ S. (Remember that S is the number of elements in the set S.) On this point, Albert Einstein wrote, in his 1905 paper On a heuristic point of view concerning the production and transformation of light (for which he was awarded the Nobel Prize), 1

In calculating entropy by molecular-theoretic methods, the word probability is often used in a sense differing from the way the word is defined in probability theory. In particular, cases of equal probability are often hypothetically stipulated when the theoretical methods employed are definite enough to permit a deduction rather than a stipulation. In other words: Don t just assume that all outcomes are equally likely, especially when you are given enough information to calculate their probabilities! An event is a subset of S. We can specify an event by listing all the outcomes that make it up. In the above example, let A be the event more heads than tails and B the event tails on last throw. Then A = {HHH,HHT,HT H,T HH}, B = {HHT,HT T,T HT,T T T }. The probability of an event is calculated by adding up the probabilities of all the outcomes comprising that event. So, if all outcomes are equally likely, we have P(A) = A S. In our example, both A and B have probability 4/8 = 1/2. An event is simple if it consists of just a single outcome, and is compound otherwise. In the example, A and B are compound events, while the event heads on every throw is simple (as a set, it is {HHH}). If A = {a} is a simple event, then the probability of A is just the probability of the outcome a, and we usually write P(a), which is simpler to write than P({a}). (Note that a is an outcome, while {a} is an event, indeed a simple event.) We can build new events from old ones: A B (read A union B ) consists of all the outcomes in A or in B (or both!); A B (read A intersection B ) consists of all the outcomes in both A and B; A \ B (read A minus B ) consists of all the outcomes in A but not in B; A (read complement of A ) consists of all outcomes not in A (that is, S \ A); /0 (read empty set ) for the event which doesn t contain any outcomes. Note the backward-sloping slash; this is not the same as either a vertical slash or a forward slash /. We also write 2

A B (read A is a subset of B or A is contained in B ) to show that every outcome in A is also in B. In the example, A is the event at least as many tails as heads, and A B is the event {HHT }. Note that P(A B) = 1/8; this is not equal to P(A) P(B), despite what you read in some books! What is probability? Nobody really knows the answer to this question. Some people think of it as limiting frequency. That is, to say that the probability of getting heads when a coin is tossed means that, if the coin is tossed many times, it is likely to come down heads about half the time. But if you toss a coin 1000 times, you are not likely to get exactly 500 heads. You wouldn t be surprised to get only 495. But what about 450, or 100? Some people would say that you can work out probability by physical arguments, like the one we used for a fair coin. But this argument doesn t work in all cases (for example, if one side of the coin is made of lead and the other side of aluminium); and it doesn t explain what probability means. Some people say it is subjective. You say that the probability of heads in a coin toss is 1/2 because you have no reason for thinking either heads or tails more likely; you might change your view if you knew that the owner of the coin was a magician or a con man. But we can t build a theory on something subjective. We regard probability as a way of modelling uncertainty. We start off with a mathematical construction satisfying some axioms (devised by the Russian mathematician A. N. Kolmogorov). We develop ways of doing calculations with probability, so that (for example) we can calculate how unlikely it is to get 480 or fewer heads in 1000 tosses of a fair coin. The answer agrees well with experiment. Kolmogorov s Axioms Remember that an event is a subset of the sample space S. Two events A and B are called disjoint if A B = /0, that is, no outcome is contained in both. A number of events, say A 1,A 2,..., are called mutually disjoint or pairwise disjoint if A i A j = /0 for any two of the events A i and A j ; that is, no two of the events overlap. According to Kolmogorov s axioms, each event A has a probability P(A), which is a number. These numbers satisfy three axioms: Axiom 1: For any event A, we have P(A) 0. Axiom 2: P(S) = 1. 3

Axiom 3: If the events A 1,A 2,... are pairwise disjoint, then P(A 1 A 2 ) = P(A 1 ) + P(A 2 ) + Note that in Axiom 3, we have the union of events and the sum of numbers. Don t mix these up; never write P(A 1 ) P(A 2 ), for example. Sometimes we separate Axiom 3 into two parts: Axiom 3a if there are only finitely many events A 1,A 2,...,A n, so that we have P(A 1 A n ) = n i=1 P(A i ), and Axiom 3b for infinitely many. We will only use Axiom 3a, but 3b is important later on. Notice that we write n P(A i ) for i=1 Proving things from the axioms P(A 1 ) + P(A 2 ) + + P(A n ). You can prove simple properties of probability from the axioms. That means that every step must be justified by appealing to an axiom. These properties seem obvious, just as obvious as the axioms; but the point of this game is that we assume only the axioms, and build everything else from that. Here are some examples of things proved from the axioms. There is really no difference between a theorem, a proposition, a lemma, and a corollary; they are all fancy words used by mathematicians for things that have to be proved. Usually, a theorem is a big, important statement; a proposition a rather smaller statement; a lemma is a stepping-stone on the way to a theorem; and a corollary is something that follows quite easily from a theorem or proposition that came before. Proposition 1 P(A ) = 1 P(A) for any event A. Proof Let A 1 = A and A 2 = A (the complement of A). Then A 1 A 2 = /0 (that is, the events A 1 and A 2 are disjoint), and A 1 A 2 = S. So P(A 1 ) + P(A 2 ) = P(A 1 A 2 ) (Axiom 3) = P(S) = 1 (Axiom 2). So P(A ) = P(A 2 ) = 1 P(A 1 ) = 1 P(A). 4

Once we have proved something, we can use it on the same basis as an axiom to prove further facts. Corollary P(A) 1 for any event A. Proof We have 1 P(A) = P(A ) by Proposition 1, and P(A ) 0 by Axiom 1; so 1 P(A) 0, from which we get P(A) 1. Remember that if you ever calculate a probability to be less than 0 or more than 1, you have made a mistake! Corollary P(/0) = 0. Proof For /0 = S, so P(/0) = 1 P(S) by Proposition 1; and P(S) = 1 by Axiom 2, so P(/0) = 0. Here is another result. Proposition 2 If A B, then P(A) P(B). Proof Take A 1 = A, A 2 = B \ A. Again we have A 1 A 2 = /0 (since the elements of B \ A are, by definition, not in A), and A 1 A 2 = B. So by Axiom 3, P(A 1 ) + P(A 2 ) = P(A 1 A 2 ) = P(B). In other words, P(A) + P(B \ A) = P(B). Now P(B \ A) 0 by Axiom 1; so as we had to show. Finite number of outcomes P(A) P(B), Although the outcome x is not the same as the event {x}, we are often lazy and write P(x) where we should write P({x}). Proposition 3 If the event B contains only a finite number of outcomes, say B = {b 1,b 2,...,b n }, then P(B) = P(b 1 ) + P(b 2 ) + + P(b n ). 5

Proof To prove the proposition, we define a new event A i containing only the outcome b i, that is, A i = {b i }, for i = 1,...,n. Then A 1,...,A n are pairwise disjoint (each contains only one element, which is in none of the others), and A 1 A 2 A n = B; so by Axiom 3, we have P(B) = P(A 1 ) + P(A 2 ) + + P(A n ) = P(b 1 ) + P(b 2 ) + + P(b n ). Corollary If the sample space S is finite, say S = {s 1,...,s n }, then P(s 1 ) + P(s 2 ) + + P(s n ) = 1. Proof For P(s 1 ) + P(s 2 ) + + P(s n ) = P(S) by Proposition 3, and P(S) = 1 by Axiom 2. Now we see that, if all the n outcomes are equally likely, and their probabilities sum to 1, then each has probability 1/n, that is, 1/ S. Now going back to Proposition 3, we see that, if all outcomes are equally likely, then P(A) = A S for any event A, justifying the principle we used earlier. Inclusion-Exclusion Principle A B A Venn diagram for two sets A and B suggests that, to find the size of A B, we add the size of A and the size of B, but then we have included the size of A B twice, so we have to take it off. In terms of probability: Proposition 4 For any two events A and B: P(A B) = P(A) + P(B) P(A B). Proof We now prove this from the axioms, using the Venn diagram as a guide. We see that A B is made up of three parts, namely A 1 = A B, A 2 = A \ B, A 3 = B \ A. 6

Indeed we do have A B = A 1 A 2 A 3, since anything in A B is in both these sets or just the first or just the second. Similarly we have A 1 A 2 = A and A 1 A 3 = B. The sets A 1,A 2,A 3 are pairwise disjoint. (We have three pairs of sets to check. Now A 1 A 2 = /0, since all elements of A 1 belong to B but no elements of A 2 do. The arguments for the other two pairs are similar you should do them yourself.) So, by Axiom 3, we have P(A) = P(A 1 ) + P(A 2 ), P(B) = P(A 1 ) + P(A 3 ), P(A B) = P(A 1 ) + P(A 2 ) + P(A 3 ). Substituting from the first two equations into the third, we obtain as required. P(A B) = P(A 1 ) + (P(A) P(A 1 )) + (P(B) P(A 1 )) = P(A) + P(B) P(A 1 ) = P(A) + P(B) P(A B) The Inclusion-Exclusion Principle extends to more than two events, but gets more complicated. Here it is for three events. C A B To calculate P(A B C), we first add up P(A), P(B), and P(C). The parts in common have been counted twice, so we subtract P(A B), P(A C) and P(B C). But then we find that the outcomes lying in all three sets have been taken off completely, so must be put back, that is, we add P(A B C). Proposition 5 For any three events A, B, C, we have P(A B C) = P(A)+P(B)+P(C) P(A B) P(A C) P(B C)+P(A B C). Proof Put D = A B. Then Proposition 4 shows that P(A B C) = P(D C) = P(D) + P(C) P(D C) = P(A B) + P(C) P(D C) = P(A) + P(B) P(A B) + P(C) P(D C), 7

using Proposition 4 again. The distributive law (see below, and also the Phrasebook) tells us that so yet another use of Proposition 4 gives D C = (A B) C = (A C) (B C), P(D C) = P((A C) (B C)) = P(A C) + P(B C) P((A B) (B C)) = P(A C) + P(B C) P(A B C). Substituting this into the previous expression for P(A B C) gives the result. Can you extend this to any number of events? Other results about sets There are other standard results about sets which are often useful in probability theory. Here are some examples. Proposition Let A,B,C be subsets of S. Distributive laws: (A B) C = (A C) (B C) and (A B) C = (A C) (B C). De Morgan s Laws: (A B) = A B and (A B) = A B. We will not give formal proofs of these. You should draw Venn diagrams and convince yourself that they work. You should also look in the Phrasebook. 8