LECTURE 1. 1 Introduction. 1.1 Sample spaces and events

Similar documents
Origins of Probability Theory

CMPSCI 240: Reasoning about Uncertainty

Statistical Inference

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 3 9/10/2008 CONDITIONING AND INDEPENDENCE

Lecture Notes 1 Basic Probability. Elements of Probability. Conditional probability. Sequential Calculation of Probability

the time it takes until a radioactive substance undergoes a decay

Lecture 1. ABC of Probability

Introduction to Probability. Ariel Yadin. Lecture 1. We begin with an example [this is known as Bertrand s paradox]. *** Nov.

4 Lecture 4 Notes: Introduction to Probability. Probability Rules. Independence and Conditional Probability. Bayes Theorem. Risk and Odds Ratio

Probability Theory Review

Chapter 6: Probability The Study of Randomness

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

Probability Theory and Applications

Probabilistic models

P (A B) P ((B C) A) P (B A) = P (B A) + P (C A) P (A) = P (B A) + P (C A) = Q(A) + Q(B).

Discrete Probability. Mark Huiskes, LIACS Probability and Statistics, Mark Huiskes, LIACS, Lecture 2

MA : Introductory Probability

Fundamentals of Probability CE 311S

Dept. of Linguistics, Indiana University Fall 2015

Venn Diagrams; Probability Laws. Notes. Set Operations and Relations. Venn Diagram 2.1. Venn Diagrams; Probability Laws. Notes

Lecture Lecture 5

ELEG 3143 Probability & Stochastic Process Ch. 1 Probability

I - Probability. What is Probability? the chance of an event occuring. 1classical probability. 2empirical probability. 3subjective probability

Statistical Theory 1

HW MATH425/525 Lecture Notes 1

Statistics for Financial Engineering Session 2: Basic Set Theory March 19 th, 2006

Properties of Probability

2. AXIOMATIC PROBABILITY

Lecture 9: Conditional Probability and Independence

Conditional Probability

Sample Space: Specify all possible outcomes from an experiment. Event: Specify a particular outcome or combination of outcomes.

Lecture 1. Chapter 1. (Part I) Material Covered in This Lecture: Chapter 1, Chapter 2 ( ). 1. What is Statistics?

Chap 4 Probability p227 The probability of any outcome in a random phenomenon is the proportion of times the outcome would occur in a long series of

Mutually Exclusive Events

Independence. P(A) = P(B) = 3 6 = 1 2, and P(C) = 4 6 = 2 3.

Probability Review I

Chapter 7 Wednesday, May 26th

Recap of Basic Probability Theory

Basic Statistics and Probability Chapter 3: Probability

Probability- describes the pattern of chance outcomes

Chapter 2.5 Random Variables and Probability The Modern View (cont.)

Chapter 2: Probability Part 1

The enumeration of all possible outcomes of an experiment is called the sample space, denoted S. E.g.: S={head, tail}

Econ 325: Introduction to Empirical Economics

STA111 - Lecture 1 Welcome to STA111! 1 What is the difference between Probability and Statistics?

Chapter 3 : Conditional Probability and Independence

Deep Learning for Computer Vision

Recap of Basic Probability Theory

F71SM STATISTICAL METHODS

Lecture 2: Probability, conditional probability, and independence

STAT 430/510 Probability

ECE353: Probability and Random Processes. Lecture 2 - Set Theory

Sample Spaces, Random Variables

Conditional Probability and Bayes

Mathematical Foundations of Computer Science Lecture Outline October 18, 2018

Topic -2. Probability. Larson & Farber, Elementary Statistics: Picturing the World, 3e 1

Lecture 3 - Axioms of Probability

Recitation 2: Probability

ECE 302: Chapter 02 Probability Model

Axioms of Probability

Intermediate Math Circles November 8, 2017 Probability II

Topic 5 Basics of Probability

Announcements. Topics: To Do:

Conditional Probability

EE 178 Lecture Notes 0 Course Introduction. About EE178. About Probability. Course Goals. Course Topics. Lecture Notes EE 178

Probability and Sample space

Chapter 2 PROBABILITY SAMPLE SPACE

Probabilistic models

Chapter 2 Probability

ELEG 3143 Probability & Stochastic Process Ch. 1 Experiments, Models, and Probabilities

Probability theory basics

Randomized Algorithms. Andreas Klappenecker

The probability of an event is viewed as a numerical measure of the chance that the event will occur.

MAT2377. Ali Karimnezhad. Version September 9, Ali Karimnezhad

1 Probability Theory. 1.1 Introduction

Week 2. Section Texas A& M University. Department of Mathematics Texas A& M University, College Station 22 January-24 January 2019

UNIT Explain about the partition of a sampling space theorem?

Lecture 8: Conditional probability I: definition, independence, the tree method, sampling, chain rule for independent events

STAT:5100 (22S:193) Statistical Inference I

Lecture notes for probability. Math 124

STAT Chapter 3: Probability

Mean, Median and Mode. Lecture 3 - Axioms of Probability. Where do they come from? Graphically. We start with a set of 21 numbers, Sta102 / BME102

Chapter 1: Introduction to Probability Theory

MATH MW Elementary Probability Course Notes Part I: Models and Counting

Toss 1. Fig.1. 2 Heads 2 Tails Heads/Tails (H, H) (T, T) (H, T) Fig.2

Basic Concepts of Probability

CS4705. Probability Review and Naïve Bayes. Slides from Dragomir Radev

Probability & Random Variables

Statistics for Linguists. Don t pannic

AMS7: WEEK 2. CLASS 2

What is Probability? Probability. Sample Spaces and Events. Simple Event

Lecture 16. Lectures 1-15 Review

STAT 111 Recitation 1

CINQA Workshop Probability Math 105 Silvia Heubach Department of Mathematics, CSULA Thursday, September 6, 2012

Probability and Conditional Probability

Chapter 2 Class Notes

1. When applied to an affected person, the test comes up positive in 90% of cases, and negative in 10% (these are called false negatives ).

Conditional Probability

Probability Theory and Random Variables

Compound Events. The event E = E c (the complement of E) is the event consisting of those outcomes which are not in E.

Transcription:

LECTURE 1 1 Introduction The first part of our adventure is a highly selective review of probability theory, focusing especially on things that are most useful in statistics. 1.1 Sample spaces and events A typical way to go about defining things is to suppose that we conduct an experiment. An experiment is a measurement of a random (stochastic) process. Our measurements take values in some set Ω: this is the sample space. The sample space effectively defines all possible outcomes of our measurement. Examples: Suppose I toss a coin: in this case the sample space Ω = {H, T }. If I measure the reaction time to some stimulus the sample space Ω = (0, ). Suppose I toss a coin twice: what is the sample space? An event is some subset of A Ω, i.e., it is a subset of possible outcomes of our experiment. We say that an event A occurs if the outcome of our experiment lies in the set A. 1.1.1 A quick aside on set-theoretic notation Here is some basic notation for set operations. (Subset) A B means that all elements in A are also in B. (Complement) A c : elements that are not in A. (Empty set) Ω c =. (Union) A B: elements that are either in A or in B, or both. (Intersection) A B: elements that are both in A and B. Sometimes we use AB for brevity. 1

(Set difference) A\B = A (B c ): elements that are in A but not in B. (Symmetric difference) A B = (A\B) (B\A): elements that are in A or B, but not both. (Cardinality) A denotes the number of elements in A. Exercise: Prove that (A B) c = A c B c and (A B) c = A c B c. 1.2 Probability Distributions A probability distribution is a mapping from events to real numbers that satisfies certain axioms. We denote this mapping by P : A R. The axioms are: 1. Non-negativity: P(A) 0, A Ω. 2. Unity of Ω: P(Ω) = 1. 3. Countable additivity: For a collection A 1, A 2,..., of disjoint sets we must have that, ( ) P A i = i=1 P(A i ). i=1 We can use these axioms to show several useful and intuitive properties of probability distributions: P( ) = 0. A B P(A) P(B). 0 P(A) 1. P(A c ) = 1 P(A). P(A B) = P(A) + P(B) P(A B). All of these properties can be understood via a Venn diagram. Example: Suppose I toss a fair coin twice, and denote by H 1 the event that the first coin lands heads, and H 2 the event that the second coin lands heads. Calculate P(H 1 H 2 ). 2

We can use the above formula: P(H 1 H 2 ) = P(H 1 ) + P(H 2 ) P(H 1 H 2 ) = 0.5 + 0.5 0.25 = 0.75. Exercise: Derive an analogous formula for P( n i=1a i ). 1.3 Counting and the uniform distribution on discrete sets Suppose we toss a die twice. There are 36 possible outcomes: Ω = {(t 1, t 2 ) : t 1, t 2 = 1, 2, 3, 4, 5, 6}. If the die is fair then each outcome is equally likely. This is an example of a uniform distribution on a discrete set. The general rule of calculating probability of an event under the uniform distribution in finite sample space is P(A) = A Ω. For example, let A be the event that the sum of two tosses being less than five. A = {(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (3, 1)}. Thus P(A) = 6/36 = 1/6. Then Example: There are two black balls and three white balls in a bag. Two balls are randomly drawn, without replacement, from the bag. What is the probability of the two balls have different color? What is the probability if the balls are drawn with replacement? When drawing without replacement, the sample space Ω has cardinality Ω = 5 4 = 20, and the event A has cardinality A = 2 3 + 3 2 = 12 (first white and second black, or first black second white). So P(A) = 12/20 = 0.6. When drawing with replacement, the sample space Ω has cardinality 5 5 = 25, and A still has cardinality 12. Then P(A) = 12/25 = 0.48. More generally, calculating probabilities under the uniform distribution on a discrete set is based on counting. 2 Conditional probability Definition 1 (Conditional probability) If P(B) > 0, the conditional probability of A given B is P(A B) P(A B) :=. (1) P(B) 3

Example: Consider tossing a fair die. Let A be the event that the result is an odd number, and B = {1, 2, 3}. Then P (A B) = 2/3, and P (A) = 1/2. In this example A and B are not independent. Remark: In general P(A B) P(B A). The chain rule: A simple re-writing of the above expression yields the so-called chain rule: P(A B) = P(B)P(A B) = P(A)P(B A). More generally, P(A 1 A 2 A 3...) = P(A 1 )P(A 2 A 1 )P(A 3 A 2, A 1 ).... 3 Independence of Events Independence roughly asks the question of whether one event provides any information about another. For example, if we toss a fair coin twice, let H i be the event that the ith toss is head (i = 1, 2), then intuitively knowing if the event H 1 occurred or not does not provide any information about H 2. The formal definition of independence is Definition 2 (Independence) Two events A and B are called independent if P(A B) = P(A)P(B). (2) A set of events A j (j I) are called mutually independent if ( ) P A j = P(A j ), j J for any finite subset J of I. j J Conditional probability gives another interpretation of independence: A and B are independent if the unconditional probability is the same as the conditional probability. Example: We can formally verify the coin toss example: P (A 1 ) = P (A 2 ) = 1/2, and P (A 1 A 2 ) = 1/4. Example: To see a less obvious example, consider tossing a fair die once. Let A = {1, 2, 3, 4}, B = {2, 4, 6}. Then AB = {2, 4}. P(A) = 2/3, P(B) = 1/2, and P(AB) = 1/3 = P(A)P(B). Then we conclude that A and B are independent. Here are some simple facts about independence. 4

1. Ω is independent of any other event. The same holds for. 2. If A, B are disjoint, both having positive probability, then A and B cannot be independent. 3. If A and B are independent, then A c and B are also independent. When combined with other properties of probability, independence can sometimes allow very easy calculation of the probability of certain events. Example: Consider tossing a fair coin, what is the probability of at least one head in the first 10 tosses? Let A be the event of at least one head in 10 tosses. Then A c is the event of no heads in 10 tosses, or equivalently, all 10 tosses being tail. Therefore P(A) = 1 P(A c ) = 1 2 10. 4 Bayes Rule Roughly Bayes rule allows us to calculate the probability of B A from the probability of A B. As a preliminary we need the following: Theorem 3 (Law of total probability) Let A 1,..., A k be a partition of Ω. The for any B, k P(B) = P(B A i )P(A i ). i=1 Proof. The claim follows by observing that A i B (i = 1,..., k) forms a partition of B, and P(A i B) = P(B A i )P(A i ). The law of total probability is a combination of additivity and conditional probability. It leads to the very useful Bayes theorem. Theorem 4 (Bayes Rule) Let A 1,..., A k be a partition of Ω. Then P(A i B) = P(B A i )P(A i ) k i=1 P(B A i)p(a i ). The proof is simple. The numerator is just P(A i B) and the denominator is P(B). This is useful when P(A i B) is not obvious to calculate but P(B A i ) and P(A i ) are easy to find. A typical application is classification. 5

Example: Suppose there are three types of emails: A 1 = spam, A 2 = low priority, A 3 = high priority. Based on previous experience, P(A 1 ) = 0.85, P(A 2 ) = 0.1, P(A 3 ) = 0.05. Let B be the event that an email contains the word free, then P(B A 1 ) = 0.9, P(B A 2 ) = 0.1, P(B A 3 ) = 0.1. Now a new coming email contains the word free, what is the probability that it is spam? Answer: P(B A 1 )P(A 1 ) P(A 1 B) = P(B A 1 )P(A 1 ) + P(B A 2 )P(A 2 ) + P(B A 3 )P(A 3 ) 0.85 0.9 = 0.85 0.9 + 0.1 0.1 + 0.05 0.1 0.98. 6