Lecture 1: An introduction to probability theory

Similar documents
Statistical Inference

Module 1. Probability

1. When applied to an affected person, the test comes up positive in 90% of cases, and negative in 10% (these are called false negatives ).

MATH 556: PROBABILITY PRIMER

Properties of Probability

PROBABILITY VITTORIA SILVESTRI

Compatible probability measures

Lecture Notes 1 Basic Probability. Elements of Probability. Conditional probability. Sequential Calculation of Probability

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

Statistics for Financial Engineering Session 2: Basic Set Theory March 19 th, 2006

Lecture 9: Conditional Probability and Independence

the time it takes until a radioactive substance undergoes a decay

Section 2: Classes of Sets

Probability and distributions. Francesco Corona

18.600: Lecture 3 What is probability?

Chapter 4. Measure Theory. 1. Measure Spaces

2. AXIOMATIC PROBABILITY

Econ 325: Introduction to Empirical Economics

Important Concepts Read Chapter 2. Experiments. Phenomena. Probability Models. Unpredictable in detail. Examples

Mathematical Foundations of Computer Science Lecture Outline October 18, 2018

PROBABILITY. Contents Preface 1 1. Introduction 2 2. Combinatorial analysis 5 3. Stirling s formula 8. Preface

Basic counting techniques. Periklis A. Papakonstantinou Rutgers Business School

Statistics 1 - Lecture Notes Chapter 1

P (A B) P ((B C) A) P (B A) = P (B A) + P (C A) P (A) = P (B A) + P (C A) = Q(A) + Q(B).

Probability Theory Review

MA : Introductory Probability

LECTURE 1. 1 Introduction. 1.1 Sample spaces and events

HW2 Solutions, for MATH441, STAT461, STAT561, due September 9th

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Lecture 8: Probability

Lecture notes for probability. Math 124

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 3 9/10/2008 CONDITIONING AND INDEPENDENCE

Random experiments may consist of stages that are performed. Example: Roll a die two times. Consider the events E 1 = 1 or 2 on first roll

MATH 3C: MIDTERM 1 REVIEW. 1. Counting

If S = {O 1, O 2,, O n }, where O i is the i th elementary outcome, and p i is the probability of the i th elementary outcome, then

Probability 1 (MATH 11300) lecture slides

Probability Experiments, Trials, Outcomes, Sample Spaces Example 1 Example 2

1.1. MEASURES AND INTEGRALS

Discrete Mathematics and Probability Theory Fall 2013 Vazirani Note 12. Random Variables: Distribution and Expectation

Math 3338: Probability (Fall 2006)

Statistical Theory 1

The probability of an event is viewed as a numerical measure of the chance that the event will occur.

Lecture 3 Probability Basics

Lecture 2: Probability. Readings: Sections Statistical Inference: drawing conclusions about the population based on a sample

Basic Measure and Integration Theory. Michael L. Carroll

Recitation 2: Probability

Lecture 2: Random Variables and Expectation

Lecture Lecture 5

CHAPTER 1 SETS AND EVENTS

Formalizing Probability. Choosing the Sample Space. Probability Measures

CMPSCI 240: Reasoning about Uncertainty

12 1 = = 1

Measure and integration

The Theory behind PageRank

Chapter 6: Probability The Study of Randomness

Chapter 1: Probability Theory Lecture 1: Measure space and measurable function

Lecture 1 Introduction to Probability and Set Theory Text: A Course in Probability by Weiss

ELEG 3143 Probability & Stochastic Process Ch. 1 Probability

ECO 317 Economics of Uncertainty Fall Term 2009 Notes for lectures 1. Reminder and Review of Probability Concepts

STA Module 4 Probability Concepts. Rev.F08 1

Lecture 1. Chapter 1. (Part I) Material Covered in This Lecture: Chapter 1, Chapter 2 ( ). 1. What is Statistics?

Conditional Probability

Set theory background for probability

STAT 7032 Probability Spring Wlodek Bryc

18.175: Lecture 2 Extension theorems, random variables, distributions

EE 178 Lecture Notes 0 Course Introduction. About EE178. About Probability. Course Goals. Course Topics. Lecture Notes EE 178

Chapter 8 Sequences, Series, and Probability

Mathematical Probability

Independence. P(A) = P(B) = 3 6 = 1 2, and P(C) = 4 6 = 2 3.

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 16. Random Variables: Distribution and Expectation

Lecture 11: Random Variables

Lecture 8: Conditional probability I: definition, independence, the tree method, sampling, chain rule for independent events

n if n is even. f (n)=

4 Lecture 4 Notes: Introduction to Probability. Probability Rules. Independence and Conditional Probability. Bayes Theorem. Risk and Odds Ratio

Randomized Algorithms

Discrete Mathematics and Probability Theory Fall 2012 Vazirani Note 14. Random Variables: Distribution and Expectation

Lecture 2: Probability

The problem Countable additivity Finite additivity Conglomerability Structure Puzzle. The Measure Problem. Alexander R. Pruss

Mathematics for Informatics 4a

Discrete Mathematics and Probability Theory Fall 2014 Anant Sahai Note 15. Random Variables: Distributions, Independence, and Expectations

Lecture 10. Variance and standard deviation

Chapter 14. From Randomness to Probability. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Sample Spaces, Random Variables

Almost Sure Convergence of a Sequence of Random Variables

Random Variables. Definition: A random variable (r.v.) X on the probability space (Ω, F, P) is a mapping

Discrete Mathematics for CS Spring 2006 Vazirani Lecture 22

Probability (Devore Chapter Two)

2.6 Tools for Counting sample points

Math 105A HW 1 Solutions

Homework 1 Solutions ECEn 670, Fall 2013

Stats Probability Theory

Chapter 3 : Conditional Probability and Independence

Week 2: Probability: Counting, Sets, and Bayes

Deep Learning for Computer Vision

6.262: Discrete Stochastic Processes 2/2/11. Lecture 1: Introduction and Probability review

Chapter 2. Conditional Probability and Independence. 2.1 Conditional Probability

MATH2206 Prob Stat/20.Jan Weekly Review 1-2

Continuing Probability.

SDS 321: Introduction to Probability and Statistics

STAT:5100 (22S:193) Statistical Inference I

Transcription:

Econ 514: Probability and Statistics Lecture 1: An introduction to probability theory Random Experiments Random experiment: Experiment/phenomenon/action/mechanism with outcome that is not (fully) predictable. Examples Classical games of chance: Rolling a die, tossing a coin, selecting a card from a deck. Random sampling from a population: Study of household income distribution in LA. Why is selection at random preferred? Unpredictable social phenomena: S&P 500 index tomorrow. Unpredictable behavior: Choice of direction by player for penalty kick in soccer (randomized strategy) 1

Why is the outcome in a random experiment unpredictable? Mechanism that generates outcome too complicated or poorly understood: Rolling a die, tossing a coin, S&P 500. Mechanism that generates outcome is designed to be unpredictable: Shuffling cards (see below), sampling at random from population, randomized strategy in game. Coincidences/independent chains of events: Getting wet on entering a building because a glass is emptied from a window above the entrance. 2

It requires effort to get a really unpredictable outcome. Example: Card shuffling (Aldous and Diaconis, American Mathematical Monthly, 1986, p. 333-348). Top in at random shuffle: Take top card from a deck and insert it at a random position in the deck. How many repeats do you need to shuffle a deck of n cards? Answer: n log n. Follow card at bottom of deck. After about T 1 = n insertions there will be a card inserted below the bottom card, and after T 2 = n 2 a second after the original bottom card (the second card can be below the original or new bottom card; hence the number of insertions is halved). The two cards below the original bottom card can be in order low-high or high-low and these are equally likely. After T n 1 insertions the original bottom card is at the top of the deck. The n 1 cards below are in random order, and if the bottom card then is inserted at random all n cards are in random order. We have T = n + n 2 + n 3 + n 4 + + 1 n log n. For instance of n = 52 then we need about 205 insertions. 3

Randomness can also be generated by a complicated mechanism: Random number generators Compare a. Select a real number at random from [0, 1] (see figure). 4

b. Compute for n = 1, 2,... x n+1 = ax n + b modulo c with x 0, a, b, c positive integers and let u n = x n c For good choices of a, b, c with a very large, the sequence u 1, u 2,... is like a sequence of numbers picked at random from [0, 1]. This is the linear congruential method for random number generation. It is not perfect, see e.g. Press et al, Numerical Recipes, Chapter 7. 5

Probabilities Important step in analysis of random experiments: Use of probabilities. Probability is a number (in [0, 1]) that measures the likelihood of an outcome or a set of outcomes. How do we assign probabilities to (sets of) outcomes? Symmetry: Assume that all outcomes are equally likely, e.g. rolling a die. Experimental method: If random experiment can be repeated, we approximate the probability of an outcome by the relative frequency in repetitions. Subjective method: Assign probabilities using knowledge of random experiment (complete or incomplete). Market method: Offer a bet of $1 if the outcome occurs. If buyers of bets are risk neutral they will offer $p for this bet with p the probability of the outcome. The subjective method is the most common: Probabilities are assigned using a probability model of the random experiment. The probability model gives a formula from which the probabilities of (sets of) outcomes can be computed. Probability theory is the branch of mathematics that analyzes random experiments formally. 6

Remember: A formally correct analysis using a probability model that is a bad representation of a random experiment may yield wrong conclusions! 7

Probability theory Starting point is the probability space that consists of the triple Sample space Class of events Probability function/measure We consider the elements in order The sample/outcome space is the set of all possible outcomes of a random experiment. We denote this set by Ω or S. An outcome is ω Ω. An event is a collection of outcomes, i.e. a subset E of Ω. A probability function/measure is a function from a collection A of subsets of Ω to the interval [0, 1], i.e. P : A [0, 1]. Both the collection of events A and the probability measure P must satisfy certain requirements. 8

Classes of events Events E 1, E 2,... can be related in the usual set theoretic ways, e.g. E 1 E 2 or E 1 = E 2, i.e. E 1 E 2 and E 2 E 1. New events can be created by the set theory operations E 3 = E 1 E 2 E 3 = E 1 E 2 E 3 = E c 1 E 3 = E 1 \ E 2 = E 1 E c 2 Example: rolling a single die Sample space Events so that Ω = {1, 2, 3, 4, 5, 6} E 1 = {1, 2, 3} E 2 = {2, 4, 6} E 1 E 2 = {1, 2, 3, 4, 6} E 1 E 2 = {2} E c 2 = {1, 3, 5} E 1 \ E 2 = {1, 3} 9

Special relations between events Two events E 1, E 2 are disjoint if E 1 E 2 = A collection of events E 1, E 2,... is called a partition of the sample space Ω if the events are pairwise disjoint and i=1 E i = Ω. In measure and probability theory a collection of events that is called a sigma (σ) field or algebra or Borel field (these names are used interchangeably) is particularly important. Definition: A collection A of events (subsets of Ω) is a σ-field if (i) A (ii) E A E c A (closed under complementation) (iii) E 1, E 2,... A i=1 E i A (closed under countable union) Examples: Trivial σ-field: A = {, Ω} Largest σ-field: the set of subsets of Ω (the powerset of Ω) Rolling a single die with Ω = {1, 2, 3, 4, 5, 6}. Two σ-fields A 1 = {, Ω, {1}, {2, 3, 4, 5, 6}} The set of all subsets of Ω (2 6 = 64 subsets) 10

Often a σ-fields is obtained from a set of events E that is itself not a σ-field. We define the smallest σ-field that contains E as σ(e) = {E Ω E A for every σ-field A with E A}. This definition works if the collection of σ-fields A i that contain E, i.e. E A i, is not empty (it contains the powerset) and their intersection, which is σ(e), is itself a σ-field (check this yourself!). Example: Ω = {1, 2, 3, 4, 5} and E = {{1, 2, 3}, {3, 4, 5}}. Now σ(e) contains of, Ω and F 1 = E 1 E c 2 = {1, 2}, F 2 = E 1 E 2 = {3}, F 3 = E c 1 E 2 = {4, 5}, F 4 = F 1 F 2 = {1, 2, 4, 5}. Hence σ(e) = {, Ω, E 1, E 2, F 1, F 2, F 3, F 4 }. 11

An important sample space is Ω = R. The usual σ-field on R is the Borel σ-field B that is the smallest σ-field that contains all the open subsets of R. It can be generated by other sets E, a particularly important one being E = {(, x] x R}. We show that σ(e) = B. First show that σ(e) B by showing that E B. We have (, x] = i=1 (, x + 1/n) B. Next to show B σ(e), we must show that each open set is in σ(e). Remember that each open set B in R can be written as B = i=1 (a i, b i ), and (a, b) = (, b) (, a] c with (, b) = n=1(, b 1/n]. There are other classes of subsets (events) that can be considered. These are important if one wants to show that all sets in σ-field A have a certain property. This is done by a generating class argument Show that all sets in E have the property. Show that A σ(e). Show that A 0 = {A A A has the property} is itself a σ-field, so that σ(e) A 0, because σ(e) is the smallest σ-field generated by sets that have the property. Then, A σ(e) A 0 A. 12

Often it is too difficult to show that A 0 is a σ-field. As we shall see it is enough to show that A 0 is a λ-system of subsets. E is closed under finite intersections (weaker than countable intersections). A 0 is a λ-system if (i) Ω A 0 (ii) If D 1, D 2 A 0 and D 2 D 1, then D 1 \ D 2 = D 1 D c 2 A 0. (iii) If D n is an increasing sequence of sets in A 0, then i=1 D i A 0. You should show that a λ-system is a σ-field iff it is closed under finite intersections. 13

Theorem 1 If E is closed under finite intersections, and if A 0 is a λ-system with E A 0, then σ(e) A 0. Proof Let A 1 be another λ-system, defined as the smallest λ-system that contains E. It is the intersection of all λ-systems that contain E, one of them being A 0 (show that this intersection is itself a λ-system). We have to show that D 1 D 2 A 1 for all D 1, D 2 A 1, because then A 1 is a σ-field and hence σ(e) A 1 A 0. Define A 2 = {A A E A 1 E E}. Because E is closed under finite intersections E A 2. If A 2 is a λ-system then A 1 A 2. You should show that A 2 is a λ-system by using (A 1 \ A 2 ) E = (A 1 E) \ (A 2 E) and ( A i ) E) = (A 1 E). Because A 1 A 2, we have that D 1 E A 1 for all D 1 A 1 and E E. Define A 3 = {B B D A 1 D A 1 }. Hence E A 3 and if A 3 is a λ-system (show this), then A 1 A 3. Hence, D 1 D 2 A 1 for all D 1, D 2 A 1. The choice of the σ-field is usually determined by the nature of the outcome space. Main cases Discrete, i.e. countable, outcome set: σ-field is the set of all possible subsets of Ω. Continuous, i.e. the outcome set is the real line or a subset thereof: σ-field is the Borel σ-field. 14

Probability measure A probability measure is a function P : A R with A a σ-field (i) For all E A, P (E) 0 (ii) P (Ω) = 1 (iii) If E 1, E 2,... are pairwise disjoint, then P ( i=1 E i) = i=1 P (E i) Except for (iii) these assumptions are reasonable. In (iii) the only problem may be with that infinite union and sum case. The definition imposes some obvious restrictions on how we assign probability to events, but it does now specify what numbers should be assigned. The usual way to specify probabilities using a probability model is to first assign probabilities to some simple collection of events E. These assignments can be extended to the σ-field σ(e). Usually (but that has to be shown) that extension is unique. If so, then we have found a probability measure on σ(e). 15

Example of construction of a probability measure for a discrete outcome space: Rolling a single die. Take as σ-field A the set of all subsets of Ω. Define the probability measure by P (E) = #E 6. This applies to all sets in A and it satisfies the conditions for a probability measure. For discrete outcome spaces with a finite or infinite number of outcomes we can use the same construction: start by assigning probabilities p i to single outcomes and define P (E) = i E p i. 16

For continuous outcome spaces, e.g. Ω = R, the situation is more complicated. Example of construction of a probability measure on the Borel σ-field B. 1. Assign probabilities to E = {(, x] x R}. 2. Note that this set is closed under finite intersection. 3. Let P, P be two probability measures that are the same on E. Then we must show that for all B B P (B) = P (B). 4. To show this consider B 0 = {B B P (B) = P (B)}. This is a λ-system. Consider e.g. condition (iii) with an increasing sequence B n of events in B 0. Then the increasing and bounded sequences P (B n ), P (B n ) have a limit, and P (B) = lim P (B n ) = lim P (B n ) = P (B). Show that lim P (B n ) = P (B) with B = n=1b n. The other conditions for the λ-system can be easily verified. 5. Invoke Theorem 1 to show that P (B) = P (B) for all B B. 17

It is useful to derive some results for probability measures from the definition: (i) P (E c ) = 1 P (E) Proof: 1 (ii) = P (Ω) = P (E c E) (iii) = P (E c ) + P (E) (ii) P ( ) = 0 Proof: Use (i) with E = Ω. (iii) P (E) 1 Proof: P (E) = 1 P (E c ) 1, because P (E c ) 0. (iv) P (E 2 \ E 1 ) = P (E 2 ) P (E 2 E 1 ) Proof: Because E 2 = E 2 (E 1 E c 1) = (E 2 E 1 ) (E 2 E c 1) and (E 2 E 1 ) (E 2 E c 1) =, P (E 2 ) = P (E 2 E 1 ) + P (E 2 E c 1) or by the definition of E 2 \ E 1 P (E 2 \ E 1 ) = P (E 2 ) P (E 2 E 1 ) (v) P (E 1 E 2 ) = P (E 1 ) + P (E 2 ) P (E 1 E 2 ) Proof: Because E 1 E 2 = E 1 (E 2 \ E 1 ) and E 1 (E 2 \ E 1 ) =, P (E 1 E 2 ) = P (E 1 )+P (E 2 \E 1 ) = P (E 1 )+P (E 2 ) P (E 1 E 2 ) (vi) E 1 E 2 P (E 1 ) P (E 2 ) Proof: If E 1 E 2, the E 1 E 2 = E 1 and E 2 = E 1 (E 2 \ E 1 ), so that P (E 2 ) = P (E 1 ) + P (E 2 \ E 1 ) 18

(vii) P (E 1 E 2 ) P (E 1 ) + P (E 2 ) Proof: Follows from (v). (viii) P (E 2 E 1 ) P (E 1 )+P (E 2 ) 1 (Bonferroni inequality) Proof: Follows directly from (v). (ix) If E 1, E 2,... is a partition of Ω, then P (A) = i=1 P (A E i ) (Law of Total Probability) Proof: A = A Ω = A ( i=1 E i) = i=1 (A E i) which is a countable union of disjoint sets and the result follows. (x) P ( i=1 E i) i=1 P (E i) (Boole s inequality) Proof: By mathematical induction: P (E 1 ) P (E 1 ) If P ( n i=1 E i) n i=1 P (E i), then P ( n+1 i=1 E i) = P (( n i=1e i ) E n+1 ) n P (E i )+P (E n+1 ) i=1 Hence P ( n i=1 E i) n i=1 P (E i) for all n. Because the sequence on the left-hand side is increasing and bounded, it has a limit. So we can let n to get the result. 19

Random experiments with equally likely outcomes Random experiment with I outcomes Ω = {ω 1,..., ω I }. Without loss of generality we choose ω i = i. If all outcomes equally likely: P (i) = 1 #E I and P (E) = I, i.e. we assign probabilities by counting the number of outcomes in E. Example: In lottery you must correctly pick 4 numbers from 1-10. Probability of winning is 1 #distinct lottery tickets The number of distinct tickets depends on the rules of the lottery Same number can appear repeatedly or not Order of numbers matters or not The number of distinct lottery tickets is equal to the number of selections of 4 numbers from the numbers 1,..., 10. In general: selection of k elements from n distinct elements. Four possibilities a. Selection without replacement, i.e. no duplication among k selections, or with replacement, i.e. duplication among k selections is allowed. b. Order among k matters (ordered selection) or not (unordered selection). 20

1. Ordered, without replacement # selections = n(n 1)... (n k + 1) = 2. Ordered, with replacement # selections = n.n... n = n k n! (n k)! 3. Unordered, without replacement # selections ordered, without replacement # selections = # permutations of each selection ( ) n! n = k!(n k)! = k = 21

4. Unordered, with replacement Equivalent: Put k objects in n bins with more than 1 object in a bin allowed. Label them according to the bin and remove the objects. This is the selection. Code the result of putting k objects in n bins as a sequence of 0-s and 1-s with a 0 indication that all the objects in a bin have been accounted for. If there is no object in the bin, use a single 0. Also if bin n contains objects do not use a 0 to indicate that all objects are accounted for and in particular do not use a 0 for the final bin if it is empty. You can think of the 0 s as bin boundaries (n 1 boundaries for n bins, because the left boundary of bin 1 and the right one of bin n can be omitted). Example: n = 3, k = 2, one object in bin 2 and one in bin 3. Code as 0101 In general: n 1 0 s and k 1 s. This is equivalent with picking the positions of k 1-s among n + k 1 positions. Because we cannot distinguish the 1-s the order does not matter and ( ) n + k 1 # selections = k 22

Application of counting methods: Debate between Samuel Pepys (1633-1703) and Isaac Newton (1642-1727). Question: In a random experiment in which a dies is rolled repeatedly P (at leat 1 six in 6 rolls) P (at leat 2 six in 12 rolls a. P (at least 1 six in 6 rolls). With 6 rolls of a die # outcomes = # elements of sample space = 6 6 E is the event of at least 1 six in 6 rolls. Hence E c is the event of no 6 in 6 rolls. Hence P (E) = 1 P (E c ) = 1 # outcomes in E c = 5 6 # elements in Ec # elements in Ω = 1 56 6 6 =.665 b. P (at leat 2 six in 12 rolls). Now the sample space has 6 12 outcomes. Define E as the event of at least 2 six in 12 rolls. The E c occurs if there are 0 six in 12 rolls or 1 six in 12 rolls. # outcomes with 0 six = 5 12 # outcomes with 1 six = 12.5 11 because we can select the toll that gives a six in 12 ways and the no six outcomes on the other 11 rolls can be chosen in 5 11 ways. Hence P (E) = 1 P (E c ) = 1 512 + 12.5 11 6 12 =.619 23