Elementary Probability. Exam Number 38119

Similar documents
I. ANALYSIS; PROBABILITY

Lecture 2: Random Variables and Expectation

18.175: Lecture 2 Extension theorems, random variables, distributions

Probability Theory. Richard F. Bass

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

1.1. MEASURES AND INTEGRALS

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

STAT 7032 Probability Spring Wlodek Bryc

Random Process Lecture 1. Fundamentals of Probability

Probability and Random Processes

Three hours THE UNIVERSITY OF MANCHESTER. 24th January

Notes 1 : Measure-theoretic foundations I

1: PROBABILITY REVIEW

Chapter 1. Sets and probability. 1.3 Probability space

Lecture 1: Review on Probability and Statistics

Stat 451: Solutions to Assignment #1

Probability: Handout

MATHS 730 FC Lecture Notes March 5, Introduction

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

Northwestern University Department of Electrical Engineering and Computer Science

7 The Radon-Nikodym-Theorem

Basic Measure and Integration Theory. Michael L. Carroll

Lebesgue Integration: A non-rigorous introduction. What is wrong with Riemann integration?

Lecture 21: Expectation of CRVs, Fatou s Lemma and DCT Integration of Continuous Random Variables

DS-GA 1002 Lecture notes 2 Fall Random variables

Lebesgue Measure. Dung Le 1

P. Billingsley, Probability and Measure, Wiley, New York, P. Gänssler, W. Stute, Wahrscheinlichkeitstheorie, Springer, Berlin, 1977.

Notes on Measure, Probability and Stochastic Processes. João Lopes Dias

Lecture 6 Basic Probability

Chapter 4. Measure Theory. 1. Measure Spaces

Spring 2014 Advanced Probability Overview. Lecture Notes Set 1: Course Overview, σ-fields, and Measures

Admin and Lecture 1: Recap of Measure Theory

ELEMENTS OF PROBABILITY THEORY

II - REAL ANALYSIS. This property gives us a way to extend the notion of content to finite unions of rectangles: we define

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define

1. Probability Measure and Integration Theory in a Nutshell

02. Measure and integral. 1. Borel-measurable functions and pointwise limits

18.175: Lecture 3 Integration

Problem set 1, Real Analysis I, Spring, 2015.

Measure and integration

Advanced Probability

17. Convergence of Random Variables

1.1 Review of Probability Theory

1 Sequences of events and their limits

Construction of a general measure structure

IEOR 6711: Stochastic Models I SOLUTIONS to the First Midterm Exam, October 7, 2008

Estimates for probabilities of independent events and infinite series

Lecture 5: Expectation

Brownian Motion and Conditional Probability

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

1 Measurable Functions

Chapter 2: Random Variables

University of Regina. Lecture Notes. Michael Kozdron

2 Measure Theory. 2.1 Measures

Measure and Integration: Concepts, Examples and Exercises. INDER K. RANA Indian Institute of Technology Bombay India

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Measure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond

The main results about probability measures are the following two facts:

Random Models. Tusheng Zhang. February 14, 2013

X 1 ((, a]) = {ω Ω : X(ω) a} F, which leads us to the following definition:

5 Measure theory II. (or. lim. Prove the proposition. 5. For fixed F A and φ M define the restriction of φ on F by writing.

Part II Probability and Measure

Exercise 1. Show that the Radon-Nikodym theorem for a finite measure implies the theorem for a σ-finite measure.

Chapter 5. Means and Variances

Lebesgue-Radon-Nikodym Theorem

Chapter 6. Integration. 1. Integrals of Nonnegative Functions. a j µ(e j ) (ca j )µ(e j ) = c X. and ψ =

Math 493 Final Exam December 01

Theory of probability and mathematical statistics

MEASURE AND INTEGRATION. Dietmar A. Salamon ETH Zürich

The Lebesgue Integral

Indeed, if we want m to be compatible with taking limits, it should be countably additive, meaning that ( )

Chapter 8. General Countably Additive Set Functions. 8.1 Hahn Decomposition Theorem

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

Chapter 1. Measure Spaces. 1.1 Algebras and σ algebras of sets Notation and preliminaries

1 Presessional Probability

Math 5051 Measure Theory and Functional Analysis I Homework Assignment 2

Introduction to Stochastic Processes

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory

Lecture 4: Probability and Discrete Random Variables

Basics of Stochastic Analysis

MATH 418: Lectures on Conditional Expectation

Almost Sure Convergence of a Sequence of Random Variables

Lecture 3: Probability Measures - 2

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Chapter 4. The dominated convergence theorem and applications

RS Chapter 1 Random Variables 6/5/2017. Chapter 1. Probability Theory: Introduction

+ 2x sin x. f(b i ) f(a i ) < ɛ. i=1. i=1

Product measures, Tonelli s and Fubini s theorems For use in MAT4410, autumn 2017 Nadia S. Larsen. 17 November 2017.

1 Probability theory. 2 Random variables and probability theory.

Part II: Markov Processes. Prakash Panangaden McGill University

Probability Theory and Statistics. Peter Jochumzen

12 Expectations. Expectations 103

IEOR 3106: Introduction to Operations Research: Stochastic Models. Professor Whitt. SOLUTIONS to Homework Assignment 2

Section 2: Classes of Sets

Filtrations, Markov Processes and Martingales. Lectures on Lévy Processes and Stochastic Calculus, Braunschweig, Lecture 3: The Lévy-Itô Decomposition

STAT 516 Midterm Exam 2 Friday, March 7, 2008

STAT 712 MATHEMATICAL STATISTICS I

Exercises Measure Theoretic Probability

Transcription:

Elementary Probability Exam Number 38119

2 1. Introduction Consider any experiment whose result is unknown, for example throwing a coin, the daily number of customers in a supermarket or the duration of a phone call in a service office. Each of these experiments has a more or less wide variety of possible results. The set of all these results is called result space and denoted Ω. In the examples above we have Ω 1 = {head, number}, Ω 2 = N and Ω 3 = (0, ). We cannot forecast for certain, which result the experiment will have, but we can tell something about the probability of certain results ω Ω. Often we are not interested in single results but in subsets A Ω containing several results. We denote the set containing all interesting events A Ω as A. The A are called events. Definition 1.1. A set A P (Ω) is called σ-algebra over Ω if (1) Ω A, (2) A c A for all A A and (3) A n A for all (A n ) A N. For E P (Ω) let A (E) the smallest σ-algebra containing E, that is A (E) = A. {A A is σ-algebra and A E} In Theorem 1.5 we will see that it is not always possible to choose A = P (Ω). For countable Ω we will usually choose A = P (Ω), but for uncountable Ω R 1 we need the Borel σ-algebra B 1 = A ( (a, b] a, b R 1) or some sub-σ-algebra of it. For Ω R n we have the n dimensional Borel σ- algebra B n = A ({ n i=1b i B i B}). On A we can now define a probability distribution P. We want P to be realistic ; therefore we require some basic properties for it. Definition 1.2. A function P : A [0, 1] is called probability distribution or probability measure for A if (1) P (Ω) = 1 and (2) P ( A ) n = P (A n) for all pairwise disjoint (A n ) A N. Definition 1.3. The triplet (Ω, A, P) where A is a σ-algebra over Ω and P is a probability distribution for A is called probability space. We will show some basic properties of probability distributions in the following Lemma 1.4. Let A, B A and (A n ) A N. Then we have (1) P (A c ) = 1 P (A), (2) P (A B) = P (A) + P (B) P (A B),

3 (3) P (A) P (B) if A B, (4) P (A \ B) = P (B) P (A) if A B and (5) P (lim A n ) = lim P (A n ) if (A n ) is isotonic or antitonic. Proof. The properties (1) (4) follow from the definition of a probability distribution. To show (5) we assume first that (A n ) is isotonic. Define A 0 = and B n = A n \ A n 1 for n N. Then the B n are pairwise disjoint and we have B n. Therefore we have ( ( ) P lim A n = P lim k i=1 A n = ) ( ) A n = P B n = P (A n \ A n 1 ) = lim k n=1 P (B n ) = n=1 (P (A n ) P (A n 1 )) = lim P (A k ). Now let (A n ) be antitonic. Then (A c n) is isotonic and we have ( ) ( ) ( ) P lim A n = P A n = 1 P A c n = 1 lim P (A c n) = lim P (1 A c n) = lim P (A n ). Theorem 1.5. There exists no function P : P ([0, 1]) [0, 1] such that (1) P ([a, b]) = b a for 0 a b 1, (2) P is translation invariant, that is P (A + x) = P (A) for all A P ([0, 1]), x [0, 1] such that A + x P ([0, 1]) and (3) P ( A ) n = P (A n) for all pairwise disjoint (A n ) P ([0, 1]) N. Proof. Suppose this function P does exist. Consider the relation on [1/3, 2/3] with x y x y Q. We have x x for all x [0, 1] as x x = 0 Q. Therefore is reflexive. The relation is symmetric as y x = (x y) Q whenever x y Q. It is also transitive as from x y Q and y z Q follows x z = (x y) + (y z) Q. Hence is an equivalence relation. According to Zorn s Lemma there exists a set A P ([0, 1]) containing exactly one element of each equivalence class. Let the sequence (q n ) count through the rational numbers in the interval [ 1/3, 1/3] and let A n = A + q n

4 for all n N. We now show that the A n are pairwise disjoint. Let a A k A l for some k, l N. This means that both a q k and a q l. Since is transitive, we also have q k q l and therefore A k = A l. Let x [2/5, 3/5]. Then can we find some a A such that x a. Hence we find some m N such that x a = q m. It follows that x A + q m. Therefore we have [2/5, 3/5] A n [0, 1]. It follows ( ) 1 P A n = P (A n ) = P (A), which means P (A) = 0. This leads to the contradiction ( ) 1/5 P A n = 0. 2. Examples for Probability Spaces In this section we will examine some examples for probability spaces. First we will look at discrete probability distributions, that is Ω is countable. In this case we can define p ω = P (ω) = P ({ω}) for all ω Ω; it is called density, in this special case of a countable result space we call it discrete density function. The p ω are sufficient to describe a distribution, since for every A A we have P (A) = ω A p ω. The most simple distribution for finite Ω and A = P (Ω) is given by P (A) = A / Ω for all A A. The fact that P is a distribution follows directly from the definition. It is called Laplace distribution. Let p [0, 1], Ω = {1,..., n}, A = P (Ω) and p i = ( n i) p i (1 p) n i for 1 i n. As n p i = (p + (1 p)) n = 1, i=0 we have that (p i ) i Ω is a density. Therefore (Ω, A, P) is a probability space where P is the corresponding distribution. This distribution is called binomial distribution and denoted b (n, p). For p (0, 1], Ω = N and A = P (Ω) we can define a density by p i = p (1 p) i 1 for i N since p i = p (1 p) i = 1. 1 p i=1 i=1

The corresponding distribution is called geometrical distribution and denoted geo (p). Let a be a positive real number, Ω = N 0, A = P (Ω) and p i = e a a i /i! for i N 0. We have n n p i = e 1 a i i! = 1. i=0 i=0 Therefore the p i form a density. The corresponding distribution is called Poisson distribution and denoted P oi (a). Now we will give some examples for continuous distributions. In this case we have Ω = R 1 and A = B 1. Given a Riemann-integrable function f : R 1 R 1 with f (x) dx = 1 we define P (A) = f (x) dx, which is therefore A a distribution. The function f is called its density or probability density function. A useful tool to describe continuous densities is the characteristic function { 1 A : Ω R 1 1 ω A, ω 0 ω A for some A A. The function f = 1 [a,b] / (b a), where a < b, defines a continuous density since f (x) dx = 1 b 1 dx = 1. b a Its corresponding distribution is called uniform distribution and denoted U [a, b]. Let the function f be defined as f (x) = λe λx 1 [0, ) (x) for some positive real λ. Since f (x) dx = λ 0 a e λx dx = 1, we have a density. Its corresponding distribution is called exponential distribution and denoted exp (λ). The normal distribution is denoted N (µ, σ 2 ) and has the density f (x) = ( ) 1 2πσ e (x µ) 2 /2σ 2. For f (x) dx = 1 see [Sch]. More examples for probability distributions can be found in [Sch] and in appendix A. Both discrete and continuous distributions are special cases of a more general case. A basic knowledge of measure theory, which can be found in [Bau], is needed to define the general probability distribution. Definition 2.1. Let Ω be an arbitrary set with a σ-algebra A. Let µ be a measure on A. A nonnegative µ-integrable function f with f dµ = 1 is called µ-density 5

6 of the probability measure P with P (A) = A f dµ. (2.1) The Theorem of Radon-Nikodym (see [Bau]) yields for continuous distributions f = d P /dλ 1, where λ 1 is the one-dimensional Lebesgue measure, and for discrete distributions f = d P /dµ, where µ is the counting measure on Ω. Inserting this into (2.1) gives us P (A) = d P = 1 A d P. A 3. Random Variables and Random Vectors In an experiment the complete result is very often not important or not interesting. For example, when throwing n coins we might not necessarily be interested in the sequence of heads and numbers ω = (ω 1,... ω n ) but in the number of heads X (ω) = {ω i ω i is head} or in the average of throwing a head X (ω) = X (ω) /n. Definition 3.1. Let (Ω 1, A 1, P) be a probability space and A 2 a σ-algebra over the set Ω 2. A function X : Ω 1 Ω 2 is called random variable if X 1 (A) A 1 for all A A 2. If X is a random variable we write X : (Ω 1, A 1 ) (Ω 2, A 2 ). If Ω 2 R n with n 2, then X is also called random vector. For A A 2 we define P X (A) = P ( X 1 (A) ) = P (X A) = P ({ω Ω 1 X (ω) A}). Lemma 3.2. The function P X is a probability distribution on A 2. This gives us the new probability space ( Ω 2, A 2, P X). Proof. We have P X (Ω 2 ) = P (X 1 (Ω 2 )) = P (Ω 1 ) = 1, which is the first condition of a probability distribution. Let (A n ) A N 2 pairwise disjoint. Then we have ( ) ( ( )) ( ) P X A n = P X 1 A n = P X 1 (A n ) = P X (A n ). P ( X 1 (A n ) ) = If P X is b (n, p) distributed we can also write X b (n, p). We use the same notation for the other distributions introduced in Section 2 and appendix A.

4. Independence Definition 4.1. Let (Ω, A, P) be a probability space. The events A 1,..., A n are called independent if P (A i1 A ik ) = P (A i1 ) P (A ik ) for all 1 i 1 <... < i k n. The sequence (A n ) A N is called independent if A 1,..., A n is independent for all n. With this definition we can formulate and prove the following Theorem 4.2 (Borel-Cantelli). Let (Ω, A, P) be a probability space and (A n ) A N. If n=1 P (A n) < then P (lim sup A n ) = 0. If (A n ) is independent and n=1 P (A n) = then P (lim sup A n ) = 1. Proof. Consider ( ) lim P A n lim Lemma 1.4 yields ( ) ( P lim sup A n = P k=1 P (A n ) = 0. A n ) = lim P ( ) A n = 0. To prove the second statement consider (A c n), which is independent, as (A n ) is independent as well. From this follows ( ) ( ) P lim sup A n = 1 P lim inf Ac n = ( ) 1 lim P A c n = 1 lim (1 P (A n )) = 1. To see the last equality let p n = P (A n ) for n N. If p n = 1 for infinitely many n this equality is trivial. So let p n < 1 for all n k for some k N. Therefore, as log x x 1 for positive x, we have ( ) ( ) (1 p n ) = exp log (1 p n ) exp p n = 0, as p n =. Consider, for example, throwing a coin infinitely often. Then Ω = {0, 1} N. We want to know the probability to throw infinitely often a 1. Let A n = 7

8 {ω Ω ω i = 1}, which are independent. Therefore we have P (lim sup A n ) = 1, as n=1 A n = 1 n=1 =. 2 Similarly to the independence of events we can define the independence of random variables (X n ). Such a sequence is called independent if the events (Xn 1 (A n )) are independent for every series (A n ) with A n X n (A). In [Sch] it is proved that if the above statement holds for events A n from intersection stable generators of the X n (A) then (X n ) is still independent. Therefore we have the following easy to check criteria for independence for discrete and continuous distributions. The sequence (X n ) of discrete random variables is independent iff ( ) P X n = x n = P (X n = x n ) for all x n. The sequence (X n ) of continuous random variables is independent iff ( ) P X n x n = P (X n x n ) for all x n. 5. Expected Value and Variance Often it is of high interest to know the mean expected value of a random variable, for example when calculation the fair price for a game or forecasting prices on the stock market. In this article we will only examine the expected value of random variables X : (Ω, A) (R 1, B 1 ), although it is possible to define the expected value also for complex random variables and random vectors. Definition 5.1. If the expression EX = exists it is called the expected value of X. X d P = id d P X R 1 For discrete random variables follows immediately EX = n=1 x n P (X = x n ) and for continuous random variables, EX = R 1 xf (x) dx. The following example demonstrates that the expected value must not necessarily exist. Consider a series of an infinitely often thrown coin. If the coin shows 1 in the n th throw we get 2 n. The game ends with the first 0 being thrown. Let (X n ) be independent identically b ( 1, 1 2) distributed on {0, 1} N. For ω Ω let Y (ω) be the first position at which a 0 is thrown. Hence we have P (Y = n) = 1/2 n. Let X denote the amount of money we win, that is X =

2 Y 1. The expected amount of money to win from this game would therefore be EX = (2 n 1) P (X = 2 n 1) = (2 n 1) P (Y = n) = n=0 n=0 (2 n 1) 1 2 = (1 12 ) =. n n n=0 We say for an statement A to be P-almost sure and denote this by [P] if there exists a set N A such that P (N) = 0 and A is true for all ω N c. With this notation we have the following Lemma 5.2. If EX and EY exist then (1) E (αx + βy ) = αex + βey for all α, β R 1, (2) Eα = α for all α R 1, (3) EX EY if X Y [P] and (4) EX = EY if X = Y [P]. These properties follow directly from the general properties of the measure integral, which can be found in [Bau] and are basically the same as for sums and Riemann integrals. Another value of interest is how much a random variable is expected to differ from its expected value. Definition 5.3. If V ar X = E (X EX) 2 exists it is called the variance of X. We call V ar X the standard deviation of X. Some examples for the expected value and the variance are in the tables in appendix A. 6. Conclusion In this article we have been introduced to probability theory with almost no references to measure theory. Measure and integration theory provides the key to understand probability theory. The first two chapters of [Bau] give a very good introduction into this topic. Further studies of probability theory could include taking a closer look at the expected value and the variance, proving the Chebyshev inequality P ( X a ε) V ar X ε 2 and exmining the convergence properties of sequences of random variables. In [Sch] we can find a wide variety of topics about probability theory such as conditional probabilities, densities of random vectors, sums of random variables, characteristic functions, the Central Limit Theorem, the Laws of Large Numbers, stochastic processes and martingals. It provides the knowledge to go deeper into n=0 9

10 special topics like simulated annealing and genetic algorithms, which are a very good alternative in approaching np complete problems like travelling salesman and find approximately optimal solutions, or stochastic differential equations and Itô integrals, which can be used to forecast prices on the stock market. Appendix A. Overview of Some Probability Distributions A.1. Discrete Distributions. binomial distribution Notation X b (n, p) Density p i = ( n i) p i (1 p) n i for 1 i n Expected Value EX = np Variance V ar X = np (1 p) geometrical distribution Notation X geo (p) Density p i = p (1 p) i 1 for i N Expected Value EX = 1 p Variance V ar X = 1 p p 2 Poisson distribution Notation X P oi (a) Density p i = e a ai i! Expected Value EX = a for i N 0 Variance V ar X = a negative binomial distribution Notation X b (n, p) Density p i = ( n+i 1) i p n (1 p) i for i N 0 Expected Value EX = n(1 p) p Variance V ar X = n(1 p) p 2 A.2. Continuous Distributions. uniform distribution Notation X U [a, b] Density f (x) = 1 b a 1 [a,b] (x) Expected Value EX = a+b 2 Variance V ar X = (b a)2 12 exponential distribution Notation X exp (λ) Density f (x) = λe λx 1 [0, ) (x) Expected Value EX = 1 λ Variance V ar X = 1 λ 2 normal distribution Notation X N ( µ, σ 2) Density f (x) = 1 e (x µ)2 /2σ 2 2πσ Expected Value EX = µ Variance V ar X = σ 2 Chi-squared distribution Notation X χ 2 1 Density f (x) = 1 e x/2 1 2πx (0, ) (x) Expected Value EX = 1 Variance V ar X = 3 References [Bau] H. Bauer, Maß- und Integrationstheorie (degruyter 1992) [Sch] N. Schmitz, Vorlesungen über Wahrscheinlichkeitstheorie (Teubner 1996)