Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Similar documents
Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Recitation 2: Probability

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

Random variables. DS GA 1002 Probability and Statistics for Data Science.

Probability Review. Yutian Li. January 18, Stanford University. Yutian Li (Stanford University) Probability Review January 18, / 27

Deep Learning for Computer Vision

Algorithms for Uncertainty Quantification

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Random Variables. Definition: A random variable (r.v.) X on the probability space (Ω, F, P) is a mapping

Northwestern University Department of Electrical Engineering and Computer Science

1 Presessional Probability

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

Probability Theory for Machine Learning. Chris Cremer September 2015

Origins of Probability Theory

Brief Review of Probability

Lecture 2: Repetition of probability theory and statistics

Discrete Random Variables

Sample Spaces, Random Variables

1.1 Review of Probability Theory

A Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes.

Chapter 2: Random Variables

SDS 321: Introduction to Probability and Statistics

Recap of Basic Probability Theory

LECTURE 1. 1 Introduction. 1.1 Sample spaces and events

Lecture Notes 1 Basic Probability. Elements of Probability. Conditional probability. Sequential Calculation of Probability

Introduction to Stochastic Processes

Recap of Basic Probability Theory

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Lecture 1: Probability Fundamentals

Quick Tour of Basic Probability Theory and Linear Algebra

Lecture 1: Review on Probability and Statistics

Review of Probability Theory

Probability theory. References:

ECE 302: Probabilistic Methods in Electrical Engineering

Probability Review. Gonzalo Mateos

Continuous Probability Spaces

Chapter 1. Sets and probability. 1.3 Probability space

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 3 9/10/2008 CONDITIONING AND INDEPENDENCE

Random Variables Example:

EE 178 Lecture Notes 0 Course Introduction. About EE178. About Probability. Course Goals. Course Topics. Lecture Notes EE 178

Basics on Probability. Jingrui He 09/11/2007

CMPSCI 240: Reasoning about Uncertainty

Probability Theory Review

Statistical Inference

Lecture Notes 2 Random Variables. Discrete Random Variables: Probability mass function (pmf)

1 Random Variable: Topics

2.1 Elementary probability; random sampling

Lecture 2: Review of Basic Probability Theory

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

DS-GA 1002 Lecture notes 2 Fall Random variables

Conditional Probability

Fundamental Tools - Probability Theory II

Introduction to Probability and Stocastic Processes - Part I

System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models

Review of Probability. CS1538: Introduction to Simulations

Sample Space: Specify all possible outcomes from an experiment. Event: Specify a particular outcome or combination of outcomes.

Review of Basic Probability Theory

Probability. Lecture Notes. Adolfo J. Rumbos

7 Random samples and sampling distributions

the time it takes until a radioactive substance undergoes a decay

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

3. Review of Probability and Statistics

Single Maths B: Introduction to Probability

EE/CpE 345. Modeling and Simulation. Fall Class 5 September 30, 2002

STAT Chapter 5 Continuous Distributions

18.175: Lecture 2 Extension theorems, random variables, distributions

where r n = dn+1 x(t)

Topic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr.

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

Lecture 11. Probability Theory: an Overveiw

SDS 321: Introduction to Probability and Statistics

Week 2. Review of Probability, Random Variables and Univariate Distributions

Given a experiment with outcomes in sample space: Ω Probability measure applied to subsets of Ω: P[A] 0 P[A B] = P[A] + P[B] P[AB] = P(AB)

Math Review Sheet, Fall 2008

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Appendix A : Introduction to Probability and stochastic processes

STAT 302 Introduction to Probability Learning Outcomes. Textbook: A First Course in Probability by Sheldon Ross, 8 th ed.

Relationship between probability set function and random variable - 2 -

More on Distribution Function

Guidelines for Solving Probability Problems

Topic 6 Continuous Random Variables

Lecture Notes 3 Multiple Random Variables. Joint, Marginal, and Conditional pmfs. Bayes Rule and Independence for pmfs

Bandits, Experts, and Games

Laws of Probability, Bayes theorem, and the Central Limit Theorem

Discrete Random Variable

PROBABILITY THEORY. Prof. S. J. Soni. Assistant Professor Computer Engg. Department SPCE, Visnagar

Chapter 2 Random Variables

STAT 712 MATHEMATICAL STATISTICS I

Lecture 3. Discrete Random Variables

Lecture 11: Random Variables

Chapter 1 Statistical Reasoning Why statistics? Section 1.1 Basics of Probability Theory

Brief Review of Probability

1: PROBABILITY REVIEW

9/6. Grades. Exam. Card Game. Homework/quizzes: 15% (two lowest scores dropped) Midterms: 25% each Final Exam: 35%

EE514A Information Theory I Fall 2013

Dept. of Linguistics, Indiana University Fall 2015

STAT 516: Basic Probability and its Applications

DISCRETE RANDOM VARIABLES: PMF s & CDF s [DEVORE 3.2]

6.1 Moment Generating and Characteristic Functions

Transcription:

ECE 6010 Lecture 1 Introduction; Review of Random Variables Readings from G&S: Chapter 1. Section 2.1, Section 2.3, Section 2.4, Section 3.1, Section 3.2, Section 3.5, Section 4.1, Section 4.2, Section 4.4, Section 4.5 Why study probability? 1. Communication systems. Noise. Information. 2. Control systems. Noise in observations. Noise in interference. 3. Computer systems. Random loads. Networks. Random packet arrival times. Probability can become a powerful engineering tool. One way of viewing it is as quantified common sense. Great success will come to those whose tool is sharp! Set theory Probability is intrinsically tied to set theory. We will review some set theory concepts. We will use c to denote complementation of a set with respect to its universe. union: A B is the set of elements that are in A or B. intersection: A B is the set of elements that are in A and B. We will also denote this as AB. a A: a is an element of the set A. A B: A is a subset of B. A B: A B and B A. Note that A A c (where is the universe). Notation for some special sets: R set of all real numbers Z set of all integers Z set of all positive integers N set of all natural numbers, 0,1,2,, R n set of all n tuples of real numbers C set of complex numbers Definition 1 A field (or algebra) of sets is a collection of sets that is closed under complementation and finite union. That is, if F is a field and A F, then A c must also be in F (closed under complementation). If A and B are in F (which we will write as A, B F) then A B F. Note: the properties of a field imply that F is also closed under finite intersection. (DeMorgan s law: AB (A c B c ) c ) Definition 2 A σ -field (or σ -algebra) of sets is a field that is also closed under countable unions (and intersections). What do we mean by countable? A set with a finite number in it is countable. A set whose elements can be matched one-for-one with Z is countable (even if it has an infinite number of elements!) Are there non-countable sets? Note: For any collection F of sets, there is a σ -field containing F, denoted by σ (F). This is called the σ -field generated by F.

ECE 6010: Lecture 1 Introduction; Review of Random Variables 2 Definition of probability We now formally define what we mean by a probability space. A probability space has three components. The first is the sample space, which is the collection of all possible outcomes of some experiment. The outcome space is frequently denoted by. Example 1 Suppose the experiment involves throwing a die. 1, 2, 3, 4, 5, 6 We deal with subsets of. For example, we might have an event which is all even throws of the die or all outcomes 4. We denote the collection of subsets of interest as F. The elements of F (that is, the subsets of ) are called events. F is called the event class. We will restrict F to be a σ -field. Example 2 Let 1, 2, 3, 4, 5, 6, and let F 1, 2, 3, 4, 5, 6, 2, 4, 6, 1, 3, 5,... (What do we need to finish this off?) Example 3 1, 2, 3. We could take F as the set of all subsets of : F, 1, 2, 3, 1, 2, 1, 3, 2, 3, 1, 2, 3 This is frequently denoted as F 2, and is called the power set of. Example 4 R. F is restricted to something smaller than all subsets of. (So that probabilities can be applied consistently.) F could be the smallest σ -field which contains all intervals of R. This is called the Borel field, B. The tuple (, F) is called a pre-probability space (because we haven t assigned probabilities yet). This brings us to the third element of a probability space: Given a preprobability space (, F), a probability distribution or a measure on (, F) is a mapping P from F to R (which we will write: P : F R) with the properties: P( ) 1 (this is a normalization that is always applied for probabilities, but there are other measures which don t use this) P( A) 0 A If A 1, A 2,... mutually exclusive ) then F. (Measures are nonnegative) F such that A i A j for all i j (that is, the sets are disjoint or P( i 1 A i) i 1 This is called the σ -additive, or additive, property. P(A i ). These three properties are called the axioms of probability. The triple (, F, P) is called a probability space. tells what individual outcomes are possible F tells what sets of outcomes events are possible. P tells what the probabilities of these events are. Some properties of probabilities (which follow from the axioms):

ECE 6010: Lecture 1 Introduction; Review of Random Variables 3 1. P(A c ) 1 P(A). 2. P( ) 0 3. A B P(A) P(B). 4. P(A B) P(A) P(B) P(AB) 5. If A 1, A 2,..., F then P( i 1 A i) i 1 P(A i) There is also the continuity of probability property. Suppose A 1 A 2 A 3 (this is called an increasing sequence). Define We write this as A n (a decreasing sequence), define lim n A n i 1 A i A. A (A n converges up to A). Similarly, if lim n A n A 1 A 2 A 3 i 1 A i A n A. If A n A, then P(A n ) P(A). If A n A, then P(A n ) P(A). Example 5 Let (, F) (R, B). 1. Take A n (, 1/n], n=1,2,. Then A n (, 0]. So P((, 0]) lim n P((, 1/n]) lim x 0 P((, x)). 2. Take A n (, 1/n], n 1, 2,...,. Then A n (, 0). P((, 0)) lim n P((, 1/n]) lim x 0 P((, x]). We will introduce more properties later. Some examples of probability spaces 1. w 1, w 2,..., w n (a discrete set of outcomes). F 2. Let p 1, p 2,..., p n be a n set of nonnegative numbers satisfying i 1 p i 1. Define the function P : F R by P(A) p i. Then (, F, P) is a probability space. w i A 2. Uniform distribution: Let [0, 1], F B [0,1] =smallest σ -field containing all intervals in [0,1]. We can take (without proving that this actually works but it does) P([a, b]) b a for 0 a b 1, and P(unions of disjoint intervals) sum of probabilities of individual intervals.

ECE 6010: Lecture 1 Introduction; Review of Random Variables 4 Conditional probability and independence Conditional probability is perhaps the most important probability concept from an engineering point of view. It allows us to describe mathematically how our information changes when we are given a measurement. We define conditional probability as follows: Suppose (, F, P) is a probability space and A, B F with P(B) > 0. Define the conditional probability of A given B as P(A B) P(AB) P(B) Essentially what we are saying is that the sample space is restricted from down to B. Dividing by P(B) provides the correct normalization for this probability measure. Some properties of conditional probability (consequences of the axioms of probability) are as follows: 1. P(A B) 0. 2. P( B) 1 3. For A 1, A 2,... F with A i A j for i j, 4. AB 5. P(B B) 1 P(A B) 0. 6. A B P(A B) P(A) 7. B A P(A B) 1. P( i 1 A i B) i 1 P(A i B) Definition 3 A 1, A 2,..., A n F is a partition of if A i A j for i j, and n i 1 A i, and P(A i ) > 0. Example 6 Let 1, 2, 3, 4, 5, 6, and A 1 1, A 2 2, 5, 6, A 3 3, 4. The Law of Total Probability: If A 1,..., A n is a partition of and A P(A) n i 1 P(A A i )P(A i ) F, then (Draw picture). Bayes Formula is a simple formula for turning around the conditioning. Because conditioning is so important in engineering, Bayes formula turns out to be a tremendously important tool (even though it is very simple). We will see applications of this throughout the semester. Suppose A, B Why? F, P(A) 0 and P(B) 0. Then P(A B) P(B A)P(A). P(B) Definition 4 The events A and B are independent if P(AB) P(A)P(B).

ECE 6010: Lecture 1 Introduction; Review of Random Variables 5 Is independent the same as disjoint? Note: For P(B) > 0, if A and B are independent then P(A B) P(A). (Since they are independent, B can provide no information about A, so the probability remains unchanged. If P(B) 0, then B is independent of A for any other event A F. (Why?). Definition 5 A 1,..., A n F are independent if for each k 2,..., n and each subset i 1,..., i k of 1,..., n, Example 7 Take n P( k j 1 A i j ) 3. Independent if: k j 1 P(A i j ). P(A 1 A 2 ) P(A 1 )P(A 2 ) P(A 1 A 3 ) P(A 1 )P(A 3 ) P(A 2 A 3 ) P(A 2 )P(A 3 ) and P(A 1 A 2 A 3 ) P(A 1 )P(A 2 )P(A 3 ). The next idea is important in a lot of practical problem of engineering interest. Definition 6 A 1 and A 2 are conditionally independent given B F if P(A 1 A 2 B) P(A 1 B)P(A 2 B) (draw picture to illustrate the idea). Random variables Up to this point, the outcomes in could be anything: they could be elephants, computers, or mitochondria, since is simple expressed in terms of sets. But we frequently deal with numbers, and want to describe events associated with sets of numbers. This leads to the idea of a random variable. Definition 7 Given a probability space (, F, P), a random variable is a function X mapping to R. (That is, X : R), such that for each a R, ω : X (ω) a F. A function X : R such that w : X (ω) a F, that is, such that the events involved are in F, is said to be measurable with respect to F. That is, F is divided into sufficiently small pieces that the events in it can describe all of the sets associated with X. Example 8 Let 1, 2, 3, 4, 5, 6, F 1, 2, 3, 4, 5, 6,,. Define 0 ω odd X (ω) 1 ω otherwise Then for X to be a random variable, we must have ω X (ω) a

ECE 6010: Lecture 1 Introduction; Review of Random Variables 6 to be an event in F. ω X (ω) a a < 0 1, 3, 5 0 a < 1 a 1 So X is not a random variable it not measurable with respect to F. The field F is too coarse to measure if ω is odd. On the other hand, let us now define X (ω) 0 ω 3 1 ω > 3 Is this a random variable? We observe that a random variable cannot generate partitions of the underlying sample space which are not events in the σ -field F. Another way of saying that X is measurable: For any B B, ω : X (ω) B F. Recalling the idea of Borel sets B associated with the real line, we see that a random variable is a measurable function from (, F) to (R, B): X : (, F) (R, B). We will abbreviate the term random variable as r.v.. We will use a notational shorthand for random variables. Definition 8 Suppose (, F, P) is a probability space and X : (, F) each B B we define (R, B). For P(X B) P( ω : X (ω) B ). By this definition, we can identify a new probability space. Using (R, B) as the preprobability space, we use the measure P X (B) P(X B). So we get the probability space (R, B, P X ). As a matter of practicality, if the sample space is R, with the Borel field, most mappings to R will be random variables. To summarize: Ẍ (, F, P) (R, B, PX ) where for B B. Distribution functions P X (B) P( ω X (ω) B ) The cumulative distribution function (cdf) of an r.v. X is defined for each a F X (a) P(X a) P( ω X (ω) a ) P X ((, a]) R as Properties of cdf:

ECE 6010: Lecture 1 Introduction; Review of Random Variables 7 1. F X is non-decreasing: If a < b then F X (a) F X (b). 2. lim a F X (a) 1 3. lim a F X (a) 0. 4. F X is right-continuous: lim b a F X (b) F X (a). Draw typical picture. These four properties completely characterize the family of cdfs on the real line. Any function which satisfies these has a corresponding probability distribution. 5. For b > a: P(a < X b) F X (b) F X (a). 6. P(X a 0 ) F X (a 0 ) lim a a0 F x(a). Thus, if F X is continuous at a 0, P(X a 0 ) 0. From these properties, we can assign probabilities to all intervals from knowledge of the cdf. Thus we can extend this to all Borel sets. Thus F X determines a unique probability distribution on (R, B), so F X and P X are uniquely related. Pure types of r.v.s 1. Discrete r.v.s an r.v. whose possible values can be enumerated. 2. Continuous r.v.s an r.v. whose distribution function can be written as the (regular) integral of another function 3. Singular, but not discrete Any other r.v. Discrete r.v.s A random variable that can take on at most a countable number of possible values is said to be a discrete r.v.: X : x 1, x 2,... Definition 9 For a discrete r.v. X, we define the probability mass function (pmf) (or discrete density function) by p X (a) P(X a) a R where p X (a) 0 if a x i for any r.v. outcome x i. Properties of pmfs: 1. Nonnegativity: p X (a) 0 a x 1, x 2,... 0 else

ECE 6010: Lecture 1 Introduction; Review of Random Variables 8 2. Total probability: i 1 p X (x i ) 1. (These two properties completely characterize the class of all pdfs on a given set x 1, x 2,....) 3. Relation to cdf: p X (a) F X (a) lim F X (b) b a (To the pdf and the cdf contain the same information. Note that for a discrete r.v., the cdf is piecewise constant. Draw picture) 4. F X (a) x i x i a p X (x i ) Example 9 Bernoulli with parameter π, (0 π 1). X 0, 1, p X (a) F X (a) 1 π a 0 π a 1 0 otherwise. 0 a < 0 1 π 0 a 1 1 a 1 The Bernoulli is often used to model bits, bit errors, random coin flips, etc. Example 10 Binomial (n, π), (0 π 1, n Z ). p X (a) X 0, 1,..., n. n! (n a)!a! π a (1 π) n a a 0, 1,..., n 0 otherwise The binomial (n, π) can be viewed as the sum of n repeated Bernoulli trials. We can ask such questions as: what is the probability of k bits out of n being flipped, if the probability of an individual bit being flipped is π? How do we show the total probability property? Use the binomial theorem: (a b) n n k 0 n k a k b n k (Plot the probability function.) Example 11 Poisson λ: X : N λ p X (a) a e λ a! a N 0 otherwise The Poisson distribution is often used to model the number of rare events occurring in a length of time. (For example, what is the number of photon emmissions from a substance over a period of time. What is the number of cars passing on a road.) Later in the course we will clarify the assumptions and derive this expression. How do we find P(X k)? k 0

ECE 6010: Lecture 1 Introduction; Review of Random Variables 9 Continuous r.v.s A r.v. X is said to be continuous is there is a function f X : R F X (a) f X (x) dx a R such that for all a R. In this case, F X (a) is an absolutely continuous function. 1 Properties: 1. f X (x) 0 2. f X (x) dx 1. These two properties completely characterize the f X s. f X called the probability density function (pdf) of X. 3. P(X B) B f X(x) dx P X (B), B B. 4. P(X a) 0 for all a R. 5. P(X [x 0, x 0 x] f (x 0 ) x. P(X dx) P X (dx) d P f (x)dx. Example 12 Uniform on (α, β) (with β > α): f X (x) Plot pdf and cdf. Uses: random number. Phase distribution. 1 β α α < x < β (or α x β) 0 otherwise. Example 13 Exponential(λ). (λ > 0). f X (x) λe λx x 0 0 otherwise. Plot pdf and cdf. Uses: Waiting time. (We ll see later what we mean by this.) Example 14 Gaussian (normal) N(µ, σ 2 ): f X (x) 1 2πσ exp( (x µ)2 /2σ 2 ) Plot. Uses: Noise. Sums of random variables. This is the most important distribution we will deal with! The cdf: Let Z N(0, 1). Define: We also use (x) F Z (x) Q(x) 1 (x) P(Z > x) x 1 2π e z2 /2 dz 1 2π x e z2 /2 dz 1 A function F is said to be absolutely continuous if for every ɛ > 0 there exists a δ such that for each finite collection of nonoverlapping intervals [a i, b i ], i 1,..., k, k i 1 F(b i ) F(a i ) < ɛ if k i 1 b i a i < δ [1, p. 433]. Being absolutely continuous is a stronger property than being continuous.

ECE 6010: Lecture 1 Introduction; Review of Random Variables 10 Properties of Gaussian Random Variables Let X N(µ, σ 2 ). 1. If Y αx β then Y N(αµ β, α 2 σ 2 ) 2. (X µ)/σ N(0, 1) 3. F X (a) ((a µ)/σ ). 4. ( x) 1 (x). 5. A Gaussian r.v. is completely characterized by its first two moments (i.e., the mean and variance). We will also see, and make use of, many other properties of Gaussian random processes, such as the fact that an uncorrelated Gaussian r.p. is also independent. References [1] P. Billingsley, Probability and Measure. New York: Wiley, 1986.