Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Similar documents
Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 3 Multiple Random Variables. Joint, Marginal, and Conditional pmfs. Bayes Rule and Independence for pmfs

Lecture Notes 2 Random Variables. Discrete Random Variables: Probability mass function (pmf)

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

1 Random Variable: Topics

Recitation 2: Probability

Lecture Notes 1 Basic Probability. Elements of Probability. Conditional probability. Sequential Calculation of Probability

Probability Review. Yutian Li. January 18, Stanford University. Yutian Li (Stanford University) Probability Review January 18, / 27

Random variables. DS GA 1002 Probability and Statistics for Data Science.

Northwestern University Department of Electrical Engineering and Computer Science

EE4601 Communication Systems

Review of Probability Theory

Introduction to Probability and Stocastic Processes - Part I

Lecture Notes 2 Random Variables. Random Variable

Quick Tour of Basic Probability Theory and Linear Algebra

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

Fundamental Tools - Probability Theory II

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

UCSD ECE 153 Handout #20 Prof. Young-Han Kim Thursday, April 24, Solutions to Homework Set #3 (Prepared by TA Fatemeh Arbabjolfaei)

Deep Learning for Computer Vision

Chapter 2: Random Variables

Algorithms for Uncertainty Quantification

EE 178 Lecture Notes 0 Course Introduction. About EE178. About Probability. Course Goals. Course Topics. Lecture Notes EE 178

Lecture 2: Repetition of probability theory and statistics

M378K In-Class Assignment #1

1 Presessional Probability

Introduction to Probability Theory

Probability Review. Gonzalo Mateos

1.1 Review of Probability Theory

Chapter 1. Sets and probability. 1.3 Probability space

Probability Theory Review

Solutions to Homework Set #3 (Prepared by Yu Xiang) Let the random variable Y be the time to get the n-th packet. Find the pdf of Y.

4 Pairs of Random Variables

Lecture 1: Review on Probability and Statistics

Basics on Probability. Jingrui He 09/11/2007

1: PROBABILITY REVIEW

Motivation and Applications: Why Should I Study Probability?

SDS 321: Introduction to Probability and Statistics

Discrete Random Variables

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

Lecture 3. Discrete Random Variables

Probability. Carlo Tomasi Duke University

Appendix A : Introduction to Probability and stochastic processes

1 Review of Probability and Distributions

Random Variables. Saravanan Vijayakumaran Department of Electrical Engineering Indian Institute of Technology Bombay

1 Joint and marginal distributions

DS-GA 1002 Lecture notes 2 Fall Random variables

Random Variables. Definition: A random variable (r.v.) X on the probability space (Ω, F, P) is a mapping

LECTURE 1. 1 Introduction. 1.1 Sample spaces and events

Continuous Random Variables and Continuous Distributions

Lecture 3: Random variables, distributions, and transformations

STAT 712 MATHEMATICAL STATISTICS I

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Lecture 11. Probability Theory: an Overveiw

Probability and Distributions

Origins of Probability Theory

Measure-theoretic probability

Formulas for probability theory and linear models SF2941

General Random Variables

Sample Spaces, Random Variables

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Statistics and Econometrics I

Brief Review of Probability

Chapter 2 Random Variables

EE514A Information Theory I Fall 2013

Stochastic Models of Manufacturing Systems

Multivariate distributions

SDS 321: Introduction to Probability and Statistics

ECE 302 Division 2 Exam 2 Solutions, 11/4/2009.

2.1 Elementary probability; random sampling

Week 2. Review of Probability, Random Variables and Univariate Distributions

2 Functions of random variables

X 1 ((, a]) = {ω Ω : X(ω) a} F, which leads us to the following definition:

Given a experiment with outcomes in sample space: Ω Probability measure applied to subsets of Ω: P[A] 0 P[A B] = P[A] + P[B] P[AB] = P(AB)

Notes 1 : Measure-theoretic foundations I

Introduction to Information Entropy Adapted from Papoulis (1991)

Probability Theory for Machine Learning. Chris Cremer September 2015

1 Probability and Random Variables

Chapter 2. Random Variable. Define single random variables in terms of their PDF and CDF, and calculate moments such as the mean and variance.

Quick review on Discrete Random Variables

Probability, Random Processes and Inference

Lecture 4: Probability and Discrete Random Variables

Continuous Random Variables

Chapter 5. Random Variables (Continuous Case) 5.1 Basic definitions

Random Variables and Their Distributions

Chap 2.1 : Random Variables

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 8 10/1/2008 CONTINUOUS RANDOM VARIABLES

Discrete Random Variable

Recap of Basic Probability Theory

3. Review of Probability and Statistics

Math Review Sheet, Fall 2008

2 (Statistics) Random variables

5. Conditional Distributions

conditional cdf, conditional pdf, total probability theorem?

Probability. Lecture Notes. Adolfo J. Rumbos

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Dept. of Linguistics, Indiana University Fall 2015

Transcription:

Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed Random Variables Scalar detection EE 278B: Probability and Random Variables 1 1 Probability Theory Probability theory provides the mathematical rules for assigning probabilities to outcomes of random experiments, e.g., coin flips, packet arrivals, noise voltage Basic elements of probability theory: Sample space Ω: set of all possible elementary or finest grain outcomes of the random experiment Set of events F: set of (all?) subsets of Ω an event A Ω occurs if the outcome ω A Probability measure P: function over F that assigns probabilities to events according to the axioms of probability (see below) Formally, a probability space is the triple (Ω,F,P) EE 278B: Probability and Random Variables 1 2

Axioms of Probability A probability measure P satisfies the following axioms: 1. P(A) 0 for every event A in F 2. P(Ω) = 1 3. If A 1,A 2,... are disjoint events i.e., A i A j =, for all i j then ( ) P A i = P(A i ) Notes: i=1 P is a measure in the same sense as mass, length, area, and volume all satisfy axioms 1 and 3 i=1 Unlike these other measures, P is bounded by 1 (axiom 2) This analogy provides some intuition but is not sufficient to fully understand probability theory other aspects such as conditioning and independence are unique to probability EE 278B: Probability and Random Variables 1 3 Discrete Probability Spaces A sample space Ω is said to be discrete if it is countable Examples: Rolling a die: Ω = {1,2,3,4,5,6} Flipping a coin n times: Ω = {H,T} n, sequences of heads/tails of length n Flipping a coin until the first heads occurs: Ω = {H,TH,TTH,TTTH,...} For discrete sample spaces, the set of events F can be taken to be the set of all subsets of Ω, sometimes called the power set of Ω Example: For the coin flipping experiment, F = {, {H}, {T}, Ω} F does not have to be the entire power set (more on this later) EE 278B: Probability and Random Variables 1 4

The probability measure P can be defined by assigning probabilities to individual outcomes single outcome events {ω} so that: P({ω}) 0 for every ω Ω P({ω}) = 1 ω Ω The probability of any other event A is simply P(A) = ω AP({ω}) Example: For the die rolling experiment, assign P({i}) = 1 6 for i = 1,2,...,6 The probability of the event the outcome is even, A = {2,4,6}, is P(A) = P({2})+P({4})+P({6}) = 3 6 = 1 2 EE 278B: Probability and Random Variables 1 5 Continuous Probability Spaces A continuous sample space Ω has an uncountable number of elements Examples: Random number between 0 and 1: Ω = (0,1] Point in the unit disk: Ω = {(x,y): x 2 +y 2 1} Arrival times of n packets: Ω = (0, ) n For continuous Ω, we cannot in general define the probability measure P by first assigning probabilities to outcomes To see why, consider assigning a uniform probability measure over (0, 1] In this case the probability of each single outcome event is zero How do we find the probability of an event such as A = [0.25,0.75]? EE 278B: Probability and Random Variables 1 6

Another difference for continuous Ω: we cannot take the set of events F as the power set of Ω. (To learn why you need to study measure theory, which is beyond the scope of this course) The set of events F cannot be an arbitrary collection of subsets of Ω. It must make sense, e.g., if A is an event, then its complement A c must also be an event, the union of two events must be an event, and so on Formally, F must be a sigma algebra (σ-algebra, σ-field), which satisfies the following axioms: 1. F 2. If A F then A c F 3. If A 1,A 2,... F then i=1 A i F Of course, the power set is a sigma algebra. But we can define smaller σ-algebras. For example, for rolling a die, we could define the set of events as F = {, odd, even, Ω} EE 278B: Probability and Random Variables 1 7 For Ω = R = (, ) (or (0, ), (0,1), etc.) F is typically defined as the family of sets obtained by starting from the intervals and taking countable unions, intersections, and complements The resulting F is called the Borel field Note: Amazingly there are subsets in R that cannot be generated in this way! (Not ones that you are likely to encounter in your life as an engineer or even as a mathematician) To define a probability measure over a Borel field, we first assign probabilities to the intervals in a consistent way, i.e., in a way that satisfies the axioms of probability For example to define uniform probability measure over (0, 1), we first assign P((a,b)) = b a to all intervals In EE 278 we do not deal with sigma fields or the Borel field beyond (kind of) knowing what they are EE 278B: Probability and Random Variables 1 8

Useful Probability Laws Union of Events Bound: ( n P i=1 A i ) n P(A i ) i=1 Law of Total Probability: Let A 1,A 2,A 3,... be events that partition Ω, i.e., disjoint (A i A j = for i j) and i A i = Ω. Then for any event B P(B) = i P(A i B) The Law of Total Probability is very useful for finding probabilities of sets EE 278B: Probability and Random Variables 1 9 Conditional Probability Let B be an event such that P(B) 0. The conditional probability of event A given B is defined to be P(A B) = P(A B) P(B) = P(A,B) P(B) The function P( B) is a probability measure over F, i.e., it satisfies the axioms of probability Chain rule: P(A,B) = P(A)P(B A) = P(B)P(A B) (this can be generalized to n events) The probability of event A given B, a nonzero probability event the a posteriori probability of A is related to the unconditional probability of A the a priori probability by P(A B) = P(B A) P(B) P(A) This follows directly from the definition of conditional probability EE 278B: Probability and Random Variables 1 10

Bayes Rule Let A 1,A 2,...,A n be nonzero probability events that partition Ω, and let B be a nonzero probability event We know P(A i ) and P(B A i ), i = 1,2,...,n, and want to find the a posteriori probabilities P(A j B), j = 1,2,...,n We know that P(A j B) = P(B A j) P(B) By the law of total probability n P(B) = P(A i,b) = i=1 Substituting, we obtain Bayes rule P(A j B) = P(A j ) n P(A i )P(B A i ) i=1 P(B A j ) n i=1 P(A i)p(b A i ) P(A j), j = 1,2,...,n Bayes rule also applies to a (countably) infinite number of events EE 278B: Probability and Random Variables 1 11 Independence Two events are said to be statistically independent if When P(B) 0, this is equivalent to P(A,B) = P(A)P(B) P(A B) = P(A) In other words, knowing whether B occurs does not change the probability of A The events A 1,A 2,...,A n are said to be independent if for every subset A i1,a i2,...,a ik of the events, P(A i1,a i2,...,a ik ) = k P(A ij ) Note: P(A 1,A 2,...,A n ) = n j=1 P(A i) is not sufficient for independence j=1 EE 278B: Probability and Random Variables 1 12

Random Variables A random variable (r.v.) is a real-valued function X(ω) over a sample space Ω, i.e., X : Ω R Ω ω X(ω) Notations: We use upper case letters for random variables: X, Y, Z, Φ, Θ,... We use lower case letters for values of random variables: X = x means that random variable X takes on the value x, i.e., X(ω) = x where ω is the outcome EE 278B: Probability and Random Variables 1 13 Specifying a Random Variable Specifying a random variable means being able to determine the probability that X A for any Borel set A R, in particular, for any interval (a,b] To do so, consider the inverse image of A under X, i.e., {ω : X(ω) A} R set A inverse image of A under X(ω), i.e., {ω : X(ω) A} Since X A iff ω {ω : X(ω) A}, P({X A}) = P({ω : X(ω) A}) = P{ω : X(ω) A} Shorthand: P({set description}) = P{set description} EE 278B: Probability and Random Variables 1 14

Cumulative Distribution Function (CDF) We need to be able to determine P{X A} for any Borel set A R, i.e., any set generated by starting from intervals and taking countable unions, intersections, and complements Hence, it suffices to specify P{X (a,b]} for all intervals. The probability of any other Borel set can be determined by the axioms of probability Equivalently, it suffices to specify its cumulative distribution function (cdf): Properties of cdf: F X (x) 0 F X (x) = P{X x} = P{X (,x]}, x R F X (x) is monotonically nondecreasing, i.e., if a > b then F X (a) F X (b) 1 F X (x) x EE 278B: Probability and Random Variables 1 15 Limits: lim x + F X(x) = 1 and lim x F X(x) = 0 F X (x) is right continuous, i.e., F X (a + ) = lim x a +F X (x) = F X (a) P{X = a} = F X (a) F X (a ), where F X (a ) = lim x a F X (x) For any Borel set A, P{X A} can be determined from F X (x) Notation: X F X (x) means that X has cdf F X (x) EE 278B: Probability and Random Variables 1 16

Probability Mass Function (PMF) A random variable is said to be discrete if F X (x) consists only of steps over a countable set X 1 Hence, a discrete random variable can be completely specified by the probability mass function (pmf) p X (x) = P{X = x}for every x X Clearly p X (x) 0 and x X p X(x) = 1 Notation: We use X p X (x) or simply X p(x) to mean that the discrete random variable X has pmf p X (x) or p(x) x EE 278B: Probability and Random Variables 1 17 Famous discrete random variables: Bernoulli: X Bern(p) for 0 p 1 has the pmf p X (1) = p and p X (0) = 1 p Geometric: X Geom(p) for 0 p 1 has the pmf p X (k) = p(1 p) k 1, k = 1,2,3,... Binomial: X Binom(n,p) for integer n > 0 and 0 p 1 has the pmf ( ) n p X (k) = p k (1 p) n k, k = 0,1,2,... k Poisson: X Poisson(λ) for λ > 0 has the pmf p X (k) = λk k! e λ, k = 0,1,2,... Remark: Poisson is the limit of Binomial for np = λ as n, i.e., for every k = 0,1,2,..., the Binom(n,λ/n) pmf p X (k) λk k! e λ as n EE 278B: Probability and Random Variables 1 18

Probability Density Function (PDF) A random variable is said to be continuous if its cdf is a continuous function 1 If F X (x) is continuous and differentiable (except possibly over a countable set), then X can be completely specified by a probability density function (pdf) f X (x) such that F X (x) = x f X (u)du If F X (x) is differentiable everywhere, then (by definition of derivative) f X (x) = df X(x) dx F(x + x) F(x) = lim x 0 x P{x < X x + x} = lim x 0 x x EE 278B: Probability and Random Variables 1 19 Properties of pdf: f X (x) 0 f X (x)dx = 1 For any event (Borel set) A R, In particular, P{X A} = P{x 1 < X x 2 } = x A x2 f X (x)dx x 1 f X (x)dx Important note: f X (x) should not be interpreted as the probability that X = x. In fact, f X (x) is not a probability measure since it can be > 1 Notation: X f X (x) means that X has pdf f X (x) EE 278B: Probability and Random Variables 1 20

Famous continuous random variables: Uniform: X U[a,b] where a < b has pdf f X (x) = { 1 b a if a x b 0 otherwise Exponential: X Exp(λ) where λ > 0 has pdf { λe λx if x 0 f X (x) = 0 otherwise Laplace: X Laplace(λ) where λ > 0 has pdf f X (x) = 1 2 λe λ x Gaussian: X N(µ,σ 2 ) with parameters µ (the mean) and σ 2 (the variance, σ is the standard deviation) has pdf f X (x) = 1 2πσ 2 e (x µ)2 2σ 2 EE 278B: Probability and Random Variables 1 21 The cdf of the standard normal random variable N(0,1) is Φ(x) = x 1 2π e u2 2 du Define the function Q(x) = 1 Φ(x) = P{X > x} N(0,1) Q(x) x The Q( ) function is used to compute P{X > a} for any Gaussian r.v. X: Given Y N(µ,σ 2 ), we represent it using the standard X N(0,1) as Then Y = σx +µ { P{Y > y} = P X > y µ } = Q σ ( ) y µ The complementary error function is erfc(x) = 2Q( 2x) σ EE 278B: Probability and Random Variables 1 22

Functions of a Random Variable Suppose we are given a r.v. X with known cdf F X (x) and a function y = g(x). What is the cdf of the random variable Y = g(x)? We use F Y (y) = P{Y y} = P{x : g(x) y} x y {x : g(x) y} y EE 278B: Probability and Random Variables 1 23 Example: Quadratic function. Let X F X (x) and Y = X 2. We wish to find F Y (y) y y y y x If y < 0, then clearly F Y (y) = 0. Consider y 0, F Y (y) = P{ y < X y} = F X ( y) F X ( y) If X is continuous with density f X (x), then f Y (y) = 1 ( 2 f X (+ y)+f X ( ) y) y EE 278B: Probability and Random Variables 1 24

Remark: In general, let X f X (x) and Y = g(x) be differentiable. Then f Y (y) = k i=1 f X (x i ) g (x i ), where x 1,x 2,... are the solutions of the equation y = g(x) and g (x i ) is the derivative of g evaluated at x i EE 278B: Probability and Random Variables 1 25 Example: Limiter. Let X Laplace(1), i.e., f X (x) = (1/2)e x, and let Y be defined by the function of X shown in the figure. Find the cdf of Y y +a 1 +1 a x To find the cdf of Y, we consider the following cases y < a: Here clearly F Y (y) = 0 y = a: Here F Y ( a) = F X ( 1) = 1 1 2 ex dx = 1 2 e 1 EE 278B: Probability and Random Variables 1 26

a < y < a: Here F Y (y) = P{Y y} = P{aX y} { = P X y } ( y ) = F X a a = 1 2 e 1 + y/a 1 1 2 e x dx y a: Here F Y (y) = 1 Combining the results, the following is a sketch of the cdf of Y F Y (y) a a y EE 278B: Probability and Random Variables 1 27 Generation of Random Variables Generating a r.v. with a prescribed distribution is often needed for performing simulations involving random phenomena, e.g., noise or random arrivals First let X F(x) where the cdf F(x) is continuous and strictly increasing. Define Y = F(X), a real-valued random variable that is a function of X What is the cdf of Y? Clearly, F Y (y) = 0 for y < 0, and F Y (y) = 1 for y > 1 For 0 y 1, note that by assumption F has an inverse F 1, so F Y (y) = P{Y y} = P{F(X) y} = P{X F 1 (y)} = F(F 1 (y)) = y Thus Y U[0,1], i.e., Y is a uniformly distributed random variable Note: F(x) does not need to be invertible. If F(x) = a is constant over some interval, then the probability that X lies in this interval is zero. Without loss of generality, we can take F 1 (a) to be the leftmost point of the interval Conclusion: We can generate a U[0,1] r.v. from any continuous r.v. EE 278B: Probability and Random Variables 1 28

Now, let s consider the more useful scenario where we are given X U[0,1] (a random number generator) and wish to generate a random variable Y with prescribed cdf F(y), e.g., Gaussian or exponential 1 x = F(y) y = F 1 (x) If F is continuous and strictly increasing, set Y = F 1 (X). To show Y F(y), F Y (y) = P{Y y} = P{F 1 (X) y} = P{X F(y)} = F(y), since X U[0,1] and 0 F(y) 1 EE 278B: Probability and Random Variables 1 29 Example: To generate Y Exp(λ), set Y = 1 λ ln(1 X) Note: F does not need to be continuous for the above to work. For example, to generate Y Bern(p), we set { 0 X 1 p Y = 1 otherwise x = F(y) 1 p 0 1 y Conclusion: We can generate a r.v. with any desired distribution from a U[0, 1] r.v. EE 278B: Probability and Random Variables 1 30

Jointly Distributed Random Variables A pair of random variables defined over the same probability space are specified by their joint cdf F X,Y (x,y) = P{X x, Y y}, x,y R F X,Y (x,y) is the probability of the shaded region of R 2 y (x,y) x EE 278B: Probability and Random Variables 1 31 Properties of the cdf: F X,Y (x,y) 0 If x 1 x 2 and y 1 y 2 then F X,Y (x 1,y 1 ) F X,Y (x 2,y 2 ) lim y F X,Y(x,y) = 0 and lim x F X,Y(x,y) = 0 lim y F X,Y(x,y) = F X (x) and lim x F X,Y(x,y) = F Y (y) F X (x) and F Y (y) are the marginal cdfs of X and Y lim x,y F X,Y(x,y) = 1 X and Y are independent if for every x and y F X,Y (x,y) = F X (x)f Y (y) EE 278B: Probability and Random Variables 1 32

Joint, Marginal, and Conditional PMFs Let X and Y be discrete random variables on the same probability space They are completely specified by their joint pmf: p X,Y (x,y) = P{X = x,y = y}, x X, y Y By axioms of probability, p X,Y (x,y) = 1 x X y Y To find p X (x), the marginal pmf of X, we use the law of total probability p X (x) = y Yp(x,y), x X The conditional pmf of X given Y = y is defined as p X Y (x y) = p X,Y(x,y) p Y (y), p Y (y) 0, x X Chain rule: p X,Y (x,y) = p X (x)p Y X (y x) = p Y (y)p X Y (x y) EE 278B: Probability and Random Variables 1 33 Independence: X and Y are said to be independent if for every (x,y) X Y, p X,Y (x,y) = p X (x)p Y (y), which is equivalent to p X Y (x y) = p X (x) for every x X and y Y such that p Y (y) 0 EE 278B: Probability and Random Variables 1 34

Joint, Marginal, and Conditional PDF X and Y are jointly continuous random variables if their joint cdf is continuous in both x and y In this case, we can define their joint pdf, provided that it exists, as the function f X,Y (x,y) such that F X,Y (x,y) = x y If F X,Y (x,y) is differentiable in x and y, then f X,Y (x,y) = 2 F(x,y) x y Properties of f X,Y (x,y): f X,Y (x,y) 0 f X,Y (x,y)dxdy = 1 f X,Y (u,v)dudv, x,y R P{x < X x+ x, y < Y y + y} = lim x, y 0 x y EE 278B: Probability and Random Variables 1 35 The marginal pdf of X can be obtained from the joint pdf via the law of total probability: f X (x) = f X,Y (x,y)dy X and Y are independent iff f X,Y (x,y) = f X (x)f Y (y) for every x, y Conditional cdf and pdf: Let X and Y be continuous random variables with joint pdf f X,Y (x,y). We wish to define F Y X (y X = x) = P{Y y X = x} We cannot define the above conditional probability as P{Y y, X = x} P{X = x} because both numerator and denominator are equal to zero. Instead, we define conditional probability for continuous random variables as a limit F Y X (y x) = lim P{Y y x < X x+ x} x 0 P{Y y, x < X x+ x} = lim x 0 P{x < X x+ x} y = lim f X,Y(x,u)du x y = x 0 f X (x) x f X,Y (x,u) f X (x) du EE 278B: Probability and Random Variables 1 36

We then define the conditional pdf in the usual way as Thus f Y X (y x) = f X,Y(x,y) f X (x) F Y X (y x) = y if f X (x) 0 f Y X (u x)du which shows that f Y X (y x) is a pdf for Y given X = x, i.e., Y {X = x} f Y X (y x) independence: X and Y are independent if f X,Y (x,y) = f X (x)f Y (y) for every (x,y) EE 278B: Probability and Random Variables 1 37 Mixed Random Variables Let Θ be a discrete random variable with pmf p Θ (θ) For each Θ = θ with p Θ (θ) 0, let Y be a continuous random variable, i.e., F Y Θ (y θ) is continuous for all θ. We define f Y Θ (y θ) in the usual way The conditional pmf of Θ given y can be defined as a limit P{Θ = θ, y < Y y + y} p Θ Y (θ y) = lim y 0 P{y < Y y + y} p Θ (θ)f Y Θ (y θ) y = lim y 0 f Y (y) y = f Y Θ(y θ)p Θ (θ) f Y (y) EE 278B: Probability and Random Variables 1 38

Bayes Rule for Random Variables Bayes Rule for pmfs: Given p X (x) and p Y X (y x), then p X Y (x y) = x X p Y X (y x) p Y X (y x )p X (x ) p X(x) Bayes rule for densities: Given f X (x) and f Y X (y x), then f X Y (x y) = f Y X (y x) f X(u)f Y X (y u)du f X(x) Bayes rule for mixed r.v.s: Given p Θ (θ) and f Y Θ (y θ), then p Θ Y (θ y) = Conversely, given f Y (y) and p Θ Y (θ y), then f Y Θ (y θ) = f Y Θ (y θ) θ p Θ (θ )f Y Θ (y θ ) p Θ(θ) p Θ Y (θ y) fy (y )p Θ Y (θ y )dy f Y(y) EE 278B: Probability and Random Variables 1 39 Example: Additive Gaussian Noise Channel Consider the following communication channel: Z N(0,N) Θ Y The signal transmitted is a binary random variable Θ: { +1 with probability p Θ = 1 with probability 1 p The received signal, also called the observation, is Y = Θ+Z, where Θ and Z are independent Given Y = y is received (observed), find p Θ Y (θ y), the a posteriori pmf of Θ EE 278B: Probability and Random Variables 1 40

Solution: We use Bayes rule We are given p Θ (θ): p Θ Y (θ y) = and f Y Θ (y θ) = f Z (y θ): Therefore f Y Θ (y θ) θ p Θ (θ )f Y Θ (y θ ) p Θ(θ) p Θ (+1) = p and p Θ ( 1) = 1 p Y {Θ = +1} N(+1,N) and Y {Θ = 1} N( 1,N) p Θ Y (1 y) = p e (y 1)2 2N 2πN p e (y 1)2 (1 p) 2N + e (y+1)2 2N 2πN 2πN pe y N = pe y N +(1 p)e y N for < y < EE 278B: Probability and Random Variables 1 41 Scalar Detection Consider the following general digital communication system Θ {θ 0,θ 1 } noisy Y channel decoder ˆΘ(Y ) {θ 0,θ 1 } where the signal sent is Θ = f Y Θ (y θ) and the observation (received signal) is { θ0 with probability p θ 1 with probability 1 p Y {Θ = θ} f Y Θ (y θ), θ {θ 0,θ 1 } We wish to find the estimate ˆΘ(Y) (i.e., design the decoder) that minimizes the probability of error: P e = P{ˆΘ Θ} = P{Θ = θ0, ˆΘ = θ1 }+P{Θ = θ 1, ˆΘ = θ0 } = P{Θ=θ 0 }P{ˆΘ=θ 1 Θ=θ 0 }+P{Θ=θ 1 }P{ˆΘ=θ 0 Θ=θ 1 } EE 278B: Probability and Random Variables 1 42

We define the maximum a posteriori probability (MAP) decoder as ˆΘ(y) = { θ0 if p Θ Y (θ 0 y) > p Θ Y (θ 1 y) θ 1 otherwise The MAP decoding rule minimizes P e, since P e = 1 P{ˆΘ(Y) = Θ} = 1 f Y (y)p{ˆθ(y) = Θ Y = y}dy and the integral is maximized when we pick the largest P{ˆΘ(y) = Θ Y = y} for each y, which is precisely the MAP decoder If p = 1 2, i.e., equally likely signals, using Bayes rule, the MAP decoder reduces to the maximum likelihood (ML) decoder { θ0 if f Y Θ (y θ 0 ) > f Y Θ (y θ 1 ) ˆΘ(y) = θ 1 otherwise EE 278B: Probability and Random Variables 1 43 Additive Gaussian Noise Channel Consider the additive Gaussian noise channel with signal { + P with probability 1 2 Θ = P with probability 1 2 noise Z N(0,N) (Θ and Z are independent), and output Y = Θ+Z The MAP decoder is ˆΘ(y) = + P if P{Θ = + P Y = y} P{Θ = > 1 P Y = y} P otherwise Since the two signals are equally likely, the MAP decoding rule reduces to the ML decoding rule + P if f Y Θ(y + P) ˆΘ(y) = f Y Θ (y > 1 P) P otherwise EE 278B: Probability and Random Variables 1 44

Using the Gaussian pdf, the ML decoder reduces to the minimum distance decoder { + P (y P) ˆΘ(y) 2 < (y ( P)) 2 = P otherwise From the figure, this simplifies to { + P y > 0 ˆΘ(y) = P y < 0 Note: The decision when y = 0 is arbitrary f(y P) f(y + P) P + P y EE 278B: Probability and Random Variables 1 45 Now to find the minimum probability of error, consider P e = P{ˆΘ(Y) Θ} = P{Θ = P}P{ˆΘ(Y) = P Θ = P} + P{Θ = P}P{ˆΘ(Y) = P Θ = P} = 1 2 P{Y 0 Θ = P}+ 1 2 P{Y > 0 Θ = P} = 1 2 P{Z P}+ 1 2 P{Z > P} ( ) P ( ) = Q = Q SNR N The probability of error is a decreasing function of P/N, the signal-to-noise ratio (SNR) EE 278B: Probability and Random Variables 1 46