Random Variables. P(x) = P[X(e)] = P(e). (1)

Similar documents
Continuous Random Variables

EE4601 Communication Systems

Chapter 2. Random Variable. Define single random variables in terms of their PDF and CDF, and calculate moments such as the mean and variance.

2 (Statistics) Random variables

Let X and Y denote two random variables. The joint distribution of these random

Chp 4. Expectation and Variance

Introduction to Probability and Stocastic Processes - Part I

Expectation of Random Variables

1: PROBABILITY REVIEW

Lecture 22: Variance and Covariance

Math 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14

Probability and Distributions

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

3. Probability and Statistics

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Chapter 4 : Expectation and Moments

Stochastic Models of Manufacturing Systems

Review of Probability Theory

Statistics, Data Analysis, and Simulation SS 2015

Multiple Random Variables

Exercises and Answers to Chapter 1

1 Review of Probability

Introduction to Probability and Stocastic Processes - Part I

LIST OF FORMULAS FOR STK1100 AND STK1110

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Random variables (discrete)

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

conditional cdf, conditional pdf, total probability theorem?

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

ORF 245 Fundamentals of Statistics Chapter 4 Great Expectations

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

Actuarial Science Exam 1/P

Probability Review. Gonzalo Mateos

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

1 Presessional Probability

Lecture 6 Basic Probability

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix:

Chap 2.1 : Random Variables

1 Review of Probability and Distributions

Final Exam # 3. Sta 230: Probability. December 16, 2012

Fundamentals of Digital Commun. Ch. 4: Random Variables and Random Processes

5 Operations on Multiple Random Variables

Lecture 2: Repetition of probability theory and statistics

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

CME 106: Review Probability theory

Probability Models. 4. What is the definition of the expectation of a discrete random variable?

Random Variables and Their Distributions

Probability Review. Yutian Li. January 18, Stanford University. Yutian Li (Stanford University) Probability Review January 18, / 27

Probability Theory and Statistics. Peter Jochumzen

Conditional densities, mass functions, and expectations

Joint Distribution of Two or More Random Variables

Expectation. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

A Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes.

Brief Review of Probability

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

Quick Tour of Basic Probability Theory and Linear Algebra

Week 12-13: Discrete Probability

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Chapter 5. Chapter 5 sections

Formulas for probability theory and linear models SF2941

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

ORF 245 Fundamentals of Statistics Great Expectations

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

1 Random Variable: Topics

Bivariate Transformations

Topic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr.

ECE 650 Lecture 4. Intro to Estimation Theory Random Vectors. ECE 650 D. Van Alphen 1

Introduction to Machine Learning

STAT 430/510: Lecture 16

Chapter 4. Continuous Random Variables 4.1 PDF

BASICS OF PROBABILITY

Week 2. Review of Probability, Random Variables and Univariate Distributions

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

Expectation. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Exam P Review Sheet. for a > 0. ln(a) i=0 ari = a. (1 r) 2. (Note that the A i s form a partition)

ELEG 3143 Probability & Stochastic Process Ch. 6 Stochastic Process

STA 256: Statistics and Probability I

where r n = dn+1 x(t)

Lecture 11. Probability Theory: an Overveiw

ECE Lecture #9 Part 2 Overview

Northwestern University Department of Electrical Engineering and Computer Science

Solutions to Homework Set #5 (Prepared by Lele Wang) MSE = E [ (sgn(x) g(y)) 2],, where f X (x) = 1 2 2π e. e (x y)2 2 dx 2π

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

Lecture 19: Properties of Expectation

Chapter 1. Sets and probability. 1.3 Probability space

Random Variables. Saravanan Vijayakumaran Department of Electrical Engineering Indian Institute of Technology Bombay

MAS113 Introduction to Probability and Statistics. Proofs of theorems

Algorithms for Uncertainty Quantification

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011

CS37300 Class Notes. Jennifer Neville, Sebastian Moreno, Bruno Ribeiro

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

Lecture Notes 2 Random Variables. Discrete Random Variables: Probability mass function (pmf)

Bivariate Distributions

Introduction to Machine Learning

Chapter 3: Random Variables 1

Chapter 5 Joint Probability Distributions

Transcription:

Random Variables Random variable (discrete or continuous) is used to derive the output statistical properties of a system whose input is a random variable or random in nature. Definition Consider an experiment whose outcomes are elements of the Universal set S. To every outcome e (elementary event), we assign a real number X(e) to the outcome e, so that corresponding to each element of S, there is a real number. The probability of an elementary event occurring is not changed by the mapping from e to X(e), so that P(x) = P[X(e)] = P(e). (1) The rule of assignment (outcome e maps to a real number X(e)) is called a random variable. The mapping from the outcomes to the points on a real line is quite arbitrary and any required correspondence may be chosen. The reason for this mapping is that real numbers are much more easier to analysis. The value of X(e) is called the value of the random variable. We will use the notation X to denote the random variable and x = X(e) to denote a particular value of the random variable X. When x = X(e) may have a discrete set of values, X is a discrete random variable. When x = X(e) may have a continuous set of values over an interval, then X is said to be a continuous random variable. The interval may be finite or infinite. Distribution and Density Function The Cumulative Distribution Function (CDF) of the random variable X is denoted by F X (x) such that P(X < x) = F X (x). (2) P(X < x) denotes the probability of the random variable X whose value is less or equal to x after the rule of assignment. When x = a, a is a fixed number, we can simply substitute a into equation (2). The important properties of F X (x) are as follows : 1. F X (x) is non-decreasing, so that F X (x+ h) > F X (x) (3) for h > 0. 2. F X (x) is a right continuous, so that lim F X (x+ h) > F X (x) (4) h 0 h > 0 3. lim x F X (x) = 0 (5) 1

4. lim x F X (x) = 1, (6) assuming that the random varaible X can have only finite vaules. X can be discrete or continuous. It will be assumed throughout the following discussion that X takes on only finite values, and the following notation will be used : Equation (4) becomes lim F X (x+ h) > F X (x + ) (7) h 0 h > 0 or becomes lim F X (x- h) > F X (x - ). (8) h 0 h > 0 Equation (5) becomes lim x F X (x) = F X (x = -) = F X (-). (9) Equation (6) becomes lim x F X (x) = F X (x = +) = F X (+) (10) The followings are some important relationships : 1. 0 < F X (a) < 1 (11) 2. F X (b) - F X (a) = P( a < x < b) (12) 3. F X (a) - F X (a + ) = P(x = a) (13) 4. F X (a - ) = P(x < a) (14) 5. F X (a) = P(x < a) (15) F X (a) = P(X = x, where x < a). A discrete random variable X can take on only the discrete values x i, for i = 1, 2,..., n. Its cumulative distribution function (CDF) remains constant between two adjacent values of x (i.e., between x i and x i+1 ) and increases suddenly at each x i. Figure 1 For a discrete random variable X whose possible values are x i for i = 1, 2,..., n. 2

P(X = x i ) > 0 (16) n P(X = x i ) = 1 (17) i =1 P(X < x i ) = P(X = x i ) (18) all x x i P(X < x i ) = F X (x) P(X = x i ) = F X (x i ) - F X (x i - ). (19) A continuous random variable X can take on any value over one or more intervals, and its cumulative distribution function (CDF) F X (x) is continuous in nature. For a continuous random variable, Figure 2 P(X = x i ) = 0. (20) Furthermore, there are now an infinite number of possible values of the random varaible X. Thus, instead of considering the probability of random variable having a discrete value, we must consider the probability of it having one of a range of values. Let f X (x i )dx be the probability that the continuous random variable lies in the range of values from x i - dx to x i, such that P(x i - dx < x < x i ) = f X (x i )dx (21) x i may here have any one of an infinite set of values. f X (x i ) is called the probability density function of the random variable X. The probability that the continuous random variables lies in one of the n separate ranges x 1 - dx to x 1, x 2 - dx to x 2,..., x n - dx to x n, is f X (x 1 )dx + f X (x 2 )dx +...+ f X (x n )dx = [f X (x 1 ) + f X (x 2 ) +...+ f X (x n )]dx. (22) This is due to the fact that in any experiment, the random variable can have only one value. Thus, the value cannot lie in the range x i - dx to x n if it lies in the range x j - dx to x j, where i j. Thus, the n events are mutually exclusive and the probability of any one of the n events occurring is the sum of their individual probabilities. It can now be seen that b P(a < x < b) = f X (x) dx. (23) a 3

But from equation (12), P(a < x < b) = F X (b) - F X (a) (24) where F X (x) is the cumulative distribution function of the continuous random variable X. Thus b F X (b) - F X (a) = f X (x) dx. (25) a From Figure 2, it can be deduced that Therefore, Similarly, F X (-) = 0. P(- < x < b) = F X (b) - F X (-) P(- < x < b) = F X (b) (26) b P(- < x < b) = f X (x) dx. a F X (a) = f X (x) dx. (27) Differentiating F X (a) with respect to a, we obtain so that and f X (a) = df X (a) / da f X (x) = df X (x) / dx (28) x F X (x) = f X (x) dx. (29) From equation (21) so that P(a - dx < x < a) = f X (a)dx (30) f X (a) = P(a - dx < x < a) / dx. (31) Hence f X (a) is the probability density of the random variable X at the value a. 4

In summary, a function f X (x) is a continuous density function if it has the following properties : 1. f X (x) > 0, - < x < (32) 2. f X (x) dx = 1 (33) a 3. P(x < a) = F X (a) = f X (x) dx. (34) 4. f X (x) = df X (x) / dx (35) Transformation of Random Variables Let Y = h(x). The continuous random variable Y is a function of the continuous random variables X. We solve the equation y = h(x) for x in terms of y, where x is the value of X and y is the value of the random variable Y. A function h(x) is monotonically increasing if h(x = x 2 ) > h(x = x 1 ) when x 2 >x 1. A function h(x) is monotonically decreasing if h(x = x 2 ) < h(x = x 1 ) when x 2 >x 1. For monotonic functions, there is a one-to-one correspondence between h(x) and x, which means that for each value of the random variable X there is a unique value of h(x). 1. Consider the case where Y = h(x) is monotonically increasing and differentiable. Let the random variable X has a density function f X (x) and CDF of F X (x), while the random variable Y has a density function f Y (y) and CDF of F Y (y). Let b = h(a). Since Y = h(x) is monotonically increasing, one-to-one mapping, therefore there is no change in probability. Then or Also, or P(Y < b) = P(X < a) (36) F Y (b) = F X (a) (37) df Y (b) / db = df X (a) / db (38) df Y (b) / db = df X (a) / da (da /db). (39) 5

Therefore, f Y (b) = f X (a) (da /db) (40) f Y (y) = f X (x) (dx /dy) (41) 2. Consider the case where Y = h(x) is monotonically decreasing and differentiable. If b = h(a) (42) then P(Y < b) = P(X > a) (43) and F Y (b) = 1 - F X (a). (44) Thus df Y (b) / db = - F X (a) / da (da / db) (45) or f Y (b) = - f X (a) (da / db) (46) Replacing a by x and b by y f Y (y) = - f X (x) (dx / dy) (47) However the derivative dx / dy is negative for monontically decreasing and differentiable random variable. But in equation (41), dx / dy is always positive for monotically increasing and differentiable random variable. Thus equations (41) and (47) may be combined to give the follwoing general equation for monotonic functions: f Y (y) = f X (x) dx / dy (48) where Y = h(x) is a one-to-one continuously differentiable function of X. 3. Consider the case where Y = h(x) is not a monotone, the value x of the random variable X has the values a 1, a 2,..., a n, all of which are real. Thus b = h(a 1 ) = h(a 2 ) =... = h(a n ). (49) Several values of X may correspond to a single value of Y, then and P(Y < b) = P(all values of X for which Y < b) (50) F Y (b) = f X (x) dx (51) {X /(Y b )} where f X (x) is integrated over all values of X for which Y < b. F Y (b) may be expressed as sum of n definite integrals of f X (x), the i-th of which is 6

a i f X (x) dx or k i f X (x) dx (52) k i a i where k i is a constant such that dk i / db = 0. This is shown in Figure 3. Figure 3 Suppose k 1 = +, then a 1 f X (x) dx = F X (a 1 ) (53) k = 1 and df X (a 1 ) / db = f X (a 1 ) (a 1 / db) (54) When df X (a 1 ) is differentiated to give f X (a 1 ) at the value x = a 1. Since k 2 f X (x) dx = a 2 k 2 f X (x) dx - a 2 f X (x) dx k 2 f X (x) dx = F X (k 2 ) - F X (a 2 ). (55) a 2 Therefore, df X (k 2 ) / db - df X (a 2 ) / db = 0 - df X (a 2 ) / db (56) where df X (a 2 ) / db = f X (a 2 ) (da 2 / db). (57) Since da 2 / db is negative (see Figure 3), it follows that f Y (b) = f X (a 1 ) da 1 / db +... + f X (a n ) da n / db. (58) In general,, for a given value y of Y, X has the values x 1, x 2,..., x n, so that y = h(x 1 ) = h(x 2 ) =... = h(x n ) (59) then 7

Example 1 f Y (y) = f X (x 1 ) dx 1 / dy +... + f X (x n ) dx n / dy. (60) Consider the transformation Y = cx + d. We write the value of Y as y and the value of X as x. Thereofre y = cx + d. Thus x = (y - d) / c and dx / dy = 1 / c. From equation (48), f Y (y) = f X (x) dx / dy f Y (y) = f X [ (y - d) / c] 1 / c. Q.E.D. Example 2 Find f Y (y) where Y = X 2. Summary 1. Transformation of random variable is useful in finding the characteristics of the output signal in a communication system where the input signal is random in nature. n 2. f Y (y) = i =1 f X (x i ) dx i / dy. 3. The technique can be extended to two random variables X and Y (continuous) whose joint probability density function is f X,Y (x, y) and Z = g(x, Y), W = h(x, Y). n f Z,W (z, w) = f X,Y (x i, y i ) J i, i =1 where the Jacobian J i = x / z x / w i i y / z y / w. i i Expectation P(X = x i ) and f X (x) give a total description of a discrete and continuous random varaibles, respectively. Alternatively, it is simpler to describe a random variable in terms of its statistical average (expectation) and moment. Consider the discrete random variable X whose possible values are {x i }. The expected value of X is defined to be 8

E[X] = i h(x = x i ) P(X = x i ) (61) i.e., µ X = h(x i ) P(x i ) (62) as the simplified notation. Consider now the continuous random variable X. The expected value of X is defined to be E[X] = x f X (x) dx. (63) If h(x) is a function of the continuous random variable X, then the expected value of h(x) is defined to be i.e., E[h(X)] = h(x = x) f X (x) dx. (64) µ X = h(x ) f X (x) dx. (65) as the simplified notation. The expected value of a random variable has some obvious properties: 1. E[h 1 (X) + h 2 (X)] = E[h 1 (X) ] + E[h 2 (X)] (66) 2. E[k h(x) ] = k E[h(X)] (67) 3. E[k] = k (68) 4. E[h(X)] > 0 (69) if h(x = x) for all x in the continuous random variable case or h(x = x i ) for all {x i } in the discrete random variable case. Moments In the case of a discrete random variable X, the k-th moment about the mean is defined to be η k = E[(X - µ X ) k ] = i (x i - µ X ) k P(X =x i ). (70) The second moment about the mean is called the variance σ X 2, and its square root σx is known as the standard deviation. 9

σ X 2 = E[(X - µ X ) 2 ] (71) Example 1 η 1 = E[X - µ X ] = E[X ] - E[µ X ] η 1 = E[X ] - µ X η 1 = µ X - µ X η 1 = 0 (72) i.e., the first moment about the mean µ X is zero. Example 2 σ X 2 = E[(X - µ X ) 2 ] σ X 2 = E[X 2-2Xµ X + µ X 2 ] σ X 2 = E[X 2 ] - 2µ X E[X] + µ X 2 Therefore, σ X 2 = E[X 2 ] - 2µ X 2 + µ X 2 σ X 2 = E[X 2 ] - µ X 2. E[X 2 ] = σ X 2 + µ X 2. (73) i.e., the second moment about the zero (the origin) is equal to the variance plus the square of the mean µ X 2. E[X 2 ] is called the mean square value. E[X 2 ] - mean square value µ X 2 - square of the mean σ X 2 - variance σ X - standard deviation In the case of a continuous random variable X, the k-th moment about the mean is defined to be η k = E[(X - µ X ) k ] = (x - µ X ) k f X (x) dx. (74) The second moment about the mean is σ X 2 = E[(X - µ X ) 2 ] = (x - µ X ) 2 f X (x) dx. (75) Equations (72) and (73) also hold for the continuous random variable case. In general, the followings are two inportant properties of moments. 1. The first moment about the mean is zero. 10

η 1 = E[X - µ X ] η 1 = E[X ] - E[µ X ] η 1 = µ X - µ X η 1 = 0. 2. The second moment about the origin is equal to the variance plus the square of the mean. σ X 2 = E[(X - µ X ) 2 ] σ X 2 = E[X 2 ] - µ X 2. E[X 2 ] = σ X 2 + µ X 2. Example 3 Given Y = a X + b where X, Y are continuous random variables and a, b are constants; find µ Y and σ Y 2 in terms of a, b, µ X and σ X 2. E[Y] = E[a X + b] µ Y = E[a X] + E[b] µ Y = a E[X] + E[b] µ Y = a µ X + b σ Y 2 = E[(Y - µ Y ) 2 ] σ Y 2 = E[(a X + b - a µ X - b) 2 ] σ Y 2 = E[a 2 ( X - µ X ) 2 ] σ Y 2 = a 2 E[( X - µ X ) 2 ] σ Y 2 = a 2 σ X 2. Tchebycheff s Inequality Consider a continuous random variable X with mean µ X and variance σ X 2. Define a random variable Y by Y = 0 for X µ X < λσ X λ 2 σ X 2 for X µ X λσ X (76) for any real λ > 0. The expected value of Y is 11

E[Y] = y f Y (y) dy (77) x µ X E[Y] = 2 y f Y (y) dy + 2 0 x µ X y f Y (y) dy E[Y] = 0P (Y = 0) + λ 2 σ 2 2 P (Y = λ σx 2 ) X P ( X µ <λσ ) X X P ( X µ λσ ) X X E[Y] = λ 2 σ 2 X P( X - µ X > λσ X ). (78) When the value y of the random variable Y takes on λ 2 σ 2 X Y = λ 2 σ 2 X thus X - µ X > λσ X i.e., (X - µ X ) 2 > λ 2 σ 2 X (X - µ X ) 2 > Y. Thus E[(X - µ X ) 2 ] > E[Y] σ 2 X > E[Y]. Use equation (78), σ 2 X > λ 2 σ 2 X P( X - µ X > λσ X ). (79) Therefore, 1/λ 2 > P( X - µ X > λσ X ). (80) Let K = λσ X, then equation (80) becomes σ 2 X /Κ 2 > P( X - µ X > Κ) (81) which is the Tchebycheff s Inequality. It is also valid for a discrete random variable. This inequality can be used to obtain bounds on th eprobability of finding X outside an 12

interval µ X +/- K. Weak Law of Large Numbers Let x 1, x 2,..., x n be statistically independent sample values of the continuous random variable X, which has a mean value µ X and a variance σ 2 X. Let y be the mean of these n sample values, so that n n y = (1/n) x i = x i / n. (82) i =1 i =1 Because X is a continuous random variable, {x i } are not the possible values of a discrete random variable. Each x i / n has a mean value µ x / n and a variance (σ x / n) 2. {x i / n} are statistically independent random variables. µ Y = E[Y] = n [µ X / n ] (83) i.e., sum of the means of {x i / n}. σ Y 2 = n (σ X / n ) 2 (84) i.e., sum of the variances of {x i / n}. From equations (83) and (84), the larger the value of n, the smaller is σ Y 2 and closer is the value y of Y likely to be to n. This can also be shown by Tchebycheff s Inequality as follows. 1/λ 2 > P( Y - µ Y > λσ Y ). (85) Let ε = λσ Y so that λ 2 = ε 2 / σ 2 Y = nε 2 / σ 2 X. Equation (85) becomes σ X 2 / (nε 2 ) = P( Y - µy > ε). From equation (83), µ Y = µ X σ X 2 / (nε 2 ) = P( Y - µx > ε). (86) For a fixed value of ε (positive quality), the probability that the mean value of y differs from µ X by an amount greater than ε is inversely proportional to n. y becomes a 13

better estimate of µ X as n increases. hence the discrepancy between y and µ X reduces. Charecteristic Functions The characteristic function of a random variable X is defined to be φ X (w) = E[e jwx ], j 2 = -1. (87) For a discrete random variable X, φ X (w) = e jwx i P(X = x i ) (88) i and for a continuous random variable X, φ X (w) = e jwx f X (x) dx. (89) for a continuous random variable X f X (x) = 1 2π φ X (w)e -jwx dw. (90) f X (x) and φ X (w) are Fourier transform pair. Application : To determine the probability density of the sum of two independent random variables, it is often first/easier to determine the characteristic function of the sum of the random variables and then the probability density function. Example Y = X 1 + X 2 φ Y (w) = E[e jwy ] φ Y (w) = E[e jwx 1+jwx 2] φ Y (w) = E[e jwx 1 ] E[e jwx 2] φ Y (w) = φ X1 (w)φ Y (w) f Y (y) = 1 2π φ Y (w)e -jwy dw. Q.E.D. Joint Distribution We will now consider two random variables X and Y that take on the values x 1, x 2,..., x n and y 1, y 2,..., y n. The joint distribution function of the two random variables is defined as F XY (x, y) = P(X < x and Y < y). (91) 14

F XY (x, y) has the following properties. 1. F XY (x, y) is monotonic non-decreasing in both variables. 2. F XY (x, y) -> 0 if either x or y approaches -. 3. F XY (x, y) -> 1 if both x and y approach. 4. F XY (x, ) = F X (x). 5. F XY (, y) = F Y (x), where F X (x) and F Y (y) are the distribution functions for X and Y, respectively. For discrete random variables, the joint probability is P(X < x, Y < y) = P(X < x and Y < y). (92) P(X < x, Y < y) = P(X = x i, Y = y j ). (93) x =x y =y i j and satisfies the following equations. 1. n n P(X = x i, Y = y j ) = 1. (94) i =1j =1 2. When the value of X is x i, n P(X = x i ) = P(X = x i, Y = y j ) (95) j =1 n P(X = x i ) = P(X = x i / Y = y j ) P(Y = y j ). (96) j =1 3. P(X = x i / Y = y j ) = P(X = x i, Y = y j ) / P(Y = y j ). (97) For continuous random variables, the joint probability density function f XY (x, y) satisfies the following equations. 1. f XY (x, y) > 0, (98) where - < x < and - < y <. 2. f XY (x, y) dx dy = 1. (99) 15

x y 3. F XY (x, y) = f XY (x, y) dx dy. (100) 4. f XY (x, y) = δ 2 δx δy F XY (x, y)], (101) where F XY (x, y) is continuous and differentiable. Expectation The expectation of a function g(x, Y) of the discrete random variables X and Y is E[g(X, Y)] = g(x i, y j ) P(X = x i, Y = y j ). (102) i j The expectation of a function g(x, Y) of the continuous random variables X and Y is E[g(X, Y)] = g(x, y)f XY (x, y) dx dy. (103) In the following analysis only continuous random variables will be considered, but the corresponding relationships apply throughout to discrete random variables. The expected value (expectation) of two random variables has the following properties : 1. E[g 1 (X, Y) + g 2 (X, Y)] =] = E[g 1 (X, Y)] + E[g 2 (X, Y)] (104) 2. E[k g(x, Y)] = k E[g(X, Y)] (105) 3. E[k ] = k, if k is a constant (106) 4. E[k g(x, Y)] > 0, if g(x, Y) > 0 for all X and Y. (107) Two random variables X and Y are independent if Thus P(X < x i, Y < y j ) = P(X < x i ) P(Y < y j ). (108) F XY (x, y) = F X (x) F Y (y). (109) The above equation implies that f XY (x, y) = f X (x) f Y (y). (110) where f XY (x, y) is the joint density function. When X and Y are independent, E[g(X) g(y)] = g(x)g(y)f XY (x, y) dx dy. (111) 16

E[g(X) g(y)] = g(x)f X (x) dx g(y)f Y (y) dy. (112) E[g(X) g(y)] = E[g(X)] E[g(Y)]. (113) In particular, if x and y are values of two independent random variables X and Y, E[X] E[Y] = E[XY]. (114) Moments The kl-th moment about the mean is defined to be η kl = E[(X - µ X ) k (Y - µ Y ) l ] (115) where µ X is the mean of X and µ Y is the mean of Y. In particular, η 20 = E[(X - µ X ) 2 ] = σ X 2 (116) and the variance of Y is η 02 = E[(Y - µ Y ) 2 ] = σ Y 2. (117) The covariance of X and Y is defined to be η 11 = E[(X - µ X )(Y - µ Y )] = σ XY. (118) and the correlation coefficient of X and Y is defined to be ρ XY = η 11 η 20 η 02 = η XY η X η Y. (119) It follows that -1 < ρ XY < 1. ρ XY gives a measure of dependence between random variables X and Y. For example, ρ XY = 0.9 or ρ XY = -0.9 give the same measure of dependence between X and Y. In general, η 11 = E[(X - µ X )(Y - µ Y )] η 11 = E[XY - µ X Y - µ Y X + µ X µ Y ] η 11 = E[XY] - µ X µ Y. (120) Therefore, E[XY] = η 11 + µ X µ Y where about zero. E[XY] gives the (k = l = 1)-st moment If X and Y are independent, then E[XY] = E[X]E[Y] (121) 17

Thus η 11 = E[X]E[Y] - µ X µ Y (122) η 11 = 0. ρ XY is also equal to zero. Summary Two random variables X and Y are statistically independent if f XY (x, y) = f X (x) f Y (y) uncorrelated if E[XY] = E[X] E[Y] orthogonal if E[XY] = 0; i.e., either E[X] = 0 or E[Y] = 0. Hence Figure 4 If X and Y are statistically independent, then functions of X and Y (i.e., g(x) and g(y )_ are also independent. f[g(x), g(y )] = f[g(x)] f[g(y )]. (123) f[g(x), g(y )] is the joint probability density of g(x) and g(y ). If X and Y are statistically independent, then functions of X and Y (i.e., g(x) and g(y )) are uncorrelated. E[g(X) g(y )] = E[g(X)] E[g(Y )]. (124) If X and Y are uncorrelated, then their covariance and correlation coefficient are both zero: µ XY = σ XY = E[(X - µ X )(Y - µ Y )] µ XY = σ XY = E[XY - µ X Y - µ Y X + µ X µ Y ] µ XY = σ XY = E[X]E[Y] - µ X E[Y] - µ Y E[X] + µ X µ Y µ XY = σ XY = µ X µ Y - µ X µ Y - µ X µ Y + µ X µ Y µ XY = σ XY = 0. ρ XY = 0 = σ XY / σ X σ Y. This implies that the random variables (X - µ X ) and (Y - µ Y ) are orthogonal. For any two random variables X and Y, 18

E[X + Y] = E[X] + E[Y] (125) E[X + Y] = µ X + µ Y so that the mean of their sum is equal to the sum of their means. The variance σ 2 X+Y = E[{(X + Y) - (µ X + µ Y )}2 ] σ 2 X+Y = E[X2 + Y 2 + 2XY + µ X 2 + µ Y 2 + 2µ X µ Y - 2µ X X - 2µ X Y - 2µ X X - 2µ Y Y] σ 2 X+Y = E[(X - µ X )2 + (Y - µ Y ) 2 + 2(X - µ X )(Y - µ Y )] σ 2 X+Y = σ X 2 + σ Y 2 + 2σ XY. (126) If X and Y are uncorrelated, σ XY = 0. Equation (126) becomes σ 2 X+Y = σ X 2 + σ Y 2. (127) Thus if two random variables are uncorrelated, the the variance of their sum equals the sum of their variance. If X and Y are orthogonal, it can be seen that E[(X + Y) 2 ] = E[X 2 ] + E[Y 2 ] (128) because E[XY] = 0 for orthogonality, i.e., E[X] or E[Y] = 0. Since E[XY] = E[X]E[Y] for uncorrelation, this implies that X and Y are uncorrelated if they are orthogonal. In general, if X and Y are statistically independent then X and Y must also be uncorrelated. However, uncorrelated random variables need not be statistically independent. Example X and Y are statistically independent, E[XY] = E[X]E[Y] but if thay are uncorrelated, then, in general, E[XY] E[X]E[Y]. Examples of Various Distributions 1. Binomial Consider a sequence of repeated independent trials of a given random experiment with only two possible outcomes on each trial and with the probabilities of two outcomes fixed throughout the trials. Such sequences are called Bernoulli traials. 19

It is conventional to call one of the two outcome S (success) and the other F (failure). The probability of a success is p and the probability of a failure q, where p + q = 1. (129) Thus, in any one trial, P(S) = p (130) P(F) = 1 - p. (131) Since the individual trials are independent, the probability of obtaining a given sequence of successes and failure in n trials, where there are k successes and n - k failures, is P k = p k (1 - p) (n - k). (132) In a sequence of n trials, there are n k different sequences containing k successes and n - k failures. These sequences are mutually exclusive and each has P k. Thus the probability of obtaining k successes in a sequence of n independent trials, where any sequence of k sequence and n - k failures is acceptable, is b k = n k pk (1 - p) (n - k). (133) b k is the (k+1)-term in the binomial expansion of (p + q) n where (p + q) n n = n k =0 k pk q (n - k) (p + q) n n = b k (134) k =0 Since q = 1 - p, equation (134) becomes n b k = [p + (1 - p)] n = 1. k =0 In addition, b k is non-negative, it is therefore a probability function, written as P k, of the discrete random variable k. The variation of b k with k, for a given n, gives the corresponding Binomial Distribution with a probability P(k) = n k pk (1 - p) (n - k) (135) for k = 0, 1, 2,..., n. It may be shown that the mean value E[k] of the binomial distribution is 20

E[k] = µ k = n p (136) and the variance E[(k - µ k ) 2 ] is 2. Poisson σ k 2 = n p ( 1 - p) (137) σ k 2 = E[(k - µ k ) 2 ]. Consider the binomial distribution where n becomes very large and p very small but n p remains of a reasonable size. Suppose that as n ->, p --> 0 and n p = λ (138) where λ is a finite non-zero positive constant. The first term of the binomial distribution is b 0 = n 0 p0 (1 - p) n = (1 - p) n. (139) Thus lim n b 0 = n lim (1 - [λ / n])n lim n b 0 = e-λ. (140) The second term of the binomial distribution is b 1 = b 1 = n 1 p1 (1 - p) (n - 1) = n p (1 - p) n 1! (1 p ) λ 1! (1 [λ/n ]) b 0 (141) and In general, lim n b 1 = lim n b 1 = lim n b k = lim n lim n lim n λ 1! (1 [λ/n ]) e-λ λ 1! e-λ. (142) λ k k! e-λ, (143) for k = 0, 1, 2,..., n. Thus if n -> and p -> 0 in such a way that n p = λ, 21

where λ is a finite non-zero positive constant, in the limit b k becomes the (k + 1)- th term p k of the Poisson Distribution, where p k = λk k! e-λ, (144) for k = 0, 1, 2,...,. In addition, p k is non-negative, and p k = e -λ λ k k =0 k =0 k! p k = e -λ e λ =1. (145) k =0 Thus p k is the probability function P(k) of the discrete random variable k. It can also be shown that the mean µ k and the variance σ k 2 of pk are each equal to λ. Thus E[k] = µ k = λ (146) E[(k -µ k ) 2 ] = σ k 2 = λ. (147) 3. Gaussian or Normal Distribution The random variable X is binomially distributed if the probability that it has the value x is P(X = x) = n x px (1 - p) (n - x), (148) for x = 0, 1, 2,..., n. As n increases to a very large value and assuming that n p (1 - p) >> 1, the probability that the random variable has the value x becomes P(X = x) ~ 1 [2πnp (1 p )] exp[ (x np )2 ]. (149) 2np (1 p ) The random variable is here discrete and not continuous. In the limit as n ->, the random variable X can be for practical purposes be considered to be continuous from - to, with a probability density function f X (x) = 1 2πσ X 2 exp[ (x µ X ) 2 2σ X 2 ], (150) 22

- < x <. Figure 4.1 The approximation is amazingly good even for small n (n > 5) provided that p is not too far from 0.5 and x not much greater then σ X. Equation (150) is now defined as the Gaussian probability density function of the random variable X. Its mean is µ X and its variance is σ 2 X. fx (x) is symmetrical about µ X. P(X > a) = a 1 2πσ X 2 exp [ (x µ X ) 2 2σ X 2 ] dx (151) P(X > a) = (a µ ) X σ X 1 2π exp (-z2 /2) dz. (152) To evaluate this integral, we will use tabulated values of the Q function, which is defined as Q(y) = 1 2π exp (-z 2 /2) dz; y > 0 (153) y Thus P(X > a) = Q(y) (154) where y = (a - µ X ) / σ X. In fact, a Gaussian probability density function is completely defined by µ X and σ X 2, where X is the random variable itself. Important Results : 1. Sum of two or more statistically independent Gaussian random variables is also a Gaussian random variable. 2. If two Gaussian random variables are uncorrelated then thay are also statistically independent. Gaussian random variables are the only case where uncorrelation gives independence. 3. µ X = 0 gives a WHITE Gaussian Random Variable (WGRV). 4. µ X = 0, σ X 2 = 0, then f X (x) is said to have a unit normal distribution. Central Limit Theorm 23

The theorem states that if X 1, X 2,... X n are statisticvally independent random variables and Y = i X i, then, as n becomes very large, the distribution of Y approaches a Gaussian distribution. It can now be seen that a random noise can be modelled by the Gaussian probability density function because noise in communicatin sytstems is often made up of a very large number of independent noise sources. References [1] Schanmugan, K. S., 'Digital and Analog Communication Systems', John Wiley & Sons, 1979. [2] Peebles, P. Z., 'Probability, Random Variables and Random Signal Principles', 2/e, McGraw-Hill, 1987. 24

1 F X (x) x 1 x 2 x 3 x 4 x 5 x 6 Figure 1 x 1 F (x) X 0 Figure 2 x b 0 Y = b k 1 k 2 a 1 a 2 Figure 3 x 25

uncorrelated statistically independent Figure 4 f (x) X Area = P{X > ( µ X +y σ X )} 0 µ X µ X +y σ X x Figure 4.1 26s