Advanced Random Processes

Size: px

Start display at page:

Download "Advanced Random Processes"

Rudolph White
5 years ago
Views:

1 1 Advanced Random Processes Text: PROBABILITY AND RANDOM PROCESSES, by Davenport Course Outline - Basic Probability Theory; Random Variables and Vectors; Conditional Probability and Densities; Expectation; Conditional Expectation - Estimation with Static Models Random Process; - Stationarity; Power Spectral Density - Mean-Square Calculus; Linear System - Kalman Filter - Wiener Integrals; Wiener Filter Grade Mid-term (40 %), Final (40 %), Homework (20 %)

2 2 2. SAMPLE POINTS AND SAMPLE SPACES Sample point A sample point is a representation of a possible outcome of an experiment. Sample space A sample space is the totality of all possible sample points, that is, the representation of all possible outcomes of an experiment. Event An event is an outcome or a collection of outcomes. It is also defined as the corresponding sample point or set of sample points, respectively. Event defined by listing A = {s 1, s 2,, s n } B = {s 1, s 2, s 3, } Event defined by description A = {s : prop(s) is true } where prop (s) is some proposition about s : for example, s < 1.

3 3 Implication or inclusion A B (s A s B) Equality A = B A B and B A Union A B {s : s A or s B or both } A A B and B A B A B A B = B Intersection A B {s : s A and s B} A B A and A B B A B A B = A

4 4 Distributive laws A (B C) = (A B) (A C) and A (B C) = (A B) (A C) Complement A c {s : s S and s A} (A c ) c = A A B B c A c Relative complement B A {s : s B and s A} B A = B A c

5 5 Null set φ A A c = φ S c = φ A φ = A and A φ = φ φ A for any A S Disjoint events The events A and B are disjoint if and only if A B = φ. De Morgan s rules (A B) c = A c B c (A B) c = A c B c Partitions of S {A 1, A 2,, A n } where A i A j = φ for i j and n i=1 A i = S

6 6 3. PROBABILITY Probability Space A Probability Space is a triple (S, A, P ). S = Sample space A = σ-algebra on S P = Probability measure σ- Algebra A is a nonempty class of subsets of S such that (i) A A A c A (ii) A, B A A B A (iii) A 1, A 2, A 3, A i=1 A i A Probability Probability is a set function P : A R which satisfies the following axioms: (i) P (A) 0 (ii) P (S) = 1

7 7 (iii) A i A j = φ, i j P [ i=1a i ] = P [A i ] i=1 (countably additive) Elementary properties of probability P [A c ] = 1 P [A] P [φ] = 0 P [A] 1 P [B A] = P [B] P [A B] A B P [B A] = P [B] P [A] A B P [A] P [B] P [A B] = P [A] + P [B] P [A B] P [A B] P [A] + P [B]

8 8 Joint probability If the sample space S is partitioned both by the collection of events {A 1, A 2,, A n }, then P [B] = P [B S] = P [B ( m j=1a i )] = P [ m j=1(b A i )] m = P [B A i ] = j=1 m P [B A j ]P [A j ] j=1 Conditional probability so long as P [A] > 0. Bayes rule P [B A] P [A B] P [A] Let B be an arbitrary event in a sample space S. Suppose that the events A 1, A 2,, A m partition S and that P [A i ] > 0 for all i. Then P [A i B] = P [A i B] P [B] = P [B A i ]P [A i ] m j=1 P [B A j]p [A j ]

9 9 Independent events The events A and B are said to be statistically independent if P [A B] = P [A] P [B] The events A 1, A 2,, A n are said to be mutually independent if and only if the relations P [A i A j ] = P [A i ]P [A j ] P [A i A j A k ] = P [A i ]P [A j ]P [A k ] P [A 1 A 2 A n ] = P [A 1 ]P [A 2 ] P [A n ] hold for all combinations of the indices such that 1 i < j < k < n. Independent experiments Suppose that we are concerned with the outcomes of n different experiments E 1, E 2,, E n. Suppose further that the sample space S k of the kth of these n experiments is partitioned by the m k events A kik, i k = 1, 2,, m k. The n given experiments are then said to be statistically independent if the equation P [A 1i1 A 2i2 A nin ] = P [A 1i1 ]P [A 2i2 ] P [A nin ] holds for every possible set of n integers {i 1, i 2,, i n }, where the index i k ranges form 1 to m k.

10 10 4. RANDOM VARIABLES Set-indicator function Inverse image 1, if s A I A (s) = 0, if s A X 1 (A) = {s S X(s) A} A-measurable A map X : S R is A-measurable if {s X(s) < a} A, for all a R Random variable A random variable X is an A-measurable function from the sample space S to R. Induced probability P [X A] P [X 1 (A)] = P [{s S X(s) A}] P [X < a] P [{s S X(s) < a}] Range R X = range of X = {a R a = X(s) for s S}

11 11 Probability distribution function F X (x) P [X x] = P [{s S X(s) x}] Properties of probability distribution function F X (+ ) = 1 and F X () = 0 b > a F X (b) F X (a) (monotone non-decreasing) F X (a) = F X (a + 0) = lim ɛ +0 F X(a + ɛ) (right continuous) F X (a 0) + P [X = a] = F X (a) Decomposition of distribution functions F X (x) = D X (x) + C X (x) where D X is a step function and hence may be expressed as D X (x) = i P [X = x i ]U(x x i ) and where C X is continuous everywhere.

12 12 Interval probability P [X (a, b]] P [a < X b] = F X (b) F X (a) Probability density f X (x) df X(x) dx Properties of probability density function f X (x) 0 F X (x) = x f X (ξ)dξ f X (ξ)dξ = 1 Calculation of probability P [X A] = A f X (x)dx

13 13 uniform density function 1 b a f X (x) =, for a x b 0, otherwise exponential density function ae ax, for x 0 f X (x) = 0, otherwise (a > 0) normal density function f X (x) = 1 2π e x2 /2 Rayleigh density function x /2b b f X (x) = e x2, for x 0 0, otherwise (b > 0) Cauchy density function f X (x) = a π 1 a 2 + x 2 (a > 0)

14 14 5. RANDOM VECTORS Random vector X 1 (s) X 2 (s) X(s) =.. X n (s) where X i (s), i = 1,, n is a random variable defined on S. Thus, a random vector is a finite family of random variables. Joint-probability distribution function F X,Y (x, y) P [X x, Y y] P [{s S X(s) x and Y (s) y}] Properties of joint-probability distribution function F X,Y (, y) = 0, F X,Y (x, ) = 0 F X,Y (+, + ) = 1 F X,Y (x, + ) = F X (x), F X,Y (+, y) = F Y (y) (marginal distribution)

15 15 Joint-probability density f X,Y (x, y) 2 F X,Y (x, y) x y Properties of joint-probability density function f X,Y (x, y) 0 F X (x) = f X (x) = F Y (y) = f Y (y) = f X,Y (x, y) dxdy = 1 x y f X,Y (ξ, η)dξdη f X,Y (x, η)dη f X,Y (ξ, η) dξdη f X,Y (ξ, y) dξ

16 16 Two-dimensional normal (or gaussian) density f X,Y (x, y) = 1 2π 1 ρ 2 exp [ x2 2ρxy + y 2 ] 2(1 ρ 2 ) ( ρ 1) Probability calculation P [(X, Y ) A] = A f X,Y (x, y) dxdy Ex : f X (x) = 1 8 (x + y), 0 x, y 2 f X,Y (x, y) = 0, otherwise (x + y)dy, 0 x 2 1 f X,Y (x, y)dy = 0, otherwise = 4 (x + 1), 0 x 2 0, otherwise 2 x 1 P [ X Y > 1] = (x + y)dydx = 1 4

17 17 Conditional-probability distribution function F X (x Y B) P [X x Y B] = P [X x, Y B] P [Y B] whenever P [Y B] > 0. F X ( Y B) = 0 F X (+ Y B) = 1 Conditional-probability density f X (x Y B) df X(x Y B) dx f X (x Y B) 0 + F X (x Y B) = f X (x Y B)dx = 1 x P [X A Y B) = A f X (ξ Y B)dξ f X (ξ Y B)dξ

18 18 Point conditioning conditional-probability distribution function F X Y (x y) F X (x Y = y) = y+dy y x f X,Y (ξ, η)dξdη f Y (y)dy = dy x f X,Y (ξ, y)dξ f Y (y)dy = x f X,Y (ξ, y)dξ f Y (y) Point conditioning conditional-probability density f X Y (x y) df X Y (x y) dx P [X A Y = y] = P [X A] = Independent random variables + A = f X,Y (x, y) f Y (y) f X Y (ξ y)dξ P [X A Y = y]f Y (y)dy The random variables X and Y are statistically independent if F X,Y (x, y) = F X (x)f Y (y) or f X,Y (x, y) = f X (x)f Y (y) X and Y independent f X Y (x y) = f X (x)

19 19 6. Functions of random variables X - - random variable with F X (x) Y g(x) Y - - random variable with F Y (y) {s S g(x(s)) a} A, for all a R F Y (y) P [{x g(x) y}] f Y (y) = d dy F Y (y) =? where g is increasing F Y (y) = P [Y y] = P [X g 1 (y)] = F X ( g 1 (y) ) f Y (y) = d dy F Y (y) = d dy F ( X g 1 (y) ) = f X (h(y)) dh dy h(y) g 1 (y) g is decreasing F Y (y) = P [Y y] = P [X g 1 (y)] = 1 F X ( g 1 (y) ) f Y (y) = d dy F Y (y) = d dy F ( X g 1 (y) ) = f X (h(y)) dh dy

20 20 Ex: Y = sin X g is one-to-one f Y (y) = f X (h(y)) dh dy Ex: Y = sin X 1 π, π/2 < x < π/2 f X (x) = 0, o/w f Y (y) = f X (sin 1 y) d dy (sin 1 y) = f X(sin 1 1 y) 1 y 2 1 f X (sin 1 π y) =, 1 < y < 1 1 f Y (y) =, 1 < y < 1 π 1 y 2 0, o/w 0, o/w g is NOT one-to-one f Y (y) = j f X (g 1 j (y)) dg 1 j (y) dy 1 π, π/2 < x < π/2 f X (x) = 0, o/w f Y (y) = f X (sin 1 y) d dy (sin 1 y) +f X( sin 1 y) d dy ( sin 1 y) = ( f X (sin 1 y)+f X ( sin 1 y) ) 1 1 y 2

21 21 1 f X (sin 1 π y) =, 0 < y < 1 1 and f X ( sin 1 π y) =, 0 < y < 1 0, o/w 0, o/w 2 f Y (y) =, 0 < y < 1 π 1 y 2 0, o/w Z X + Y f Z (z) = d dz = F Z (z) = P [X + Y z] = + { + [ z x f X,Y (x, z x)dx + [ z x ] } + [ d f X,Y (x, y)dy dx = dz ] f X,Y (x, y)dy dx z x ] f X,Y (x, y)dy dx X and Y independent f Z (z) = + f X (x)f Y (z x)dx (convolution) Z XY

22 22 F Z (z) = P [XY z] = 0 [ ] f X,Y (x, y)dy dx + z/x + [ ] z/x f X,Y (x, y)dy dx 0 f Z (z) = 0 ( 1 x )f X,Y (x, z x )dx + + X and Y independent f Z (z) = 0 1 x f X,Y (x, z + x )dx = + 1 x f X,Y (x, z x )dx 1 x f X(x)f Y ( z x )dx X = X 1 and Y = Y 1 X 2 Y 2 Y g(x) (1 1) where X = h(y ) f Y (y) = f X (h(y)) dh 1 dh 1 dy 1 dy 2 dh 2 dh 2 dy 1 dy 2

23 23 7. Statistical averages Statistical average or Expectation E[X] k x k P [X = x k ] E[X] xf X (x)dx E[g(X)] k g(x k )P [X = x k ] Random vectors E[g(X)] = k 1 E[g(X)] X = X 1 X 2. X n g(x)f X (x)dx E[X 1 ] E[X 2 ] E[X]. E[X n ] g(x k1, x k2,, x kn )P [X 1 = x k1, X 2 = x k2,, X n = x kn ] k 2 k n

24 24 E[g(X)] = General properties of the Expectation g(x 1, x 2,, x n )f X1,X 2,,X n (x 1, x 2,, x n )dx 1 dx 2 dx n E[I A (X)] = P [X A] where I A is the set indicator of the event A R. E[aX] = ae[x], for any real a E[a 1 X 1 + a 2 X 2 ] = a 1 E[X 1 ] + a 2 E[X 2 ], for any real a 1 and a 2 E[AX] = AE[X], for any real matrix A E[X] E[ X ] X(s) 0 for every s S E[X] 0 X 1 (s) X 2 (s) for every s S E[X 1 ] E[X 2 ]

25 25 X 1 and X 2 independent E[X 1 X 2 ] = E[X 1 ]E[X 2 ] kth Moments kth moment of X E[X k ] Variance Covariance Σ X E[(X E[X]) 2 ] = E[X 2 ] E[X] 2 Σ X E[(X E[X])(X E[X]) T ] = E[XX T ] E[X]E[X] T Σ XY E [ (X E[X])(Y E[Y ]) T ] = E[XY T ] E[X]E[Y ] T Uncorrelated random variables X and Y uncorrelated Σ XY = 0 E[XY T ] = E[X]E[Y ] T Orthogonal random variables X and Y orthogonal E[XY T ] = 0 Properties of the variance and the covariance Σ T X = Σ X, (symmetric)

26 26 b T Σ X b 0, for all b R n (positive semidefinite) Σ AX+b = AΣ X A T, for any real A and b E[(X c) 2 ] Σ X, for any real c X 1, X 2, X 3 pairwise uncorrelated Σ X1 +X 2 +X 3 = Σ X1 + Σ X2 + Σ X3 Σ Y X = Σ XY T Σ AX+BY,Z = AΣ X,Z + BΣ Y,Z, for any real A and B Σ AX+BY = Σ AX+BY,AX+BY = AΣ X,AX+BY + BΣ Y,AX+BY = A(Σ AX+BY,X ) T + B(Σ AX+BY,Y ) T = A(AΣ X,X + BΣ Y,X ) T + B(AΣ X,Y + BΣ Y,Y ) T = AΣ X A T + AΣ X,Y B T + BΣ Y,X A T + BΣ Y B T X, Y uncorrelated Σ AX+BY = AΣ X A T + BΣ Y B T

27 27 Bernoulli random variables P [X = 1] = p and P [X = 0] = 1 p E[X] = p and E[X k ] = p, k = 1, 2, 3, Σ X = p(1 p) Binomial random variables where q = 1 p. P [Y = k] = ( ) n p k q n k, k E[Y ] = np k = 0, 1, 2,, n Σ Y = npq Poisson random variables P [X = k] = e M M k, k = 0, 1, 2, k! E[X] = M = Σ X

28 28 Uniform random variables Let X be uniformly distributed over the interval [a, b]. Then E[X] = a + b 2 E[X k ] = 1 k + 1 (bk + b k 1 a + + ba k 1 + a k ) Exponential random variables ae at, t 0 f T (t) = 0, t < 0 E[T ] = 1 a E[T k ] = k! a k Gaussian or normal random variables f X (x) = 1 2πσ exp [ ] (x m)2 2σ 2 f X1,X 2 (x 1, x 2 ) = E[X] = m Σ X = σ 2 [ 1 2πσ 1 σ exp ( x 1 m 1 σ 1 ) 2 2ρ( x 1 m 1 σ 1 )( x 2 m 2 σ 2 ) + ( x 2 m ρ 2 2(1 ρ 2 ) σ 2 ) 2 ]

29 29 X R n f X (x) = E[X i ] = m i Σ Xi = σ 2 i Σ X1,X 2 = ρσ 1 σ 2 [ 1 exp 1 ] (2π) n/2 Σ X 1/2 2 (x m X) T Σ 1 X (x m X) Rayleigh random variables r /2b b, for r 0 f R (r) = e r2 0, otherwise E[R] = bπ 2 E[R 2 ] = 2b Chebyshev inequality Let X be a r.v. with E[ X r ] <, for any r > 0. Then PF: P [ X ɛ] E[ X r ] ɛ r, for any ɛ > 0. 0, if X < ɛ Y ɛ r, if X ɛ E[Y ] = 0 P [Y = 0] + ɛ r P [Y = ɛ r ] = ɛ r P [ X ɛ]

30 30 Y X r E[Y ] E[ X r ] P [ X ɛ] = E[Y ] ɛ r E[ X r ] ɛ r Special case: X = Z E[Z] and r = 2 Cauchy-Schwarz inequality P [ Z E[Z] ɛ] Σ Z, for any ɛ > 0 ɛ2 Let the real random variables X and Y have finite second moments. Then PF: For any real λ, E[XY ] 2 E[X 2 ]E[Y 2 ] 0 E[(λX + Y ) 2 ] = λ 2 E[X 2 ] + 2λE[XY ] + E[Y 2 ] Conditional Expectation E[X Y = y] j x j P [X = x j Y = y] E[X Y = y] E[g(X) Y = y] xf X Y (x y)dx g(x)f X Y (x y)dx = h(y)

31 31 E[g(X) Y ] h(y ) Σ X Y E [ (X E[X Y ])(X E[X Y ]) T Y ] = E [ XX T Y ] E[X Y ]E[X Y ] T Properties of the Conditional Expectation E[I A (Y ) X = x] = P [Y A X = x] PF: = g(y)e[h(x) Y = y]f Y (y)dy = E [g(y )E[h(X) Y ]] = E[g(Y )h(x)] g(y)h(x)f X Y (x y)f Y (y)dxdy = [ g(y) ] h(x)f X Y (x y)dx f Y (y)dy g(y)h(x)f X,Y (x, y)dxdy Special case: g(y ) = 1, h(x) = X E [E[X Y ]] = E[X] E[AX Y ] = AE[X Y ] E[X + Z Y ] = E[X Y ] + E[Z Y ]

32 32 E[g(Y )X Y = y] = g(y)e[x Y = y] E[g(Y )X Y ] = g(y )E[X Y ] X and Y independent E[h(X) Y = y] = E[h(X)]

33 33 8. ESTIMATION, SAMPLING, AND PREDICTION X & Y jointly distributed X to be estimated Y observed Question: Given that value of Y = y, what is best estimate ˆx of the value of x that minimizes, over all x, E{ X x 2 Y = y} = E{(X x) T (X x) Y = y} Theorem: ˆx = E{X Y = y} and minimum value of mean-squared error is E{ X ˆx 2 Y = y} = E{(X ˆx) T (X ˆx) Y = y} = E{tr(X ˆx)(X ˆx) T Y = y} = tre{(x ˆx)(X ˆx) T Y = y} = trσ X Y =y Proof. E{(X x) T (X x) Y = y} = E{X T X x T X X T x + x T x Y = y} = E{X T X Y = y} x T E{X Y = y} E{X T Y = y} x + xt x = E{X T X Y = y} 2 x T E{X Y = y} + x T x + E{X Y = y} 2 E{X Y = y} 2 = x E{X Y = y} 2 + E{X T X Y = y} E{X Y = y} 2

34 34 Remark: (a) A is a scalar (b) tr(ab) = tr(ba) n n matrix A = {a ij }, trace(a) = tra n i=1 a ij tra = A (c) tr(a + B) = tra + trb Terminology: (a) For any value y of Y, Best Estimate is ˆx = E{X Y = y}. (b) Let y vary. ˆX = E{X Y } is Best Estimator. Thus, Best Estimator is a random variable. Theorem: The estimator of X in terms of Y that minimizes E{ X g(y ) 2 } over all functions g is ˆX = E{X Y }. Proof. See I.B. Rhodes, A Tutorial Introduction to Estimation and Filtering, IEEE Transaction on Automatic Control, Vol.16, No.6, Properties of Best Estimator: (a) Linear: E{AX + BZ + C Y } = AE{X Y } + BE{Z Y } + C (b) Unbiased: E{ ˆX} = E[E{X Y }] = E{X}

35 35 (c) Projection Theorem: Error X ˆX = X is orthogonal to the r.v. g(y ) for any scalar function g, i.e., PF: E{g(Y ) X T } = 0 E{g(Y ) X T } = E[E{g(Y ) X T Y }] = E[g(Y )E{X T E{X T Y } Y }] = E[g(Y )(E{X T Y } E{X T Y })] = 0 Definition: (a) X & Y are L 2 orthogonal if E{X T Y } = 0. (denoted X Y ). (Reminder: X & Y are orthogonal if E{XY T } = 0. E{X T Y } = tre{xy T }. (b) Let M be a subspace of X. (e.g., M = all n vector valued functions f(y ) ). M {X X X Y, Y M}. Projection Theorem: Let M be a subspace of X. Then there exists a unique pair of maps P : X M and Q : X M such that X = P X + QX, for all X X. Also:

36 36 (a) X M P X = X and QX = 0 X M P X = 0 and QX = X (b) For all X X, X P X = min X X X M i.e., projection of X on M gives minimum error over all points in M. (c) (d) P & Q are linear. X 2 = P X 2 + QX 2 Problem: Find the best linear estimator X = A Y + b that minimizes E{ X AY b 2 } over all n m matrix A and n 1 matrix b. Sol.) First, assume that X & Y have zero mean. Let M = all random vectors of the form AY + b. By Projection theorem, That is, for all A & b, X A Y b M E{(AY + b) T (X A Y b )} = tre{(x A Y b )(AY + b) T } = tr[σ XY A T A Σ Y A T b b T ] = tr[(σ XY A Σ Y )A T ] b T b = 0

37 37 Thus which implies that X = Σ XY Σ 1 Y Y. Assume non-zero mean. Then, A = Σ XY Σ 1 Y and b = 0 (X m X ) = Σ XY Σ 1 Y (Y m Y ) Thus X = m X + Σ XY Σ 1 Y (Y m Y ) Basic Properties of Best Linear Estimator: (a) Unbiased: E{X } = E{X} = m X (b) Let X = X X. Then the error covariance is Σ X = E{(X X )(X X ) T } = Σ X Σ XY Σ 1 Y Σ Y X Remark: (a) If uncorrelated, best linear estimator is X = m X and Σ X = Σ X ( Σ XY = 0) (b) If independent,

38 38 best linear estimator is X = m X. ( independent uncorrelated) best estimator is ˆX = m X. Example: 3, 0 y 1, 0 x y 2 f XY (x, y) = 0, otherwise Estimate X by (a) constant, (b) linear estimator, and (c) nonlinear estimator. Sol) 3(1 x), 0 x 1 f X (x) = 0, otherwise (a) (b) E[X] = xf X (x)dx = 1 0 3x(1 x)dx = ( 3 2 x2 6 ) 1 5 x 5 2 Error var. = E[(X m X ) 2 ] = = y 2, 0 y 1 f Y (y) = 0, otherwise 0 = = 3 10 m Y = 3 4, var(y ) = 3 80, cov(x, Y ) = 1 40

39 39 (c) X = (Y 3 4 ) = 2 3 Y 1 5 cov(x, Y )2 Error var. = var(x) = var(y ) 1 y, 0 x y 2 1 f X Y (x y) = 2 0, otherwise E[X Y = y] = y 2 0 x 1 y 2 dx = 1 2 y2 ˆX = E[X Y ] = 1 2 Y 2 Error var. = E[(X ˆX) 2 ] = Matrix Inversion Lemma: (P 1 + H T R 1 H) 1 = P P H T (HP H T + R) 1 HP (A + X T Y ) 1 = A 1 A 1 X T (I + Y A 1 X T ) 1 Y A 1 PF: exercise. Best Linear Min. Var. Estimator of X given Y : X = X Y = E [X Y ] = m X + Σ XY Σ 1 Y (Y m Y )

40 40 More Properties of Best Linear Estimator: (1) Only depend on 1st and 2nd moments. (2) X & Y are jointly Gaussian E [X Y ] = E[X Y ]. (3) E [X Y ] is linear in 1st argument. (4) Assume that Y & Z are uncorrelated. Also, let X, Y, & Z have zero mean. (a) E [X Y, Z] = E [X Y ] + E [X Z] Let X Y X E [X Y ] and X Y,Z X E [X Y, Z]. Then, Σ X Y = Σ X Σ XY Σ 1 Y Σ Y X Σ X Y,Z = Σ X Σ XY Σ 1 Y Σ Y X Σ XZ Σ 1 Z Σ ZX (b) E [X Y, Z] = E [X Y ] + E [ X Y Z] (5) Let X, Y, & Z have zero mean. Σ X Y,Z = Σ X Y Σ X Y,Z Σ 1 Z Σ Z, X Y E [X Y, Z] = E [X Y, Z Y ] = E [X Y ] + E [X Z Y ] (by 4(a)) = E [X Y ] + E [ X Y Z Y ] (by 4(b))

41 41 Σ X Y,Z = Σ X Y Σ X Y, Z Y Σ 1 Z Y Σ Z Y, X Y Z Y = innovation in Z w.r.t. Y (6) X, Y 1,, Y k+1 zero mean. Denote E [X Y 1,, Y k+1 ] X k+1 X k+1 = X k + E [ X k Ỹk+1 k] where X k = X X k and Ỹ k+1 k = Y k+1 E [Y k+1 Y 1,, Y k ] innovation in Y k+1 w.r.t. Y 1,, Y k (7) {Y 1, Y 2, Y 3,, Y k+1 } are linearly related to {Y 1, Ỹ2 1, Ỹ3 2,, Ỹk+1 k}. Gram-Schmidt orthogonalization

42 42 Sample Mean Estimator of E(X) = m n ˆm n 1 n i=1 X i E[ ˆm n ] = m Assume independence. {X i, i = 1, 2, 3,..., n} independent var( ˆm n ) = 1 n var(x) lim P [ ˆm n m ɛ] = 0 n 0 (the weak law of large numbers) Relative frequency - Suppose that we sample a random variable, say X, and that we determine for each sample whether or not some given event A occurs. The random variable characterizing the relative frequency of occurrence of the event A has the statistical properties. where E[ n A n ] = p and var(n A n p P [X A] ) = p(1 p) n

43 43 P [ n A n p ɛ ] 1 4nɛ 2 [ lim P n ] A n n p ɛ = 0 (Bernoulli theorem)

44 44 9. Random Processes Random Process An indexed family of random variables, {X t, t T }, where T denotes the set of possible values of the index t. If T is a countably infinite set, then the process is called a discrete-parameter random process; if T is a continuum, then the process is called a continuous-parameter random process. Bernoulli process A random process {X n, n = 1, 2, 3, } in which the random variables X n are Bernoulli random variables, for example, where P [X n = 1] = p and P [X n = 0] = 1 p and where the X n are statistically independent random variables. E[X n ] = p var(x n ) = pq = p(1 p) Binomial counting process A random process {Y n, n = 1, 2, 3, } Y n n i=1 X i

45 45 where {X i, i = 1, 2, 3, } is independent Bernoulli r.p. P [Y n = k] = ( ) n p k (1 p) n k, k for k = 0, 1, 2,, n E[Y n ] = np var(y n ) = npq cov(y m, Y n ) = pq min(m, n) var(y m Y n ) = m n pq Sine wave process where V, Ω, and Φ are r.v. s. X t V sin(ωt + Φ), t R Stationarity (strict sense) A random process {X T, t T } is stationary (in the strict sense) if and only if all of the finite-dimensional probability distribution functions are invariant under shifts of the time origin. Mean function m X (t) E[X t ]

46 46 Autocorrelation function Covariance function R X (t 1, t 2 ) E[X t1 X t2 ] K X (t 1, t 2 ) cov(x t1, X t2 ) Covariance function K X (t 1, t 2 ) cov(x t1, X t2 ) = R X (t 1, t 2 ) m X (t 1 )m X (t 2 ) Cross-correlation function R XY (t 1, t 2 ) E[X t1 Y t2 ] Cross-covariance function K XY (t 1, t 2 ) cov(x t1, Y t2 ) = R XY (t 1, t 2 ) m X (t 1 )m Y (t 2 ) Stationary random processes Let {X t, < t < + } be a strictly stationary real random process. It then follows that m X (t) = E[X t ] = E[X 0 ] = const

47 47 R X (t, t τ) = R X (0, τ) We generally write in this case E[X t ] = E[X] = m X R X (t, t τ) = R X (τ), t R R X ( τ) = R X (τ) R X (τ) R X (0) Wide sense stationarity (wss) Let {X t, < t < + } be a real random process such that E[X t ] = E[X 0 ], t R R X (t, t τ) = R X (0, 0 τ), t R, τ R Then the given random process is said to be stationary in the wide sense. Jointly wide sense stationary random processes We say that the random processes {X t, < t < + } and {Y t, < t < + } are jointly wss, if {X t, < t < + } and {Y t, < t < + } are wss and R XY (t, t τ) = R XY (0, τ), t R, τ R

48 48 Sample mean Consider the wide-sense stationary random process {X t, < t < + } whose second moment is finite. Suppose that we sample that process at the n time instants t 1, t 2,, t n. The estimator ˆm n 1 n X i n where i=1 is called the sample mean. X i X ti E[ ˆm n ] = m X var( ˆm n ) = 1 n 2 n i=1 j=1 n K X (t i, t j ) Special cases are: X i pairwise uncorrelated var( ˆm n ) = K X(0, 0) = σ2 n n lim P [ ˆm n m > ɛ] = 0 (the weak law of large numbers) n X i highly correlated var( ˆm n ) = σ 2

49 49 Periodic sampling Let the wss random process {X t, < t < + } be sampled periodically throughout the interval 0 t T in such a way that there are n sampling instants equally spaced throughout that interval (the last at t = T ). The variance of the sample mean is given in this case by the formula where t T/n. It therefore follows that var( ˆm n ) = σ2 n + 2 n 1 (1 k n n )K X(k t) k=1 lim n var( ˆm n) = 2 T T if we pass to the limit n while keeping T fixed. 0 (1 τ T )K X(τ)dτ

50 LINEAR TRANSFORMATIONS n-dimensional case Suppose that the m-dimensional real random vector Y = (Y 1, Y 2,, Y m ) is generated from the n-dimensional real random vector X = (X 1, X 2,, X n ) by the transformation g, that is, Y = g(x) We say that g is a linear transformation if and only if it satisfies the relation g(aw + bz) = ag(w ) + bg(z), a, b R Y 1 = g 11 X 1 + g 12 X g 1n X n Y 2 = g 21 X 1 + g 22 X g 2n X n Y m = g m1 X 1 + g m2 X g mn X n

51 51 Y i = n g ij X j, j=1 i = 1, 2,, m E[Y i ] = n g ij E[X j ] j=1 cov(y i, Y k ) = n n g ij g kr cov(x j, X r ) j=1 r=1 Matrix formulation X = X 1 Y = X 2 g 11 g 12 G = g 21 g 22 Y 1 Y 2 Y 3 g 31 g 32 Y = GX E[Y ] = GE[X] Σ Y = GΣ X G T

52 52 Time averages {X t, < t < + } - - r.p. Y t 1 T E[Y t ] = 1 T t t T t t T X τ dτ E[X τ ]dτ Output autocorrelation function R Y (t 1, t 2 ) = E[Y t1 Y t2 ] = E = 1 T 2 t1 t 1 T t2 t 2 T [ 1 T {X t, < t < + } wss t1 t 1 T R Y (t 1, t 2 ) = 1 T 2 X α1 dα 1 1 T t2 t 2 T R X (α 1, α 2 )dα 1 dα 2 = 1 T 2 X α2 dα 2 ] = E T E[Y t ] = m X E[X t ] T 0 T 0 0 T 0 [ 1 T 2 R X (t 1 t 2 + τ 1 τ 2 )dτ 1 dτ 2 t1 t 1 T t2 t 2 T X α1 X α2 dα 1 dα 2 ] R X (τ 1 + t 1 T, τ 2 + t 2 T )dτ 1 dτ 2

53 53 R Y (t, t) = 1 T 2 = 2 T 2 T 0 T 0 T 0 R X (τ 1 τ 2 )dτ 1 dτ 2 = 2 (T α 1 )R X (α 1 )dα 1 = 2 T T 2 T 0 T 0 T α 1 R X (α 1 )dα 2 dα 1 (1 τ T )R X(τ)dτ var(y t ) = R Y (t, t) m 2 X = 2 T T 0 (1 τ T )[R X(τ) m 2 X]dτ = 2 T T 0 (1 τ T )K X(τ)dτ K X (τ) dτ < C var(y t ) < C T (??) Weighting functions time-invariant Linear system h(t) - - system weighting function y(t) = + h(τ)x(t τ)dτ Output moments X t random input Y t = + h(τ)x t τ dτ

54 54 K Y (t 1, t 2 ) = E[Y t ] = h(τ)e[x t τ ]dτ h(τ 1 )h(τ 2 )K X (t 1 τ 1, t 2 τ 2 )dτ 1 dτ 2 X t wss K Y (τ) = + E[Y t ] = m X h(τ)dτ R Y X (τ) = h(τ 1 )h(τ 2 )K X (τ τ 1 + τ 2 )dτ 1 dτ 2 h(t )R X (τ t )dt System correlation function R h (τ) R Y (τ) = var(y t ) = h(t)h(t τ)dt R h (t )R X (τ t )dt R h (t )K X (t )dt

55 SPECTRAL ANALYSIS Fourier transforms X(f) x(t) = System functions + + x(t)e i2πft dt X(f)e i2πft df h(t) - - the weighting function of a stable, linear, time-invariant linear system. the system function H(f) + h(τ)e i2πfτ dτ h(τ) = y(t) = + + H(f)e i2πfτ df h(τ)x(t τ)dτ Y (f) = X(f)H(f)

56 56 Spectral density S X (f) R X (τ) = + S X (0) = + R X (τ)e i2πfτ dτ + E[X 2 t ] = R X (0) = S X (f)e i2πfτ df R X (τ)dτ + S X (f)df S X (f) 0, for all f X t real S X ( f) = S X (f) Spectral analysis of linear system X t wss input, Y t wss output S Y (f) = H(f) 2 S X (f)

57 57 E[Y 2 t ] = + H(f) 2 S X (f)df If H has the value unity over a narrowband of width f centered about a frequency f 1, then E[Y 2 t ] = 2S X (f 1 ) f Cross-spectral density S XY (f) R XY (τ) = + + R XY (τ)e i2πfτ dτ S XY (f)e i2πfτ df

58 SUMS OF INDEPENDENT RANDOM VARIABLES Independent-increment process The real random process {Y t, t 0} is said to be an independent-increment process if for every set of time instants 0 < t 1 < t 2 < < t n the increments are mutually independent random variables and (Y t0 Y 0 ), (Y t2 Y t1 ),, (Y tn Y tn 1 ) Y 0 0 where Y tn = n i=1 X i X i Y ti Y ti 1, i = 1, 2,, n Independent-increment process with stationary increments F Yt2 +τ Y t1 +τ (x) = F Y t2 Y t1 (x), τ R E[Y t2 +t 1 Y t1 ] = E[Y t2 Y 0 ] = E[Y t2 ] E[Y t2 +t 1 ] = E[Y t1 ] + E[Y t2 ]

59 59 E[Y t ] = mt where m E[Y t ] t=1 R Y (t 2 + t 1, t 1 ) = E[Y t2 +t 1 Y t1 ] = E [( Y t2 +t 1 Y t1 + Y t1 ) Yt1 ] = E[Yt2 +t 1 Y t1 ]E[Y t1 ] + E[Y 2 t 1 ] = m 2 t 2 t 1 + E[Y 2 t 1 ] E[(Y t2 Y 0 ) 2 ] = E[(Y t2 +t 1 Y t1 ) 2 ] = R Y (t 2 + t 1, t 2 + t 1 ) 2R Y (t 2 + t 1, t 1 ) + R Y (t 1, t 1 ) = R Y (t 2 + t 1, t 2 + t 1 ) 2m 2 t 2 t 1 R Y (t 1, t 1 ) K Y (t 2 + t 1, t 2 + t 1 ) = R Y (t 2 + t 1, t 2 + t 1 ) m 2 (t 2 + t 1 ) 2 = R Y (t 2, t 2 ) + 2m 2 t 2 t 1 + R Y (t 1, t 1 ) m 2 t 2 2 m 2 t 2 1 2m 2 t 2 t 1 = K Y (t 2, t 2 ) + K Y (t 1, t 1 ) where var(y t ) = K Y (t, t) = σ 2 t σ 2 var(y t ) t=1

60 60 t 2 t 1 R Y (t 2, t 1 ) = E[Y t2 Y t1 ] = E [( Y t2 Y t1 + Y t1 ) Yt1 ] = E[Yt2 Y t1 ]E[Y t1 ] + E[Y 2 t 1 ] = m 2 t 2 t 1 + σ 2 t 1 t 2 t 1 K Y (t 2, t 1 ) = R Y (t 2, t 1 ) m 2 t 2 t 1 = σ 2 t 1 K Y (t 2, t 1 ) = σ 2 min(t 2, t 1 ) t 2 t 1 var(y t2 Y t1 ) = K Y (t 2, t 2 )+K Y (t 1, t 1 ) 2K Y (t 2, t 1 ) = σ 2 t 2 +σ 2 t 1 2σ 2 t 1 = σ 2 (t 2 t 1 ) var(y t2 Y t1 ) = σ 2 t 2 t 1 Characteristic function φ X (v) E[e ivx ] = f X (x)e ivx dx (Fourier transform) f X (x) = 1 2π φ X (v)e ivx dv φ X (v) = k P [X = x k ]e ivx k

61 61 φ X (v) φ X (0) = 1 Sums of independent random variables Y n where X 1, X 2,, X n are mutually independent. n i=1 X i φ Yn (v) = E[e i n i=1 v ix i ] = n E[e iv ix i ] = i=1 n φ Xi (v) i=1 f Yn (y) = f X1 (y) f X2 (y) f Xn (y) Linear functions Y ax + b, a, b R φ Y (v) = E[e iv(ax+b) ] = e ivb φ X (av)

62 62 Gaussian random variables Let X be a gaussian random variable. φ X (v) = exp ( ive[x] v2 σx 2 ) 2 Let Y be a sum of n mutually independent gaussian random variable X k. φ Y (v) = exp (ivm v2 σ 2 ) 2 where m n k=1 m k and σ 2 and where m k and σ 2 k are the mean and variance, respectively, of X k. n k=1 σ 2 k Cauchy random variables f X (x) = 1 π(1 + x 2 ) φ X (v) = e v

63 63 Chi-squared random variables where n is a nonnegative integer. x (n 2)/2 e x/2, for x 0 2 f X (x) = n/2 Γ( n 2 ) 0, for x < 0 φ X (v) = (1 i2v) n/2 Poisson random variables P [X = k] = λk e λ, for k = 0, 1, 2, k! φ X (v) = exp[λ(e iv 1)] Moment-generating property E[X k ] = ( i) k φ (k) X (0) φ X (v) = k=0 E[X k ] (iv)k k!

64 64 Joint - characteristic functions [ ( )] 2 φ X1,X 2 (x 1, x 2 ) = E exp i v k X k k=1 φ X (v) = E[exp(iv T X)] v v 1 v 2 X X 1 X 2 φ X (v) φ X (0) = 1 X n 1 f X (x) = f X1,X 2,,X n (x 1,, x n ) φ X1,,X n (v 1,, v n ) = f X1,,X n (x 1,, x n ) = 1 (2π) n exp(i n v k x k )f X1,,X n (x 1,, x n )dx 1 dx n k=1 exp( i n v k x k )φ X1,,X n (v 1,, v n )dv 1 dv n k=1

65 65 Independent random variables n X k s are mutually independent φ X1,X 2,,X n (v 1, v 2,, v n ) = φ Xk (v k ) k=1 Moment-generating properties E[X m 1 X k 2 ] = ( i) m+k m+k φ X1,X 2 (v 1, v 2 ) v m 1 vk 2 v1 =v 2 =0 φ X1,X 2 (v 1, v 2 ) = m=0 k=0 E[X1 m X2 k ] (iv 1) m m! (iv 2 ) k k! Independent-increment processes Let {Y t, t 0} be a real random process with stationary and independent increments and let Y 0 = 0. If, given the time instants 0 = t 0 < t 1 < t 2 < < t n X k Y tk Y tk 1, for k = 1, 2,, n

66 66 Y tn = φ (v) = Yt n n k=1 n k=1 X k φ Xk (v) n φ Yt1,Y t2,...,y (v t n 1, v 2,, v n ) = φ Xk n k=1 j=k v j Probability generating function Let X be a discrete random variable with nonnegative integer possible values. ψ X (z) E[z X ] = k P [X = k]z k E[X] = ψ X(1) E[X(X 1) (X n + 1)] = ψ (n) X (1) Joint-probability generating functions Each of the X k is a nonnegative, integer-valued random variable.

67 67 ψ X (z) E[z X 1 1 zx 2 2 z X n n ] = E[ mutually independent ψ X (z) = n k=1 z X k k ] n ψ Xk (z k ) k=1

68 The Poisson process Poisson process Let {N t, 0 t < + } be a counting random process such that: a. N t assumes only nonnegative integer values and N 0 0 b. The process has stationary and independent increments, c. P [N t+ t N t = 1] = λ t + o( t) (λ > 0) d. P [N t+ t N t > 1] = o( t) where It then follows that o( t) lim t 0 t = 0 P [N t+ t N t = 0] = 1 λ t + o( t) and that

69 69 P [N t = k] = e λt (λt) k k!, k = 0, 1, 2, ; That is, N t is a Poisson random variable. The counting process {N t, 0 t < } is then called a Poisson counting process. Arrival times E[N t ] = λt var(n t ) = λt Let T k be the random variable which describes the arrival time of the kth event counted by the counting random process {N t, 0 t < }. Then F Tk (t) = 1 F Nt (k 1) If {N t, 0 t < } is a Poisson counting process, then 1 e λt k 1 (λt) j j=0 j!, t 0 F Tk (t) = 0, t < 0 λe λt (λt) k 1 (k 1)!, t 0 f Tk (t) = 0, t < 0

70 70 that is, T k has an Erlang probability density. In this case: E[T k ] = k λ var(t k ) = k λ 2 Interarrival times φ Tk (v) = 1 ( 1 iv λ Let {N t, 0 t < + } be a counting process with the arrival times {T k, k = 1, 2, 3, }. The durations ) k Z 1 T 1 Z k T k T k 1, k = 2, 3, 4, are called the interarrival times of the counting process. We then have F Zk (τ) = 1 P [N tk 1 +τ N tk 1 = 0] If the counting process has stationary increments, then, for all k,

71 71 F Zk (τ) = 1 P [N τ = 0] Further, if the given counting process is Poisson, then 1 e λτ, τ 0 F Zk (τ) = 0, τ < 0 λe λτ, τ 0 f Zk (τ) = 0, τ < 0 In this case E[Z k ] = 1 λ, k = 1, 2, 3, In any case Renewal counting processes E[T k ] = ke[z k ] Let {N t, 0 t < } be a counting process. If the interarrival times of this counting process are mutually independent random variables, all with the same probability distribution function, then the given process is called a renewal counting process. The renewal function m(t) of a renewal counting process is the expected value of that process; that is,

72 72 m(t) E[N t ] and its derivative λ(t) dm(t) dt is called the renewal intensity of the process. It then follows that and, if the various derivative exist, m(t) = F Tk (t) k=1 λ(t) = f Tk (t) On defining Λ(v) to be the Fourier transform of the renewal intensity, that is, k=0 it then follows that Λ(v) + λ(t)e ivt dt Λ(v) = φ Z(v) 1 φ Z (v)

73 73 where φ z is the common characteristic function of the interarrival times. Unordered arrival times Let {N t, 0 t < } be a Poisson counting process and suppose that N t = k : that is, suppose that k events occur by time t. The unordered arrival times U 1, U 2,, U k of those k events are then mutually independent random variables, each of which is uniformly distributed over the interval (0, t] : for all i = 1, 2,, k. Filtered Poisson processes 1 t f Ui (u i N t = k) =, 0 < u i t 0, otherwise Let {N t, 0 t < } be a Poisson counting process. The random process {X t, 0 t < } in which N t X t h(t U j ) j=1 where an event which occurs at time u j generates an outcome h(t u j ) at time t and where the random variables U j are the unordered arrival times of the events which occur during the interval (0, t], is called a filtered Poisson process. The mean of a filtered Poisson process is

74 74 the variance is t E[X t ] = λ h(u)du 0 t var(x t ) = λ h(u) 2 du 0 and the characteristic function of the random variable X t is Random partitioning [ t ( ) ] φ Xt (v) = exp λ e ivh(u) 1 du 0 Let {N t, 0 t < } be a Poisson counting process and let {X t, 0 t < } be the corresponding filtered Poisson process in which N t X t h(t U j ) j=1 We say that the random process {Z t, 0 t < } is a randomly partitioned filtered Poisson random process if N t Z t Y j h(t U j ) j=1

75 75 where the partitioning random variables Y j are mutually independent random variables which are independent of the unordered arrival times U j, and where each of the Y j has the same Bernoulli probability distribution where 0 < p < 1. In this case, P [Y j = 1] = p and P [Y j = 0] = q 1 p t E[Z t ] = pλ h(u)du = pe[x t ] 0 The characteristic function of the randomly partitioned random variable Z t is [ t ( ) ] φ Zt (v) = exp pλ e ivh(u) 1 du 0

76 Gaussian random process Gaussian random vectors Y = (Y 1, Y 2,, Y m ) φ Y (v) = exp(im T Y v 1 2 vt Σ Y v) Gaussian random processes f Y (y) = exp[ 1 2 (y m Y ) T Σ Y (y m Y )] (2π) m/2 Σ Y 1/2 The real random process {Y t, t T } is said to be a gaussian random process if for every finite set of time instants t j T, the corresponding random variables Y tj are jointly gaussian random variables. Narrowband random processes The random process {X t, < t < } is said to be a narrowband random process if it has a zero mean, is stationary in the wide sense, and if its spectral density S X differ from zero only in some narrowband of width f centered about some frequency f 0 where f 0 >> f A narrowband random process may be represented in terms of an envelope random process {V t, < t < } and a phase random precess {Φ t, < t < } by using the relation

77 77 X t = V t cos(ω 0 t + Φ t ) where w 0 = 2πf 0. Alternatively, a narrowband random process may also be represented in terms of cosine and sine component random processes {X ct, < t < } and {X st, < t < }, respectively, by using the relation X t = X ct cos ω 0 t X st sin ω 0 t The relations between these two representations are given by the formulas X ct = V t cos Φ t and X st = V t sin Φ t which have the inverses V t = X 2 ct + X 2 st and Φ t = tan 1 X st X ct The random variables X ct, X st, X c(t+τ), and X s(t+τ) have the covariance matrix R X (0) 0 R c (τ) R cs (τ) 0 R X (0) R cs (τ) R c (τ) R(τ) = R c (τ) R cs (τ) R X (0) 0 R cs (τ) R c (τ) 0 R X (0)

78 78 where and + R c (τ) = 2 S X (f) cos[2π(f f 0 )τ]df 0 Narrowband gaussian processes + R cs (τ) = 2 S X (f) sin[2π(f f 0 )τ]df 0 The cosine- and sine-component random variables X ct and X st of a gaussian narrowband random process are independent random variables with zero means, each with a variance equal to R X (0), and a joint-probability density f Xct X st (x, y) = ] exp [ x2 +y 2 2R X (0) 2πR X (0) The envelope and phase random variables V t and Φ t of a gaussian narrowband random process are also independent random variables. The envelope has the Rayleigh probability density [ ] v R f Vt (v) = X (0) exp v2 2R X (0), v 0 0, otherwise and the phase is uniformly distributed over [0, 2π] :

79 79 1 2π f Φt (φ) =, 0 φ 2π 0, otherwise

conditional cdf, conditional pdf, total probability theorem?

6 Multiple Random Variables 6.0 INTRODUCTION scalar vs. random variable cdf, pdf transformation of a random variable conditional cdf, conditional pdf, total probability theorem expectation of a random