Probability Space. J. McNames Portland State University ECE 538/638 Stochastic Signals Ver

Stochastic Signals Overview Definitions Second order statistics Stationarity and ergodicity Random signal variability Power spectral density Linear systems with stationary inputs Random signal memory Correlation matrices Introduction Discrete-time stochastic processes provides a mathematical framework for working with non-deterministic signals Signals that have an exact functional relationship are often called predictable or deterministic, though some stochastic processes are predictable I m going to use the term deterministic to refer to signals that are not affected by the outcome of a random experiment I will use the terms stochastic process and random process interchangeably J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 1 J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 2 Probability Space Ω ζ x(n, ζ) Conceptually we should imagine a sample space with some number (possibly infinite) of outcomes: Ω={ζ 1,ζ 2,} Each has a probability Pr {ζ k } By some rule, each outcome generates a sequence x(n, ζ k ) We can think of x(n, ζ k ) as a vector of (possibly) infinite duration Note that the entire sequence is generated from a single outcome of the underlying experiment x(n, ζ) is called a discrete-time stochastic process or a random sequence Definitions and Interpretations Interpretations Random variable: x(n, ζ) with n = n o fixed and ζ treated as a variable Sample Sequence: x(n, ζ) with ζ = ζ k fixed and n treated as an independent (non-random) variable Number: x(n, ζ) with both ζ = ζ k and n = n o fixed Stochastic Process: x(n, ζ) with both ζ and n treated as variables Realization: a sample sequence Ensemble: The set of all possible sequences, {x(n, ζ)} J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 3 J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 4

Probability Functions In order to fully characterize a stochastic process, we must consider the cdf or pdf F x (x 1,,x k ; n 1,,n k ) = Pr{x(n 1 ) x 1,,x(n k ) x k } f x (x 1,,x k ; n 1,,n k ) = k F x (x 1,,x k ; n 1,,n k ) x 1 x k for every k 1 and any set of sample times {n 1,n 2,,n k } Without additional sweeping assumptions, estimation of f x ( ) from a realization is impossible Many stochastic processes can be characterized accurately or, at least, usefully by much less information To simplify notation, from here on will mostly use x(n) to denote both random processes and single realizations In most cases will assume x(n) is complex valued Second Order Statistics At any time n, we can specify the mean and variance of x(n) μ x (n) E[x(n)] σ 2 x(n) E[ x(n) μ x (n) 2 ] μ x (n) and σx(n) 2 are both deterministic sequences The expectation is taken over the ensemble In general, the second-order statistics at two different times are given by the autocorrelation or autocovariance sequences Autocorrelation Sequence Autocovariance Sequence r xx (n 1,n 2 )=E[x(n 1 )x (n 2 )] γ xx (n 1,n 2 ) = E[(x(n 1 ) μ x (n 1 )) (x(n 2 ) μ x (n 2 )) ] = r xx (n 1,n 2 ) μ x (n 1 )μ x(n 2 ) J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 5 J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 6 Cross-Correlation Cross-Correlation and Cross-Covariance Independent: iff More Definitions Cross-Covariance r xy (n 1,n 2 )=E[x(n 1 )y (n 2 )] γ xy (n 1,n 2 ) = E [ (x(n 1 ) μ x (n 1 )) (y(n 2 ) μ y (n 2 )) ] = r xy (n 1,n 2 ) μ x (n 1 )μ y(n 2 ) Normalized Cross-Correlation ρ xy (n 1,n 2 )= γ xy(n 1,n 2 ) σ x (n 1 )σ y (n 2 ) Uncorrelated: if Orthogonal: if f x (x 1,,x k ; n 1,,n k )= γ x (n 1,n 2 )= r x (n 1,n 2 )= k f l (x l, ; n l ) l=1 { σ 2 x(n 1 ) n 1 = n 2 0 n 1 n 2 { σ 2 x(n 1 )+ μ x (n 1 ) 2 n 1 = n 2 0 n 1 n 2 k J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 7 J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 8

Still More Definitions Wide-sense Periodic: if μ x (n) = μ x (n + N) n r x (n 1,n 2 ) = r x (n 1 + N,n 2 )=r x (n 1,n 2 + N) = r x (n 1 + N,n 2 + N) Statistically Independent: iff for every n 1 and n 2 f xy (x, y; n 1,n 2 ) = f x (x; n 1 )f y (y; n 2 ) Uncorrelated: if for every n 1 and n 2, Stationarity Stationarity of Order N: A stochastic process x(n) such that f x (x 1,,x N ; n 1,,n N )=f x (x 1,,x N ; n 1 + k,,n N + k) for any value for any k Any stochastic process of Order N, is also a stochastic process of order M for all M N Strict-Sense Stationary (SSS): A stochastic process that is stationary of all orders N γ xy (n 1,n 2 )=0 Orthogonal: if for every n 1 and n 2 r xy (n 1,n 2 )=0 J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 9 J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 10 Wide Sense Stationary f x (x 1,x 2 ; n 1,n 2 )=f x (x 1,x 2 ; n 1 + k, n 2 + k) Wide-Sense Stationary (WSS): A stochastic process with a constant mean and autocorrelation that only depends on the delay between the two sample times WSS Properties Example 1: Stationarity Describe a random process that is stationary Describe a second random process that is not stationary E[x(n)] = μ x r x (n 1,n 2 )=r x (l) =r x (n 1 n 2 )=E[x(n + l)x (n)] γ x (l) =r x (l) μ x 2 This implies the variance is also constant, var[x(n)] = σx 2 All processes that are stationary of order 2 are WSS Not all WSS processes are stationary of order 2 Note this is slightly different from the text J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 11 J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 12

Stationarity Notes SSS implies WSS If the marginal pdf of a signal is Guassian for all n, thenwss implies SSS The book states that most WSS processes are SSS True? Jointly Wide-Sense Stationary: two random signals x(n) and are jointly WSS if they are both WSS and r xy (l) =r xy (n1 n 2 )=r xy (l) =E[x(n)y (n l)] γ xy (l) =γ xy (n 1 n 2 )=γ xy (l) =r xy (l) μ x μ y WSS is a very useful property because it enables us to consider a spectral description In practice, we only need the signal to be WSS long enough to estimate the autocorrelation or cross-correlation k=1 m=1 Autocorrelation Sequence Properties r x (0) = σ 2 x + μ x 2 r x (0) r x (l) r x (l) = rx( l) α k r x (k m)αm 0 α sequences Average DC Power: μ x 2 Average AC Power: σ 2 x Nonnegative Definite: A sequence is said to be nonnegative definite if it satisfies this last property Positive Definite: Any sequence that satisfies the last inequality strictly for any α J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 13 J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 14 Comments on Stationarity Many real processes are nonstationary Best case: can determine from domain knowledge of the process Else: must rely on statistical methods Many nonstationary processes are approximately locally stationary (stationary over short periods of time) Much of time-frequency analysis is dedicated to this type of signal There is no general mathematical framework for analyzing nonstationary signals However, many nonstationary stochastic processes can be understood through linear estimation (ie, Kalman filters) Note that nonstationary is a negative definition: not stationary Introduction to Ergodicity In most practical situations we can only observe one or a few realizations If the process is ergodic, we can know all statistical information from a single realization Ensemble Averages: Repeat the experiment many times Time Averages: 1 ( ) lim N 2N +1 N n= N ( ) J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 15 J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 16

Time Averages of Interest Mean value = x(n) Mean square = x(n) 2 Variance = x(n) x(n) 2 Autocorrelation = x(n)x (n l) Autocovariance = [x(n) x(n) ][x(n l) x(n) ] Cross-correlation = x(n)y (n l) Cross-covariance = [x(n) x(n) ][y(n l) ] Similar to correlation sequences for deterministic power signals Both quantitites have the same properties Difference Time averages are random variables (functions of the experiment outcome) In the deterministic case the quantities are fixed numbers Ergodic Random Processes Ergodic Random Process: a random signal for which the ensemble averages equal the corresponding time averages Like stationarity, there are various degrees Ergodic in the Mean: a random process such that x(n) =E[x(n)] = μ x Ergodic in Correlation: a random process such that x(n)x (n l) =E[x(n)x (n l)] = r x (l) If a process is ergodic in both mean and correlation, it is also WSS Only stationary signals can be ergodic WSS does not imply any type of ergodicity Text: Almost all stationary processes are also ergodic True? Our usage: ergodic = ergodic in both the mean and correlation J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 17 J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 18 More on Ergodicity Joint Ergodicity: two random signals are jointly ergodic iff they are individually ergodic and x(n)y (n l) =E[x(n)y (n l)] Stationarity ensures time invariance of the statistics Ergodicity implies the statistics can be obtained from a single realization with time averaging In words: one realization (a single ζ k ) is sufficient to estimate any statistic of the underlying random process Problems with Ergodicity Problem we never know x(n) for n = to + In all real sitations, we only have finite records The most common estimator is then 1 N ( ) N ( ) 2N +1 n= N Note that it is a random variable How good is it? Bias Variance Consistent Confidence intervals Distribution This is one of the key topics of this class J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 19 J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 20

Ergodic Processes vs Deterministic Signals 1 N r x (l) = lim x(n)x (n l) N 2N +1 n= N The autocorrelation of a deterministic power signal and a ergodic process can be calculated with the same infinite summation What s the difference then? With deterministic signals there is only one signal With stochastic signals, we assume it was generated from an underlying random experiment ζ k This enables us to consider the ensemble of possible signals: r x (l) =E[x(n)x (n l)] We can therefore draw inferences and make predictions about the population of possible outcomes, not merely this one signal Whether you define a given signal as deterministic or as a single realization of a random process depends largely on the application Random Processes in the Frequency Domain Power Spectral Density (PSD) R x (e jω ) F{r x (l)} = l= r x (l) = F 1 { R x (e jω ) } = 1 2π r x (l)e jωl π π R x (e jω )e jω dω Stationary random processes have deterministic correlation sequences They have a single index (independent variable) Note again that the power spectral density can be calculated with the same equation for deterministic and ergodic signals J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 21 J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 22 Periodic and Non-Periodic Processes R x (e jω ) F{r x (l)} = r x (l)e jωl l= If r x (l) is periodic, the DTFS is most appropriate Line Spectrum If we allow impulses in the PSD, then the PSD of a periodic r x (l) consists of an impulse train If the process x(n) is non-zero mean (ie, nonzero average DC power), the PSD will contain an impulse at ω =0 More generally, a random process can be composed of both deterministic components and non-periodic components Power Spectral Density Properties R x (e jω ) is real-valued R x (e jω ) is periodic with period 2π R x (e jω ) 0 is nonnegative definite R x (e jω ) has nonnegative area and 1 2π π π If x(n) is real-valued r x (l) is real and even R x (e jω ) is an even function of ω What if x(n) is complex-valued? R x (e jω )dω = r x (0) = E[ x(n) 2 ] J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 23 J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 24

White Noise White Noise Process: A random WSS sequence w(n) such that E[w(n)] = μ w r w (l) = ( σ 2 w + μ 2 w) δ(l) Specifically, this is a second-order white process Notation: w(n) WN(μ w,σw) 2 Not a complete characterization of w(n): the marginal pdf could be anything If w(n) is Gaussian, then a white Guassian process is denoted by w(n) WGN(μ w,σw) 2 The term white comes from properties of white light Harmonic Process Harmonic Process any process defined by x(n) = a k cos(ω k n + φ k ) k=1 ω k 0 where M, {a k } M 1, and {ω k } M 1 are constant The random variables {φ k } M 1 are pairwise independent and uniformly distributed in the interval [ π, π] x(n) is stationary and ergodic with zero mean and autocorrelation r x (l) = 1 2 a 2 k cos(ω k l) k=1 Note the cosines in the autocorrelation are in-phase J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 25 J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 26 Harmonic Process PSD Harmonic Process Comments x(n) = a k cos(ω k n + φ k ) k=1 ω k 0 x(n) = a k cos(ω k n + φ k ) k=1 ω k 0 The PSD consists of pairs of impulses (line spectrum) of area πa2 k 2 located a frequencies ±ω k R x (e jω )= π 2 a 2 k [δ(ω ω k )+δ(ω + ω k )] k=1 π ω π If all ω k /(2π) are rational numbers, x(n) is periodic and the impulses are equally spaced apart This never happens, unless there is a single periodic (perhaps non-sinusoidal) component Otherwise they are almost periodic (always happens) It is only stationary if all of the random phases are equally likely (uniformly distributed over all possible angles) This is an unusual circumstance where the signal is stationary but is parameterized by one or more random variables that are constant over all n In general, is non-guassian Is a predictable random sequence! (also highly unusual) J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 27 J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 28

Cross-Power Spectral Density Cross-power Spectral Density: if x(n) and are jointly stationary stochastic processes, R xy (e jω ) F{r xy (l)} = l= r xy (l) = 1 π R xy (e jω )e jωl dω 2π π R xy (e jω ) = Ryx(e jω ) r xy (l)e jωl Also known as the cross-spectrum Note that unlike the PSD, it is not real-valued, in general Normalized Cross-Spectrum G xy (e jω ) Coherence R xy (e jω ) Rx (e jω )R y (e jω ) Also known as the coherency spectrum or simply coherency Similar to the correlation coefficient in frequency Coherence Function Gxy(e 2 jω ) R xy(e jω ) 2 R x (e jω )R y (e jω ) Alsoknownasthecoherence and magnitude square coherence If =h(n) x(n), thengxy(e 2 jω )=1 ω If r xy (l) =0,thenGxy(e 2 jω )=0 ω 0 Gxy 2 1 J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 29 J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 30 Linear Transforms and Coherence x(n) H(z) Linear transforms have no effect on coherence Similar to the case of random variables: y = mx + b x and y are perfectly correlated: ρ = ±1 w(n) x(n) G 2 xy(e jω )= Linear Transforms and Coherence G(z) H(z) P F(z) R x (e jω ) H(e jω ) 2 R x (e jω ) H(e jω ) 2 + R w (e jω ) G(e jω ) 2 Noise w(n) decreases coherence The final linear transform F (z) has no effect! J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 31 J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 32

Complex Spectral Density Functions Complex Spectral Density R x (z) = r x (l)z l R y (z) = r y (l)z l Complex Cross-Spectral Density R xy (z) = r xy (l)z l Random Processes and Linear Systems If the input to an LTI system is a random process, so is the output y(n, ζ) = h(k)x(n k, ζ) If the system is BIBO stable and the input process is stationary with E[ x(n, ζ) ] <, then the output converges absolutely with probability one In English, the output is stationary If E[ x(n, ζ) 2 ] <, thene[ y(n, ζ) 2 ] < If the h(n) has finite energy, the output converges in the mean square sense J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 33 J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 34 Linear System Statistics Output Power x(n) h(n) x(n) H(z) x(n) h(n) x(n) H(z) Let x(n) be a random process that is the input to an LTI system with an output μ y = h(k)e[x(n k)] = μ x h(k) =μ x H(e j0 ) r xy (l) = h (k)r x (l + k) = m= h ( m)r x (l m) r xy (l) = h ( l) r x (l) r yx (l) = h(l) r x (l) r y (l) = h(l) r xy (l) =h(l) h ( l) r x (l) =r h (l) r x (l) Let x(n) be a random process that is the input to an LTI system with an output P y = r y (0) = [r h (l) r x (l)] l=0 = r h (k)r x ( k) = If the system is FIR, then r h (k)r x(k) P y = h H R x h If μ x =0,thenμ y =0and σ 2 y = P y J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 35 J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 36

Output Distribution z Domain Analysis x(n) h(n) x(n) H(z) x(n) h(n) x(n) H(z) In general, it is very difficult to solve for the output PDF (even when is WSS) If x(n) is a Gaussian process, the output is a Gaussian process If x(n) is IID, The output is a weighted sum of IID random variables If the distribution of x(n) is stable, then has the same distribution (even if the mean and variance differ) If many of the largest weights are approximately equal so that many elements of the input signal have an equal effect on the output, then the CLT applies (approximately) and the output will be approximately Gaussian Z{h ( n)} = H (z ) R xy (z) = Z{h ( l) r x (l)} = H (z )R x (z) R yx (z) = Z{h(l) r x (l)} = H(z)R x (z) R y (z) = H(z)H (z )R x (z) Note that if h(n) is real, then h ( n) =h( n) h( n) Z H(z 1 ) J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 37 J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 38 Frequency Domain Analysis Random Signal & System Memory x(n) h(n) x(n) H(z) x(n) h(n) x(n) H(z) If the system is stable, z =e jω lies in the ROC and the following relations hold R xy (e jω ) = H (e jω )R x (e jω ) R yx (e jω ) = H(e jω )R x (e jω ) R y (e jω ) = H(e jω ) 2 R x (e jω ) Zero-memory: a process for which r x (l) =σxδ(l) 2 Examples: white noise, IID process We can create a signal with memory (dependence) by passing a zero-memory process through an LTI system Extent and degree of imposed dependence depend on h(n) Knowing r y (l) and r x (l) or the input and output PSD s are sufficient to determine H(e jω ) We can t estimate H(e jω ) from this information (the second order statistics) Only r xy (l) or R xy (e jω ) can provide phase information J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 39 J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 40

Correlation Length Correlation Length: given a WSS process, L c = 1 r x (0) r x (l) = ρ x (l) l=0 l=0 Equal to the area of the normalized autocorrelation curve Undesirable properties Why is it one sided? Lengths should not be negative, in general Could this be negative? Short Memory Processes Short Memory: a WSS process x(n) such that ρ x (l) < l= For example, autocorrelation decays exponentially ρ x (l) a l for large l r(l) =[10000, 03214, 07538] for l =1, 2, 3 zero-memory processes have a non-zero correlation length (L c =1) J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 41 J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 42 Long Memory Processes Long memory: for WSS signal x(n) with finite variance, if there exists 0 <α<1 and C r > 0 1 lim l C r σx 2 r x (l)l α =1 Equivalently, there exists 0 β<1 and C r > 0 such that lim ω 0 1 C r σx 2 R x (e jω ) ω β =1 Implies The autocorrelation has heavy tails Autocorrelation decays as a power law ρ x (l) = ρ x (l) C r l α as l l= Has infinite autocorrelation length Correlation Matrices Let the random vector x(n) be related to the (possibly nonstationary) random process x(n) as follows x(n) [ x(n) x(n 1) x(n M +1) ] H E[x(n)] = [ μ x (n) μ x (n 1) μ x (n M +1) ] H R x (n) = E[x(n)x(n) H ] r x (n, n) r x (n, n M +1) = r x (n M +1,n) r x (n M +1,n M +1) Note that R x (n) is nonnegative definite and Hermitian since r x (n i, n j) =r x(n j, n i) J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 43 J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 44

Correlation Matrices If x(n) is a stationary process, the correlation matrix becomes r x (0) r x (1) r x (2) r x (M 1) rx(1) r x (0) r x (1) r x (M 2) R x (n) = rx(2) rx(1) r x (0) r x (M 3) rx(m 1) rx(m 2) rx(m 3) r x (0) In this case R x is Hermitian, R x = R H x, Toeplitz (the elements along each diagonal are equal), and nonnegative definite Conditioning of Correlation Matrix Condition number: of a positive definite matrix R x is χ(r x ) λ max λ min where λ max and λ min are the largest and smallest eigenvalues of the autocorrelation matrix, respectively If x(n) is a WSS random process, then the eigenvalues of the autocorrelation matrix are bounded by the dynamic range of the PSD min R(e jω ) λ i max ω ω R(ejω ) λ i See text for proof Interpretation: a large spread in eigenvalues implies PSD is more variable (less flat) Process is less like white nose (more predictable) J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 45 J McNames Portland State University ECE 538/638 Stochastic Signals Ver 110 46