Stochastic Processes Stochastic: from Greek stochastikos, proceeding by guesswork, literally, skillful in aiming. A stochastic process is simply a collection of random variables labelled by some parameter: 1. {X t (ζ), t [a, b]} is an infinite set of random variables defined over an interval [a, b], all of which map from the same set of events {ζ}. 2. {X n (ζ), n = integer} is a discrete parameter random process. A stochastic process is a function of two variables: 1. the parameter t or n in the examples above (e.g. time); and 2. ζ S = the sample or event space. At a fixed time, X t (ζ) is a simple random variable. For a fixed event ζ, X t (ζ) is a simple function of time. One of these functions is called a realization of the random process. The collection of all possible realizations (perhaps infinite) is an assembly or ensemble. 1
Ensemble vs. Realization Averages Revisited Two kinds of averages may be defined, according to which variable the process is averaged over: 1. Averages over t for a single realization are sample averages or (for t = time) time averages. 2. Averages over ζ are ensemble averages. In general, ensemble averages sample averages. If a sample average converges to the ensemble average as the length of the realization tends to infinity, the process is said to be ergodic and is said to have stationary statistics. The Gibbs ensemble in statistical mechanics has stationary statistics, since it is an infinite assembly of oscillators or systems in equilibrium with a temperature bath. Why is this important? One reason is that we will be considering estimators for ensemble average quantities that are based on single (or a finite number of) realizations. In some cases the estimator will converge to the ensemble average quantity, in others it will not. 2
Recall the graphics that show realizations of time series: Figure 1: Realizations for stationary stochastic white noise (left) and a nonstationary random walk (right) 3
Characterization of Stochastic Processes To totally specify a random process, we must know the multivariate pdf (or distribution function) of a large number (possibly infinite) of random variables. For a discrete process {X(t j ), j = 1,..., n} we would need to know the 2n dimensional distribution function: F X(t1 )...X(t 2 )(x 1,..., x n ; t 1,..., t n ) P {X(t 1 ) x 1,..., X(t n ) x n }. In practice we will be much less ambitious and will be satisfied with knowing (or constraining) only a few low order moments of the process. These include first order moments like X n (t) = dz z n f X(t) (z; t) which are ensemble averages that may be functions of time. Second order moments include the autocorrelation function R X (t 1, t 2 ) X(t 1 )X (t 2 ) dw dz wzf X(t1 )X(t 2 )(w, z; t 1, t 2 ) and the autocovariance function C X (t 1, t 2 ) [X(t 1 ) X(t 1 ) ] [X(t 2 ) X(t 2 ) ]. 4
Stationarity If any moments of a process are functions of time, the process is nonstationary. Different orders of stationarity are defined according to the order of moment. 1. Stationarity of order 1: 2. Second order stationarity: for any t F X(t1 )(x; t 1 ) = F X(t2 )(x; t 2 ). F (x 1, x 2 ; t 1, t 2 ) = F (x 1, x 2 ; t 1 + t, t 2 + t). In particular, for t = t 1, the right hand side depends only on the difference or lag t 2 t 1. 3. Strict stationarity: time or lag invariance of the distribution function holds for all orders. 4. Wide sense stationarity (WSS) is defined up to only second order. Note the congruence with the complete determination of Gaussian processes by their first and second moments. The constraints for WSS are i. X 2 (t) < t. ii. X(t) = constant. iii. R(t 1, t 2 ) = X(t 1 )X(t 2 ) = R(t 2 t 1 ). i.e. the autocorrelation function depends only on time differences. 5
Correlation Functions and Power Spectra of WSS Processes Autocorrelation functions of WSS processes (distinct from autocorrelations of functions) R(τ) = X(t)X (t + τ) have the properities: 1. Hermiticity: R X( τ) = R X (τ). 2. R X (0) = X 2. 3. R X (τ) R X (0). Autocorrelation (and autocovariance) functions are useful as: 1. probes of characteristic time (or length or velocity, etc.) scales of a process. 2. quantities used in estimation procedures (via the covariance matrix). 3. a means for calculating the power spectrum of a process. 6
Wiener-Kinchin theorem The power spectrum S(f) is simply the Fourier transform of the autocorrelation function (sometimes the autocovariance function). S(f) = dτe 2πifτ R X (τ). As such it (as well as the ACF) is an ensemble average quantity. With finite measurements of realization(s) of a process, the best we can do is to estimate the power spectrum. Properties of S(f) are: 1. S(f) 0. 2. Real since R(τ) hermitian. 3. Is the distribution of the second moment (or variance) in frequency space. 4. Partakes of the analogy S(f) : R(τ) :: f X (x) : Φ X (ω). In some contexts (e.g. maximum entropy spectral estimation), it is convenient to view the power spectrum as a probability distribution of frequency components. In some Bayesian treatments, the PDF of the frequency is explicitly calculated. 7
Correlation Functions and Power Spectra Recall from Fourier transform theorems for deterministic functions we have the relationships: f(t) FT F (f) irreversible irreversible dt f(t)f (t + τ) FT F (f) 2 For stochastic processes the situation is different. We need to distinguish the power spectrum of a realization from the ensemble-average (true) power spectrum: x(t) FT X(f) irreversible irreversible x(t)x (t + τ) FT S(f) = X(f) 2 8
Cross-correlation Functions: Suppose we have two random processes X(t) and Y (t) and we wish to test whether they are statistically related. e.g. X(t) = sunspot number Y (t) = number of airline accidents X(t) = pressure Y (t) = temperature X(t) = seismic activity Y (t) = animal behavior A useful statistic is the cross-correlation function (CCF) as is the cross-covariance function (CCV) R XY (t 1, t 2 ) X(t 1 )Y (t 2 ) C XY (t 1, t 2 ) = [X(t 1 ) X(t 1 ) ][Y (t 2 ) Y (t 2 ) ] Two random processes are uncorrelated if C XY (t 1, t 2 ) = 0 t 1, t 2 Two r.p. s are orthogonal if R XY (t 1, t 2 ) = 0 t 1, t 2. 9