ECE 636: Systems identification

ECE 636: Systems identification Lectures 3 4 Random variables/signals (continued) Random/stochastic vectors Random signals and linear systems Random signals in the frequency domain

υ ε x S z + y Experimental I/O data > Selection of model type >Selection of criterion >Calculation of model parameters >Model validation

Selection of model type: two main model categories Nonparametric (black box): Linear systems impulse response/ frequency response M yn ( ) = hmxn ( ) ( m) Ye jω ( ) He jω j ( ) Xe ω = ( ) m = 0 n Nonlinear systems: Volterra series: Parametric (grey box): Linear systems linear differential/difference equations d y() t dy() t du() t a + a + y() t = b + b0u() t dt dt dt ayn ( ) + ayn ( ) + yn ( ) = bun ( ) + bun ( ) 0 Nonlinear systems: Nonlinear difference equations, block models 3

Curve fitting Model complexity, amount of data, criterion of fit: important! wˆ arg min E( w) N = w N E( w) = tk g( xk, w) Ν k =

Review of random variable basics Gaussian random variables, central limit theorem Random signals Correlation and covariance functions Stationarity (weak/wide sense and strict), ergodicity Note: Only stationary signals can be ergodic 5

Independent/Uncorrelated random variables Independent random variables: pxy (, ) = px ( ) py ( ) In the general case, for N random variables: = N N i= = px (,..., X ) px ( ) Uncorrelated random variables: E{ XY} = E{ X} E{ Y} i IfXYareuncorrelated X,Y uncorrelated, thecovariance betweenthemis them zero: Cov( X, Y ) = E{( X E{ X})( Y E{ Y})} = E{ XY} E{ X} E{ Y} = 0 It also holds that: The normalized quantity: ρ XY Cov( X, Y ) = σ σ X Y Cov( X, Y ) σ Xσ Y is termed the correlation coefficient between the random variables X,Y. It follows that for two uncorrelated r.v. s the correlation coefficient is zero. The correlation coefficient lies between and. 6

Vector random variables Last time we reviewed some basic probabilistic measures used for one random variable and for random processes/signals Often we need to describe probabilistically the properties of a set of random variables: random/stochastic vectors Example: We will treat many systems identification problems as a linear regression problem ε u S + y where c will be a set of parameters that will describe the system (e.g. the values of its impulse response h[n] or the coefficients of an autoregressive model with exogenous input a i i, b i i.e.: yt + ayt + + ayt n n = but + + but m m Due to the randomness in the noise term ε these parameters are also random, therefore we will treat them as a random vector and will describe them accordingly. gy In other words if we repeat the estimation procedure with different data records (and consequently noise samples), we will not get the same result 7for c! ( ) ( )... ( ) ( )... ( )

Vector random variables For a random vector X the probability distribution function is defined as: P( X x) Prob{ X x,..., Xn xn} = Pr ob{ X x} P( X ) = P(- ) = 0 X The corresponding probability density function is defined as: n P X( x) P( X x) x... xn p ( x ) d x X = Pr ob{ x X < x+ dx,..., xn Xn < xn + dxn} x x xn ' ' P ( x) = p ( x') dx' =... p ( x') dx... dx n and: X X X The marginal probability density function for the element x i is defined as: p ( x )... x p ( x,..., x x,..., x ) dx... dx dx dx = X Xi i i i i+ n i i+ n The joint probability distribution function between two random vectors X and Y is defined as: P ( x, y ) = Prob{ X x, Y y XY } Similarly, the joint probability density function between X and Y is defined as: x y x x ' ' ' ', ) ( ', ') ' '... n y ym XY xy = =... ( ', ')... n... XY x y x y XY x y m P ( p d d p dx dx dy dy 8

Vector random variables The expected value of a random vector Χ is defined dfi das: μ =Ε{ X} μi =... xip ( x,..., xn) dx... dx X n or equivalently, l using the marginal pdf for the element x i : μi = xp i X ( x) i i dx i The covariance matrix (dim: nxn) of a random vector X (dim: nx) is defined as: Cov ( X ) = Σ = E {( X μ )( X μ ) T } Diagonal terms Σii: equal to the variance of each vector element σi = E{( Xi μι ) } Non diagonal terms Σij: Covariance between the random vector elements xi and xj The covariance matrix is symmetric and positive semidefinite If the elements of the random vector are uncorrelated, the covariance matrix is diagonal (why?) 9

Vector random variables Similarly, the autocorrelation matrix of a random vector X is defined as: R = E{ XX T } = Σ μ T X μ X The covariance matrix (dim: nxm) between two random vectors X (dim: nx), Y (dim: mx) is also defined as: Cov{ XY, } = E{ XY T } Analogously to the case of random variables, two random vectors are termed independent if: p ( x, y) = p ( x) p ( y) XY X Y Two random vectors are termed uncorrelated if: T T E{ XY } = E{ X} E{ Y } Two random vectors are termed orthogonal if: n T E { XY } = E { X Y } = 0 i= i i 0

The multidimensional normal distribution A random vector X of dimension Νx is said to follow a multidimensional normal (Gaussian) distribution if the corresponding joint pdf of its elements is: px x x N x μ Σ x μ N ( π ) Σ X N( μσ, ) Τ Σ (,..., ) = exp ( ) ( ) / / Mean value μ Covariance matrix Σ symmetric and positive semidefinite Diagonal elements: Variance of each σ i ( ) element x i ( ) Non diagonal elements: Covariance between xi και xj ( E{( xi μi )( xj μj)} ) N If the elements of X are independent Σ p ( x,..., x ) = p ( x ) = : Σ diagonal Ellipsoid with principal axes determined by the eigenvalues, eigenvectors of Σ σ 0 0 = 0... 0 i i= 0 0 σ Ν N xi μ i = exp N / ( π) σ... σ Ν i = σ i X N X i Multidimensional central limit theorem: vector sum of large number of mutually independent N dimensional r.v. s approaches N dimensional normal distribution

The two dimensional normal distribution For N=, if ρ=cov{x,x }/σ σ (correlation coefficient, ρ<): For x, x uncorrelated ρ=0, the principal axes are parallel to the x, x axes. In this case. Therefore, Gaussian (normally) distributed random variables that are uncorrelated are also independent. This is not true in general for other distributions. For ρ=, the distribution is reduced to a one dimensional distribution For Σ=σ Ι, circular contours Any linear transformation of X follows a normal distribution as well, i.e. if p(x)~n(μ,σ) then for Y=A T X we have p(y)~n(a T μ, A T ΣA)

The two dimensional normal distribution For increasing ρ 3

Sample statistics In practice, the probability distributions of random variables/vectors or random signals are not available. Therefore we can not use the definitions to estimate statistical quantities such as the mean or variance, e.g.: If we have Ν samples {x i } i=,,n of a random variable Χ, we can estimate the mean and variance as the sample mean and sample variance: N N ˆ x = μ = Χ xi ˆ σ N = ( xi x) i= N σ Χ i= These are not the only ways to obtain the estimates. How do we judge if an estimator is good or bad? Eti Estimator t properties An estimator is termed unbiased if its expected value is equal to the true value of the parameter that is estimated, i.e.: An estimator is termed consistent if it converges to its true value for N, i.e.: or equivalently: For the estimate of the mean, if the samples are independent identically distributed (i.i.d.): Ex { } = μ (unbiased) σ X E {( x μx ) } = (consistent) N For the estimate of the variance: N N E{ ˆ σ } σ (biased). Therefore, an unbiased estimate is: ˆ σ Χ = = Χ ( xi x) Χ N t= N It can be also shown that this estimate is consistent

Stationary random signals We saw that t for stationary tti random signals, their statistical ttiti properties (mean and variance, correlation/covariance functions) are independent of the time lag t, i.e.: μx mean ϕxx ( τ) = Extxt { ( ) ( + τ)} autocorrelation function ϕxy ( τ) = Extyt { ( ) ( + τ)} cross correlation function γ xx ( τ) = E{( x( t) μx )( x( t+ τ) μx )} = ϕxx ( τ) μx autocovariance function γ xy( τ ) = E{( x( t) μx)( y( t+ τ ) μy)} = ϕxy( τ ) μμ x y cross covariance function For two uncorrelated random processes the cross covariance function is zero: γ xy ( τ ) = 0 The autocorrelation coefficient function is defined as: ϕxx ( τ) ϕxx ( τ) ρxx ( τ) = = ϕxx (0) σx Similarly, the cross correlation coefficient function is: γ xy ( τ ) γ xy ( τ ) ρxy ( τ) = = γ xx (0) γ (0) σ xσ yy y These functions lie between and For ergodic random signals we saw that statistical properties may be calculated as time averages. Therefore: T μx = lim T xtdt ( ) T 0 T ϕxx ( τ) = lim T xtxt ( ) ( ) dt T + τ 0 T ϕxy ( τ) = lim T xt ( ) yt ( + τ) dt T 0

Stationary random signals These quantities can be estimated from finite samples in a similar way as before, i.e.: N ˆ μ x = xt () N t= N ˆ ϕxx ( τ) = xnxn ( ) ( + τ) N n= N ˆ ϕxy ( τ) = xny ( ) ( n + τ) N n= It can be shown that these estimates are unbiased and consistent for most random processes, e.g. Gaussian random processes, which are defined as random processes for which the random samples x(τ) follow a multidimensional normal (Gaussian) distribution

Examples Autocorrelation (Matlab xcorr)

Examples Cross correlation

Applications of cross correlation: Estimation of pure time delay

Stochastic signals and LTI systems Let the input of a DT LTI system with impulse response h[n] be a wide sense stationary (WSS) signal. The output is: x(t) h(τ) y(t) The input signal is characterized by its mean value and its autocorrelation function What are the corresponding quantities for the output? Since x(t) is WSS: The mean value of the output is: 0 where H(e j0 ) is the frequency response of the system (i.e. the Discrete Time Fourier transform DTFT of its impulse response) evaluated at ω=0: ( j ω He ) = h( τ ) e n= jωτ Therefore, the mean value of y is independent of t also. The value of the frequency response at ω=0 is termed DC gain

Stochastic signals and LTI systems The autocorrelation function of the output is: But x(t) is WSS, i.e.: therefore: The output of an LTI system to a WSS random signal is also WSS. By substituting where is the autocorrelation sequence of the deterministic signal h (τ) and is equal to Therefore, the autocorrelation function of the output is the convolution between the autocorrelation function of the input and the deterministic autocorrelation of h[n]

Stochastic signals in the frequency domain How do we describe a stochastic signal in the frequency domain? Fourier transform of the autocorrelation function: Power spectrum x(t) () h(τ) y(t) ) Similarly, we define as the DTFTs of respectively. For zero mean random signals: Why power spectrum? From the definition: we can see that the area under the spectrum Φ xx from π to π is proportional to the mean power ofthe signal. For simplicity, we often write: where is the power spectrum or the power spectrum density of the signal For real random signals, therefore the power spectrum is real and an even function of ω, i.e. : Note: In CT the power spectrum is defined as Φ = jωτ xx ( ω) ϕxx ( τ) e dτ

Stochastic signals in the frequency domain Similarly the cross power density spectrum between two random signals is defined as the DTFT of the cross correlation function between them The cross power density spectrum is generally x(t) () y(t) ) a complex number and: h(τ) We saw that: Therefore in the frequency domain (convolution property of the DTFT): but Therefore: The output power spectrum is equal to the input power spectrum multiplied by the squared magnitude of the frequency response of the system 3

White noise signals White noise signal: Defined as a random signal with zero mean, for which any two samples are independent. The autocorrelation function of a white noise signal is a Dirac delta function: ϕxx ( τ) = Ext { ( + τ) xt ( )} = σxδ( τ) The spectrum of a white noise signal is flat and contains all frequencies (from to ) - analogy with white light: Φ xx( ω) = σ x The mean power of a white noise signal is: White noise is an ideal signal and it exhibits very desirable properties for systems identification (more to follow) If in addition the samples of the white noise signal follow a normal distribution: Gaussian white noise (Matlab: randn) 4

Stochastic signals in the frequency domain Example: If Η is an ideal bandpass filter: x(t) H(ω) y(t) What happens if the input signal is white noise with autocorrelation function? Example: Let the input of an LTI system with frequency response is a zero mean white noise signal with. The output spectrum is: 5

Stochastic signals and LTI systems The cross correlation between the input and output is: x(t) h(τ) y(t) i.e. the convolution between the input autocorrelation and the inpulse response of the system Direct consequence: Τhe input/output cross spectrum is given by: 6

Stochastic signals and LTI systems Example: Let a white noise signal be the input to an x(t) H(ω) y(t) ideal lowpass filter with cutoff frequency ω c Output power spectrum: Output autocorrelation: Mean power of the output: 7

Stochastic signals and LTI systems Overall, we have the following relations for an LTI system being driven by a random input Time domain Frequency domain ϕ ( τ) = h( τ)* ϕ ( τ) Φ ( ω) =Η( ω) Φ ( ω) xy xx ϕ ( τ ) = h ( τ)* ϕ ( τ )* h( τ) = ϕ ( τ )* C ( τ ) Φ ( ω ) = Η ( ω ) Φ ( ω ) yy xx xx hh Mean, power, variance μ = μ H (0) y x π {( ( )) } = yy (0) = ( ) xx ( ) π Η Φ π E yt ϕ ω ω dω σ = ϕ (0) μ y yy y Note: All these relations are also valid for continuous time systems (proofs analogous instead of convolution sum >convolution integral) We will use these relations for nonparametric systems identification in the time and frequency domain xy yy xx xx 8