System Identification - PDF Free Download

System Identification Arun K. Tangirala Department of Chemical Engineering IIT Madras July 27, 2013 Module 3 Lecture 1 Arun K. Tangirala System Identification July 27, 2013 1

Objectives of this Module The objective of this module is to provide foundations on Random variables and statistical descriptions Covariance-based measures Random processes, stationarity and ergodicity Time-domain models for stationary processes Frequency-domain (spectral) representations of random processes Arun K. Tangirala System Identification July 27, 2013 2

Lectures in this module This module contains eight (8) lectures: Lecture 1: Introduction to Random Variables and Statistical Descriptions Lecture 2: Covariance and Correlation Lecture 3: Introduction to Random Processes Lecture 4: Auto-correlation & Cross-Correlation functions Lecture 5: Moving-Average Models Lecture 6: Auto-regressive Models Lecture 7: Models for non-stationary processes: ARIMA models Lecture 8: Spectral representations of stationary random processes Arun K. Tangirala System Identification July 27, 2013 3

Learning Objectives Through a study of this lecture, the student will learn the theoretical aspects of Random variables Probability theory (basic concepts) Statistical properties of random variables Expectation operator Arun K. Tangirala System Identification July 27, 2013 4

Introduction Earlier (in Lecture 0 of Module 1), we learnt that the output measurement is made up of two parts, (i) a deterministic component due to the effects of known variables, i.e., inputs and (ii) a stochastic (random) component comprising the effects of measurement noise and/or unmeasured disturbances. The overall model is therefore a composite model consisting of a deterministic as well as a stochastic model. In Module 2 we learnt how to mathematically describe deterministic processes. Now we turn our attention to the description of random processes. The cornerstone of theory of random processes is the concept of a random variable and the associated theory. Therefore, this module begins with a review of the theory of random variables and the associated statistics in the context of identification. Arun K. Tangirala System Identification July 27, 2013 5

Random Variable Definition A random variable (RV) is one whose value set contains at least two elements, i.e., it draws one value from multiple possibilities. The space of possible values is known as the outcome space or sample space. Examples: Toss of a coin, roll of a dice, outcome of a game In the study of random variables, the time dimension does not come into picture. Stated otherwise, a random variable is analysed only in the outcome space Naturally when the set of possibilities contains a single element, the randomness vanishes to give rise to a deterministic variable. Two classes of random variables can be found: Discrete-valued RV: A discrete RV is one which can take one of the values from a discrete set of possibilities (e.g., roll of a dice) Continuous-valued RV: A continuous RV is one which can take any value from a continuous outcome space (e.g., ambient temperature) Arun K. Tangirala System Identification July 27, 2013 6

Random Variable... contd. Definition (Random Variable, Priestley (1981)) A random variable X is a mapping (or point function) from the sample space Ω onto the real line such that to each element ω Ω there corresponds a unique real number Note: Random variable is denoted by an uppercase alphabet, as in X, while the values that it takes is denoted by a lowercase alphabet, x In constructing a random variable, we effectively replace our original (abstract) sample space by a new (concrete) sample space. E.g., outcomes of a game / head and tail of a toss are mapped to [1, 0] If the experiment itself yields some physical quantity that is real valued, then no further mapping is required In practice, the term random variable is restricted to measurable functions only, i.e., for whom the probabilities in the sample space are defined. Arun K. Tangirala System Identification July 27, 2013 7

Do random variables actually exist? The tag of randomness is given to any variable or a signal which is not accurately predictable, i.e., the outcome of the associated event is not predictable with zero error. In reality, there is no reason to believe that the true process behaves in a random manner. It is merely that since we are unable to predict its course, i.e., due to lack of sufficient understanding or knowledge that any process becomes random. Randomness is, therefore, not a characteristic of a process, but is rather a reflection of our (lack of) knowledge and understanding of that process Arun K. Tangirala System Identification July 27, 2013 8

Probability Distribution The natural recourse to dealing with uncertainties is to list all possible outcomes and assign a chance to each of those outcomes Examples: Rainfall in a region: Ω = {0, 1}, P = {0.3, 0.7} Face value from the roll of a die: Ω = {1, 2,, 6}, P(ω) = {1/6} ω Ω The specification of the outcomes and the associated probabilities completely characterizes the random variable. Arun K. Tangirala System Identification July 27, 2013 9

Probability Distribution Function In general, any random variable X is characterized by what is known as the probability distribution function F (x) F (x) = Pr(X x) Probability distributions either are known upfront OR determined through experiments Out of 10000 coin tosses find what fraction of tosses head appears - This is (an estimate of) the probability of head occurring in any toss (assuming the coin has not undergone any physical damage to alter the probability distribution). Arun K. Tangirala System Identification July 27, 2013 10

Probability Distribution Functions: Examples Gaussian dist. Chisquare (df=10) dist. Binomial dist. (n=10,p=0.5) Probability distribution functions are of several types. Shown in the figure are Gaussian, Chi-square and Binomial distribution as F(x) 0.0 0.2 0.4 0.6 0.8 1.0-4 -2 0 2 4 F(x) 0.0 0.2 0.4 0.6 0.8 1.0 0 10 20 30 40 F(x) 0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 10 well as density/mass func- x x x tions (to be defined shortly). The type of distribution for a particular random event depends on the nature of the event or process f(x) 0.0 0.1 0.2 0.3 0.4 Gaussian density f(x) 0.00 0.02 0.04 0.06 0.08 0.10 Chi-square (df=10) density p(x=x) 0.00 0.05 0.10 0.15 0.20 0.25 Binomial mass (n=10, p=0.5) -4-2 0 2 4 0 10 20 30 40 0 2 4 6 8 10 x x x Arun K. Tangirala System Identification July 27, 2013 11

Properties of Probability Distribution Functions For any function to serve as the probability distribution function, it has to satisfy the following properties (implications provided in parentheses): 1 0 F (x) 1, x 2 lim F (x) = 0,, lim F (x) = 1, (Probability has to be between 0 and 1) x x 3 F (x) is a non-decreasing function in the sense that, for any h 0 and all x, F (x + h) F (x) (Probability cannot be negative) 4 F (x) is right-continuous for all x, i.e., lim F (x + h) = F (x) (F (x) can have jumps ) h 0+ Arun K. Tangirala System Identification July 27, 2013 12

Types of distributions Distributions can be classified into two categories, depending on the nature of the RV, i.e., whether it is continuous-valued or discrete-valued Continuous distributions F (x) is continuous and differentiable for almost all x (e.g., Gaussian). The random variable in this case is a continuous quantity (e.g., temperature, pressure, voltage) For these distributions, a density function (like in mechanics) exists Discrete (step-type) distributions F (x) is a simple step function with jumps at points (e.g., Binomial) The random variable is then a purely discrete one. No density function exists for this case. Instead a probability mass function that determines Pr(X = x) is defined (e.g., counts of number of heads in a coin toss, face value on a die) The probability of taking on a value between jump points is zero Mixed distributions: F (x) is both continuous and discrete Arun K. Tangirala System Identification July 27, 2013 13

Lebesgue s Decomposition Theorem Any distribution function can be decomposed into sum of three terms, namely, the continuous, discrete and a singular type Theorem (Lebesgue s Decomposition of F (x)) Any distribution function, F (x), may be written in the form where a i 0, i = 1, 2, 3, a 1 + a 2 + a 3 = 1 F (x) = a 1 F 1 (x) + a 2 F 2 (x) + a 3 F 3 (x) F 1(x) is absolutely continuous (continuous everywhere and differentiable for almost all x), F 2(x) is a step function with a finite or countably infinite number of jumps, and F 3(x) is a singular function, that is, continuous with zero derivative almost everywhere The third term arises only in pathological situations and largely arises in hypothetical situations. Therefore, it is sufficient to consider only the first two terms Arun K. Tangirala System Identification July 27, 2013 14

Probability Density Function For the subject under study, continuous distributions (continuous-valued RVs) are of interest. Given that continuous probability distributions possess a density function, it is most convenient to work with densities (again as in mechanics) The density function f (x) can be defined in two different ways 1 The density function is such that the area under the curve gives the probability, Thus, Pr(a < x < b) = b a f (x) dx = 1 f (x) dx (1) 2 The density function is the derivative (w.r.t. x) of the distribution function f (x) = df (x) dx (2) Arun K. Tangirala System Identification July 27, 2013 15

Gaussian Density Function One of the most importantly encountered and assumed distribution for a RV is the Gaussian (Normal) distribution with probability density function (p.d.f.) f (x) = 1 ( σ 2π exp 1 2 Remarks: (x µ) 2 ) σ 2 Density is completely characterized by the two parameters µ and σ Parameters µ and σ are related to the first two moments of the p.d.f., which can be easily estimated in practice, The density function is symmetric = the higher-order moments are zero. (3) In addition, the Central Limit Theorem renders wide usage of Gaussian p.d.f. f(x) 0.0 0.1 0.2 0.3 0.4 Gaussian density function -4-2 0 2 4 x Shaded region: Pr(1 X 2) Arun K. Tangirala System Identification July 27, 2013 16

Uniform Distribution A widely encountered distribution is the uniform distribution in the interval [a, b] Remarks: f (x) = 1 b a, a x b (4) Density is completely characterized by the two parameters a and b The uniform distribution has the simplest of all the pdfs and is usually the starting point for generating samples following a Gaussian distribution Unlike the Gaussian distribution, the higher-order moments are not zero. f(x) 0.08 0.10 0.12 0.14 0.16 Uniform density function -4-2 0 2 4 x Shaded region: Pr(1 X 2) Arun K. Tangirala System Identification July 27, 2013 17

Chi-square Density Function Another popularly encountered distribution (of non-negative RVs) is the χ 2 p.d.f. with n degrees of freedom: f n (x) = Remarks: 1 2 n/2 Γ(n/2) x n/2 1 e x/2 (5) An alternative definition exists. When n X = i=1 Z 2 i where Z i s are independent Gaussian distributed random variables, X is said to possess a Chi-square distribution. Estimates of variance and other non-negative quantities are known to possess chi-square distributions The mean and variance are n and 2n f(x) 0.00 0.02 0.04 0.06 0.08 0.10 Chi-square density function 0 10 20 30 40 Shaded region: Pr(6 X 8) Arun K. Tangirala System Identification July 27, 2013 18 x

Practical Aspects The pdf of a RV allows us to compute the probability of X taking on values in an infinitesimal interval, i.e., Pr(x X x + dx) f (x)dx Note: Just as the way the density encountered in mechanics cannot be interpreted as mass of the body at a point, the probability density should never be interpreted as the probability at a point. In fact, for continuous-valued RVs, Pr(X = x) = 0 In practice, knowing the pdf of a random variable is seldom possible theoretically. One has to conduct experiments and then try to fit a known pdf that best explains the behaviour of the RV. It may not be necessary to know the pdf in practice! What is of interest in practice is (i) the most likely value and/or the expected outcome (mean) and (ii) how far the outcomes are spread (variance) Arun K. Tangirala System Identification July 27, 2013 19

Practical Aspects: Moments of a pdf The useful statistical properties, namely, mean, variance and covariance are related through the pdf f (x) as its first-order and second-order moments (similar to the moments of inertia). The n th moment of a pdf is defined as M n (X ) = x n f (x) dx (6) It turns out that for linear processes, predictions of random signals and estimation of model parameters only require the knowledge of mean, variance and covariance (to be introduced shortly), i.e., it is sufficient to know the first and second-order moments of pdf. In fact, for a Gaussian distributed RV, recall that knowing the mean and variance implies complete knowledge of the uncertainty description Arun K. Tangirala System Identification July 27, 2013 20

First Moment of a pdf: Mean The most important property of interest for a RV is the center of outcomes, which is perhaps the most used statistic in all spheres of data analysis. Average salary, average rainfall, average rating, etc. The mean is defined as the first moment of the pdf (Analogous to the center of mass). It is also the expected value (outcome) of the RV. Mean The mean of a RV, also the expectation of the RV, is defined as E(X ) = µ X = xf (x) dx (7) Arun K. Tangirala System Identification July 27, 2013 21

Examples Example Problem: Determine the mean of a RV that follows Gaussian distribution. Solution: The Gaussian distributed RV has the pdf f (x) = 1 ( σ 2π exp 1 ) (x µ) 2 2 σ 2 Therefore, ( 1 µ X = E(X ) = x σ 2π exp 1 ) (x µ) 2 dx 2 σ 2 = µ Problem: Determine the mean of a RV that follows a uniform distribution in [a, b]. Solution: The uniform distributed random variable has the pdf f (x) = 1 b a Therefore, b 1 µ X = E(X ) = x b a dx = b + a 2 a Arun K. Tangirala System Identification July 27, 2013 22

Remarks on Mean The integration in equation (7) is across the outcome space and NOT across any time space. The symbol E is the expectation operator. Applying the operator E to a random variable produces its average or expected value. There are other measures of the center of outcomes - the popular alternative being median Prediction perspective: The mean is the best prediction of the random variable in the minimum mean square error sense, i.e., µ = min c E(X ˆX ) 2 s.t. ˆX = c where ˆX denotes the prediction of X. Arun K. Tangirala System Identification July 27, 2013 23

Expectation Operator The expectation operator is the most powerful and useful operator in statistics. It is important to study its properties The expectation operator operates across the space of outcomes For any constant, E(c) = c The expectation of a function of X is given by E(g(X )) = It is a linear operator: ( k ) E c i g i (X ) = Remark: i=1 g(x)f (x) dx (8) k c i E(g i (X )) (9) i=1 The expected value of X (or g(x )) is the weighted average of all possible values of X (or g(x )), each value being weighted by the corresponding probability Arun K. Tangirala System Identification July 27, 2013 24

Examples: Computing expectations Example Problem: Find the expectation of a random variable y[k] = sin(ωk + φ) where φ is uniformly distributed in [ π, π]. Solution: E(y[k]) = E(sin(ωk + φ)) = 1 2π π π sin(ωk + φ) dφ = 1 2π ( cos(ωk + φ) π π ) = 1 (cos(ωk π) cos(ωk + π)) = 0 2π Problem: Current fluctuation in a constant resistance wire is known to follow a uniform distribution in [a, b]. Determine the average power dissipated by the wire. Solution: E(P) = E(i 2 R) = R 1 b a ( = R ) i 3 b b a 3 = R(b 2 + a 2 + ab)/3 a b a i 2 di Arun K. Tangirala System Identification July 27, 2013 25

Variance An important statistic useful in decision making, error analysis of parameter estimation, input design and several other prime stages of data analysis is the variance. Variance The variance of a random variable, denoted by σx 2 outcomes around its mean, is the average spread of σ 2 X = E((X µ X ) 2 ) = (x µ X ) 2 f (x) dx (10) Arun K. Tangirala System Identification July 27, 2013 26

Points to note Remarks: As (10) suggests, σx 2 can be rewritten as is the second central moment of f (x). However, it σ 2 X = E(X 2 ) µ 2 X (11) The variance definition is in the space of outcomes. It should not be confused with the widely used variance definition for a series or a signal (time samples). Large variance indicates far spread of outcomes around its statistical center. Naturally, in the limit as σx 2 0, X becomes a deterministic variable. Arun K. Tangirala System Identification July 27, 2013 27

Examples Example Problem: Determine the variance of a RV that follows Gaussian distribution Solution: The variance is found using (10), σ 2 X = E((X µ X ) 2 ) = = = σ 2 (x µ X ) 2 f (x) dx (x µ X ) 2 1 ( σ 2π exp 1 2 ) (x µ) 2 dx Problem: Determine the variance of a RV that follows a Laplace distribution, the pdf given by f (x) = 1 ( ) 2b exp x µ b Solution: σ 2 X = E((X µ X ) 2 ) = = 2b 2 (x µ X ) 2 1 ( 2b exp ) x µ dx b σ 2 Arun K. Tangirala System Identification July 27, 2013 28

Mean and Variance of scaled RVs In statistical data analysis including estimation, identification and prediction, we encounter scaled random variables. It is useful to know how the properties of the scaled variables are related to those of the original ones. Adding a constant to a RV simply shifts its mean by the same amount. The variance remains unchanged (since addition merely shifts the mean and variance is a measure of spread around the center) Multiplication: Y = αx + β, α R = µ Y = αµ X + β (12) σ 2 Y = α 2 σ 2 X (13) Non-linear operations on RV can change the properties depending on the non-linearity involved Arun K. Tangirala System Identification July 27, 2013 29

Properties of Normally Distributed Variables The normal distribution is one of the most widely assumed and studied distribution for two important reasons: It is completely characterized by the mean and variance Central LImit Theorem If x 1, x 2,, x n are uncorrelated normal variables, then y = a 1 x 1 + a 2 x 2 + + a n x n is also a normally distributed variable with mean and variance µ y = a 1 µ 1 + a 2 µ 2 + + a n µ n σ 2 y = a 2 1σ 2 1 + a 2 2σ 2 2 + + a 2 nσ 2 n Arun K. Tangirala System Identification July 27, 2013 30

Central Limit Theorem The central limit theorem is one of the classical results in statistics. It is widely used to support the assumption of Gaussian distribution for many random phenomena and is also used to derive distributions of parameter estimates. Central Limit Theorem Let X 1, X 2,, X m be a sequence of independent identically distributed random variables each having finite mean µ and finite variance σ 2. Let Y N = N X i N = 1, 2, i=1 Then, as N, the distribution of Y N Nµ σ N (0, 1) N One of the popular applications of the CLT is in deriving the distribution of sample mean, which is simply the average of series data. Arun K. Tangirala System Identification July 27, 2013 31

Summary In this lecture, we Learnt the concepts of random variables and probability distribution Studied density functions (which only exist for continuous distributions) Understood the concept of moments of a pdf, particularly two most important moments, namely, mean and variance Observed that Gaussian distribution is completely characterized by its first two moments only Learnt in detail the Gaussian distribution and its properties. Encountered an important result - the expectation of a RV is its best prediction in the minimum mean square sense Learnt the Central Limit Theorem that provides strong support for the assumption of Gaussian distribution Arun K. Tangirala System Identification July 27, 2013 32