System Identification, Lecture 4

System Identification, Lecture 4 Kristiaan Pelckmans (IT/UU, 2338) Course code: 1RT880, Report code: 61800 - Spring 2016 F, FRI Uppsala University, Information Technology 13 April 2016 SI-2016 K. Pelckmans Jan.-March, 2016

Lecture 4 Stochastic Setup. Interpretation. Maximum Likelihood. Least Squares Revisited. Instrumental Variables. SI-2016 K. Pelckmans Jan.-March, 2016 1

Stochastic Setup Why: Analysis (abstract away). Constructive. Basics: Samples ω Sample space Ω = {ω} Event A Ω. SI-2016 K. Pelckmans Jan.-March, 2016 2

Rules of probability Pr : {Ω} [0, 1] 1. Pr(Ω) = 1, 2. Pr({}) = 0, 3. Pr(A 1 A 2 ) = Pr(A 1 ) + Pr(A 2 ) for A 1 A 2 = {} SI-2016 K. Pelckmans Jan.-March, 2016 3

Random variables is a function X : Ω R. Examples: Urn. Sample= Ball. Random variable = Color ball. Sample = Image. Random variable = Color/black-white. Sample = Speech. Random variable = Words. Sample = text. Random variable = Length optimal compression. Sample = weather. Random variable = Temperature. Sample = External force. Random variable = Influence. SI-2016 K. Pelckmans Jan.-March, 2016 4

Pr ({ω : X(ω) = x}) := P (X = x). CDF P (x) := Pr(X x). 1 0.9 0.8 0.7 0.6 F n 0.5 0.4 0.3 0.2 0.1 0 3 2 1 0 1 2 3 x PDF p(x) := dp (x) dx. 14 12 10 8 f n 6 4 2 0 3 2 1 0 1 2 3 x Realizations x vs. random variables X. SI-2016 K. Pelckmans Jan.-March, 2016 5

Expectation: E[X] := Ω x Pr(dx) = Ω x dp (x) = Ω x p(x) dx Expectation of g : R R. E[g(X)] = g(x) dp (x) = Ω Ω g(x) p(x) dx Independence: E[X 1 X 2 ] = E[X 1 ] E[X 2 ] Law of Large numbers: if {X i } n i=1 I.I.D.: 1 n n X i E[X]. i=1 SI-2016 K. Pelckmans Jan.-March, 2016 6

Gaussian PDF: Gaussian PDF determined by 2 moment: mean µ and variance σ 2 p(x) = 1 ) ( σ 2π exp (x µ)2 2σ 2 = N (x; µ, σ) (or standard deviation σ). Closed by convolution, conditioning and product. Multivariate Normal PDF for x R 2 : p(x) = N (x; µ, Σ) = 1 2π Σ 1 2 exp ( 1 ) 2 (x µ)t Σ 1 (x µ) SI-2016 K. Pelckmans Jan.-March, 2016 7

1 0.8 CDF(x 1,x 2 ) 0.6 0.4 0.2 0 4 2 0 x 2 2 4 4 2 x 1 0 2 4 SI-2016 K. Pelckmans Jan.-March, 2016 8

0.8 0.6 PDF(x 1,x 2 ) 0.4 0.2 0 4 2 0 x 2 2 4 4 2 x 1 0 2 4 SI-2016 K. Pelckmans Jan.-March, 2016 9

Let bivariate Gaussian with mean µ and variance matrix Σ as p ([ x1 x 2 ]) = N ( [ ] [ ]) µ1 Σ11 Σ x;, 12 µ 2 Σ 21 Σ 22 Then conditional bivariate Gaussian p(x 1 X 2 = x 2 ) = N ( µ, Σ) where { µ = µ 1 + Σ 12 Σ 1 22 (x 2 µ 2 ) Σ = Σ 11 Σ 12 Σ 1 22 Σ 21 Central Limit Theorem (CLT): if X i N (E[X], σ) 1 n n X i N i=1 ( E[X], ) σ n SI-2016 K. Pelckmans Jan.-March, 2016 10

Interpretations The metallurgist told his friend the statistician how he planned to test the effect of heat on the strength of a metal bar by sawing the bar into six pieces. The first two would go into the hot oven, the next two into the medium oven and the last two into the cool oven. The statistician, horrified, explained how he should randomize in order to avoid the effect of a possible gradient of strength in the metal bar. The method of randomization was applied, and it turned out that the randomized experiment called for putting the first two into the hot oven, the next two into the medium oven and the last two into the cool oven. Obviously, we can t do that, said the metallurgist. On the contrary, you have to do that, said the statistician. SI-2016 K. Pelckmans Jan.-March, 2016 11

Stochastic Processes Sequence of random variables indexed by time. Joint distribution function. Conditional PDF. Stationarity. Quasi-Stationary. Ergodic: different realizations vs. time averages. SI-2016 K. Pelckmans Jan.-March, 2016 12

Maximum Likelihood If the true PDF of X were f, then the probability of occurrence of a realization x of X were p(x). Consider a class of such PDFs {p θ : θ Θ} Likelihood function: this PDF as a function of an unknown parameter θ L(θ; x) = p θ (x) Idea: given an observation y i, find a model under which this sample was most likely to occur. θ n = argmax θ Θ L(y i, θ) Equivalent to θ n = argmax θ Θ log L(y i ; θ) SI-2016 K. Pelckmans Jan.-March, 2016 13

Properties of θ n Assume that x n contains n independent samples. Consistent, i.e. θ n θ 0 with rate 1/n. Asymptotic normal: if n large f ( n(θ n θ 0 ) ) = N (0, 1) Sufficient. Efficient. Regularity conditions: identifiability: x : L(x, θ) L(x, θ ) θ θ SI-2016 K. Pelckmans Jan.-March, 2016 14

Least Squares Revisited Observations (realizations) {y i } n i=1 of {Y i} n i=1. Model Y i = x T i θ + V i, V i N (0, 1) and observaciones (realizations) y i = x T i θ + e i with {e i } realizations of {V i } i i.i.d.. {x i } n i=1 and θ deterministic. Equivalently Y i N (x T i θ, σ) or p(y i ) = N (y i ; x T i θ, σ) SI-2016 K. Pelckmans Jan.-March, 2016 15

Since {V i } independent Y 1,..., Y n n N (x T i θ, σ) i=1 or p(y 1,..., y n ) = n N (y i ; x T i θ, σ) i=1 SI-2016 K. Pelckmans Jan.-March, 2016 16

Maximum Likelihood θ n = argmax θ log L(θ, y 1,..., y n ) = log = Ordinary Least Squares (OLS). n N (y i ; x T i θ, σ) i=1 θ n = argmin θ c n ( yi x T i θ ) 2 i=1 Also when zero mean, uncorrelated errors with equal, bounded variance (Gauss-Markov). If noise not independent, but V 1. N (0 n, Σ) V n Then ML leads to BLUE (Best Linear Unbiased Estimator) θ n = argmin θ e T Σ 1 e = (y Xθ) T Σ 1 (y Xθ) SI-2016 K. Pelckmans Jan.-March, 2016 17

Analysis OLS Model with {V i } zero mean white noise with E[V 2 i ] = λ2 and {θ, x 1,..., x n } R d Y i = x i θ 0 + V i. OLS: θ n = argmin θ y Xθ 2 2. Normal Equations: (X T X)θ = X T y. Unbiased: E[θ n ] = θ 0. Covariance: E[(θ n θ 0 ) T (θ n θ 0 )] = λ 2 (X T X) 1 = λ2 n ( 1 n ) 1 n x i x T i. i=1 Objective: E y Xθ n 2 2 = λ 2 (n d). SI-2016 K. Pelckmans Jan.-March, 2016 18

Efficient. n(θ n θ 0 ) N (0, I 1 ) for n. SI-2016 K. Pelckmans Jan.-March, 2016 19

Cramér-Rao Bound Lowerbound to the variance of any unbiased estimator θ n to θ. Discrete setting: D is set of samples, with {D n } denoting the observed datasets. Given PDFs p θ > 0 over all possible datasets D n such that D n p θ (D n ) = 1 Given estimator θ n (D n ) of θ Θ such that θ Θ E[θ n (D n )] = D n θ n (D n )p θ (D n ) = θ SI-2016 K. Pelckmans Jan.-March, 2016 20

Taking derivative w.r.t. θ gives { dp θ (D n ) Dn dθ = 0 D n θ n (D n ) dp θ(d n ) dθ = 1 and hence 1 = D n (θ θ n (D n )) dp θ(d n ) dθ Combining: 1 = Dn ( ) ( ) pθ (D n ) dpθ (D n ) (θ θ n (D n )) p θ (D n ) dθ 1 = Dn (θ θ n (D n )) p θ (D n ) ( ) pθ (D n ) dp θ (D n ) p θ (D n ) dθ Cauchy-Schwarz (a T b a 2 b 2 ) gives 1 Dn (θ θ n (D n )) 2 p θ (D n ) Dn ( pθ (D n ) dp θ (D n ) p θ (D n ) dθ ) 2 SI-2016 K. Pelckmans Jan.-March, 2016 21

Or with E θ [θ θ n (D n )] 2 1 I(θ) I(θ) = E θ [ dpθ (D n ) dθ 1 p θ (D n ) ] 2 SI-2016 K. Pelckmans Jan.-March, 2016 22

Dynamical System Describe data as coming from a ARX(1,1) model Y t + ay t 1 = bu t 1 + V t, t. where {V t } zero mean and unit variance white noise. Then application of OLS gives Normal equations of θ = ( a, b) T ([ n t=1 Y 2 n t 1 t=1 u t 1Y t 1 n t=1 Y ]) t 1u t 1 θ = n t=1 u2 t 1 [ n t=1 Y ] n t 1y t t=1 u. t 1Y t Taking Expectations [ n E t=1 Y 2 n t 1 t=1 u t 1Y t 1 n t=1 Y ] t 1u t 1 θ = E n t=1 u2 t 1 [ n t=1 Y ] n t 1Y t t=1 u. t 1Y t SI-2016 K. Pelckmans Jan.-March, 2016 23

Approximatively θ [ ] E[Y 2 1 [ ] t ] E[Y t u t ] E[Yt y t 1 ] E[u t Y t ] E[u 2 t] E[Y t u t 1 ] = R 1 r. Ill-conditioning. SI-2016 K. Pelckmans Jan.-March, 2016 24

Instrumental Variables LS: min n i=1 (Y i x T i θ)2 Normal equations ( n x i x T i i=1 ) θ = ( n ) x i Y i i=1 Or n ( x i Yi x T i θ ) = 0 d i=1 Interpretation as orthogonal projection. Idea: n ( Z i Yi x T i θ ) = 0 d i=1 SI-2016 K. Pelckmans Jan.-March, 2016 25

Choose {Z t } taking values in R d such that {Z t } independent from {V t } R full rank. Example. Past input signals. Figure 1: Instrumental Variable as Modified Projection. SI-2016 K. Pelckmans Jan.-March, 2016 26

Conclusions Stochastic (Theory) and Statistical (Data). Maximum Likelihood. Least Squares. Instrumental Variables. SI-2016 K. Pelckmans Jan.-March, 2016 27