System Identification, Lecture 4 Kristiaan Pelckmans (IT/UU, 2338) Course code: 1RT880, Report code: 61800 - Spring 2016 F, FRI Uppsala University, Information Technology 13 April 2016 SI-2016 K. Pelckmans Jan.-March, 2016
Lecture 4 Stochastic Setup. Interpretation. Maximum Likelihood. Least Squares Revisited. Instrumental Variables. SI-2016 K. Pelckmans Jan.-March, 2016 1
Stochastic Setup Why: Analysis (abstract away). Constructive. Basics: Samples ω Sample space Ω = {ω} Event A Ω. SI-2016 K. Pelckmans Jan.-March, 2016 2
Rules of probability Pr : {Ω} [0, 1] 1. Pr(Ω) = 1, 2. Pr({}) = 0, 3. Pr(A 1 A 2 ) = Pr(A 1 ) + Pr(A 2 ) for A 1 A 2 = {} SI-2016 K. Pelckmans Jan.-March, 2016 3
Random variables is a function X : Ω R. Examples: Urn. Sample= Ball. Random variable = Color ball. Sample = Image. Random variable = Color/black-white. Sample = Speech. Random variable = Words. Sample = text. Random variable = Length optimal compression. Sample = weather. Random variable = Temperature. Sample = External force. Random variable = Influence. SI-2016 K. Pelckmans Jan.-March, 2016 4
Pr ({ω : X(ω) = x}) := P (X = x). CDF P (x) := Pr(X x). 1 0.9 0.8 0.7 0.6 F n 0.5 0.4 0.3 0.2 0.1 0 3 2 1 0 1 2 3 x PDF p(x) := dp (x) dx. 14 12 10 8 f n 6 4 2 0 3 2 1 0 1 2 3 x Realizations x vs. random variables X. SI-2016 K. Pelckmans Jan.-March, 2016 5
Expectation: E[X] := Ω x Pr(dx) = Ω x dp (x) = Ω x p(x) dx Expectation of g : R R. E[g(X)] = g(x) dp (x) = Ω Ω g(x) p(x) dx Independence: E[X 1 X 2 ] = E[X 1 ] E[X 2 ] Law of Large numbers: if {X i } n i=1 I.I.D.: 1 n n X i E[X]. i=1 SI-2016 K. Pelckmans Jan.-March, 2016 6
Gaussian PDF: Gaussian PDF determined by 2 moment: mean µ and variance σ 2 p(x) = 1 ) ( σ 2π exp (x µ)2 2σ 2 = N (x; µ, σ) (or standard deviation σ). Closed by convolution, conditioning and product. Multivariate Normal PDF for x R 2 : p(x) = N (x; µ, Σ) = 1 2π Σ 1 2 exp ( 1 ) 2 (x µ)t Σ 1 (x µ) SI-2016 K. Pelckmans Jan.-March, 2016 7
1 0.8 CDF(x 1,x 2 ) 0.6 0.4 0.2 0 4 2 0 x 2 2 4 4 2 x 1 0 2 4 SI-2016 K. Pelckmans Jan.-March, 2016 8
0.8 0.6 PDF(x 1,x 2 ) 0.4 0.2 0 4 2 0 x 2 2 4 4 2 x 1 0 2 4 SI-2016 K. Pelckmans Jan.-March, 2016 9
Let bivariate Gaussian with mean µ and variance matrix Σ as p ([ x1 x 2 ]) = N ( [ ] [ ]) µ1 Σ11 Σ x;, 12 µ 2 Σ 21 Σ 22 Then conditional bivariate Gaussian p(x 1 X 2 = x 2 ) = N ( µ, Σ) where { µ = µ 1 + Σ 12 Σ 1 22 (x 2 µ 2 ) Σ = Σ 11 Σ 12 Σ 1 22 Σ 21 Central Limit Theorem (CLT): if X i N (E[X], σ) 1 n n X i N i=1 ( E[X], ) σ n SI-2016 K. Pelckmans Jan.-March, 2016 10
Interpretations The metallurgist told his friend the statistician how he planned to test the effect of heat on the strength of a metal bar by sawing the bar into six pieces. The first two would go into the hot oven, the next two into the medium oven and the last two into the cool oven. The statistician, horrified, explained how he should randomize in order to avoid the effect of a possible gradient of strength in the metal bar. The method of randomization was applied, and it turned out that the randomized experiment called for putting the first two into the hot oven, the next two into the medium oven and the last two into the cool oven. Obviously, we can t do that, said the metallurgist. On the contrary, you have to do that, said the statistician. SI-2016 K. Pelckmans Jan.-March, 2016 11
Stochastic Processes Sequence of random variables indexed by time. Joint distribution function. Conditional PDF. Stationarity. Quasi-Stationary. Ergodic: different realizations vs. time averages. SI-2016 K. Pelckmans Jan.-March, 2016 12
Maximum Likelihood If the true PDF of X were f, then the probability of occurrence of a realization x of X were p(x). Consider a class of such PDFs {p θ : θ Θ} Likelihood function: this PDF as a function of an unknown parameter θ L(θ; x) = p θ (x) Idea: given an observation y i, find a model under which this sample was most likely to occur. θ n = argmax θ Θ L(y i, θ) Equivalent to θ n = argmax θ Θ log L(y i ; θ) SI-2016 K. Pelckmans Jan.-March, 2016 13
Properties of θ n Assume that x n contains n independent samples. Consistent, i.e. θ n θ 0 with rate 1/n. Asymptotic normal: if n large f ( n(θ n θ 0 ) ) = N (0, 1) Sufficient. Efficient. Regularity conditions: identifiability: x : L(x, θ) L(x, θ ) θ θ SI-2016 K. Pelckmans Jan.-March, 2016 14
Least Squares Revisited Observations (realizations) {y i } n i=1 of {Y i} n i=1. Model Y i = x T i θ + V i, V i N (0, 1) and observaciones (realizations) y i = x T i θ + e i with {e i } realizations of {V i } i i.i.d.. {x i } n i=1 and θ deterministic. Equivalently Y i N (x T i θ, σ) or p(y i ) = N (y i ; x T i θ, σ) SI-2016 K. Pelckmans Jan.-March, 2016 15
Since {V i } independent Y 1,..., Y n n N (x T i θ, σ) i=1 or p(y 1,..., y n ) = n N (y i ; x T i θ, σ) i=1 SI-2016 K. Pelckmans Jan.-March, 2016 16
Maximum Likelihood θ n = argmax θ log L(θ, y 1,..., y n ) = log = Ordinary Least Squares (OLS). n N (y i ; x T i θ, σ) i=1 θ n = argmin θ c n ( yi x T i θ ) 2 i=1 Also when zero mean, uncorrelated errors with equal, bounded variance (Gauss-Markov). If noise not independent, but V 1. N (0 n, Σ) V n Then ML leads to BLUE (Best Linear Unbiased Estimator) θ n = argmin θ e T Σ 1 e = (y Xθ) T Σ 1 (y Xθ) SI-2016 K. Pelckmans Jan.-March, 2016 17
Analysis OLS Model with {V i } zero mean white noise with E[V 2 i ] = λ2 and {θ, x 1,..., x n } R d Y i = x i θ 0 + V i. OLS: θ n = argmin θ y Xθ 2 2. Normal Equations: (X T X)θ = X T y. Unbiased: E[θ n ] = θ 0. Covariance: E[(θ n θ 0 ) T (θ n θ 0 )] = λ 2 (X T X) 1 = λ2 n ( 1 n ) 1 n x i x T i. i=1 Objective: E y Xθ n 2 2 = λ 2 (n d). SI-2016 K. Pelckmans Jan.-March, 2016 18
Efficient. n(θ n θ 0 ) N (0, I 1 ) for n. SI-2016 K. Pelckmans Jan.-March, 2016 19
Cramér-Rao Bound Lowerbound to the variance of any unbiased estimator θ n to θ. Discrete setting: D is set of samples, with {D n } denoting the observed datasets. Given PDFs p θ > 0 over all possible datasets D n such that D n p θ (D n ) = 1 Given estimator θ n (D n ) of θ Θ such that θ Θ E[θ n (D n )] = D n θ n (D n )p θ (D n ) = θ SI-2016 K. Pelckmans Jan.-March, 2016 20
Taking derivative w.r.t. θ gives { dp θ (D n ) Dn dθ = 0 D n θ n (D n ) dp θ(d n ) dθ = 1 and hence 1 = D n (θ θ n (D n )) dp θ(d n ) dθ Combining: 1 = Dn ( ) ( ) pθ (D n ) dpθ (D n ) (θ θ n (D n )) p θ (D n ) dθ 1 = Dn (θ θ n (D n )) p θ (D n ) ( ) pθ (D n ) dp θ (D n ) p θ (D n ) dθ Cauchy-Schwarz (a T b a 2 b 2 ) gives 1 Dn (θ θ n (D n )) 2 p θ (D n ) Dn ( pθ (D n ) dp θ (D n ) p θ (D n ) dθ ) 2 SI-2016 K. Pelckmans Jan.-March, 2016 21
Or with E θ [θ θ n (D n )] 2 1 I(θ) I(θ) = E θ [ dpθ (D n ) dθ 1 p θ (D n ) ] 2 SI-2016 K. Pelckmans Jan.-March, 2016 22
Dynamical System Describe data as coming from a ARX(1,1) model Y t + ay t 1 = bu t 1 + V t, t. where {V t } zero mean and unit variance white noise. Then application of OLS gives Normal equations of θ = ( a, b) T ([ n t=1 Y 2 n t 1 t=1 u t 1Y t 1 n t=1 Y ]) t 1u t 1 θ = n t=1 u2 t 1 [ n t=1 Y ] n t 1y t t=1 u. t 1Y t Taking Expectations [ n E t=1 Y 2 n t 1 t=1 u t 1Y t 1 n t=1 Y ] t 1u t 1 θ = E n t=1 u2 t 1 [ n t=1 Y ] n t 1Y t t=1 u. t 1Y t SI-2016 K. Pelckmans Jan.-March, 2016 23
Approximatively θ [ ] E[Y 2 1 [ ] t ] E[Y t u t ] E[Yt y t 1 ] E[u t Y t ] E[u 2 t] E[Y t u t 1 ] = R 1 r. Ill-conditioning. SI-2016 K. Pelckmans Jan.-March, 2016 24
Instrumental Variables LS: min n i=1 (Y i x T i θ)2 Normal equations ( n x i x T i i=1 ) θ = ( n ) x i Y i i=1 Or n ( x i Yi x T i θ ) = 0 d i=1 Interpretation as orthogonal projection. Idea: n ( Z i Yi x T i θ ) = 0 d i=1 SI-2016 K. Pelckmans Jan.-March, 2016 25
Choose {Z t } taking values in R d such that {Z t } independent from {V t } R full rank. Example. Past input signals. Figure 1: Instrumental Variable as Modified Projection. SI-2016 K. Pelckmans Jan.-March, 2016 26
Conclusions Stochastic (Theory) and Statistical (Data). Maximum Likelihood. Least Squares. Instrumental Variables. SI-2016 K. Pelckmans Jan.-March, 2016 27