System Identification, Lecture 4 Kristiaan Pelckmans (IT/UU, 2338) Course code: 1RT880, Report code: 61800 - Spring 2012 F, FRI Uppsala University, Information Technology 30 Januari 2012 SI-2012 K. Pelckmans Jan.-March, 2012
Lecture 4 Stochastic Setup. Interpretation. Maximum Likelihood. Least Squares Revisited. Instrumental Variables. SI-2012 K. Pelckmans Jan.-March, 2012 1
Stochastic Setup Why: Analysis (abstract away). Constructive. Basics: Samples ω Sample space Ω = {ω} Event A Ω. SI-2012 K. Pelckmans Jan.-March, 2012 2
Rules of probability Pr : {Ω} [0, 1] 1. Pr(Ω) = 1, 2. Pr({}) = 0, 3. Pr(A 1 A 2 ) = Pr(A 1 ) + Pr(A 2 ) for A 1 A 2 = {} SI-2012 K. Pelckmans Jan.-March, 2012 3
Random variables: Function X : Ω R. Pr ({ω : X(ω) = x}) := P (X = x). CDF P (x) := Pr(X x). PDF p(x) := dp (x) dx. Realizations vs. random variables. Expectation: E[X] := Ω x dp (x) = Ω x p(x) dx Expectation of g : R R. E[g(X)] = g(x) dp (x) = Ω Ω g(x) p(x) dx SI-2012 K. Pelckmans Jan.-March, 2012 4
Independence: E[X 1 X 2 ] = E[X 1 ] E[X 2 ] f n 14 12 10 8 6 4 2 0 3 2 1 0 1 2 3 x F n 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 3 2 1 0 1 2 3 x Examples: Urn. Sample= Ball. Random variable = Color ball. Sample = Image. Random variable = Color/black-white. Sample = Speech. Random variable = Words. Sample = text. Random variable = Length optimal compression. SI-2012 K. Pelckmans Jan.-March, 2012 5
Sample = weather. Random variable = Temperature. Sample = External force. Random variable = Influence. Law of Large numbers: if {X i } n i=1 I.I.D.: 1 n n X i E[X]. i=1 SI-2012 K. Pelckmans Jan.-March, 2012 6
Gaussian PDF: Gaussian PDF determined by 2 moment: mean µ and variance σ 2 p(x) = 1 ) ( σ 2π exp (x µ)2 2σ 2 = N (x; µ, σ) (or standard deviation σ). Closed by convolution, conditioning and product. Multivariate Normal PDF for x R 2 : p(x) = N (x; µ, Σ) = 1 2π Σ 1 2 exp ( 1 ) 2 (x µ)t Σ 1 (x µ) SI-2012 K. Pelckmans Jan.-March, 2012 7
0.8 1 0.6 PDF(x1,x2) CDF(x1,x2) 0.8 0.6 0.4 0.4 0.2 0.2 0 4 0 4 2 4 2 0 SI-2012 4 ï4 x 0 ï2 ï2 ï4 2 0 0 ï2 x2 2 x2 1 K. Pelckmans ï2 ï4 ï4 x1 Jan.-March, 2012 8
Let bivariate Gaussian with mean µ and variance matrix Σ as p ([ x1 x 2 ]) = N ( [ ] [ ]) µ1 Σ11 Σ x;, 12 µ 2 Σ 21 Σ 22 Then conditional bivariate Gaussian p(x 1 X 2 = x 2 ) = N ( µ, Σ) where { µ = µ 1 + Σ 12 Σ 1 22 (x 2 µ 2 ) Σ = Σ 11 Σ 12 Σ 1 22 Σ 21 Central Limit Theorem (CLT): if X i N (E[X], σ) 1 n n X i N i=1 ( E[X], ) σ n SI-2012 K. Pelckmans Jan.-March, 2012 9
Stochastic Processes Sequence of random variables indexed by time. Joint distribution function. Conditional PDF. Stationarity. Quasi-Stationary. Ergodic: different realizations vs. time averages. SI-2012 K. Pelckmans Jan.-March, 2012 10
Maximum Likelihood If the true PDF of X were f, then the probability of occurrence of a realization x of X were p(x). Consider a class of such PDFs {p θ : θ Θ} Likelihood function: this PDF as a function of an unknown parameter θ L(θ; x) = p θ (x) Idea: given an observation y i, find a model under which this sample was most likely to occur. θ n = argmax θ Θ L(y i, θ) Equivalent to θ n = argmax θ Θ log L(y i ; θ) SI-2012 K. Pelckmans Jan.-March, 2012 11
Properties of θ n Assume that x n contains n independent samples. Consistent, i.e. θ n θ 0 with rate 1/n. Asymptotic normal: if n large f ( n(θ n θ 0 ) ) = N (0, 1) Sufficient. Efficient. Regularity conditions: identifiability: x : L(x, θ) L(x, θ ) θ θ SI-2012 K. Pelckmans Jan.-March, 2012 12
Interpretations The metallurgist told his friend the statistician how he planned to test the effect of heat on the strength of a metal bar by sawing the bar into six pieces. The first two would go into the hot oven, the next two into the medium oven and the last two into the cool oven. The statistician, horrified, explained how he should randomize in order to avoid the effect of a possible gradient of strength in the metal bar. The method of randomization was applied, and it turned out that the randomized experiment called for putting the first two into the hot oven, the next two into the medium oven and the last two into the cool oven. Obviously, we can t do that, said the metallurgist. On the contrary, you have to do that, said the statistician. SI-2012 K. Pelckmans Jan.-March, 2012 13
Least Squares Revisited Observations of {y i } n i=1. Model as realizations of random variable Y i with Y i = x T i θ + V i, V i N (0, 1) with {e i } realizations of {V i } i i.i.d.. {x i } n i=1 and θ deterministic. Equivalently p(v i ) = N (Y i ; x T i θ, σ) Since {V i } independent p(v 1,..., V n ) = n N (Y i ; x T i θ, σ) i=1 SI-2012 K. Pelckmans Jan.-March, 2012 14
Maximum Likelihood θ n = argmax θ log L(θ, V 1,..., V n ) = log = Ordinary Least Squares (OLS). n N (Y i ; x T i θ, σ) i=1 θ n = argmin θ c n ( Yi x T i θ ) 2 i=1 Also when zero mean, uncorrelated errors with equal, bounded variance (Gauss-Markov). If noise not independent, but e 1. N (0 n, Σ) e n Then ML leads to BLUE θ n = argmin θ e T Σ 1 e = (y Xθ) T Σ 1 (y Xθ) SI-2012 K. Pelckmans Jan.-March, 2012 15
Analysis OLS Model with {V i } zero mean white noise with E[V 2 i ] = λ2 and {θ, x 1,..., x n } R d Y i = x i θ 0 + V i. OLS: θ n = argmin θ y Xθ 2 2. Normal Equations: (X T X)θ = X T y. Unbiased: E[θ n ] = θ 0. Covariance: E[(θ n θ 0 ) T (θ n θ 0 )] = λ 2 (X T X) 1 = λ2 n ( 1 n ) 1 n x i x T i. i=1 Objective: E y Xθ n 2 2 = λ 2 (n d). SI-2012 K. Pelckmans Jan.-March, 2012 16
Efficient. n(θ n θ 0 ) N (0, I 1 ) for n. SI-2012 K. Pelckmans Jan.-March, 2012 17
Cramér-Rao Bound Lowerbound to the variance of any unbiased estimator θ n to θ. Discrete setting: D is set of samples, with {D n } denoting the observed datasets. Given PDFs p θ > 0 over all possible datasets D n such that D n p θ (D n ) = 1 Given estimator θ n (D n ) of θ Θ such that θ Θ E[θ n (D n )] = D n θ n (D n )p θ (D n ) = θ SI-2012 K. Pelckmans Jan.-March, 2012 18
Taking derivative w.r.t. θ gives { dp θ (D n ) Dn dθ = 0 D n θ n (D n ) dp θ(d n ) dθ = 1 and hence 1 = D n (θ θ n (D n )) dp θ(d n ) dθ Combining: 1 = Dn ( ) ( ) pθ (D n ) dpθ (D n ) (θ θ n (D n )) p θ (D n ) dθ 1 = Dn (θ θ n (D n )) p θ (D n ) ( ) pθ (D n ) dp θ (D n ) p θ (D n ) dθ Cauchy-Schwarz (a T b a 2 b 2 ) gives 1 Dn (θ θ n (D n )) 2 p θ (D n ) Dn ( pθ (D n ) dp θ (D n ) p θ (D n ) dθ ) 2 SI-2012 K. Pelckmans Jan.-March, 2012 19
Or with E θ [θ θ n (D n )] 2 1 I(θ) I(θ) = E θ [ dpθ (D n ) dθ 1 p θ (D n ) ] 2 SI-2012 K. Pelckmans Jan.-March, 2012 20
Dynamical System Given the ARX(1,1) system y t + ay t 1 = bu t 1 + e t, t. where {e t } zero mean and unit variance white noise. Then application of OLS gives Normal equations of θ = ( a, b) T ([ n t=1 y2 t 1 n t=1 u t 1y t 1 n t=1 y ]) t 1u t 1 θ = n t=1 u2 t 1 [ n t=1 y ] n t 1y t t=1 u. t 1y t Taking Expectations E [ n t=1 y2 t 1 n t=1 u t 1y t 1 n t=1 y ] t 1u t 1 θ = E n t=1 u2 t 1 [ n t=1 y ] n t 1y t t=1 u. t 1y t SI-2012 K. Pelckmans Jan.-March, 2012 21
Approximatively θ [ ] E[y 2 1 [ ] t ] E[y t u t ] E[yt y t 1 ] E[u t y t ] E[u 2 t] E[y t u t 1 ] = R 1 r. Ill-conditioning. SI-2012 K. Pelckmans Jan.-March, 2012 22
Instrumental Variables LS: min n i=1 (y i x T i θ)2 Normal equations ( n x i x T i i=1 ) θ = ( n ) x i y i i=1 Or n ( x i yi x T i θ ) = 0 d i=1 Interpretation as orthogonal projection. Idea: n ( Z i yi x T i θ ) = 0 d i=1 SI-2012 K. Pelckmans Jan.-March, 2012 23
Choose {Z t } taking values in R d such that {Z t } independent from {V t } R full rank. Example. Past input signals. Figure 1: Instrumental Variable as Modified Projection. SI-2012 K. Pelckmans Jan.-March, 2012 24
Conclusions Stochastic (Theory) and Statistical (Data). Maximum Likelihood. Least Squares. Instrumental Variables. SI-2012 K. Pelckmans Jan.-March, 2012 25