System Identification, Lecture 4

Similar documents
System Identification, Lecture 4

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

2 Statistical Estimation: Basic Concepts

Continuous Random Variables

Lecture 1: August 28

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

ELEG 3143 Probability & Stochastic Process Ch. 6 Stochastic Process

Discrete time processes

Parameter Estimation

Lecture 2: Repetition of probability theory and statistics

Introduction to Stochastic processes

Module 9: Stationary Processes

Lesson 4: Stationary stochastic processes

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

1 Bayesian Linear Regression (BLR)

Quick Tour of Basic Probability Theory and Linear Algebra

A Few Notes on Fisher Information (WIP)

E X A M. Probability Theory and Stochastic Processes Date: December 13, 2016 Duration: 4 hours. Number of pages incl.

Lecture 32: Asymptotic confidence sets and likelihoods

Mathematical Statistics

10. Linear Models and Maximum Likelihood Estimation

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =

p(z)

First Year Examination Department of Statistics, University of Florida

Brief Review on Estimation Theory

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Estimation of Dynamic Regression Models

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

System Identification

Machine Learning Basics: Maximum Likelihood Estimation

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Introduction to Estimation Methods for Time Series models Lecture 2

STAT 100C: Linear models

Fundamentals of Digital Commun. Ch. 4: Random Variables and Random Processes

Chapter 5 continued. Chapter 5 sections

Lecture 4 September 15

Northwestern University Department of Electrical Engineering and Computer Science

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE)

Probability and Estimation. Alan Moses

STAT 414: Introduction to Probability Theory

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003

Stochastic Processes

Bayesian Decision and Bayesian Learning

Stochastic Processes. M. Sami Fadali Professor of Electrical Engineering University of Nevada, Reno

Parameter Estimation

1. Point Estimators, Review

Lecture 11. Probability Theory: an Overveiw

Inference in non-linear time series

Maximum Likelihood Tests and Quasi-Maximum-Likelihood

EIE6207: Estimation Theory

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

1 Class Organization. 2 Introduction

Multivariate Random Variable

ECON 616: Lecture 1: Time Series Basics

6.1 Variational representation of f-divergences

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

Graphical Models for Collaborative Filtering

4. Distributions of Functions of Random Variables

10-704: Information Processing and Learning Spring Lecture 8: Feb 5

ECE 636: Systems identification

Mathematical statistics

STAT 418: Probability and Stochastic Processes

Mathematical statistics

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

DS-GA 1002 Lecture notes 10 November 23, Linear models

Introduction to Estimation Methods for Time Series models. Lecture 1

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

STA205 Probability: Week 8 R. Wolpert

Probability Models. 4. What is the definition of the expectation of a discrete random variable?

Statistical signal processing

Stochastic Processes. A stochastic process is a function of two variables:

MAS223 Statistical Inference and Modelling Exercises

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

Economics 620, Lecture 9: Asymptotics III: Maximum Likelihood Estimation

Lecture Notes 7 Stationary Random Processes. Strict-Sense and Wide-Sense Stationarity. Autocorrelation Function of a Stationary Process

Chapter 4. Chapter 4 sections

Convergence of Square Root Ensemble Kalman Filters in the Large Ensemble Limit

Advanced Signal Processing Introduction to Estimation Theory

Algorithms for Uncertainty Quantification

Introduction to Probability and Stocastic Processes - Part I

Lecture 3: Pattern Classification

1. Stochastic Processes and Stationarity

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

where r n = dn+1 x(t)

STAT 730 Chapter 4: Estimation

Appendix A : Introduction to Probability and stochastic processes

ECE 275B Homework # 1 Solutions Winter 2018

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

Machine learning - HT Maximum Likelihood

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

A Probability Review

Probability on a Riemannian Manifold

Chapter 8: Least squares (beginning of chapter)

Gaussian Processes 1. Schedule

5 Operations on Multiple Random Variables

Transcription:

System Identification, Lecture 4 Kristiaan Pelckmans (IT/UU, 2338) Course code: 1RT880, Report code: 61800 - Spring 2016 F, FRI Uppsala University, Information Technology 13 April 2016 SI-2016 K. Pelckmans Jan.-March, 2016

Lecture 4 Stochastic Setup. Interpretation. Maximum Likelihood. Least Squares Revisited. Instrumental Variables. SI-2016 K. Pelckmans Jan.-March, 2016 1

Stochastic Setup Why: Analysis (abstract away). Constructive. Basics: Samples ω Sample space Ω = {ω} Event A Ω. SI-2016 K. Pelckmans Jan.-March, 2016 2

Rules of probability Pr : {Ω} [0, 1] 1. Pr(Ω) = 1, 2. Pr({}) = 0, 3. Pr(A 1 A 2 ) = Pr(A 1 ) + Pr(A 2 ) for A 1 A 2 = {} SI-2016 K. Pelckmans Jan.-March, 2016 3

Random variables is a function X : Ω R. Examples: Urn. Sample= Ball. Random variable = Color ball. Sample = Image. Random variable = Color/black-white. Sample = Speech. Random variable = Words. Sample = text. Random variable = Length optimal compression. Sample = weather. Random variable = Temperature. Sample = External force. Random variable = Influence. SI-2016 K. Pelckmans Jan.-March, 2016 4

Pr ({ω : X(ω) = x}) := P (X = x). CDF P (x) := Pr(X x). 1 0.9 0.8 0.7 0.6 F n 0.5 0.4 0.3 0.2 0.1 0 3 2 1 0 1 2 3 x PDF p(x) := dp (x) dx. 14 12 10 8 f n 6 4 2 0 3 2 1 0 1 2 3 x Realizations x vs. random variables X. SI-2016 K. Pelckmans Jan.-March, 2016 5

Expectation: E[X] := Ω x Pr(dx) = Ω x dp (x) = Ω x p(x) dx Expectation of g : R R. E[g(X)] = g(x) dp (x) = Ω Ω g(x) p(x) dx Independence: E[X 1 X 2 ] = E[X 1 ] E[X 2 ] Law of Large numbers: if {X i } n i=1 I.I.D.: 1 n n X i E[X]. i=1 SI-2016 K. Pelckmans Jan.-March, 2016 6

Gaussian PDF: Gaussian PDF determined by 2 moment: mean µ and variance σ 2 p(x) = 1 ) ( σ 2π exp (x µ)2 2σ 2 = N (x; µ, σ) (or standard deviation σ). Closed by convolution, conditioning and product. Multivariate Normal PDF for x R 2 : p(x) = N (x; µ, Σ) = 1 2π Σ 1 2 exp ( 1 ) 2 (x µ)t Σ 1 (x µ) SI-2016 K. Pelckmans Jan.-March, 2016 7

1 0.8 CDF(x 1,x 2 ) 0.6 0.4 0.2 0 4 2 0 x 2 2 4 4 2 x 1 0 2 4 SI-2016 K. Pelckmans Jan.-March, 2016 8

0.8 0.6 PDF(x 1,x 2 ) 0.4 0.2 0 4 2 0 x 2 2 4 4 2 x 1 0 2 4 SI-2016 K. Pelckmans Jan.-March, 2016 9

Let bivariate Gaussian with mean µ and variance matrix Σ as p ([ x1 x 2 ]) = N ( [ ] [ ]) µ1 Σ11 Σ x;, 12 µ 2 Σ 21 Σ 22 Then conditional bivariate Gaussian p(x 1 X 2 = x 2 ) = N ( µ, Σ) where { µ = µ 1 + Σ 12 Σ 1 22 (x 2 µ 2 ) Σ = Σ 11 Σ 12 Σ 1 22 Σ 21 Central Limit Theorem (CLT): if X i N (E[X], σ) 1 n n X i N i=1 ( E[X], ) σ n SI-2016 K. Pelckmans Jan.-March, 2016 10

Interpretations The metallurgist told his friend the statistician how he planned to test the effect of heat on the strength of a metal bar by sawing the bar into six pieces. The first two would go into the hot oven, the next two into the medium oven and the last two into the cool oven. The statistician, horrified, explained how he should randomize in order to avoid the effect of a possible gradient of strength in the metal bar. The method of randomization was applied, and it turned out that the randomized experiment called for putting the first two into the hot oven, the next two into the medium oven and the last two into the cool oven. Obviously, we can t do that, said the metallurgist. On the contrary, you have to do that, said the statistician. SI-2016 K. Pelckmans Jan.-March, 2016 11

Stochastic Processes Sequence of random variables indexed by time. Joint distribution function. Conditional PDF. Stationarity. Quasi-Stationary. Ergodic: different realizations vs. time averages. SI-2016 K. Pelckmans Jan.-March, 2016 12

Maximum Likelihood If the true PDF of X were f, then the probability of occurrence of a realization x of X were p(x). Consider a class of such PDFs {p θ : θ Θ} Likelihood function: this PDF as a function of an unknown parameter θ L(θ; x) = p θ (x) Idea: given an observation y i, find a model under which this sample was most likely to occur. θ n = argmax θ Θ L(y i, θ) Equivalent to θ n = argmax θ Θ log L(y i ; θ) SI-2016 K. Pelckmans Jan.-March, 2016 13

Properties of θ n Assume that x n contains n independent samples. Consistent, i.e. θ n θ 0 with rate 1/n. Asymptotic normal: if n large f ( n(θ n θ 0 ) ) = N (0, 1) Sufficient. Efficient. Regularity conditions: identifiability: x : L(x, θ) L(x, θ ) θ θ SI-2016 K. Pelckmans Jan.-March, 2016 14

Least Squares Revisited Observations (realizations) {y i } n i=1 of {Y i} n i=1. Model Y i = x T i θ + V i, V i N (0, 1) and observaciones (realizations) y i = x T i θ + e i with {e i } realizations of {V i } i i.i.d.. {x i } n i=1 and θ deterministic. Equivalently Y i N (x T i θ, σ) or p(y i ) = N (y i ; x T i θ, σ) SI-2016 K. Pelckmans Jan.-March, 2016 15

Since {V i } independent Y 1,..., Y n n N (x T i θ, σ) i=1 or p(y 1,..., y n ) = n N (y i ; x T i θ, σ) i=1 SI-2016 K. Pelckmans Jan.-March, 2016 16

Maximum Likelihood θ n = argmax θ log L(θ, y 1,..., y n ) = log = Ordinary Least Squares (OLS). n N (y i ; x T i θ, σ) i=1 θ n = argmin θ c n ( yi x T i θ ) 2 i=1 Also when zero mean, uncorrelated errors with equal, bounded variance (Gauss-Markov). If noise not independent, but V 1. N (0 n, Σ) V n Then ML leads to BLUE (Best Linear Unbiased Estimator) θ n = argmin θ e T Σ 1 e = (y Xθ) T Σ 1 (y Xθ) SI-2016 K. Pelckmans Jan.-March, 2016 17

Analysis OLS Model with {V i } zero mean white noise with E[V 2 i ] = λ2 and {θ, x 1,..., x n } R d Y i = x i θ 0 + V i. OLS: θ n = argmin θ y Xθ 2 2. Normal Equations: (X T X)θ = X T y. Unbiased: E[θ n ] = θ 0. Covariance: E[(θ n θ 0 ) T (θ n θ 0 )] = λ 2 (X T X) 1 = λ2 n ( 1 n ) 1 n x i x T i. i=1 Objective: E y Xθ n 2 2 = λ 2 (n d). SI-2016 K. Pelckmans Jan.-March, 2016 18

Efficient. n(θ n θ 0 ) N (0, I 1 ) for n. SI-2016 K. Pelckmans Jan.-March, 2016 19

Cramér-Rao Bound Lowerbound to the variance of any unbiased estimator θ n to θ. Discrete setting: D is set of samples, with {D n } denoting the observed datasets. Given PDFs p θ > 0 over all possible datasets D n such that D n p θ (D n ) = 1 Given estimator θ n (D n ) of θ Θ such that θ Θ E[θ n (D n )] = D n θ n (D n )p θ (D n ) = θ SI-2016 K. Pelckmans Jan.-March, 2016 20

Taking derivative w.r.t. θ gives { dp θ (D n ) Dn dθ = 0 D n θ n (D n ) dp θ(d n ) dθ = 1 and hence 1 = D n (θ θ n (D n )) dp θ(d n ) dθ Combining: 1 = Dn ( ) ( ) pθ (D n ) dpθ (D n ) (θ θ n (D n )) p θ (D n ) dθ 1 = Dn (θ θ n (D n )) p θ (D n ) ( ) pθ (D n ) dp θ (D n ) p θ (D n ) dθ Cauchy-Schwarz (a T b a 2 b 2 ) gives 1 Dn (θ θ n (D n )) 2 p θ (D n ) Dn ( pθ (D n ) dp θ (D n ) p θ (D n ) dθ ) 2 SI-2016 K. Pelckmans Jan.-March, 2016 21

Or with E θ [θ θ n (D n )] 2 1 I(θ) I(θ) = E θ [ dpθ (D n ) dθ 1 p θ (D n ) ] 2 SI-2016 K. Pelckmans Jan.-March, 2016 22

Dynamical System Describe data as coming from a ARX(1,1) model Y t + ay t 1 = bu t 1 + V t, t. where {V t } zero mean and unit variance white noise. Then application of OLS gives Normal equations of θ = ( a, b) T ([ n t=1 Y 2 n t 1 t=1 u t 1Y t 1 n t=1 Y ]) t 1u t 1 θ = n t=1 u2 t 1 [ n t=1 Y ] n t 1y t t=1 u. t 1Y t Taking Expectations [ n E t=1 Y 2 n t 1 t=1 u t 1Y t 1 n t=1 Y ] t 1u t 1 θ = E n t=1 u2 t 1 [ n t=1 Y ] n t 1Y t t=1 u. t 1Y t SI-2016 K. Pelckmans Jan.-March, 2016 23

Approximatively θ [ ] E[Y 2 1 [ ] t ] E[Y t u t ] E[Yt y t 1 ] E[u t Y t ] E[u 2 t] E[Y t u t 1 ] = R 1 r. Ill-conditioning. SI-2016 K. Pelckmans Jan.-March, 2016 24

Instrumental Variables LS: min n i=1 (Y i x T i θ)2 Normal equations ( n x i x T i i=1 ) θ = ( n ) x i Y i i=1 Or n ( x i Yi x T i θ ) = 0 d i=1 Interpretation as orthogonal projection. Idea: n ( Z i Yi x T i θ ) = 0 d i=1 SI-2016 K. Pelckmans Jan.-March, 2016 25

Choose {Z t } taking values in R d such that {Z t } independent from {V t } R full rank. Example. Past input signals. Figure 1: Instrumental Variable as Modified Projection. SI-2016 K. Pelckmans Jan.-March, 2016 26

Conclusions Stochastic (Theory) and Statistical (Data). Maximum Likelihood. Least Squares. Instrumental Variables. SI-2016 K. Pelckmans Jan.-March, 2016 27