System Identification, Lecture 4

Similar documents
System Identification, Lecture 4

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

ELEG 3143 Probability & Stochastic Process Ch. 6 Stochastic Process

2 Statistical Estimation: Basic Concepts

Lecture 1: August 28

Continuous Random Variables

Parameter Estimation

Lecture 2: Repetition of probability theory and statistics

Introduction to Stochastic processes

Module 9: Stationary Processes

Lesson 4: Stationary stochastic processes

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

1 Bayesian Linear Regression (BLR)

Quick Tour of Basic Probability Theory and Linear Algebra

Discrete time processes

E X A M. Probability Theory and Stochastic Processes Date: December 13, 2016 Duration: 4 hours. Number of pages incl.

Lecture 32: Asymptotic confidence sets and likelihoods

Mathematical Statistics

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Maximum Likelihood Tests and Quasi-Maximum-Likelihood

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =

A Few Notes on Fisher Information (WIP)

Brief Review on Estimation Theory

System Identification

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Lecture 2: From Linear Regression to Kalman Filter and Beyond

STAT 100C: Linear models

Fundamentals of Digital Commun. Ch. 4: Random Variables and Random Processes

Chapter 5 continued. Chapter 5 sections

Northwestern University Department of Electrical Engineering and Computer Science

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Lecture 4 September 15

ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE)

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

10. Linear Models and Maximum Likelihood Estimation

Stochastic Processes

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003

p(z)

Stochastic Processes. M. Sami Fadali Professor of Electrical Engineering University of Nevada, Reno

Estimation of Dynamic Regression Models

1. Point Estimators, Review

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

First Year Examination Department of Statistics, University of Florida

EIE6207: Estimation Theory

Introduction to Estimation Methods for Time Series models Lecture 2

Multivariate Random Variable

1 Class Organization. 2 Introduction

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

ECON 616: Lecture 1: Time Series Basics

STAT 414: Introduction to Probability Theory

6.1 Variational representation of f-divergences

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

Graphical Models for Collaborative Filtering

4. Distributions of Functions of Random Variables

10-704: Information Processing and Learning Spring Lecture 8: Feb 5

Probability on a Riemannian Manifold

Mathematical statistics

DS-GA 1002 Lecture notes 10 November 23, Linear models

Machine Learning Basics: Maximum Likelihood Estimation

Mathematical statistics

STA205 Probability: Week 8 R. Wolpert

Introduction to Estimation Methods for Time Series models. Lecture 1

Statistical signal processing

Stochastic Processes. A stochastic process is a function of two variables:

Probability and Estimation. Alan Moses

MAS223 Statistical Inference and Modelling Exercises

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

Lecture Notes 7 Stationary Random Processes. Strict-Sense and Wide-Sense Stationarity. Autocorrelation Function of a Stationary Process

Chapter 4. Chapter 4 sections

Inference in non-linear time series

Bayesian Decision and Bayesian Learning

Convergence of Square Root Ensemble Kalman Filters in the Large Ensemble Limit

Algorithms for Uncertainty Quantification

Advanced Signal Processing Introduction to Estimation Theory

Introduction to Probability and Stocastic Processes - Part I

Parameter Estimation

Parametric Signal Modeling and Linear Prediction Theory 1. Discrete-time Stochastic Processes

1. Stochastic Processes and Stationarity

Lecture 11. Probability Theory: an Overveiw

where r n = dn+1 x(t)

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Chapter 3: Maximum Likelihood Theory

STAT 730 Chapter 4: Estimation

Appendix A : Introduction to Probability and stochastic processes

STAT 418: Probability and Stochastic Processes

ECE 275B Homework # 1 Solutions Winter 2018

A Probability Review

5 Operations on Multiple Random Variables

ECE 636: Systems identification

Gaussian Processes 1. Schedule

SGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection

Chapter 7. Confidence Sets Lecture 30: Pivotal quantities and confidence sets

Advanced Quantitative Methods: maximum likelihood

Multiple Random Variables

ELEMENTS OF PROBABILITY THEORY

Machine learning - HT Maximum Likelihood

Random Eigenvalue Problems Revisited

Transcription:

System Identification, Lecture 4 Kristiaan Pelckmans (IT/UU, 2338) Course code: 1RT880, Report code: 61800 - Spring 2012 F, FRI Uppsala University, Information Technology 30 Januari 2012 SI-2012 K. Pelckmans Jan.-March, 2012

Lecture 4 Stochastic Setup. Interpretation. Maximum Likelihood. Least Squares Revisited. Instrumental Variables. SI-2012 K. Pelckmans Jan.-March, 2012 1

Stochastic Setup Why: Analysis (abstract away). Constructive. Basics: Samples ω Sample space Ω = {ω} Event A Ω. SI-2012 K. Pelckmans Jan.-March, 2012 2

Rules of probability Pr : {Ω} [0, 1] 1. Pr(Ω) = 1, 2. Pr({}) = 0, 3. Pr(A 1 A 2 ) = Pr(A 1 ) + Pr(A 2 ) for A 1 A 2 = {} SI-2012 K. Pelckmans Jan.-March, 2012 3

Random variables: Function X : Ω R. Pr ({ω : X(ω) = x}) := P (X = x). CDF P (x) := Pr(X x). PDF p(x) := dp (x) dx. Realizations vs. random variables. Expectation: E[X] := Ω x dp (x) = Ω x p(x) dx Expectation of g : R R. E[g(X)] = g(x) dp (x) = Ω Ω g(x) p(x) dx SI-2012 K. Pelckmans Jan.-March, 2012 4

Independence: E[X 1 X 2 ] = E[X 1 ] E[X 2 ] f n 14 12 10 8 6 4 2 0 3 2 1 0 1 2 3 x F n 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 3 2 1 0 1 2 3 x Examples: Urn. Sample= Ball. Random variable = Color ball. Sample = Image. Random variable = Color/black-white. Sample = Speech. Random variable = Words. Sample = text. Random variable = Length optimal compression. SI-2012 K. Pelckmans Jan.-March, 2012 5

Sample = weather. Random variable = Temperature. Sample = External force. Random variable = Influence. Law of Large numbers: if {X i } n i=1 I.I.D.: 1 n n X i E[X]. i=1 SI-2012 K. Pelckmans Jan.-March, 2012 6

Gaussian PDF: Gaussian PDF determined by 2 moment: mean µ and variance σ 2 p(x) = 1 ) ( σ 2π exp (x µ)2 2σ 2 = N (x; µ, σ) (or standard deviation σ). Closed by convolution, conditioning and product. Multivariate Normal PDF for x R 2 : p(x) = N (x; µ, Σ) = 1 2π Σ 1 2 exp ( 1 ) 2 (x µ)t Σ 1 (x µ) SI-2012 K. Pelckmans Jan.-March, 2012 7

0.8 1 0.6 PDF(x1,x2) CDF(x1,x2) 0.8 0.6 0.4 0.4 0.2 0.2 0 4 0 4 2 4 2 0 SI-2012 4 ï4 x 0 ï2 ï2 ï4 2 0 0 ï2 x2 2 x2 1 K. Pelckmans ï2 ï4 ï4 x1 Jan.-March, 2012 8

Let bivariate Gaussian with mean µ and variance matrix Σ as p ([ x1 x 2 ]) = N ( [ ] [ ]) µ1 Σ11 Σ x;, 12 µ 2 Σ 21 Σ 22 Then conditional bivariate Gaussian p(x 1 X 2 = x 2 ) = N ( µ, Σ) where { µ = µ 1 + Σ 12 Σ 1 22 (x 2 µ 2 ) Σ = Σ 11 Σ 12 Σ 1 22 Σ 21 Central Limit Theorem (CLT): if X i N (E[X], σ) 1 n n X i N i=1 ( E[X], ) σ n SI-2012 K. Pelckmans Jan.-March, 2012 9

Stochastic Processes Sequence of random variables indexed by time. Joint distribution function. Conditional PDF. Stationarity. Quasi-Stationary. Ergodic: different realizations vs. time averages. SI-2012 K. Pelckmans Jan.-March, 2012 10

Maximum Likelihood If the true PDF of X were f, then the probability of occurrence of a realization x of X were p(x). Consider a class of such PDFs {p θ : θ Θ} Likelihood function: this PDF as a function of an unknown parameter θ L(θ; x) = p θ (x) Idea: given an observation y i, find a model under which this sample was most likely to occur. θ n = argmax θ Θ L(y i, θ) Equivalent to θ n = argmax θ Θ log L(y i ; θ) SI-2012 K. Pelckmans Jan.-March, 2012 11

Properties of θ n Assume that x n contains n independent samples. Consistent, i.e. θ n θ 0 with rate 1/n. Asymptotic normal: if n large f ( n(θ n θ 0 ) ) = N (0, 1) Sufficient. Efficient. Regularity conditions: identifiability: x : L(x, θ) L(x, θ ) θ θ SI-2012 K. Pelckmans Jan.-March, 2012 12

Interpretations The metallurgist told his friend the statistician how he planned to test the effect of heat on the strength of a metal bar by sawing the bar into six pieces. The first two would go into the hot oven, the next two into the medium oven and the last two into the cool oven. The statistician, horrified, explained how he should randomize in order to avoid the effect of a possible gradient of strength in the metal bar. The method of randomization was applied, and it turned out that the randomized experiment called for putting the first two into the hot oven, the next two into the medium oven and the last two into the cool oven. Obviously, we can t do that, said the metallurgist. On the contrary, you have to do that, said the statistician. SI-2012 K. Pelckmans Jan.-March, 2012 13

Least Squares Revisited Observations of {y i } n i=1. Model as realizations of random variable Y i with Y i = x T i θ + V i, V i N (0, 1) with {e i } realizations of {V i } i i.i.d.. {x i } n i=1 and θ deterministic. Equivalently p(v i ) = N (Y i ; x T i θ, σ) Since {V i } independent p(v 1,..., V n ) = n N (Y i ; x T i θ, σ) i=1 SI-2012 K. Pelckmans Jan.-March, 2012 14

Maximum Likelihood θ n = argmax θ log L(θ, V 1,..., V n ) = log = Ordinary Least Squares (OLS). n N (Y i ; x T i θ, σ) i=1 θ n = argmin θ c n ( Yi x T i θ ) 2 i=1 Also when zero mean, uncorrelated errors with equal, bounded variance (Gauss-Markov). If noise not independent, but e 1. N (0 n, Σ) e n Then ML leads to BLUE θ n = argmin θ e T Σ 1 e = (y Xθ) T Σ 1 (y Xθ) SI-2012 K. Pelckmans Jan.-March, 2012 15

Analysis OLS Model with {V i } zero mean white noise with E[V 2 i ] = λ2 and {θ, x 1,..., x n } R d Y i = x i θ 0 + V i. OLS: θ n = argmin θ y Xθ 2 2. Normal Equations: (X T X)θ = X T y. Unbiased: E[θ n ] = θ 0. Covariance: E[(θ n θ 0 ) T (θ n θ 0 )] = λ 2 (X T X) 1 = λ2 n ( 1 n ) 1 n x i x T i. i=1 Objective: E y Xθ n 2 2 = λ 2 (n d). SI-2012 K. Pelckmans Jan.-March, 2012 16

Efficient. n(θ n θ 0 ) N (0, I 1 ) for n. SI-2012 K. Pelckmans Jan.-March, 2012 17

Cramér-Rao Bound Lowerbound to the variance of any unbiased estimator θ n to θ. Discrete setting: D is set of samples, with {D n } denoting the observed datasets. Given PDFs p θ > 0 over all possible datasets D n such that D n p θ (D n ) = 1 Given estimator θ n (D n ) of θ Θ such that θ Θ E[θ n (D n )] = D n θ n (D n )p θ (D n ) = θ SI-2012 K. Pelckmans Jan.-March, 2012 18

Taking derivative w.r.t. θ gives { dp θ (D n ) Dn dθ = 0 D n θ n (D n ) dp θ(d n ) dθ = 1 and hence 1 = D n (θ θ n (D n )) dp θ(d n ) dθ Combining: 1 = Dn ( ) ( ) pθ (D n ) dpθ (D n ) (θ θ n (D n )) p θ (D n ) dθ 1 = Dn (θ θ n (D n )) p θ (D n ) ( ) pθ (D n ) dp θ (D n ) p θ (D n ) dθ Cauchy-Schwarz (a T b a 2 b 2 ) gives 1 Dn (θ θ n (D n )) 2 p θ (D n ) Dn ( pθ (D n ) dp θ (D n ) p θ (D n ) dθ ) 2 SI-2012 K. Pelckmans Jan.-March, 2012 19

Or with E θ [θ θ n (D n )] 2 1 I(θ) I(θ) = E θ [ dpθ (D n ) dθ 1 p θ (D n ) ] 2 SI-2012 K. Pelckmans Jan.-March, 2012 20

Dynamical System Given the ARX(1,1) system y t + ay t 1 = bu t 1 + e t, t. where {e t } zero mean and unit variance white noise. Then application of OLS gives Normal equations of θ = ( a, b) T ([ n t=1 y2 t 1 n t=1 u t 1y t 1 n t=1 y ]) t 1u t 1 θ = n t=1 u2 t 1 [ n t=1 y ] n t 1y t t=1 u. t 1y t Taking Expectations E [ n t=1 y2 t 1 n t=1 u t 1y t 1 n t=1 y ] t 1u t 1 θ = E n t=1 u2 t 1 [ n t=1 y ] n t 1y t t=1 u. t 1y t SI-2012 K. Pelckmans Jan.-March, 2012 21

Approximatively θ [ ] E[y 2 1 [ ] t ] E[y t u t ] E[yt y t 1 ] E[u t y t ] E[u 2 t] E[y t u t 1 ] = R 1 r. Ill-conditioning. SI-2012 K. Pelckmans Jan.-March, 2012 22

Instrumental Variables LS: min n i=1 (y i x T i θ)2 Normal equations ( n x i x T i i=1 ) θ = ( n ) x i y i i=1 Or n ( x i yi x T i θ ) = 0 d i=1 Interpretation as orthogonal projection. Idea: n ( Z i yi x T i θ ) = 0 d i=1 SI-2012 K. Pelckmans Jan.-March, 2012 23

Choose {Z t } taking values in R d such that {Z t } independent from {V t } R full rank. Example. Past input signals. Figure 1: Instrumental Variable as Modified Projection. SI-2012 K. Pelckmans Jan.-March, 2012 24

Conclusions Stochastic (Theory) and Statistical (Data). Maximum Likelihood. Least Squares. Instrumental Variables. SI-2012 K. Pelckmans Jan.-March, 2012 25