AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET. Questions AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET

Size: px
Start display at page:

Download "AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET. Questions AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET"

Transcription

1 The Problem Identification of Linear and onlinear Dynamical Systems Theme : Curve Fitting Division of Automatic Control Linköping University Sweden Data from Gripen Questions How do the control surface angles affect the pitch rate? Aerodynamical derivatives? How to use the information in flight data? Pitch rate, Canard, Elevator, Leading Edge Flap Aircraft Dynamics: From input y(t) pitch rate at time t u (t) canard angle at time t Try y(t) = a y(t T) a y(t T) a y(t T) a y(t T) + b u (t T) + b u (t T) + b u (t T) + b u (t T) T = /s Using all inputs u canard angle; u Elevator angle; u Leading edge flap; y(t) = a y(t T) a y(t T) a y(t T) a y(t T) + b u (t T) + + b u (t T) + + b u (t T) + + b u (t T) T = /s Dashed line: Actual Pitch rate Solid line: step ahead predicted pitch rate, based on the fourth order model from canard angle only Lennart Ljung System Identification: Issues Course Outline: Themes Select a class of candidate models Select a member in this class using the observed data Evaluate the quality of the obtained model Design the experiment so that the model will be good The basic questions and (statistical) tools illustrated for a simple curve fitting problem Linear models: The model structures, Special techniques for linear models Time and frequency domain data Software session with hands-on experience onlinear models: Parameterizations, problems and techniques Some practical issues in system identification: Experiment design and data quality

2 ε 9 Background Terms onparametric methods Random (stochastic) variable, Expectation, Variance Independent random variables, random processes Law of Large umbers, Central Limit Theorem Curve Fitting Curve Fitting II: Several Regressors Most basic ideas from system identification, choice of model structures and model sizes are brought out by considering the basic curve fitting problem from elementary statistics Surface fitting : Unknown function g (x) For a sequence of x-values (regressors) {x, x,,x } (that may or may not be chosen by the user) observe the corresponding function values with some noise: y(k) = g (x k ) + e(k) Construct an estimate ĝ (x) from {y(), x, y(), x,,y(), x } The floor is formed by the regressors x, and the upright wall is the function value y = g (x) Φ The Curve-fitting problem y(k) = g (x k ) + e(k) Construct an estimate ĝ (x) from {y(), x, y(), x,,y(), x } The error ĝ (x) g (x) should be as small as possible Approaches: Parametric: Construct ĝ (x) by searching over a parameterized set of candidate functions on-parametric: Construct ĝ (x) by smoothing over (carefully chosen subsets of) y(k) onparametric methods Parametric Approach The Choices in the Parametric case Search for the function g in a parameterized family of functions: n g(x, ) = α k f k (x, k ), = {α k, k, k =,,n} Grey box/black box Local/Global basis functions Examples: g(x, ) = + x + + n x n + x + + n x n g(x, ) = + n+ x + + n+m x m n g(x, ) = + k cos(kπx) + k sin(kπx) g(x, ) = n α k U((x γ k )/β k ), U(x) unit pulse Type of function family (Basis functions f k (x,)) Size of model (n or dim ) The parameter values

3 The Choices in the Parametric case Type of function family (Basis functions f k (x,)) Size of model (n or dim ) The parameter values onparametric methods 9 y(t) = g (x t ) + e(t) y(t) = g (x t ) + e(t) Weighted Least Squares: Weighted Least Squares: ˆ = arg min V ()+δ ˆ = arg min V ()+δ V () = L(x t ) y(t) g(x t, ) /lambda t λ t Proportional to reliability of t:th measurement Ee (t) V () = L(x t ) y(t) g(x t, ) /λ t λ t Proportional to reliability of t:th measurement Ee (t) A extra weighting L(x t ) could also reflect the relevance of the point x t ( Focus in fit ) A extra weighting L(x t ) could also reflect the relevance of the point x t ( Fo- cus in fit ) y(t) = g (x t ) + e(t) y(t) = g (x t ) + e(t) Weighted Least Squares: (Regularized) Least squares: ˆ = arg min V ()+δ V () = L(x t ) y(t) g(x t, ) /λ t λ t Proportional to reliability of t:th measurement Ee (t) ˆ = arg min V () + δ V () = y(t) g(x t, ) A extra weighting L(x t ) could also reflect the relevance of the point x t δ penalizes excessive model flexibility Could come in various forms ( Focus in fit ) Why the Least Squares Criterion? Linear Least squares Gauss! Maximum Likelihood: ote that if the parameterization g(x, ) is linear in, the basic criterion becomes quadratic in, and the minimum can be found analytically: ˆ = arg min l(z, t) = log p(z, t), l(y(t) g(x t, ), t) p(z, t) is the probability density function (pdf) of e(t) g(x, ) = ϕ(x) T V () = (y(t) ϕ(x t ) T ) = Y Φ Y =col y(t), Φ = col ϕ(x t ) T Gaussian distribution p(z, t) e z /λt gives a quadratic criterion! ˆ = (Φ T Φ) Φ T Y Other choices min max t y(t) g(x t, ) ( unknown-but-bounded ) min y(t) g(x t, ) ǫ (ǫ-insensitive l norm, Support vector machines )

4 Choices in Parametric Methods So, the choice of parameters within a parameterized model is not that difficult: Fit to the observed data, by one criterion or another The choice of model size and model parameterization is a more interesting issue onparametric methods Choice of Model Size: Example Choice of Model Size: Example Observed data with true curve Example: Observed data with true curve Fit polynomials of different orders Fit polynomials of different orders The Model Fit 9 Model Curves The value of the criterion as a function of polynomial order The fit between the true curve and the model curve The value of the criterion evaluated on a fresh data set Blue: True curve Green: nd order Red: th order Cyan: th order Fit Polynomial order The Model Fit The Model Fit The value of the criterion as a function of polynomial order The fit between the true curve and the model curve The value of the criterion evaluated on a fresh data set The value of the criterion as a function of polynomial order The fit between the true curve and the model curve The value of the criterion evaluated on a fresh data set Fit Fit Polynomial order Polynomial order

5 Choice of Size and Structure Testing the Model Parameterizations This is a more difficult choice, and we need to understand how the model error ĝ (x) g (x) depends on our choices Players: The fit for a certain data set Z: V (, Z) = (y(t) g(xt, )) Estimation (training) data Z e Validation (generalization) data Z v The empirical fit: V (ˆ, Z e ) = min V (, Z e ) (blue curve) Test by simulation: Monte-Carlo Do not get fooled by the empirical fit V (ˆ, Z e ) eed to understand how the empirical fit relates to H (ˆ ) Compute by calculations: Analysis Analytical Monte-Carlo : Assume certain properties of x k and e(k), the compute (if possible) the error The validation fit V (ˆ, Z v ) (black curve) The curve fit H(x, ) = g (x) g(x, ) For given x t -sequence H () = H(xt, ) H (ˆ ) was the red curve The expected (typical) value of H (ˆ ) would be a suitable goodness measure for the chosen parameterization Basic Tools for Analysis onparametric methods For a stationary stochastic process e( ) under mild conditions Law of large numbers (LL) lim e(t) = Ee(t) Central limit theorem (CLT) If e(t) has zero mean: e(t) converges in distribution to the normal (Gaussian) distribution with zero mean and variance λ = lim t,s= Ee(t)e(s) e(t) (, λ) Asymptotic Analysis: Probabilistic Setup Asymptotic Analysis: Basic Facts BIAS Analytical Monte-Carlo Experiment : For a given g ( ) and a given sequence x t collect the data y(t) = g (x t ) + e(t), Ee(t) = λ where the stochastic process e( ) obeys the LL and CLT and has variance λ Use a parameterization g(x, ) Form the estimate ˆ = arg min ĝ (x) = g(x, ˆ ) (y(t) g(x t, )) Except for very simple parameterizations g(x, ), the distribution of ˆ cannot be calculated (mainly due to arg min ) However its asymptotic distribution as can be established: E = averaging over x k : Ef(x) = lim f(x k) H() = lim H () = E g (x t ) g(x t, ) Best possible model in parameterization: = arg minh() If H( ) = we have a perfect curve fit, otherwise there be some bias in the curve fit Main Result: lim ˆ = Then ˆ and ĝ (x) are random variables with properties inherited from e What can be said about their distributions? System Identification: Curve Fitting Proof: Formal Calculations 9 Asymptotic Analysis: Basic Facts VARIACE (g(x t) + e(t) g(x t, )) V () = = (g(xt) g(x t, )) + e (t) + (g(xt) g(x t, ))e(t) LL: (g(xt) g(x t, ))e(t) (uniformly in!) so V () H() as Suppose the limit model is correct: g(x, ) g (x) and e white noise with variance λ: The asymptotic distribution of (ˆ ) is normal with zero mean and covariance matrix P = λ[eψ(t)ψ T (t)], ψ(t) = d d g(x t, ) Cov ˆ λ [Eψ(t)ψT (t)]

6 Proof: Formal Calculations A Remarkable Observation = V (ˆ ) = V ( ) + V ( )(ˆ ) (ˆ ) = [V ( )] V ( ) V () = (y(t) g(xt, ))g (x t, ) Recall the curve fit H(x, ) = g (x) g(x, ), H() = lim H(xt, ) (For the x-sequence of the estimation data) H(ˆ ) is a random variable, since the estimate depends on the e-sequence, and V ( ) = e(t)ψ(t) EH(ˆ ) = H( ) + λ d LL: V ( ) = ψ(t)ψ T (t) + e(t)g (x t, ) Eψψ T CLT: e(t)ψ(t) (, λeψψ T ) (ˆ ) (, λ[eψψ T ] ) where d is the number of estimated parameters independently of the parameterization! (Proof: ) Proof: Formal Calculations The Regularized Case g (x) = g(x, ) ( assumption ) H(ˆ ) = H( ) + H ( )(ˆ ) + (ˆ ) T H ( )(ˆ ) H ( ) = ( minimizes H()) H () = (g(x t) g(x t, ))g (x t, ) T H ( ) = g (x t, )g (x t, ) T = Eψ(t)ψ T (t) The variance is reduced by regularization, at the price of some bias In the previous result, the number of parameters d is replaced by d eff : Effective dimension of umber of eigenvalues of the Hessian of V that are larger than δ (the regularization parameter) ote: d eff d = dim EH(ˆ ) = H( ) + E (ˆ ) T H ( )(ˆ ) Etr [ (ˆ ) T H ( )(ˆ )] = Etr [ H ( )(ˆ )(ˆ ) T ] = tr H ( )Cov ˆ = λ tr [ (Eψψ T )(Eψψ T ) ] = d λ Choice of Size: Basic Trade-off onparametric methods H() = lim t g (x t ) g(x t, ), EH(ˆ ) H( ) + λ d A good model size is one that minimizes this expression H( ) is the best possible fit that can be achieved within the parameterization A smaller value of this means less bias Thus, more parameters gives a more flexible model parameterization and hence less bias More parameters lead however to higher variance The model size is thus a bias variance trade-off ote that this balance is usually reached with a non-zero H( ), that is, it is normal to accept bias Also a larger size model can be used when more data are available (larger ) If a regularized criterion is used, the size of the regularization parameter δ can also be used to control the flexibility of the parametrization Choice of Type (basis functions) Generally speaking, the parameterization should be such that useful flexibility is achieved with as few parameters as possible: Grey box models Tunable or on-tunable Basis functions: g(x, ) = n α kf k (x,) + More flexible structure = Less parameters More work to minimize (non-tunable = Linear Least Squares) Use (number of parameters) d or (regularization parameter) δ as a size-tuning knob When no natural ordering of structures: Easier to use δ onparametric methods

7 on-parametric Methods 9 Example A simple idea is to locally smooth the noisy observations of the function values: ĝ (x) = C(x, x k )y(k) C(x, x k ) = x Often C(x, x k ) = c(x x k )/λ k and c(r) = for r > β, β = the bandwidth These are known as kernel methods in statistics If C(x, x t ) is chosen so that it is non-zero (= /k) only for k observed values x t around x, this is the k-nearest neighbor method C(x, x k ) = U((x x k )/β); U( ) the unit pulse β = Cyan dots: Computed for x = : : System Identification: Curve Fitting Analysis of on-parametric Methods The Trade-off ε (x) = ĝ (x) g (x) = C(x, x k )y(k) g (x) = C(x, x k )(e(k) + [g (x k ) g (x)]) Eε (x) = C(x, x k )[g (x k ) g (x)] Var ε (x) = C (x, x k )λ (for white e with variance λ) Think of C(x, x k ) = U((x x k )/β) where U is the unit pulse: MSE: H(x) = { C(x, x k ) = if x x k β k else k = number of x k in the bin x x k β [ C (x, x k )λ + C(x, x k )[g (x k ) g (x)]] k λ + variation of g (x) over x x k β Trade-off: Want β to be small for small bias Want β to be large for small variance The best choice depends on the nature of g Parametric and onparametric Methods Summary Theme Consider the parametric method using unit pulses U(x): g(x, ) = n k U((x γ k )/β) n (y(t) g(x t, )) = n t: xt γk <β β and γ k given γ k γ k = β t: xt γk <β (y(t) k ) ˆ k = k (y(t) g(x t, )) = t: xt γk <β y(t) This means that ĝ(γ k ) = ˆ k If we use a nonparametric method to estimate g at x = γ k with C(x, x k ) = k U((x x k)/β) we obtain the same estimate We have used the simple case of curve-fitting to illustrate basic issues, frameworks and techniques for linear and nonlinear system identification Parametric onparametric methods Choice of model parametrization, model size and parameter values Parameter values easy: Some version of least squares fit Basic asymptotic properties: ˆ, best possible approximation available in the parameterization (for the used x t -sequence) (ˆ ) (, P), P = λ[eψ(t)ψ T (t)] (ormal distribution) Choice of parametric model structure guided by bias-variance trade off (number of parameters) Choice of nonparametric method guided by bias-variance trade off (band-with of the kernel)

Outline 2(42) Sysid Course VT An Overview. Data from Gripen 4(42) An Introductory Example 2,530 3(42)

Outline 2(42) Sysid Course VT An Overview. Data from Gripen 4(42) An Introductory Example 2,530 3(42) Outline 2(42) Sysid Course T1 2016 An Overview. Automatic Control, SY, Linköpings Universitet An Umbrella Contribution for the aterial in the Course The classic, conventional System dentification Setup

More information

System Identification: From Data to Model

System Identification: From Data to Model 1 : From Data to Model With Applications to Aircraft Modeling Division of Automatic Control Linköping University Sweden Dynamic Systems 2 A Dynamic system has an output response y that depends on (all)

More information

Parameter Estimation in a Moving Horizon Perspective

Parameter Estimation in a Moving Horizon Perspective Parameter Estimation in a Moving Horizon Perspective State and Parameter Estimation in Dynamical Systems Reglerteknik, ISY, Linköpings Universitet State and Parameter Estimation in Dynamical Systems OUTLINE

More information

Modelling Non-linear and Non-stationary Time Series

Modelling Non-linear and Non-stationary Time Series Modelling Non-linear and Non-stationary Time Series Chapter 2: Non-parametric methods Henrik Madsen Advanced Time Series Analysis September 206 Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September

More information

Outline lecture 2 2(30)

Outline lecture 2 2(30) Outline lecture 2 2(3), Lecture 2 Linear Regression it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic Control

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Econ 582 Nonparametric Regression

Econ 582 Nonparametric Regression Econ 582 Nonparametric Regression Eric Zivot May 28, 2013 Nonparametric Regression Sofarwehaveonlyconsideredlinearregressionmodels = x 0 β + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β The assume

More information

Terminology for Statistical Data

Terminology for Statistical Data Terminology for Statistical Data variables - features - attributes observations - cases (consist of multiple values) In a standard data matrix, variables or features correspond to columns observations

More information

Local Modelling with A Priori Known Bounds Using Direct Weight Optimization

Local Modelling with A Priori Known Bounds Using Direct Weight Optimization Local Modelling with A Priori Known Bounds Using Direct Weight Optimization Jacob Roll, Alexander azin, Lennart Ljung Division of Automatic Control Department of Electrical Engineering Linköpings universitet,

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Local Modelling of Nonlinear Dynamic Systems Using Direct Weight Optimization

Local Modelling of Nonlinear Dynamic Systems Using Direct Weight Optimization Local Modelling of Nonlinear Dynamic Systems Using Direct Weight Optimization Jacob Roll, Alexander Nazin, Lennart Ljung Division of Automatic Control Department of Electrical Engineering Linköpings universitet,

More information

Outline Lecture 2 2(32)

Outline Lecture 2 2(32) Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic

More information

FUNDAMENTAL FILTERING LIMITATIONS IN LINEAR NON-GAUSSIAN SYSTEMS

FUNDAMENTAL FILTERING LIMITATIONS IN LINEAR NON-GAUSSIAN SYSTEMS FUNDAMENTAL FILTERING LIMITATIONS IN LINEAR NON-GAUSSIAN SYSTEMS Gustaf Hendeby Fredrik Gustafsson Division of Automatic Control Department of Electrical Engineering, Linköpings universitet, SE-58 83 Linköping,

More information

Linear Regression (continued)

Linear Regression (continued) Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.1 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Support Vector Regression (SVR) Descriptions of SVR in this discussion follow that in Refs. (2, 6, 7, 8, 9). The literature

Support Vector Regression (SVR) Descriptions of SVR in this discussion follow that in Refs. (2, 6, 7, 8, 9). The literature Support Vector Regression (SVR) Descriptions of SVR in this discussion follow that in Refs. (2, 6, 7, 8, 9). The literature suggests the design variables should be normalized to a range of [-1,1] or [0,1].

More information

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models

More information

41903: Introduction to Nonparametrics

41903: Introduction to Nonparametrics 41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 24) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu October 2, 24 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 24) October 2, 24 / 24 Outline Review

More information

Brief Review on Estimation Theory

Brief Review on Estimation Theory Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Nonparametric Methods

Nonparametric Methods Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis

More information

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55

More information

Using Multiple Kernel-based Regularization for Linear System Identification

Using Multiple Kernel-based Regularization for Linear System Identification Using Multiple Kernel-based Regularization for Linear System Identification What are the Structure Issues in System Identification? with coworkers; see last slide Reglerteknik, ISY, Linköpings Universitet

More information

Introduction to Machine Learning Fall 2017 Note 5. 1 Overview. 2 Metric

Introduction to Machine Learning Fall 2017 Note 5. 1 Overview. 2 Metric CS 189 Introduction to Machine Learning Fall 2017 Note 5 1 Overview Recall from our previous note that for a fixed input x, our measurement Y is a noisy measurement of the true underlying response f x):

More information

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

To get horizontal and slant asymptotes algebraically we need to know about end behaviour for rational functions.

To get horizontal and slant asymptotes algebraically we need to know about end behaviour for rational functions. Concepts: Horizontal Asymptotes, Vertical Asymptotes, Slant (Oblique) Asymptotes, Transforming Reciprocal Function, Sketching Rational Functions, Solving Inequalities using Sign Charts. Rational Function

More information

Linear regression COMS 4771

Linear regression COMS 4771 Linear regression COMS 4771 1. Old Faithful and prediction functions Prediction problem: Old Faithful geyser (Yellowstone) Task: Predict time of next eruption. 1 / 40 Statistical model for time between

More information

Identification, Model Validation and Control. Lennart Ljung, Linköping

Identification, Model Validation and Control. Lennart Ljung, Linköping Identification, Model Validation and Control Lennart Ljung, Linköping Acknowledgment: Useful discussions with U Forssell and H Hjalmarsson 1 Outline 1. Introduction 2. System Identification (in closed

More information

Lecture 3: Statistical Decision Theory (Part II)

Lecture 3: Statistical Decision Theory (Part II) Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical

More information

output dimension input dimension Gaussian evidence Gaussian Gaussian evidence evidence from t +1 inputs and outputs at time t x t+2 x t-1 x t+1

output dimension input dimension Gaussian evidence Gaussian Gaussian evidence evidence from t +1 inputs and outputs at time t x t+2 x t-1 x t+1 To appear in M. S. Kearns, S. A. Solla, D. A. Cohn, (eds.) Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 999. Learning Nonlinear Dynamical Systems using an EM Algorithm Zoubin

More information

ECE 636: Systems identification

ECE 636: Systems identification ECE 636: Systems identification Lectures 9 0 Linear regression Coherence Φ ( ) xy ω γ xy ( ω) = 0 γ Φ ( ω) Φ xy ( ω) ( ω) xx o noise in the input, uncorrelated output noise Φ zz Φ ( ω) = Φ xy xx ( ω )

More information

On Identification of Cascade Systems 1

On Identification of Cascade Systems 1 On Identification of Cascade Systems 1 Bo Wahlberg Håkan Hjalmarsson Jonas Mårtensson Automatic Control and ACCESS, School of Electrical Engineering, KTH, SE-100 44 Stockholm, Sweden. (bo.wahlberg@ee.kth.se

More information

Gaussian Processes for Machine Learning

Gaussian Processes for Machine Learning Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of

More information

Average-cost temporal difference learning and adaptive control variates

Average-cost temporal difference learning and adaptive control variates Average-cost temporal difference learning and adaptive control variates Sean Meyn Department of ECE and the Coordinated Science Laboratory Joint work with S. Mannor, McGill V. Tadic, Sheffield S. Henderson,

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete

More information

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015 EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,

More information

Simple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017

Simple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017 Simple Techniques for Improving SGD CS6787 Lecture 2 Fall 2017 Step Sizes and Convergence Where we left off Stochastic gradient descent x t+1 = x t rf(x t ; yĩt ) Much faster per iteration than gradient

More information

Introduction to the regression problem. Luca Martino

Introduction to the regression problem. Luca Martino Introduction to the regression problem Luca Martino 2017 2018 1 / 30 Approximated outline of the course 1. Very basic introduction to regression 2. Gaussian Processes (GPs) and Relevant Vector Machines

More information

Linear Models for Regression. Sargur Srihari

Linear Models for Regression. Sargur Srihari Linear Models for Regression Sargur srihari@cedar.buffalo.edu 1 Topics in Linear Regression What is regression? Polynomial Curve Fitting with Scalar input Linear Basis Function Models Maximum Likelihood

More information

Gaussian Processes (10/16/13)

Gaussian Processes (10/16/13) STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs

More information

Hastie, Tibshirani & Friedman: Elements of Statistical Learning Chapter Model Assessment and Selection. CN700/March 4, 2008.

Hastie, Tibshirani & Friedman: Elements of Statistical Learning Chapter Model Assessment and Selection. CN700/March 4, 2008. Hastie, Tibshirani & Friedman: Elements of Statistical Learning Chapter 7.1-7.9 Model Assessment and Selection CN700/March 4, 2008 Satyavarta sat@cns.bu.edu Auditory Neuroscience Laboratory, Department

More information

Web-based Supplementary Material for. Dependence Calibration in Conditional Copulas: A Nonparametric Approach

Web-based Supplementary Material for. Dependence Calibration in Conditional Copulas: A Nonparametric Approach 1 Web-based Supplementary Material for Dependence Calibration in Conditional Copulas: A Nonparametric Approach Elif F. Acar, Radu V. Craiu, and Fang Yao Web Appendix A: Technical Details The score and

More information

Density Estimation (II)

Density Estimation (II) Density Estimation (II) Yesterday Overview & Issues Histogram Kernel estimators Ideogram Today Further development of optimization Estimating variance and bias Adaptive kernels Multivariate kernel estimation

More information

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October

More information

CONTROL SYSTEMS, ROBOTICS, AND AUTOMATION - Vol. V - Prediction Error Methods - Torsten Söderström

CONTROL SYSTEMS, ROBOTICS, AND AUTOMATION - Vol. V - Prediction Error Methods - Torsten Söderström PREDICTIO ERROR METHODS Torsten Söderström Department of Systems and Control, Information Technology, Uppsala University, Uppsala, Sweden Keywords: prediction error method, optimal prediction, identifiability,

More information

Gaussian processes for inference in stochastic differential equations

Gaussian processes for inference in stochastic differential equations Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017

More information

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices Lecture 3: Simple Linear Regression in Matrix Format To move beyond simple regression we need to use matrix algebra We ll start by re-expressing simple linear regression in matrix form Linear algebra is

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

6.435, System Identification

6.435, System Identification System Identification 6.435 SET 3 Nonparametric Identification Munther A. Dahleh 1 Nonparametric Methods for System ID Time domain methods Impulse response Step response Correlation analysis / time Frequency

More information

18.9 SUPPORT VECTOR MACHINES

18.9 SUPPORT VECTOR MACHINES 744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the

More information

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group Nonparmeteric Bayes & Gaussian Processes Baback Moghaddam baback@jpl.nasa.gov Machine Learning Group Outline Bayesian Inference Hierarchical Models Model Selection Parametric vs. Nonparametric Gaussian

More information

LTI Approximations of Slightly Nonlinear Systems: Some Intriguing Examples

LTI Approximations of Slightly Nonlinear Systems: Some Intriguing Examples LTI Approximations of Slightly Nonlinear Systems: Some Intriguing Examples Martin Enqvist, Lennart Ljung Division of Automatic Control Department of Electrical Engineering Linköpings universitet, SE-581

More information

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas 0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity

More information

11. Further Issues in Using OLS with TS Data

11. Further Issues in Using OLS with TS Data 11. Further Issues in Using OLS with TS Data With TS, including lags of the dependent variable often allow us to fit much better the variation in y Exact distribution theory is rarely available in TS applications,

More information

Estimators as Random Variables

Estimators as Random Variables Estimation Theory Overview Properties Bias, Variance, and Mean Square Error Cramér-Rao lower bound Maimum likelihood Consistency Confidence intervals Properties of the mean estimator Introduction Up until

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the

More information

Max Margin-Classifier

Max Margin-Classifier Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization

More information

Cognitive Cyber-Physical System

Cognitive Cyber-Physical System Cognitive Cyber-Physical System Physical to Cyber-Physical The emergence of non-trivial embedded sensor units, networked embedded systems and sensor/actuator networks has made possible the design and implementation

More information

Lecture Stat Information Criterion

Lecture Stat Information Criterion Lecture Stat 461-561 Information Criterion Arnaud Doucet February 2008 Arnaud Doucet () February 2008 1 / 34 Review of Maximum Likelihood Approach We have data X i i.i.d. g (x). We model the distribution

More information

Further Results on Model Structure Validation for Closed Loop System Identification

Further Results on Model Structure Validation for Closed Loop System Identification Advances in Wireless Communications and etworks 7; 3(5: 57-66 http://www.sciencepublishinggroup.com/j/awcn doi:.648/j.awcn.735. Further esults on Model Structure Validation for Closed Loop System Identification

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

State Estimation of Linear and Nonlinear Dynamic Systems

State Estimation of Linear and Nonlinear Dynamic Systems State Estimation of Linear and Nonlinear Dynamic Systems Part I: Linear Systems with Gaussian Noise James B. Rawlings and Fernando V. Lima Department of Chemical and Biological Engineering University of

More information

Rirdge Regression. Szymon Bobek. Institute of Applied Computer science AGH University of Science and Technology

Rirdge Regression. Szymon Bobek. Institute of Applied Computer science AGH University of Science and Technology Rirdge Regression Szymon Bobek Institute of Applied Computer science AGH University of Science and Technology Based on Carlos Guestrin adn Emily Fox slides from Coursera Specialization on Machine Learnign

More information

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan Monte-Carlo MMD-MA, Université Paris-Dauphine Xiaolu Tan tan@ceremade.dauphine.fr Septembre 2015 Contents 1 Introduction 1 1.1 The principle.................................. 1 1.2 The error analysis

More information

AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET

AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET Identification of Linear and Nonlinear Dynamical Systems Theme : Nonlinear Models Grey-box models Division of Automatic Control Linköping University Sweden General Aspects Let Z t denote all available

More information

2 Statistical Estimation: Basic Concepts

2 Statistical Estimation: Basic Concepts Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 2 Statistical Estimation:

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II 1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector

More information

Classification: The rest of the story

Classification: The rest of the story U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher

More information

CS 7140: Advanced Machine Learning

CS 7140: Advanced Machine Learning Instructor CS 714: Advanced Machine Learning Lecture 3: Gaussian Processes (17 Jan, 218) Jan-Willem van de Meent (j.vandemeent@northeastern.edu) Scribes Mo Han (han.m@husky.neu.edu) Guillem Reus Muns (reusmuns.g@husky.neu.edu)

More information

Statistical inference on Lévy processes

Statistical inference on Lévy processes Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline

More information

Variance Reduction and Ensemble Methods

Variance Reduction and Ensemble Methods Variance Reduction and Ensemble Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Last Time PAC learning Bias/variance tradeoff small hypothesis

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 6: Bias and variance (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 49 Our plan today We saw in last lecture that model scoring methods seem to be trading off two different

More information

Lecture 7: Kernels for Classification and Regression

Lecture 7: Kernels for Classification and Regression Lecture 7: Kernels for Classification and Regression CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 15, 2011 Outline Outline A linear regression problem Linear auto-regressive

More information

ECE 636: Systems identification

ECE 636: Systems identification ECE 636: Systems identification Lectures 3 4 Random variables/signals (continued) Random/stochastic vectors Random signals and linear systems Random signals in the frequency domain υ ε x S z + y Experimental

More information

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar

More information

12 - Nonparametric Density Estimation

12 - Nonparametric Density Estimation ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6

More information

SGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection

SGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection SG 21006 Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection Ioan Tabus Department of Signal Processing Tampere University of Technology Finland 1 / 28

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

PARAMETER ESTIMATION AND ORDER SELECTION FOR LINEAR REGRESSION PROBLEMS. Yngve Selén and Erik G. Larsson

PARAMETER ESTIMATION AND ORDER SELECTION FOR LINEAR REGRESSION PROBLEMS. Yngve Selén and Erik G. Larsson PARAMETER ESTIMATION AND ORDER SELECTION FOR LINEAR REGRESSION PROBLEMS Yngve Selén and Eri G Larsson Dept of Information Technology Uppsala University, PO Box 337 SE-71 Uppsala, Sweden email: yngveselen@ituuse

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

System Identification: An Inverse Problem in Control

System Identification: An Inverse Problem in Control System Identification: An Inverse Problem in Control Potentials and Possibilities of Regularization Lennart Ljung Linköping University, Sweden Inverse Problems, MAI, Linköping, April 4, 2013 System Identification

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Generalization theory

Generalization theory Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1

More information

For a stochastic process {Y t : t = 0, ±1, ±2, ±3, }, the mean function is defined by (2.2.1) ± 2..., γ t,

For a stochastic process {Y t : t = 0, ±1, ±2, ±3, }, the mean function is defined by (2.2.1) ± 2..., γ t, CHAPTER 2 FUNDAMENTAL CONCEPTS This chapter describes the fundamental concepts in the theory of time series models. In particular, we introduce the concepts of stochastic processes, mean and covariance

More information

Least Absolute Shrinkage is Equivalent to Quadratic Penalization

Least Absolute Shrinkage is Equivalent to Quadratic Penalization Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr

More information

Estimation of cumulative distribution function with spline functions

Estimation of cumulative distribution function with spline functions INTERNATIONAL JOURNAL OF ECONOMICS AND STATISTICS Volume 5, 017 Estimation of cumulative distribution function with functions Akhlitdin Nizamitdinov, Aladdin Shamilov Abstract The estimation of the cumulative

More information

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna

More information

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY

More information

CSC321 Lecture 2: Linear Regression

CSC321 Lecture 2: Linear Regression CSC32 Lecture 2: Linear Regression Roger Grosse Roger Grosse CSC32 Lecture 2: Linear Regression / 26 Overview First learning algorithm of the course: linear regression Task: predict scalar-valued targets,

More information

A Novel Nonparametric Density Estimator

A Novel Nonparametric Density Estimator A Novel Nonparametric Density Estimator Z. I. Botev The University of Queensland Australia Abstract We present a novel nonparametric density estimator and a new data-driven bandwidth selection method with

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Single Equation Linear GMM with Serially Correlated Moment Conditions

Single Equation Linear GMM with Serially Correlated Moment Conditions Single Equation Linear GMM with Serially Correlated Moment Conditions Eric Zivot October 28, 2009 Univariate Time Series Let {y t } be an ergodic-stationary time series with E[y t ]=μ and var(y t )

More information

Ch. 12 Linear Bayesian Estimators

Ch. 12 Linear Bayesian Estimators Ch. 1 Linear Bayesian Estimators 1 In chapter 11 we saw: the MMSE estimator takes a simple form when and are jointly Gaussian it is linear and used only the 1 st and nd order moments (means and covariances).

More information