AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET. Questions AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET
|
|
- Tobias Cobb
- 6 years ago
- Views:
Transcription
1 The Problem Identification of Linear and onlinear Dynamical Systems Theme : Curve Fitting Division of Automatic Control Linköping University Sweden Data from Gripen Questions How do the control surface angles affect the pitch rate? Aerodynamical derivatives? How to use the information in flight data? Pitch rate, Canard, Elevator, Leading Edge Flap Aircraft Dynamics: From input y(t) pitch rate at time t u (t) canard angle at time t Try y(t) = a y(t T) a y(t T) a y(t T) a y(t T) + b u (t T) + b u (t T) + b u (t T) + b u (t T) T = /s Using all inputs u canard angle; u Elevator angle; u Leading edge flap; y(t) = a y(t T) a y(t T) a y(t T) a y(t T) + b u (t T) + + b u (t T) + + b u (t T) + + b u (t T) T = /s Dashed line: Actual Pitch rate Solid line: step ahead predicted pitch rate, based on the fourth order model from canard angle only Lennart Ljung System Identification: Issues Course Outline: Themes Select a class of candidate models Select a member in this class using the observed data Evaluate the quality of the obtained model Design the experiment so that the model will be good The basic questions and (statistical) tools illustrated for a simple curve fitting problem Linear models: The model structures, Special techniques for linear models Time and frequency domain data Software session with hands-on experience onlinear models: Parameterizations, problems and techniques Some practical issues in system identification: Experiment design and data quality
2 ε 9 Background Terms onparametric methods Random (stochastic) variable, Expectation, Variance Independent random variables, random processes Law of Large umbers, Central Limit Theorem Curve Fitting Curve Fitting II: Several Regressors Most basic ideas from system identification, choice of model structures and model sizes are brought out by considering the basic curve fitting problem from elementary statistics Surface fitting : Unknown function g (x) For a sequence of x-values (regressors) {x, x,,x } (that may or may not be chosen by the user) observe the corresponding function values with some noise: y(k) = g (x k ) + e(k) Construct an estimate ĝ (x) from {y(), x, y(), x,,y(), x } The floor is formed by the regressors x, and the upright wall is the function value y = g (x) Φ The Curve-fitting problem y(k) = g (x k ) + e(k) Construct an estimate ĝ (x) from {y(), x, y(), x,,y(), x } The error ĝ (x) g (x) should be as small as possible Approaches: Parametric: Construct ĝ (x) by searching over a parameterized set of candidate functions on-parametric: Construct ĝ (x) by smoothing over (carefully chosen subsets of) y(k) onparametric methods Parametric Approach The Choices in the Parametric case Search for the function g in a parameterized family of functions: n g(x, ) = α k f k (x, k ), = {α k, k, k =,,n} Grey box/black box Local/Global basis functions Examples: g(x, ) = + x + + n x n + x + + n x n g(x, ) = + n+ x + + n+m x m n g(x, ) = + k cos(kπx) + k sin(kπx) g(x, ) = n α k U((x γ k )/β k ), U(x) unit pulse Type of function family (Basis functions f k (x,)) Size of model (n or dim ) The parameter values
3 The Choices in the Parametric case Type of function family (Basis functions f k (x,)) Size of model (n or dim ) The parameter values onparametric methods 9 y(t) = g (x t ) + e(t) y(t) = g (x t ) + e(t) Weighted Least Squares: Weighted Least Squares: ˆ = arg min V ()+δ ˆ = arg min V ()+δ V () = L(x t ) y(t) g(x t, ) /lambda t λ t Proportional to reliability of t:th measurement Ee (t) V () = L(x t ) y(t) g(x t, ) /λ t λ t Proportional to reliability of t:th measurement Ee (t) A extra weighting L(x t ) could also reflect the relevance of the point x t ( Focus in fit ) A extra weighting L(x t ) could also reflect the relevance of the point x t ( Fo- cus in fit ) y(t) = g (x t ) + e(t) y(t) = g (x t ) + e(t) Weighted Least Squares: (Regularized) Least squares: ˆ = arg min V ()+δ V () = L(x t ) y(t) g(x t, ) /λ t λ t Proportional to reliability of t:th measurement Ee (t) ˆ = arg min V () + δ V () = y(t) g(x t, ) A extra weighting L(x t ) could also reflect the relevance of the point x t δ penalizes excessive model flexibility Could come in various forms ( Focus in fit ) Why the Least Squares Criterion? Linear Least squares Gauss! Maximum Likelihood: ote that if the parameterization g(x, ) is linear in, the basic criterion becomes quadratic in, and the minimum can be found analytically: ˆ = arg min l(z, t) = log p(z, t), l(y(t) g(x t, ), t) p(z, t) is the probability density function (pdf) of e(t) g(x, ) = ϕ(x) T V () = (y(t) ϕ(x t ) T ) = Y Φ Y =col y(t), Φ = col ϕ(x t ) T Gaussian distribution p(z, t) e z /λt gives a quadratic criterion! ˆ = (Φ T Φ) Φ T Y Other choices min max t y(t) g(x t, ) ( unknown-but-bounded ) min y(t) g(x t, ) ǫ (ǫ-insensitive l norm, Support vector machines )
4 Choices in Parametric Methods So, the choice of parameters within a parameterized model is not that difficult: Fit to the observed data, by one criterion or another The choice of model size and model parameterization is a more interesting issue onparametric methods Choice of Model Size: Example Choice of Model Size: Example Observed data with true curve Example: Observed data with true curve Fit polynomials of different orders Fit polynomials of different orders The Model Fit 9 Model Curves The value of the criterion as a function of polynomial order The fit between the true curve and the model curve The value of the criterion evaluated on a fresh data set Blue: True curve Green: nd order Red: th order Cyan: th order Fit Polynomial order The Model Fit The Model Fit The value of the criterion as a function of polynomial order The fit between the true curve and the model curve The value of the criterion evaluated on a fresh data set The value of the criterion as a function of polynomial order The fit between the true curve and the model curve The value of the criterion evaluated on a fresh data set Fit Fit Polynomial order Polynomial order
5 Choice of Size and Structure Testing the Model Parameterizations This is a more difficult choice, and we need to understand how the model error ĝ (x) g (x) depends on our choices Players: The fit for a certain data set Z: V (, Z) = (y(t) g(xt, )) Estimation (training) data Z e Validation (generalization) data Z v The empirical fit: V (ˆ, Z e ) = min V (, Z e ) (blue curve) Test by simulation: Monte-Carlo Do not get fooled by the empirical fit V (ˆ, Z e ) eed to understand how the empirical fit relates to H (ˆ ) Compute by calculations: Analysis Analytical Monte-Carlo : Assume certain properties of x k and e(k), the compute (if possible) the error The validation fit V (ˆ, Z v ) (black curve) The curve fit H(x, ) = g (x) g(x, ) For given x t -sequence H () = H(xt, ) H (ˆ ) was the red curve The expected (typical) value of H (ˆ ) would be a suitable goodness measure for the chosen parameterization Basic Tools for Analysis onparametric methods For a stationary stochastic process e( ) under mild conditions Law of large numbers (LL) lim e(t) = Ee(t) Central limit theorem (CLT) If e(t) has zero mean: e(t) converges in distribution to the normal (Gaussian) distribution with zero mean and variance λ = lim t,s= Ee(t)e(s) e(t) (, λ) Asymptotic Analysis: Probabilistic Setup Asymptotic Analysis: Basic Facts BIAS Analytical Monte-Carlo Experiment : For a given g ( ) and a given sequence x t collect the data y(t) = g (x t ) + e(t), Ee(t) = λ where the stochastic process e( ) obeys the LL and CLT and has variance λ Use a parameterization g(x, ) Form the estimate ˆ = arg min ĝ (x) = g(x, ˆ ) (y(t) g(x t, )) Except for very simple parameterizations g(x, ), the distribution of ˆ cannot be calculated (mainly due to arg min ) However its asymptotic distribution as can be established: E = averaging over x k : Ef(x) = lim f(x k) H() = lim H () = E g (x t ) g(x t, ) Best possible model in parameterization: = arg minh() If H( ) = we have a perfect curve fit, otherwise there be some bias in the curve fit Main Result: lim ˆ = Then ˆ and ĝ (x) are random variables with properties inherited from e What can be said about their distributions? System Identification: Curve Fitting Proof: Formal Calculations 9 Asymptotic Analysis: Basic Facts VARIACE (g(x t) + e(t) g(x t, )) V () = = (g(xt) g(x t, )) + e (t) + (g(xt) g(x t, ))e(t) LL: (g(xt) g(x t, ))e(t) (uniformly in!) so V () H() as Suppose the limit model is correct: g(x, ) g (x) and e white noise with variance λ: The asymptotic distribution of (ˆ ) is normal with zero mean and covariance matrix P = λ[eψ(t)ψ T (t)], ψ(t) = d d g(x t, ) Cov ˆ λ [Eψ(t)ψT (t)]
6 Proof: Formal Calculations A Remarkable Observation = V (ˆ ) = V ( ) + V ( )(ˆ ) (ˆ ) = [V ( )] V ( ) V () = (y(t) g(xt, ))g (x t, ) Recall the curve fit H(x, ) = g (x) g(x, ), H() = lim H(xt, ) (For the x-sequence of the estimation data) H(ˆ ) is a random variable, since the estimate depends on the e-sequence, and V ( ) = e(t)ψ(t) EH(ˆ ) = H( ) + λ d LL: V ( ) = ψ(t)ψ T (t) + e(t)g (x t, ) Eψψ T CLT: e(t)ψ(t) (, λeψψ T ) (ˆ ) (, λ[eψψ T ] ) where d is the number of estimated parameters independently of the parameterization! (Proof: ) Proof: Formal Calculations The Regularized Case g (x) = g(x, ) ( assumption ) H(ˆ ) = H( ) + H ( )(ˆ ) + (ˆ ) T H ( )(ˆ ) H ( ) = ( minimizes H()) H () = (g(x t) g(x t, ))g (x t, ) T H ( ) = g (x t, )g (x t, ) T = Eψ(t)ψ T (t) The variance is reduced by regularization, at the price of some bias In the previous result, the number of parameters d is replaced by d eff : Effective dimension of umber of eigenvalues of the Hessian of V that are larger than δ (the regularization parameter) ote: d eff d = dim EH(ˆ ) = H( ) + E (ˆ ) T H ( )(ˆ ) Etr [ (ˆ ) T H ( )(ˆ )] = Etr [ H ( )(ˆ )(ˆ ) T ] = tr H ( )Cov ˆ = λ tr [ (Eψψ T )(Eψψ T ) ] = d λ Choice of Size: Basic Trade-off onparametric methods H() = lim t g (x t ) g(x t, ), EH(ˆ ) H( ) + λ d A good model size is one that minimizes this expression H( ) is the best possible fit that can be achieved within the parameterization A smaller value of this means less bias Thus, more parameters gives a more flexible model parameterization and hence less bias More parameters lead however to higher variance The model size is thus a bias variance trade-off ote that this balance is usually reached with a non-zero H( ), that is, it is normal to accept bias Also a larger size model can be used when more data are available (larger ) If a regularized criterion is used, the size of the regularization parameter δ can also be used to control the flexibility of the parametrization Choice of Type (basis functions) Generally speaking, the parameterization should be such that useful flexibility is achieved with as few parameters as possible: Grey box models Tunable or on-tunable Basis functions: g(x, ) = n α kf k (x,) + More flexible structure = Less parameters More work to minimize (non-tunable = Linear Least Squares) Use (number of parameters) d or (regularization parameter) δ as a size-tuning knob When no natural ordering of structures: Easier to use δ onparametric methods
7 on-parametric Methods 9 Example A simple idea is to locally smooth the noisy observations of the function values: ĝ (x) = C(x, x k )y(k) C(x, x k ) = x Often C(x, x k ) = c(x x k )/λ k and c(r) = for r > β, β = the bandwidth These are known as kernel methods in statistics If C(x, x t ) is chosen so that it is non-zero (= /k) only for k observed values x t around x, this is the k-nearest neighbor method C(x, x k ) = U((x x k )/β); U( ) the unit pulse β = Cyan dots: Computed for x = : : System Identification: Curve Fitting Analysis of on-parametric Methods The Trade-off ε (x) = ĝ (x) g (x) = C(x, x k )y(k) g (x) = C(x, x k )(e(k) + [g (x k ) g (x)]) Eε (x) = C(x, x k )[g (x k ) g (x)] Var ε (x) = C (x, x k )λ (for white e with variance λ) Think of C(x, x k ) = U((x x k )/β) where U is the unit pulse: MSE: H(x) = { C(x, x k ) = if x x k β k else k = number of x k in the bin x x k β [ C (x, x k )λ + C(x, x k )[g (x k ) g (x)]] k λ + variation of g (x) over x x k β Trade-off: Want β to be small for small bias Want β to be large for small variance The best choice depends on the nature of g Parametric and onparametric Methods Summary Theme Consider the parametric method using unit pulses U(x): g(x, ) = n k U((x γ k )/β) n (y(t) g(x t, )) = n t: xt γk <β β and γ k given γ k γ k = β t: xt γk <β (y(t) k ) ˆ k = k (y(t) g(x t, )) = t: xt γk <β y(t) This means that ĝ(γ k ) = ˆ k If we use a nonparametric method to estimate g at x = γ k with C(x, x k ) = k U((x x k)/β) we obtain the same estimate We have used the simple case of curve-fitting to illustrate basic issues, frameworks and techniques for linear and nonlinear system identification Parametric onparametric methods Choice of model parametrization, model size and parameter values Parameter values easy: Some version of least squares fit Basic asymptotic properties: ˆ, best possible approximation available in the parameterization (for the used x t -sequence) (ˆ ) (, P), P = λ[eψ(t)ψ T (t)] (ormal distribution) Choice of parametric model structure guided by bias-variance trade off (number of parameters) Choice of nonparametric method guided by bias-variance trade off (band-with of the kernel)
Outline 2(42) Sysid Course VT An Overview. Data from Gripen 4(42) An Introductory Example 2,530 3(42)
Outline 2(42) Sysid Course T1 2016 An Overview. Automatic Control, SY, Linköpings Universitet An Umbrella Contribution for the aterial in the Course The classic, conventional System dentification Setup
More informationSystem Identification: From Data to Model
1 : From Data to Model With Applications to Aircraft Modeling Division of Automatic Control Linköping University Sweden Dynamic Systems 2 A Dynamic system has an output response y that depends on (all)
More informationParameter Estimation in a Moving Horizon Perspective
Parameter Estimation in a Moving Horizon Perspective State and Parameter Estimation in Dynamical Systems Reglerteknik, ISY, Linköpings Universitet State and Parameter Estimation in Dynamical Systems OUTLINE
More informationModelling Non-linear and Non-stationary Time Series
Modelling Non-linear and Non-stationary Time Series Chapter 2: Non-parametric methods Henrik Madsen Advanced Time Series Analysis September 206 Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September
More informationOutline lecture 2 2(30)
Outline lecture 2 2(3), Lecture 2 Linear Regression it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic Control
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationEcon 582 Nonparametric Regression
Econ 582 Nonparametric Regression Eric Zivot May 28, 2013 Nonparametric Regression Sofarwehaveonlyconsideredlinearregressionmodels = x 0 β + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β The assume
More informationTerminology for Statistical Data
Terminology for Statistical Data variables - features - attributes observations - cases (consist of multiple values) In a standard data matrix, variables or features correspond to columns observations
More informationLocal Modelling with A Priori Known Bounds Using Direct Weight Optimization
Local Modelling with A Priori Known Bounds Using Direct Weight Optimization Jacob Roll, Alexander azin, Lennart Ljung Division of Automatic Control Department of Electrical Engineering Linköpings universitet,
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationLocal Modelling of Nonlinear Dynamic Systems Using Direct Weight Optimization
Local Modelling of Nonlinear Dynamic Systems Using Direct Weight Optimization Jacob Roll, Alexander Nazin, Lennart Ljung Division of Automatic Control Department of Electrical Engineering Linköpings universitet,
More informationOutline Lecture 2 2(32)
Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic
More informationFUNDAMENTAL FILTERING LIMITATIONS IN LINEAR NON-GAUSSIAN SYSTEMS
FUNDAMENTAL FILTERING LIMITATIONS IN LINEAR NON-GAUSSIAN SYSTEMS Gustaf Hendeby Fredrik Gustafsson Division of Automatic Control Department of Electrical Engineering, Linköpings universitet, SE-58 83 Linköping,
More informationLinear Regression (continued)
Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression
More informationChapter 9. Non-Parametric Density Function Estimation
9-1 Density Estimation Version 1.1 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least
More informationSupport Vector Regression (SVR) Descriptions of SVR in this discussion follow that in Refs. (2, 6, 7, 8, 9). The literature
Support Vector Regression (SVR) Descriptions of SVR in this discussion follow that in Refs. (2, 6, 7, 8, 9). The literature suggests the design variables should be normalized to a range of [-1,1] or [0,1].
More information9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures
FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models
More information41903: Introduction to Nonparametrics
41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 24) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu October 2, 24 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 24) October 2, 24 / 24 Outline Review
More informationBrief Review on Estimation Theory
Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on
More informationChapter 9. Non-Parametric Density Function Estimation
9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least
More informationNonparametric Methods
Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis
More informationCOMP 551 Applied Machine Learning Lecture 20: Gaussian processes
COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55
More informationUsing Multiple Kernel-based Regularization for Linear System Identification
Using Multiple Kernel-based Regularization for Linear System Identification What are the Structure Issues in System Identification? with coworkers; see last slide Reglerteknik, ISY, Linköpings Universitet
More informationIntroduction to Machine Learning Fall 2017 Note 5. 1 Overview. 2 Metric
CS 189 Introduction to Machine Learning Fall 2017 Note 5 1 Overview Recall from our previous note that for a fixed input x, our measurement Y is a noisy measurement of the true underlying response f x):
More informationEstimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators
Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationTo get horizontal and slant asymptotes algebraically we need to know about end behaviour for rational functions.
Concepts: Horizontal Asymptotes, Vertical Asymptotes, Slant (Oblique) Asymptotes, Transforming Reciprocal Function, Sketching Rational Functions, Solving Inequalities using Sign Charts. Rational Function
More informationLinear regression COMS 4771
Linear regression COMS 4771 1. Old Faithful and prediction functions Prediction problem: Old Faithful geyser (Yellowstone) Task: Predict time of next eruption. 1 / 40 Statistical model for time between
More informationIdentification, Model Validation and Control. Lennart Ljung, Linköping
Identification, Model Validation and Control Lennart Ljung, Linköping Acknowledgment: Useful discussions with U Forssell and H Hjalmarsson 1 Outline 1. Introduction 2. System Identification (in closed
More informationLecture 3: Statistical Decision Theory (Part II)
Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical
More informationoutput dimension input dimension Gaussian evidence Gaussian Gaussian evidence evidence from t +1 inputs and outputs at time t x t+2 x t-1 x t+1
To appear in M. S. Kearns, S. A. Solla, D. A. Cohn, (eds.) Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 999. Learning Nonlinear Dynamical Systems using an EM Algorithm Zoubin
More informationECE 636: Systems identification
ECE 636: Systems identification Lectures 9 0 Linear regression Coherence Φ ( ) xy ω γ xy ( ω) = 0 γ Φ ( ω) Φ xy ( ω) ( ω) xx o noise in the input, uncorrelated output noise Φ zz Φ ( ω) = Φ xy xx ( ω )
More informationOn Identification of Cascade Systems 1
On Identification of Cascade Systems 1 Bo Wahlberg Håkan Hjalmarsson Jonas Mårtensson Automatic Control and ACCESS, School of Electrical Engineering, KTH, SE-100 44 Stockholm, Sweden. (bo.wahlberg@ee.kth.se
More informationGaussian Processes for Machine Learning
Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of
More informationAverage-cost temporal difference learning and adaptive control variates
Average-cost temporal difference learning and adaptive control variates Sean Meyn Department of ECE and the Coordinated Science Laboratory Joint work with S. Mannor, McGill V. Tadic, Sheffield S. Henderson,
More informationKernel Methods and Support Vector Machines
Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,
More informationSimple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017
Simple Techniques for Improving SGD CS6787 Lecture 2 Fall 2017 Step Sizes and Convergence Where we left off Stochastic gradient descent x t+1 = x t rf(x t ; yĩt ) Much faster per iteration than gradient
More informationIntroduction to the regression problem. Luca Martino
Introduction to the regression problem Luca Martino 2017 2018 1 / 30 Approximated outline of the course 1. Very basic introduction to regression 2. Gaussian Processes (GPs) and Relevant Vector Machines
More informationLinear Models for Regression. Sargur Srihari
Linear Models for Regression Sargur srihari@cedar.buffalo.edu 1 Topics in Linear Regression What is regression? Polynomial Curve Fitting with Scalar input Linear Basis Function Models Maximum Likelihood
More informationGaussian Processes (10/16/13)
STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs
More informationHastie, Tibshirani & Friedman: Elements of Statistical Learning Chapter Model Assessment and Selection. CN700/March 4, 2008.
Hastie, Tibshirani & Friedman: Elements of Statistical Learning Chapter 7.1-7.9 Model Assessment and Selection CN700/March 4, 2008 Satyavarta sat@cns.bu.edu Auditory Neuroscience Laboratory, Department
More informationWeb-based Supplementary Material for. Dependence Calibration in Conditional Copulas: A Nonparametric Approach
1 Web-based Supplementary Material for Dependence Calibration in Conditional Copulas: A Nonparametric Approach Elif F. Acar, Radu V. Craiu, and Fang Yao Web Appendix A: Technical Details The score and
More informationDensity Estimation (II)
Density Estimation (II) Yesterday Overview & Issues Histogram Kernel estimators Ideogram Today Further development of optimization Estimating variance and bias Adaptive kernels Multivariate kernel estimation
More informationSpatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood
Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October
More informationCONTROL SYSTEMS, ROBOTICS, AND AUTOMATION - Vol. V - Prediction Error Methods - Torsten Söderström
PREDICTIO ERROR METHODS Torsten Söderström Department of Systems and Control, Information Technology, Uppsala University, Uppsala, Sweden Keywords: prediction error method, optimal prediction, identifiability,
More informationGaussian processes for inference in stochastic differential equations
Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017
More informationLecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices
Lecture 3: Simple Linear Regression in Matrix Format To move beyond simple regression we need to use matrix algebra We ll start by re-expressing simple linear regression in matrix form Linear algebra is
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More information6.435, System Identification
System Identification 6.435 SET 3 Nonparametric Identification Munther A. Dahleh 1 Nonparametric Methods for System ID Time domain methods Impulse response Step response Correlation analysis / time Frequency
More information18.9 SUPPORT VECTOR MACHINES
744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the
More informationNonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group
Nonparmeteric Bayes & Gaussian Processes Baback Moghaddam baback@jpl.nasa.gov Machine Learning Group Outline Bayesian Inference Hierarchical Models Model Selection Parametric vs. Nonparametric Gaussian
More informationLTI Approximations of Slightly Nonlinear Systems: Some Intriguing Examples
LTI Approximations of Slightly Nonlinear Systems: Some Intriguing Examples Martin Enqvist, Lennart Ljung Division of Automatic Control Department of Electrical Engineering Linköpings universitet, SE-581
More informationDensity estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas
0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity
More information11. Further Issues in Using OLS with TS Data
11. Further Issues in Using OLS with TS Data With TS, including lags of the dependent variable often allow us to fit much better the variation in y Exact distribution theory is rarely available in TS applications,
More informationEstimators as Random Variables
Estimation Theory Overview Properties Bias, Variance, and Mean Square Error Cramér-Rao lower bound Maimum likelihood Consistency Confidence intervals Properties of the mean estimator Introduction Up until
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More informationMax Margin-Classifier
Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization
More informationCognitive Cyber-Physical System
Cognitive Cyber-Physical System Physical to Cyber-Physical The emergence of non-trivial embedded sensor units, networked embedded systems and sensor/actuator networks has made possible the design and implementation
More informationLecture Stat Information Criterion
Lecture Stat 461-561 Information Criterion Arnaud Doucet February 2008 Arnaud Doucet () February 2008 1 / 34 Review of Maximum Likelihood Approach We have data X i i.i.d. g (x). We model the distribution
More informationFurther Results on Model Structure Validation for Closed Loop System Identification
Advances in Wireless Communications and etworks 7; 3(5: 57-66 http://www.sciencepublishinggroup.com/j/awcn doi:.648/j.awcn.735. Further esults on Model Structure Validation for Closed Loop System Identification
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationState Estimation of Linear and Nonlinear Dynamic Systems
State Estimation of Linear and Nonlinear Dynamic Systems Part I: Linear Systems with Gaussian Noise James B. Rawlings and Fernando V. Lima Department of Chemical and Biological Engineering University of
More informationRirdge Regression. Szymon Bobek. Institute of Applied Computer science AGH University of Science and Technology
Rirdge Regression Szymon Bobek Institute of Applied Computer science AGH University of Science and Technology Based on Carlos Guestrin adn Emily Fox slides from Coursera Specialization on Machine Learnign
More informationMonte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan
Monte-Carlo MMD-MA, Université Paris-Dauphine Xiaolu Tan tan@ceremade.dauphine.fr Septembre 2015 Contents 1 Introduction 1 1.1 The principle.................................. 1 1.2 The error analysis
More informationAUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET
Identification of Linear and Nonlinear Dynamical Systems Theme : Nonlinear Models Grey-box models Division of Automatic Control Linköping University Sweden General Aspects Let Z t denote all available
More information2 Statistical Estimation: Basic Concepts
Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 2 Statistical Estimation:
More informationADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II
1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector
More informationClassification: The rest of the story
U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher
More informationCS 7140: Advanced Machine Learning
Instructor CS 714: Advanced Machine Learning Lecture 3: Gaussian Processes (17 Jan, 218) Jan-Willem van de Meent (j.vandemeent@northeastern.edu) Scribes Mo Han (han.m@husky.neu.edu) Guillem Reus Muns (reusmuns.g@husky.neu.edu)
More informationStatistical inference on Lévy processes
Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline
More informationVariance Reduction and Ensemble Methods
Variance Reduction and Ensemble Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Last Time PAC learning Bias/variance tradeoff small hypothesis
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued
Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 6: Bias and variance (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 49 Our plan today We saw in last lecture that model scoring methods seem to be trading off two different
More informationLecture 7: Kernels for Classification and Regression
Lecture 7: Kernels for Classification and Regression CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 15, 2011 Outline Outline A linear regression problem Linear auto-regressive
More informationECE 636: Systems identification
ECE 636: Systems identification Lectures 3 4 Random variables/signals (continued) Random/stochastic vectors Random signals and linear systems Random signals in the frequency domain υ ε x S z + y Experimental
More informationVector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.
Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar
More information12 - Nonparametric Density Estimation
ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6
More informationSGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection
SG 21006 Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection Ioan Tabus Department of Signal Processing Tampere University of Technology Finland 1 / 28
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationPARAMETER ESTIMATION AND ORDER SELECTION FOR LINEAR REGRESSION PROBLEMS. Yngve Selén and Erik G. Larsson
PARAMETER ESTIMATION AND ORDER SELECTION FOR LINEAR REGRESSION PROBLEMS Yngve Selén and Eri G Larsson Dept of Information Technology Uppsala University, PO Box 337 SE-71 Uppsala, Sweden email: yngveselen@ituuse
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationSystem Identification: An Inverse Problem in Control
System Identification: An Inverse Problem in Control Potentials and Possibilities of Regularization Lennart Ljung Linköping University, Sweden Inverse Problems, MAI, Linköping, April 4, 2013 System Identification
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationGeneralization theory
Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1
More informationFor a stochastic process {Y t : t = 0, ±1, ±2, ±3, }, the mean function is defined by (2.2.1) ± 2..., γ t,
CHAPTER 2 FUNDAMENTAL CONCEPTS This chapter describes the fundamental concepts in the theory of time series models. In particular, we introduce the concepts of stochastic processes, mean and covariance
More informationLeast Absolute Shrinkage is Equivalent to Quadratic Penalization
Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr
More informationEstimation of cumulative distribution function with spline functions
INTERNATIONAL JOURNAL OF ECONOMICS AND STATISTICS Volume 5, 017 Estimation of cumulative distribution function with functions Akhlitdin Nizamitdinov, Aladdin Shamilov Abstract The estimation of the cumulative
More informationReview (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology
Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna
More informationCOMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017
COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY
More informationCSC321 Lecture 2: Linear Regression
CSC32 Lecture 2: Linear Regression Roger Grosse Roger Grosse CSC32 Lecture 2: Linear Regression / 26 Overview First learning algorithm of the course: linear regression Task: predict scalar-valued targets,
More informationA Novel Nonparametric Density Estimator
A Novel Nonparametric Density Estimator Z. I. Botev The University of Queensland Australia Abstract We present a novel nonparametric density estimator and a new data-driven bandwidth selection method with
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationSingle Equation Linear GMM with Serially Correlated Moment Conditions
Single Equation Linear GMM with Serially Correlated Moment Conditions Eric Zivot October 28, 2009 Univariate Time Series Let {y t } be an ergodic-stationary time series with E[y t ]=μ and var(y t )
More informationCh. 12 Linear Bayesian Estimators
Ch. 1 Linear Bayesian Estimators 1 In chapter 11 we saw: the MMSE estimator takes a simple form when and are jointly Gaussian it is linear and used only the 1 st and nd order moments (means and covariances).
More information