ECE 636: Systems identification
|
|
- Jesse Tucker
- 5 years ago
- Views:
Transcription
1 ECE 636: Systems identification Lectures 9 0 Linear regression
2 Coherence Φ ( ) xy ω γ xy ( ω) = 0 γ Φ ( ω) Φ xy ( ω) ( ω) xx o noise in the input, uncorrelated output noise Φ zz Φ ( ω) = Φ xy xx ( ω ) ( ω) Φ ( ω) =Φ ( ω) Φ ( ω) ee zz yy Φ ( ) xy ω Φuz ( ω) γ xy ( ω) = = Φ ( ω) Φ ( ω) Φ ( ω)( Φ ( ω) +Φ ( ω)) xx yy xx zz ee = +Φ ( ω)/ Φ ( ω) ee zz yy m(t) + u(t) x(t) H(ω) z(t) + e(t) y(t) Uncorrelated input and output noise Φ ( ) ω xy Φ uz ( ω ) γ xy ( ω ) = = Φ ( ω) Φ ( ω) ( Φ ( ω) +Φ ( ω))( Φ ( ω) +Φ ( ω)) xx yy uu mm zz ee = + c ( ω) + c ( ω) + c ( ω) c ( ω) c c Φ ( ω ( ) ) mm ω = Φ ( ω) Φ ( ω) = Φ uu ee zz ( ω) ( ω) γ ( ω) < xy
3 ime domain Impulse response analysis Step response analysis Correlation analysis White noise input Generally onparametric identification ˆ ϕ ( τ) = gˆ ( τ)* ˆ ϕ ( τ) uy uu g ϕ ( τ ) yt () gt ˆ( ) = α yt () yt ( ) gt ˆ( () = α ˆ( ) uy n= τ 0( τ ) = g τ = σ ynun ( ) ( τ ) u ( n) = n ˆ ϕ (0) uy ˆ ϕ (0) ˆ ( )... ˆ ( ( )) gˆ (0) uu ϕuu ϕ uu Μ ˆ ϕ () ˆ ˆ ˆ ˆ() uy () (0)... ( ( )) g ϕuu ϕuu ϕuu Μ = ˆ ϕ ( ) ˆ ( ) ˆ ( ) ˆ (0) ˆ uy Μ ϕ g( ) uu Μ ϕuu Μ ϕ Μ uu Least squares y=ugˆ y() u() gˆ (0) y() u() u()... 0 gˆ () = y ( ) u ( ) u ( )... u ( M+ ) gˆ ( Μ) g=uu ˆ ( ) Uy u(t) ( ) gˆ = Φuu Φuy g 0 (τ) + υ(t) y(t)
4 Frequency domain Sine wave testing onparametric identification ut () = α cos( ω0t) y () t = α G0 ( ω0 )cos( ω0 t + G0 ( ω0 )) + υ () t + transient ϕ = G ( ω ) 0 0 Frequency response analysis Empirical transfer function estimate Y ( ) ˆ( ) ω G ω = U ( ω) (large variance for Ν > ) u(t) g 0 (τ) + υ(t) y(t) ˆ Φ yu ( ω ) G ( ω ) = Φ ( ω) Smoothing Windowing uu 0 W ( ξ ω ) U ( ξ) Gˆ ( ξ) dξ ω0 Δω γ ˆ( ) G( ω ) = Φ ˆ Φ ˆ uu yu ω +Δω 0 ω0 +Δ ω ω Δω 0 ( ω) = w( τ) ˆ ϕuu ( τ) τ = ( ω) = w( τ) ˆ ϕyu ( τ) τ = 0 W ( ξ ω ) U ( ξ) dξ γ 0 i e τω i e τω
5 Linear regression One of the most common problems in quantitative sciences is predicting the values of a dependent variable y based on the information given by a set of independent variables φ,,φ d Linear regression models assume that the dependence of y on φ,,φ d is linear and have been studied extensively in statisticsandand used in many other scientific fields (econometrics, human sciences, psychology, engineering etc.) hese models can be theoretically analyzed in detail and often yield satisfactory descriptions of reality Additionally, in cases where the amount of experimental data is relatively low and/or we have considerable noise, linear regression models may yield better results than more complex nonlinear models Linear regression Leastsquaresmethod: squares Gauss (809) Generally, linear regression aims in calculating a function of the independent variables g(φ) based on observations of φ and y, such that the difference: y g( φ ) = y yˆ is small. We can treat linear regression both in a deterministic and a stochastic context
6 he general form of a linear regression model is g( φ) = θϕ + θϕ θdϕd = φ θ where φ = ϕ ϕ... ϕ, θ = θ θ... θ [ ] [ ] d Example: Curve fitting (Lecture ) Linear regression M In this case: φ = x... x, θ = w 0 w... w M Example: Impulse response model for finite memory LI system (Lectures 7 8) d [ ] [ ] [ ] In this case: φ = ut ( ) ut ( )... ut ( M+ ), θ = h(0) h()... hm ( ) In general, note that we can have nonlinear transformation terms of some independent variable(s) such as 3 logarithmic, polynomial etc, e.g. ϕ = ϕ, ϕ3 = ϕ (as in curve fitting) or even interaction terms such as ϕ = ϕ ϕ 3 In the case of systems identification (but not in general) the vector φ will be a function of time, i.e. φ(t) Both linear and nonlinear systems identification may be formulated as linear regression problems!
7 Linear regression gx ( ) = θ x + θ gx ( ) = θ ( x = θ + θ x + θ x 0 + θx+ θx g( ) 0 0
8 Linear regression he objective is to find an estimate of the parameter vector θ, i.e. to obtain measurements of φ and y, i.e. the set {φ, y,φ, y,,φ,y } We can write the following set of linear equations: y = φ θ y =... y φθ = φ θ φθ from or, in matrix form: y = Φθ where: y y: x vector, Φ: xd matrix φ y, φ y = Φ = y φ If =d then we can invert Φ to obtain θ, however we typically have data contaminated by noise and in this case we need >>d in order to obtain good results (Lecture ) overdetermined system, in general we don t have an exact solution How do we solve the above matrix equation? Define the model errors/residuals as ε = and their vector ε i yi φθ i ε ε =... ε
9 Linear regression We may then define the least squares estimate t of θ as the vector that t minimizes i i the following cost function V y g y ( θ) = ε k = [ k ( φk) ] = k k = = k= k= φ θ ε ε ε k= i.e. we are looking for the values of θ such that: θˆ = arg min V θ ( θ) Set derivative =0 V ( θ) = y = = ( ) ( ) = φ θ ε ε y Φθ y Φθ k k k = [ = yy+ θφφθy Φθ θ Φ y ] V ( θ ) = Φ y + ΦΦθ= 0 θ ˆ ( ) θν = Φ Φ Φ y ΦΦ *ote from algebra θ Aθ = Aθ + θ a θ θ a = = a θ θ If Φ is full rank, the matrix is nonsingular and positive definite and we have the above unique solution which corresponds to a minimum as: V ( θ) =ΦΦ θθ Equivalently: θ ˆ Ν = k k kyk φ φ k= φ k= If Φ is not full rank, we have infinite solutions. he matrix ( ΦΦ) Φ A θ is termed the pseudoinverse of Φ
10 Linear regression Geometric interpretation {y,φ,φ,,φ }: vectors in R Equivalent problem: Find a linear combination of {φ,φ,,φ d } which approximates the vector y as well as possible. he vectors {φ,,φ define a subspace R d of R,φ,φ d } if d< If y belongs to this subspace, we can express it as a linear combination of {φ,φ,,φ d } If not? he best approximation that belongs to R d is the one with the smallest distance from the vector y, which is the orthogonal projection of y on the subspace R d ( yyˆ ) φi, i=,,..., d herefore ( y yˆ ) φ i = 0 d and since yˆ d yˆ = θˆ φ j= j d y φi jφ φ j j j= j = ˆθ, i =,,..., d which in matrix form becomes: ( ΦΦθ ) ˆ Ν ˆ ( ) Ν = = Φy θ Φ Φ Φ y = 3, d =
11 Weighted least squares Often, our observations may not be equally reliable: we can weigh them differently V( θ) = α k y k k φ θ In matrix form k = V( θ) = ( y Φθ) Q( y Φθ) α 0 0 Q = α and: Ν ( ) θ ˆ Φ Q Φ Φ Q y Ν = ˆ θ Ν = α α φφ k= φ k= k k k k k k he model prediction error/residuals are given by: εˆ = y yˆ = y Φθˆ Ν y he percentage of the observations y that are explained by the linear regression model may be quantified by: yˆ ˆ k ε k Correlation coefficient, normalized mean square error k= k= Ry =, MSE = y y k k= k= k
12 Linear regression in a stochastic context Assume that our observations are a function of time and that they are given by: y() t = φ () t θ + e() t 0 Eet {()} = 0, Eetes {()()} = rls Assume also that: φ(t) is deterministic In the simplest case, e(t) is white noise with covariance matrix E { ee } = λ Ι Properties of the least squares estimate he quantity ˆ ( ) θ LS = Φ Φ Φ y Is an unbiased estimate of θ i.e. ˆ 0 E { θ LS} = θ 0 he covariance matrix of the least squares estimate is: {( ˆ { ˆ })( ˆ { ˆ }) E θ ( ) LS E θls θls E θls } = λ0 Φ Φ herefore, the covariance matrix depends on the noise variance and the input characteristics Specifically, it is desirable that the input is such that the values of the elements of the inverse matrix above are small he value of λ 0 is not known: How can we estimate it? he following noise variance estimate: ˆ λ = ˆ yt () () t d φ θ Ν t= is unbiased
13 Best linear unbiased estimate BLUE We have seen that the weighted least squares estimate is: θ ˆ Φ Q Φ Φ Q y Ν = ( ) Question: For which matrix Q is the variance of the estimates minimized? In the general case where: E { ee } = R Cov{ ˆ θ } = Φ Q Φ Φ Q R Q ΦΦQ Φ ( ) ( ) WLS he matrix Q which minimizes the above is: Q = R he resulting estimate for this choice of Q ˆ θ = Φ Q Φ Φ Q y = θˆ ( ) WLS BLUE is called the best linear unbiased estimate (BLUE). What happens when e is white noise? E { ee } = diag ( λ, λ,..., λ k ) α k = λ k ˆ θ = Φ Φ Φ y = θˆ BLUE ( ) LS he standard least squares estimate is the best linear unbiased estimate If the noise is not white, there may be another estimate with lower variance
14 Distribution of the estimates We have seen that the estimated parameters ˆθ are random What is their distribution? Assume that e(t) is Gaussian White oise, i.e. its distribution is Ν(0,λ( ) he output observations follow a multivariate Gaussian distribution y ( Φθ0, λ I) he coefficient estimates also follow a multivariate Gaussian distribution: ˆ θ LS ( θ 0, λ ( Φ Φ) ) In the general case, where the noise is not white (samples are not independent ), the observations and the estimated coefficients still follow MV normal distributions: y ( Φθ, R ) 0 ( ) ( ) ˆ (, θ ) LS θ0 Φ Φ Φ R ΦΦΦ Even if the observations are not normally distributed, the distribution of ˆθ may approach a normal distribution for a large number of (central limit theorem) he estimate of the noise variance follows a χ distribution with d degrees of freedom ˆ λ = yt () () t ˆ d φ θ Ν t= specifically d λ ˆ ( Ν ) Ν λ χd
15 Statistical testing We have seen (Lectures 5 6) that we can use the sampling distribution of an estimator in order to perform statistical hypothesis testing his procedure can be applied to linear regression, in order to examine whether the value of an estimated regression coefficient is significantly different than zero, in other words if this coefficient corresponds to a regressor that should be included in the model > model order selection! We have seen that for white noise: ˆ θls ( θ, λ Φ Φ ) 0 ( ) In order to examine whether the estimate of a coefficient is significantly different than zero, we θˆj consider the null hypothesis that the real value of θ j is equal to zero, therefore that it follows the distribution: ˆθ (0, ) where r : diagonal elements of (Φ Φ) j λ r j θˆθ j j If we create the random variable z j = λ rj Considering λ known, z j follows a standard normal distribution (Ν(0,)) Lectures 5 6 Considering λ unknown and random (more realistic) z j is a ratio between a normal r.v. over the square root of a χ r.v. herefore zj follows a t distribution with Ν d degrees of freedom (Lectures 5 6) herefore, in order to decide whether the estimated value of z j is significantly different than zero we can compare this value to the tail probability t where α is the level of significance d, α /
16 Matlab: tcdf(x,v) tpdf(x,v) tinv(p,v) Statistical testing For large values of Ν, the t distribution approximates the standard normal distribution (0,) and we can compare the estimates θ ˆ to the tail probabilities of the j Ν(0,) distribution Matlab: P = ORMCDF(X,MU,SIGMA) Y = ORMPDF(X,MU,SIGMA) X = ORMIV(P,MU,SIGMA) (MU=0, SIGMA=)
17 Statistical testing Similarly, we could compare the estimated value of a coefficient ˆθ j is significantly different than a value by creating the r.v. ˆθ j θ 0, j z j = λ rj which follows at or Ν(0,) distribution as before he corresponding confidence interval, which quantifies the uncertainty for each estimate, may be obtained from (θ ˆ t ˆ λ r,θ ˆ + t ˆ λ r ) j d, α/ j j d, α/ j θ0, j We can also examine the significance of a group of coefficients simultaneously (e.g. a group that may correspond to a specific independent variable), by computing the value of the following quantity: F = ( MSEMSE)/( d d) MSE /( d ) Fd d d For Gaussian white noise, this quantity follows the, distribution (Lectures 5 6), where d and d are the number of coefficients in the more complex and simpler regression models respectively For large Ν this distribution approximates the χd d distribution
18 Statistical testing model order selection How can we use this result to select the regression model order? In the ideal case (no noise true system of the same type) when we increase d adequately the error falls to zero Realistically: gradual decrease of V(θ) as we increase d When should we stop increasing d? For two models M and M where M is more complex (i.e. it includes more regressors), we should decide whether the reduction in the cost function ΔV=V V for M and M is significant We can examine the normalized quantity V V V Moreover, when Ν tends to infinity and if the true system can be perfectly described by the model M, then ΔV should tend to zero V ( An appropriate test quantity is V) V
19 herefore if we have y() t = φ () t θ + e() t 0 e t iid { ()}, ε (0, λ ) We have seen that the quantity ( VV)/( dd) F = V /( d ) Statistical testing model order selection Fd d, χ d follows an distribution which approximates the d d distribution for large > herefore, in order to compare the performance between two models Μ and Μ We compute the mean square errors and the corresponding F quantity We determine the level of significance α We compare this quantity to the tail probabilities F α d or d, d χ d d, α If F < χ : accept model Μ d d, α If F > χd d, α : accept model Μ Matlab: Y = CHIPDF(X,V),P = CHICDF(X,V), X = CHIIV(P,V)
20 Computational issues Eigenvalue decomposition of the matrix Φ Φ (Hessian) ΦΦ= UΛU We can have an idea regarding possible arithmetic errors by calculating the ratio of the (absolutely) largest over the smallest eigenvalue, termed the condition number of the matrix (Matlab: cond, rcond) he largest this number is, the closest the determinant of the matrix Φ Φ is closer to zero : greater sensitivity to small changes in the data, which may produce large values in the estimated coefficients his in turn depends on the input characteristics (next lecture) We may do the same by computing the singular value decomposition of the non square matrix Φ and examining its singular values
21 Estimation of the coefficients θ ˆ Φ Φ Φ y Ν = ( ) Computational issues requires inverting the matrix Φ Φ. As discussed in the previous slide, this inversion may lead to problems particularly for large matrices that are close to being singular (determinant close to zero) or sparse matrices. Small changes in observations > large changes in estimated coefficients! Possible solutions: QR decomposition: here exists an orthogonal matrix Q (Q Q=I) such as for any non square matrix Φ (Ν>d) we can write: Φ = QR where R is an upper triangular matrix Multiplying the relation Φθ=y with Q we get: Qy= QΦ = Rθ We can therefore solve the equivalent problem: Rθ = Qy which is easier to solve (R is triangular) Less sensitivity to errors (the condition number of R is equal to the square root of the initial matrix condition number) Matlab: qr
22 Computational issues Singular value decomposition Φ = UΣV Φ: Νxd, U,V orthogonal (U: x, V: dxd), Σ diagonal We can keep only the (absolutely) largest singular values of Φ i.e. Σ 0 V Φ = UΣV = U U 0 Σ V and calculate the reduced rank pseudoinverse, i.e.: Φ + + = V Σ U in other words, we reject the coefficients corresponding to small singular values and we solve a problem of reduced order Matlab: [U,S,V] = SVD(X)
23 Regularization Regularization: Similarly, when our problem is ill conditioned we can modify the cost function (see also Lecture ) as follows: W ( θ) = V ( θ) +λv ( θ) R For example we can use a quadratic regularization term i.e.: ( ) [ ( )] λ W θ = yk g φk + θ θ k = In this case we can obtain an analytic solution as: ( ) ˆ λ θ = Φ Φ+ Ι Φ y reg Addition of a term λι to the matrix Φ Φ: improvement of condition number Increase of λ: We bring the estimates closer to zero, i.e. we induce bias in the estimates but we improve the variance, computational problems and avoid overfitting (bias/variance tradeoff) In general we can use cost functions of the form: d [ ] λ q ( θ) = k ( φk) + θ j k= j= W y g
24 q=: Lasso regularization Regularization
25 Regularization Minimization of [ ] q W ( θ) = yk g( φk) + θ j is equivalent to least squares k= j= minimization under constraints for the coefficients, i.e.: λ d d j= θ j q η Regularization with small λ (eg Lasso) )leads to sparser solutions some weights iht are driven to zero, therefore we select only significant terms
ECE 636: Systems identification
ECE 636: Systems identification Lectures 7 8 onparametric identification (continued) Important distributions: chi square, t distribution, F distribution Sampling distributions ib i Sample mean If the variance
More informationDS-GA 1002 Lecture notes 10 November 23, Linear models
DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.
More informationECE 636: Systems identification
ECE 636: Systems identification Lectures 3 4 Random variables/signals (continued) Random/stochastic vectors Random signals and linear systems Random signals in the frequency domain υ ε x S z + y Experimental
More informationLinear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,
Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,
More informationEECE Adaptive Control
EECE 574 - Adaptive Control Basics of System Identification Guy Dumont Department of Electrical and Computer Engineering University of British Columbia January 2010 Guy Dumont (UBC) EECE574 - Basics of
More informationInverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1
Inverse of a Square Matrix For an N N square matrix A, the inverse of A, 1 A, exists if and only if A is of full rank, i.e., if and only if no column of A is a linear combination 1 of the others. A is
More informationThis model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that
Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More information1 Cricket chirps: an example
Notes for 2016-09-26 1 Cricket chirps: an example Did you know that you can estimate the temperature by listening to the rate of chirps? The data set in Table 1 1. represents measurements of the number
More informationSTAT 100C: Linear models
STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix
More informationLeast squares: introduction to the network adjustment
Least squares: introduction to the network adjustment Experimental evidence and consequences Observations of the same quantity that have been performed at the highest possible accuracy provide different
More informationA Probability Review
A Probability Review Outline: A probability review Shorthand notation: RV stands for random variable EE 527, Detection and Estimation Theory, # 0b 1 A Probability Review Reading: Go over handouts 2 5 in
More informationAUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET. Questions AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET
The Problem Identification of Linear and onlinear Dynamical Systems Theme : Curve Fitting Division of Automatic Control Linköping University Sweden Data from Gripen Questions How do the control surface
More informationLECTURE 7. Least Squares and Variants. Optimization Models EE 127 / EE 227AT. Outline. Least Squares. Notes. Notes. Notes. Notes.
Optimization Models EE 127 / EE 227AT Laurent El Ghaoui EECS department UC Berkeley Spring 2015 Sp 15 1 / 23 LECTURE 7 Least Squares and Variants If others would but reflect on mathematical truths as deeply
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationNext is material on matrix rank. Please see the handout
B90.330 / C.005 NOTES for Wednesday 0.APR.7 Suppose that the model is β + ε, but ε does not have the desired variance matrix. Say that ε is normal, but Var(ε) σ W. The form of W is W w 0 0 0 0 0 0 w 0
More informationReview of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley
Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationStatistics 910, #5 1. Regression Methods
Statistics 910, #5 1 Overview Regression Methods 1. Idea: effects of dependence 2. Examples of estimation (in R) 3. Review of regression 4. Comparisons and relative efficiencies Idea Decomposition Well-known
More information2 Statistical Estimation: Basic Concepts
Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 2 Statistical Estimation:
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationMultivariate Regression
Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the
More informationLecture 9 SLR in Matrix Form
Lecture 9 SLR in Matrix Form STAT 51 Spring 011 Background Reading KNNL: Chapter 5 9-1 Topic Overview Matrix Equations for SLR Don t focus so much on the matrix arithmetic as on the form of the equations.
More informationApplied Econometrics (QEM)
Applied Econometrics (QEM) based on Prinicples of Econometrics Jakub Mućk Department of Quantitative Economics Jakub Mućk Applied Econometrics (QEM) Meeting #3 1 / 42 Outline 1 2 3 t-test P-value Linear
More informationEconometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018
Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate
More informationLinear Regression Linear Regression with Shrinkage
Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle
More informationLinear Models for Regression. Sargur Srihari
Linear Models for Regression Sargur srihari@cedar.buffalo.edu 1 Topics in Linear Regression What is regression? Polynomial Curve Fitting with Scalar input Linear Basis Function Models Maximum Likelihood
More informationLinear Regression Linear Regression with Shrinkage
Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle
More informationf-domain expression for the limit model Combine: 5.12 Approximate Modelling What can be said about H(q, θ) G(q, θ ) H(q, θ ) with
5.2 Approximate Modelling What can be said about if S / M, and even G / G? G(q, ) H(q, ) f-domain expression for the limit model Combine: with ε(t, ) =H(q, ) [y(t) G(q, )u(t)] y(t) =G (q)u(t) v(t) We know
More informationConcentration Ellipsoids
Concentration Ellipsoids ECE275A Lecture Supplement Fall 2008 Kenneth Kreutz Delgado Electrical and Computer Engineering Jacobs School of Engineering University of California, San Diego VERSION LSECE275CE
More informationLinear Model Under General Variance
Linear Model Under General Variance We have a sample of T random variables y 1, y 2,, y T, satisfying the linear model Y = X β + e, where Y = (y 1,, y T )' is a (T 1) vector of random variables, X = (T
More informationMa 3/103: Lecture 24 Linear Regression I: Estimation
Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the
More informationCOMP 558 lecture 18 Nov. 15, 2010
Least squares We have seen several least squares problems thus far, and we will see more in the upcoming lectures. For this reason it is good to have a more general picture of these problems and how to
More informationAdvanced Engineering Statistics - Section 5 - Jay Liu Dept. Chemical Engineering PKNU
Advanced Engineering Statistics - Section 5 - Jay Liu Dept. Chemical Engineering PKNU Least squares regression What we will cover Box, G.E.P., Use and abuse of regression, Technometrics, 8 (4), 625-629,
More informationLecture 6. Numerical methods. Approximation of functions
Lecture 6 Numerical methods Approximation of functions Lecture 6 OUTLINE 1. Approximation and interpolation 2. Least-square method basis functions design matrix residual weighted least squares normal equation
More informationEconometrics II - EXAM Outline Solutions All questions have 25pts Answer each question in separate sheets
Econometrics II - EXAM Outline Solutions All questions hae 5pts Answer each question in separate sheets. Consider the two linear simultaneous equations G with two exogeneous ariables K, y γ + y γ + x δ
More informationLinear Models. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.
Linear Models DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Linear regression Least-squares estimation
More informationLinear Algebra Massoud Malek
CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product
More informationData Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.
TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin
More informationLecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices
Lecture 3: Simple Linear Regression in Matrix Format To move beyond simple regression we need to use matrix algebra We ll start by re-expressing simple linear regression in matrix form Linear algebra is
More informationStatistical signal processing
Statistical signal processing Short overview of the fundamentals Outline Random variables Random processes Stationarity Ergodicity Spectral analysis Random variable and processes Intuition: A random variable
More informationEconometrics II - EXAM Answer each question in separate sheets in three hours
Econometrics II - EXAM Answer each question in separate sheets in three hours. Let u and u be jointly Gaussian and independent of z in all the equations. a Investigate the identification of the following
More informationCorner. Corners are the intersections of two edges of sufficiently different orientations.
2D Image Features Two dimensional image features are interesting local structures. They include junctions of different types like Y, T, X, and L. Much of the work on 2D features focuses on junction L,
More informationOctober 25, 2013 INNER PRODUCT SPACES
October 25, 2013 INNER PRODUCT SPACES RODICA D. COSTIN Contents 1. Inner product 2 1.1. Inner product 2 1.2. Inner product spaces 4 2. Orthogonal bases 5 2.1. Existence of an orthogonal basis 7 2.2. Orthogonal
More informationLinear Regression and Its Applications
Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start
More informationECON 4160, Spring term Lecture 12
ECON 4160, Spring term 2013. Lecture 12 Non-stationarity and co-integration 2/2 Ragnar Nymoen Department of Economics 13 Nov 2013 1 / 53 Introduction I So far we have considered: Stationary VAR, with deterministic
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More informationFitting Linear Statistical Models to Data by Least Squares II: Weighted
Fitting Linear Statistical Models to Data by Least Squares II: Weighted Brian R. Hunt and C. David Levermore University of Maryland, College Park Math 420: Mathematical Modeling April 21, 2014 version
More information9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures
FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models
More informationEE731 Lecture Notes: Matrix Computations for Signal Processing
EE731 Lecture Notes: Matrix Computations for Signal Processing James P. Reilly c Department of Electrical and Computer Engineering McMaster University October 17, 005 Lecture 3 3 he Singular Value Decomposition
More informationTime series models in the Frequency domain. The power spectrum, Spectral analysis
ime series models in the Frequency domain he power spectrum, Spectral analysis Relationship between the periodogram and the autocorrelations = + = ( ) ( ˆ α ˆ ) β I Yt cos t + Yt sin t t= t= ( ( ) ) cosλ
More informationSingular Value Decomposition (SVD)
School of Computing National University of Singapore CS CS524 Theoretical Foundations of Multimedia More Linear Algebra Singular Value Decomposition (SVD) The highpoint of linear algebra Gilbert Strang
More informationCross-Validation with Confidence
Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation
More informationLinear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77
Linear Regression Chapter 3 September 27, 2016 Chapter 3 September 27, 2016 1 / 77 1 3.1. Simple linear regression 2 3.2 Multiple linear regression 3 3.3. The least squares estimation 4 3.4. The statistical
More informationEconometrics Summary Algebraic and Statistical Preliminaries
Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L
More informationVAR Model. (k-variate) VAR(p) model (in the Reduced Form): Y t-2. Y t-1 = A + B 1. Y t + B 2. Y t-p. + ε t. + + B p. where:
VAR Model (k-variate VAR(p model (in the Reduced Form: where: Y t = A + B 1 Y t-1 + B 2 Y t-2 + + B p Y t-p + ε t Y t = (y 1t, y 2t,, y kt : a (k x 1 vector of time series variables A: a (k x 1 vector
More informationAnalysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems
Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Jeremy S. Conner and Dale E. Seborg Department of Chemical Engineering University of California, Santa Barbara, CA
More informationME EN 363 Elementary Instrumentation
ME E 6 Elementary Instrumentation Curve Fitting Least squares regression analysis Standard error of fit Correlation coefficient Error in static sensitivity Regression According to various Internet resources,
More informationMS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari
MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind
More informationLeast Squares. Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Winter UCSD
Least Squares Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 75A Winter 0 - UCSD (Unweighted) Least Squares Assume linearity in the unnown, deterministic model parameters Scalar, additive noise model: y f (
More informationLinear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman
Linear Regression Models Based on Chapter 3 of Hastie, ibshirani and Friedman Linear Regression Models Here the X s might be: p f ( X = " + " 0 j= 1 X j Raw predictor variables (continuous or coded-categorical
More informationGI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil
GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection
More informationChapter 4: Models for Stationary Time Series
Chapter 4: Models for Stationary Time Series Now we will introduce some useful parametric models for time series that are stationary processes. We begin by defining the General Linear Process. Let {Y t
More informationSystem Identification & Parameter Estimation
System Identification & Parameter Estimation Wb3: SIPE lecture Correlation functions in time & frequency domain Alfred C. Schouten, Dept. of Biomechanical Engineering (BMechE), Fac. 3mE // Delft University
More information2. Review of Linear Algebra
2. Review of Linear Algebra ECE 83, Spring 217 In this course we will represent signals as vectors and operators (e.g., filters, transforms, etc) as matrices. This lecture reviews basic concepts from linear
More informationNumerical Methods. Elena loli Piccolomini. Civil Engeneering. piccolom. Metodi Numerici M p. 1/??
Metodi Numerici M p. 1/?? Numerical Methods Elena loli Piccolomini Civil Engeneering http://www.dm.unibo.it/ piccolom elena.loli@unibo.it Metodi Numerici M p. 2/?? Least Squares Data Fitting Measurement
More informationEstimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators
Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let
More informationSIGNAL AND IMAGE RESTORATION: SOLVING
1 / 55 SIGNAL AND IMAGE RESTORATION: SOLVING ILL-POSED INVERSE PROBLEMS - ESTIMATING PARAMETERS Rosemary Renaut http://math.asu.edu/ rosie CORNELL MAY 10, 2013 2 / 55 Outline Background Parameter Estimation
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationRegression Analysis. y t = β 1 x t1 + β 2 x t2 + β k x tk + ϵ t, t = 1,..., T,
Regression Analysis The multiple linear regression model with k explanatory variables assumes that the tth observation of the dependent or endogenous variable y t is described by the linear relationship
More informationCointegrated VAR s. Eduardo Rossi University of Pavia. November Rossi Cointegrated VAR s Financial Econometrics / 56
Cointegrated VAR s Eduardo Rossi University of Pavia November 2013 Rossi Cointegrated VAR s Financial Econometrics - 2013 1 / 56 VAR y t = (y 1t,..., y nt ) is (n 1) vector. y t VAR(p): Φ(L)y t = ɛ t The
More informationStatistical inference
Statistical inference Contents 1. Main definitions 2. Estimation 3. Testing L. Trapani MSc Induction - Statistical inference 1 1 Introduction: definition and preliminary theory In this chapter, we shall
More informationLecture 10 Multiple Linear Regression
Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable
More informationAdvanced Econometrics
Based on the textbook by Verbeek: A Guide to Modern Econometrics Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna May 16, 2013 Outline Univariate
More informationSupplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017
Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION By Degui Li, Peter C. B. Phillips, and Jiti Gao September 017 COWLES FOUNDATION DISCUSSION PAPER NO.
More information[y i α βx i ] 2 (2) Q = i=1
Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation
More informationInference in Regression Analysis
Inference in Regression Analysis Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 1 Today: Normal Error Regression Model Y i = β 0 + β 1 X i + ǫ i Y i value
More informationLinear Least-Squares Data Fitting
CHAPTER 6 Linear Least-Squares Data Fitting 61 Introduction Recall that in chapter 3 we were discussing linear systems of equations, written in shorthand in the form Ax = b In chapter 3, we just considered
More informationProf. Dr.-Ing. Armin Dekorsy Department of Communications Engineering. Stochastic Processes and Linear Algebra Recap Slides
Prof. Dr.-Ing. Armin Dekorsy Department of Communications Engineering Stochastic Processes and Linear Algebra Recap Slides Stochastic processes and variables XX tt 0 = XX xx nn (tt) xx 2 (tt) XX tt XX
More informationNumerical Methods I Eigenvalue Problems
Numerical Methods I Eigenvalue Problems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 2nd, 2014 A. Donev (Courant Institute) Lecture
More informationLecture 5 Least-squares
EE263 Autumn 2008-09 Stephen Boyd Lecture 5 Least-squares least-squares (approximate) solution of overdetermined equations projection and orthogonality principle least-squares estimation BLUE property
More informationClassification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).
Regression and PCA Classification The goal: map from input X to a label Y. Y has a discrete set of possible values We focused on binary Y (values 0 or 1). But we also discussed larger number of classes
More informationLecture 15. Hypothesis testing in the linear model
14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma
More informationStat 206: Linear algebra
Stat 206: Linear algebra James Johndrow (adapted from Iain Johnstone s notes) 2016-11-02 Vectors We have already been working with vectors, but let s review a few more concepts. The inner product of two
More informationMATH 350: Introduction to Computational Mathematics
MATH 350: Introduction to Computational Mathematics Chapter V: Least Squares Problems Greg Fasshauer Department of Applied Mathematics Illinois Institute of Technology Spring 2011 fasshauer@iit.edu MATH
More informationApplied Numerical Linear Algebra. Lecture 8
Applied Numerical Linear Algebra. Lecture 8 1/ 45 Perturbation Theory for the Least Squares Problem When A is not square, we define its condition number with respect to the 2-norm to be k 2 (A) σ max (A)/σ
More informationModelling Non-linear and Non-stationary Time Series
Modelling Non-linear and Non-stationary Time Series Chapter 2: Non-parametric methods Henrik Madsen Advanced Time Series Analysis September 206 Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September
More informationIf we want to analyze experimental or simulated data we might encounter the following tasks:
Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction
More informationSingular Value Decomposition Compared to cross Product Matrix in an ill Conditioned Regression Model
International Journal of Statistics and Applications 04, 4(): 4-33 DOI: 0.593/j.statistics.04040.07 Singular Value Decomposition Compared to cross Product Matrix in an ill Conditioned Regression Model
More informationESTIMATION THEORY. Chapter Estimation of Random Variables
Chapter ESTIMATION THEORY. Estimation of Random Variables Suppose X,Y,Y 2,...,Y n are random variables defined on the same probability space (Ω, S,P). We consider Y,...,Y n to be the observed random variables
More informationLinear Models 1. Isfahan University of Technology Fall Semester, 2014
Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and
More informationIn the bivariate regression model, the original parameterization is. Y i = β 1 + β 2 X2 + β 2 X2. + β 2 (X 2i X 2 ) + ε i (2)
RNy, econ460 autumn 04 Lecture note Orthogonalization and re-parameterization 5..3 and 7.. in HN Orthogonalization of variables, for example X i and X means that variables that are correlated are made
More informationOutline lecture 2 2(30)
Outline lecture 2 2(3), Lecture 2 Linear Regression it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic Control
More informationAppendix A: The time series behavior of employment growth
Unpublished appendices from The Relationship between Firm Size and Firm Growth in the U.S. Manufacturing Sector Bronwyn H. Hall Journal of Industrial Economics 35 (June 987): 583-606. Appendix A: The time
More informationIntroduction to Estimation Methods for Time Series models. Lecture 1
Introduction to Estimation Methods for Time Series models Lecture 1 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 1 SNS Pisa 1 / 19 Estimation
More informationstatistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:
Wooldridge, Introductory Econometrics, d ed. Chapter 3: Multiple regression analysis: Estimation In multiple regression analysis, we extend the simple (two-variable) regression model to consider the possibility
More informationSGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection
SG 21006 Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection Ioan Tabus Department of Signal Processing Tampere University of Technology Finland 1 / 28
More informationMultivariate Statistics
Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More information