Time Series Analysis hm@imm.dtu.dk Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby 1
Outline of the lecture Regression based methods, 1st part: Introduction (Sec. 3.1) The General Linear Model, including OLS-, WLS-, and ML-estimates (Sec. 3.2) Prediction in the General Linear Model (Sec. 3.3) Examples... 2
General form of the regression model Where: Y t = f(x t,t;θ) + ε t Y t is the output we aim to model X t indicates the p independent variables X t = (X 1t,,X pt ) T t is the time index θ indicates m unknown parameters (θ 1,,θ m ) T ε t is a sequence of random variables with mean zero, variance σ t, and Cov[ε ti,ε tj ] = σσ ij We restrict the discussion to the case where X t is non-random and we write x t 3
Y t = θ 1 /(1 + exp( θ 2 x t + θ 3 )) + ε t 10 5 0 5 10 2/(1+exp( 0.5*x)) 2/(1+exp( 0.5*x+2)) 2.0 1.5 1.0 0.5 Ey 2.0 1/(1+exp( x)) 2/(1+exp( x)) 0.0 1.5 1.0 0.5 0.0 10 5 0 5 10 x 4
Least squares estimates Observations: (y 1,x 1 ),(y 2,x 2 ),,(y n,x n ) Ordinary Least Square (unweighted) estimates is found from where S(θ) = θ = arg min θ S(θ) n [y t f(x t ;θ)] 2 = t=1 n ε 2 t(θ) The unweighted method assumes that the errors all have the same variance and are mutually uncorrelated. t=1 5
Variance of error and estimates If the model errors ε t are i.i.d. The variance of the model errors is estimated as: σ 2 = S( θ) n p The variance-covariance matrix of the estimates is [ ] V [ θ] = 2 σ 2 2 1 2 θ S(θ) θ= b θ 6
The General Linear Model Y t = x T t θ + ε t 7
The General Linear Model Y t = x T t θ + ε t Note that the quadratic model Y t = θ 0 + θ 1 z t + θ 2 z 2 t + ε t can be written y t = ( 1 z t z 2 t ) θ 0 θ 1 θ 2 + ε t and hence it is a general linear model. 7
General Linear Models Some examples in the book (Multiple) regression analysis, ex: Y = α + βx + ε Analysis of variance, ex: Y = α i + ε (i indexes the treatment) Analysis of covariance, ex: Y = α i + βx + ε For ANOVA and ANCOVA the treatments must be coded into a number of x-variables. 8
OLS solution Non-linear regression: Numerical optimization is required; see the book for a simple example (Newton-Raphson) For the general linear model a closed-form solution exists. For all observations the model equations are written as: Y 1. Y n = x T 1. x T n θ + i.e. we want to minimize ε T ε ε 1. ε n or Y = xθ + ε The solution is θ = (x T x) 1 x T Y (if x has full rank) σ 2 = ε T ε/(n p) and V [ θ] = σ 2 (x T x) 1 9
Example Data: t y x 1 0.2 0.4 2 1.2 1.2 3 1.9 2.3 4 2.3 3.4 5 1.9 4.3 Model: Y t = θ 0 + θ 1 x t + θ 2 x 2 t + ε t 10
Example Data: t y x 1 0.2 0.4 2 1.2 1.2 3 1.9 2.3 4 2.3 3.4 5 1.9 4.3 Model: Y t = θ 0 + θ 1 x t + θ 2 x 2 t + ε t Y = xθ + ε 10
Example Data: t y x 1 0.2 0.4 2 1.2 1.2 3 1.9 2.3 4 2.3 3.4 5 1.9 4.3 Model: 0.2 1.2 1.9 2.3 1.9 = Y t = θ 0 + θ 1 x t + θ 2 x 2 t + ε t Y = xθ + ε 1 0.4 0.16 1 1.2 1.44 1 2.3 5.29 1 3.4 11.56 1 4.3 18.49 θ 0 θ 1 θ 2 + ε 1 ε 2 ε 3 ε 4 ε 5 10
Example Data: t y x 1 0.2 0.4 2 1.2 1.2 3 1.9 2.3 4 2.3 3.4 5 1.9 4.3 Model: 0.2 1.2 1.9 2.3 1.9 = Y t = θ 0 + θ 1 x t + θ 2 x 2 t + ε t Y = xθ + ε 1 0.4 0.16 1 1.2 1.44 1 2.3 5.29 1 3.4 11.56 1 4.3 18.49 θ 0 θ 1 θ 2 + ε 1 ε 2 ε 3 ε 4 ε 5 θ = (x T x) 1 x T Y = 0.40 1.61 0.25 10
Properties It is a linear function of the observations Y (and function of the observations) Ŷ is a linear It is unbiased, i.e. E[ θ] = θ V [ θ] = E [ ( θ θ)( θ θ) T] = σ 2 (x T x) 1 ˆθ is BLUE (Best Linear Unbiased Estimator), which means that it has the smallest variance among all estimators which are a linear function of the observations. 11
WLS-estimates Equations for all observations: Y = xθ + ε E[ε] = 0 and V [ε] = E[εε T ] = σ 2 Σ, where Σ is known We want to minimize (Y xθ) T Σ 1 (Y xθ) (why?) The solution is (if x T Σ 1 x is invertible) An unbiased estimate of σ 2 is θ = (x T Σ 1 x) 1 x T Σ 1 Y σ 2 = 1 n p (Y x θ) T Σ 1 (Y x θ) 12
Example WLS/OLS H. Madsen & P. Thyregod (1988). Modelling the Time Correlation in Hourly Observations of Direct Radiation in Clear Skies. Energy and Buildings, 11, 201 211. See the examples in the book. 13
ML-estimates We now assume that the observations are Gaussian: Y N n (xθ,σ 2 Σ) Σ is assumed known The ML-estimator is the same as the WLS-estimator: θ = (x T Σ 1 x) 1 x T Σ 1 Y The ML-estimator for σ 2 is σ 2 = 1 n (Y x θ) T Σ 1 (Y x θ) 14
Properties of the ML-estimator It is a linear function of the observations which now implies that it is normally distributed It is unbiased, i.e. E[ θ] = θ and The variance V [ θ] = E[( θ θ)( θ θ) T ] = (x T Σ 1 x) 1 σ 2 It is an efficient estimator 15
Unknown Σ Relaxation algorithm: a) Select a value for Σ (e.g. Σ = I). b) Find the estimates for this value of Σ e.g. by solving the normal equations. c) Consider the residuals { ε t } and calculate the correlation and variance structure of the residuals. Then select a new value for Σ which reflects that correlation and variance structure. d) Stop if convergence - otherwise go to b). See (Goodwin and Payne, 1977) for details. 16
Prediction If the expected value of the squared prediction error is to be minimized, then we must use the expected mean E[Y X = x] as the prediction. 17
Prediction in the general linear model Known parameters: Ŷ t+l = E[Y t+l X t+l = x t+l ] = x T t+l θ V [Y t+l Ŷ t+l ] = V [ε t+l ] = σ 2 Estimated parameters: Ŷ t+l = E[Y t+l X t+l = x t+l ] = x T t+l θ V [Y t+l Ŷt+l] = V [ε t+l ] = σ 2 [1 + x T t+l (xt x) 1 x t+l ] 18
Prediction in the general linear model continued We must use an estimate of σ and therefore a 100(1 α)% prediction interval of a future value is calculated as: Ŷ t+l ± t α/2 (n p)ˆσ 1 + x T t+l (xt x) 1 x t+l 19