You must continuously work on this project over the course of four weeks.

Size: px

Start display at page:

Download "You must continuously work on this project over the course of four weeks."

Darlene Ramsey
5 years ago
Views:

1 The project Five project topics are described below. You should choose one the projects. Maximum of two people per project is allowed. If two people are working on a topic they are expected to do double the work as compared to one person on a project. You must continuously work on this project over the course of four weeks. Each week on Monday you should post on a wikipage (that I will create) the work you have done on it so far. Everyone in the class can view your work. The wiki link (you need to use VPN to access this page from outside the university) By the end of the semester you are expected to submit a final report on the topic and present a talk on the topic. The literature I give is only a starting point. You are expected to find more papers and or books related to the subject and use them in your report and talk. Provisionally the talks will be presented on the reading data Thursday 6th December (if there are no objections). 1

2 Topic 1: Integer Valued Autoregressive Processes INAR(p) Literature An introduction to Discrete valued time series, Christian Weiss. Here is the online book: (Weiss) You may have to be within the university domain to access it. Review paper by Eddie McKenzie (2000), Discrete Variate Time Series (Mckenzie: Here) Typically we assume that time series is continuous. But in many cases the time series is will be positive and integer valued. If the highest integer value is large this can often be approximated with as a continuous value time series. However, this may not always be hold. The plot in Figure 1 is an example from Weiss (2018). Figure 1: Real data. Several authors including E. McKenzie and C. Weiss have proposed methods for modelling such data. We describe a GLM type approach in Topic 5. But here we discuss an approach which appears to be a close generalisation of the classical autoregressive process. To do this, 2

3 we first define the thinning operator is known as the thinning operator and a Z is defined Z t a Z t = B t,j Bin(a, Z t ) j=1 where {B t,j } Bernoulli random variables with P (B t,j = 1) = a. Clearly 0 a 1 and 0 Z t = 0 and 1 Z t = Z t, thus in some respects a Z t resembles the usual az t. The Integer Valued AR process (INAR) of order 1 INAR(1) is defined as X t = a X t 1 + ε t, where ε t are iid discrete random variable, independent of X t 1 and Thus We observe that X t = X t 1 j=1 B t,j + ε t. E (X t X t 1 ) = ax }{{ t 1 +E[ε } t ], mean of Bin(a,X t 1 ) which resembles the conditional expectation of the regular AR(1) which is E[X t X t 1 ] = ax t 1. Note the marginal distribution of X t will not binomial. The moments and covariances between X t have interesting properties, which are worth investigating. Objectives Properties of the INAR model. Estimation methods of the INAR model. 3

4 Summarize sampling properties of the INAR model. Simulations; are the sampling properties reliable. Extensions of the model. 4

5 Topic 2: Granger Causality and Vector AR processes Literature Graphical interaction models for multivariate time series: Metrika (2000) by Rainer Dahlhaus. Application to Neurosciences (in particular looking at particular wave bands): Here I know that Hernando Ombao also worked on this, I just cannot find his papers! A paper set in high dimensions; High dimension Consider the stationary multivariate time series X t = (X 1,t,..., X d,t ). The notion of partial correlation (defined in Section 4.3 of the class notes) is generalized to the multivariate set-up. Let X a,t and X b,t correspond to two elements in the vector X t. Let Y t, (a,b) = (X i,t ; i a or b) (after removing the elements a and b). The best linear predictor of X a,t and X b,t given {Y τ, (a,b) } τ= is X a,t = X b,t = d a (τ)y t τ, (a,b) τ= τ= d b (τ)y t τ, (a,b). The partial correlation between X a,t and X b,τ after removing this dependence is cov (a,b) (X a,t, X b,τ ) = cov (X a,t X a,t, X b,τ X ) b,τ. The above can be considered as a measure of dependence between X a,t and X b,τ after removing the information from the other components in the time series. If this is zero for all t and τ, then it means there is not dependence (if Gaussian) between the two time series after removing the other time series. However, calculating this can be difficult. However, Dahlhaus reformulates the problem in terms of spectral density matrices (recall for univariate time series these are like the 5

6 eigenvalues we discussed in class). We recall that for the univariate ARMA the spectral density is σ 2 1+θ exp(iω) 2 1 φ exp(iω) 2 = τ= c(τ) exp(iτω). This can be generalized to multivariate time series. Define the spectral matrix Σ(ω) = τ= cov(x 0, X τ ) exp(iτω) ω [0, 2π] The following result holds. Let Σ(ω) (a,b) denote the (a, b)th element of Σ(ω) 1 (a form of precision matrix), then if for all ω [0, 2ω] Σ(ω) (a,b) = 0, then cov (a,b) (X a,t, X b,τ ) = 0 for all t and τ. This result is analogous to the result for partial correlation given in Section Applications Suppose we are modelling X t with a vector AR(p) X t = p Φ j X t j + ε t, j=1 where {ε t } t are iid random vectors with variance σ 2 I d. The non-zero components in Φ j can be interpretated as understanding which time series directly impact other time series. Suppose Φ j,a,b, then it says that X b,t j has a direct impact on X a,t. Therefore, the dependence between X a,t and X b,t j remains even after removing the dependence on other variables. Thus Σ(ω) (a,b) will be non-zero for at least some frequency ω. Conversely, if for all ω Σ(ω) (a,b) = 0 then Φ j (a, b) = Φ j (b, a) = 0 for 1 j p. This means there is only a indirect dependence between the time series. This can be represented in terms of an undirected graph. If Σ(ω) (a,b) 0 and the VAR(p) is the correct model, then it means that for some j either Φ j (a, b) 0 or Φ j (b, a) 0. In other words, one variable has an actual influence on other (not indirectly, through the other variables). 1 To understand why it is true note that Σ(ω k ) is approximately the variance of the DFT of the vector time series J n (ω k ) = n 1/2 n t=1 X t exp(itω k). Under certain conditions J n (ω k ) is approximately (of order O(n 1 )) uncorrelated at different frequencies. Thus if Σ(ω k ) a,b it tells us that J a (ω k ) and J b (ω k ) (ath and bth element of J n (ω k ) are uncorrelated after the linear dependence on the other time series). This together with the near uncorrelatedness of J n (ω k ) at different frequencies gives the result. 6

7 Looking at regions of [0, π] (α, β waves etc discussed in Section 1.4.1) where Σ(ω) (a,b) = 0 is considered in the neuroscience paper described above. Directed graphs are also important. This when we are only interested in how a time series in the past may influence a time series in the future. For example, if X τ,b for τ t has a direct influence on X t,a this direct influence is modelled with a VAR X t = p Φ j X t j + ε t. j=1 But this time there is a direct influence if Φ j (a, b) 0 for some 1 j p (notice the is no parity here we do not look for Φ j (b, a) 0). This is an example of a directed graph. Objectives Understand the above through directed and undirected graphs. How does one test for partial correlation in a time series? Extensions to high dimensions? 7

8 Topic 3: Estimation of burst type signals with application to modelling ECG data Literature Statistical Signal Processing: Kundu and Nandi: Here Book We focus on the burst type model Y t = p A j exp (b j (1 cos(αt + c j ))) cos (θ j t + φ j ) + ε t j=1 Two plots of the burst model with different parameters are given in Figure 2 (no noise was added). signal Time signal Time Figure 2: Plots in the case p = 1 and (a) A = 0.05, b = 3.5, α = 0.132, θ = (b) A = 0.05, b = 3.5, α = 0.25, θ = Observe the only change in the models is the slightly change in the α parameter. In Figure 3 we plot the same burst signal but white and coloured noise added. The parameters of the model can be estimated by minimising the non-linear least squares 8

9 signal1n Time signal1nn Time Figure 3: Plots in the case p = 1 and A = 0.05, b = 3.5, α = (a) with white noise (b) with coloured noise. criterion; L(b, α, c, θ, φ) = ( 2 n p Y t A j exp (b j (1 cos(αt + c j ))) cos (θ j t + φ j )). t=1 j=1 Here are some suggestions in the case the errors are dependent. We assume that the errors follow an AR(1) process, ε t = φε t 1 + η t where {η t } are iid random variables and the model has order p = 1. In other words Y t = A exp (b (1 cos(αt + c))) cos (θt + φ) + ε t and ε t = φε t 1 + η t. To simplify notation, we define the function G(t, θ) = A exp (b (1 cos(αt + c))) cos (θt + φ), where θ = (A, b, α, c, φ), thus Y t = G(t, θ) + ε t Below we describe are two possible approaches for estimating the parameters. 9

10 Approach 1 Since η t = ε t φε t 1, then ε t = Y t G(t, θ) ε t 1 = Y t 1 G(t 1, θ). Therefore the white noise η t can be written as η t = ε t φε t 1 = (Y t G(t, θ)) φ (Y t 1 G(t 1, θ)). (1) Since η t are iid random variables we can simultaneously estimate φ and θ using the least squares criterion L(θ, φ) = n (Y t G(t, θ) φy t 1 + φg(t 1, θ)) 2 t=2 Then find the parameters θ, φ which minimise the above. Approach 2 This is a three stage approach, which may be easier to implement, especially when the order of the AR(p) process is higher than 1. The idea is that we estimate θ using regular least squares. Use this to estimate the residuals ε t, which is used to estimate the autoregressive parameters. Which we then use again to reestimate θ, in the hope one obtains and more efficient estimator (estimator with a smaller variance) the second time round. Step 1 Find θ which minimises L 1 (θ) = n (Y t G(t, θ)) 2. t=1 Let θ 1 = arg min L 1 (θ). Evaluate the residuals ε t = Y t G(t, θ 1 ) and make an ACF plot of the estimated residuals. If there appears to be correlation go on to the next step. 10

11 Step 2 Find the φ which minimises L 2 (φ) = n ( ε t φ ε t 1 ) 2. t=1 Let φ = arg min L 2 (φ). Step 3 We substitute φ into (1) to give L 3 (θ) = n t=2 n t=2 ( Y t G(t, θ) φy t 1 + φg(t 1, θ)) 2. Observe that the only unknown in the above criterion is θ. Now use θ where θ 2 = arg min L 3 (θ). Step 4 Our final estimator of θ is θ 3. Evaluate the residuals η t = ( Y t G(t, θ ) 3 ) φ ( Y t 1 G(t 1, θ ) 3 ) and make an ACF plot of these residuals. 11

12 Objectives: Investigate the parameter estimators using least squares and the methods described above for both white and correlated noise. Run many simulations (say a thousand) for the parameter estimators. Evaluate the average squared error of the estimating and the plott the histogram/qqplot of the centralized estimators. Do the plots look close to normal? What does the periodogram of the model look like. What information does it convey. How does peaks in the periodogram correspond to the different parameters? any information about the period in the ECG? Fit the model to real ECG data. 12

13 Topic 4: Multivariate GARCH Literature A simple review. Here; paper The review paper, Multivariate GARCH models by Silvennoinen and Terasvirta (2009). Let y t be the price of a stock at time t. Let r t = log y t log y t 1. The univariate GARCH(p, q) model models y t as r t = σ t Z t where {Z t } are iid random variables with E[Z t ] = 0 and var(z t ) = 1 and var(r t r t 1, r t 2,...) = σ 2 t = a 0 + p q a i rt i 2 + b j σt j. 2 i=1 j=1 But this is just how one stock is modelled. Suppose we observe the price of d stocks. The log returns r t = (r 1,t,..., r d,t ) can be modelled as r t = Σ t Z t where Z t are iid d-dimension random vector with variance/covariance matrix I d. But how to model the conditional variance matrix Σ t. The simplest model is Σ t = A 0 + p q A i r t i r t i + B j Σ t j i=1 j=1 where crucially all A 0, A i and B j are diagonal matrices. By expanding this out, you then this the is exactly the same as modelling each element of r t as a univarate GARCH(p, q) process and does not allow for dependencies between the different returns. Clearly in many situations this is a highly unrealistic modelling assumption. It does not allow for the influence one return on the other. 13

14 To model all the possible dependencies we need to remember that the conditonal variance Σ t is a non-negative definite matrix and not a vector. So we cannot simply generalize to the vector set-up as was done for the VAR(p) and VARMA(p, q) model. Since Σ t is a d d matrix, which is a variance/covariance matrix so is symmetric it has a maximum of d(d+1)/2 different entries. Thus we rewrite Σ t as a d(d + 1)/2 vector called the vech-transform vech(sigma t ). Thus the most general model is the VEC-GARCH(p, q) model vech(σ t ) = a 0 + p q A i vech(y t y ) + B t j vech(σ t j ). i=1 j=1 This is a very general and flexible model. However, there are two major considerations. The first is that vech(σ t ) should correspond to the entries of a non-negative definite matrix. This means finding conditons on the parameters which will ensure this is true (and then maximising over the appropriate parameters space to ensure that it is true). The second is the large number of parameters in this model. A 0, A i and B i are d(d + 1)/2 d(d + 1)/2 matrices. This means there are (p + q)(d(d + 1)/2) 2 unknown parameters plus those in A 0. Clearly, it is almost infeasible to estimate so many parameters when d is even a few dimensions. The corresponding likelihood is also extremely difficult to maximise. Therefore several simpler models have been proposed. Probably the post popular is the BEKK-GARCH model, which is a subset of VEC-GARCH. Objectives Review several different Multivariate GARCH models. and disadvantages. Discussing their advantages Code the estimators and make an extensive simulation studies. Validate the models by assessing the residuals (see Chapter 7). Fit the models to real data sets and compare the different predictors. 14

15 Topic 5: Categorical time series within a GLM framework Literature The book Weiss (2018), Chapter 7, gives a nice introduction. Here Weiss Book Parameter driven models: Time series analysis of non-gaussian Time series Models based on state space models from both a classical and Bayesian perspective (JRSSB). Durbin and Koopman (2000). Here Observation driven models: Here The review paper: Discrete time series: A GLM perspective covers both parameter driven and observation driven models. Here Here Suppose that Y i is a categorical or discrete random variable with probability distribution ( ) P (Y = y; θ) = exp yθ κ(θ) + c(y). The above distribution describes and a general class of distributions called the exponential family. Thus the log-probability distribution is log P (Y i = y; θ) = yθ κ(θ) + c(y). Famous examples, which belong to the above class are binary, binomial, multinomial (for 15

16 categorical data) and Poisson random variables; Binary: P (Y = 1; p) = p ( ) n Binomial: P (Y n = k; p) = p k (1 p) n k 0 k n k ( ) d n Multinomial: P (Y n = k 1,..., k d ; p 1,..., p d 1 ) = k 1, k 2,..., k k Poisson: P (Y n = k) = exp( λ)λk k! i=1 p i k k d = n Often observed with Y i are explanatory variables x i = (x i,1,..., x i,p ). The GLM-approach to model the dependence of x i on Y i through a linear transform. Let η x i = p j=1 η jx i,j. Often this is done in the following way; Binary: P (Y i = 1; x i ) = exp(η x i ) 1 + exp(η x i ) ( ) n Binomial: P (Y i = k; x i ) = p k (1 p) n k ; p = exp(η x i ) k 1 + exp(η x i ) Multinomial: P (Y i = k 1,..., k d ; x i ); p j = exp(η j x i) p s=1 exp(η s x i) Poisson: P (Y i = k; x i ) = exp( λ)λk ; λ = exp(η x k! i ). Let us return to the case that {Y t } is observed over time and Y t are either categorical or discrete valued random variables. Since {Y t } is observed over time it is highly likely that there is dependence. There are various methods for inducing the dependence. We summarize two of the approaches below. Parameter driven state-space models This can considered as a state-space model for non-gaussian data. We model the dependence through an unobserved latent variable. And by conditioning on the latent variables we obtain 16

17 a classical GLM. For example Binary: P (Y t = 1 α t ) exp(α t) 1 + exp(α t ) Multinomial: P (Y t = k 1,..., k d α t ); p j = exp(β j α t) p s=1 exp(β j α t) Poisson: P (Y t = k α t ) = exp( λ)λk ; log λ = α t. k! We model the dependence of Y t over time through the latent variable using, say, the classical autoregressive process α t = Aα t 1 + ε t where {ε t } t are iid vector random variables. Since α t is dependent over time, this induces the dependence in the time series {Y t }. Observation driven models The observation driven models use a similar idea, but avoid the need of introducing a latent random variable to model the time dependence. We give two such approaches below for numerical discrete time series. Model 1; Poisson: ( ) P (Y t = k Y t 1 ) = exp( λ)λk Yt 1 λ t 1 ; log λ t = β 0 + β 1 k! λ η t 1 for η > 0. Essentially, since Y t 1 is discrete, by defining the residual Y t 1 λ t 1 one is making λ η t 1 a transformation of the data to something closer to Gaussian. This approach is described in this paper. An alternative model is the ARMA type model Poisson: P (Y t = k Y t 1, Y t 2,...) = exp( λ)λk ; log λ t = β 0 + k! p φ j Y t j + j=1 q θ i λ t i. i=1 17

18 This model is described in this paper. Objectives Review the different models that are available, describing the similarities and differences. Describe and implement the different estimation methods. Look for methods which check for correlation in the residuals. 18

Chapter 3 - Temporal processes

STK4150 - Intro 1 Chapter 3 - Temporal processes Odd Kolbjørnsen and Geir Storvik January 23 2017 STK4150 - Intro 2 Temporal processes Data collected over time Past, present, future, change Temporal aspect