Next tool is Partial ACF; mathematical tools first. The Multivariate Normal Distribution Defn: Z R 1 N(0,1) iff f Z (z) = 1 2π e z2 /2 Defn: Z R p MV N p (0, I) if and only if Z = (Z 1,..., Z p ) (a column vector for later use) with the Z i independent and each Z i N(0,1). In this case, f Z (z 1,..., z p ) = 1 2π e z2 i /2 = (2π) p/2 exp{ z z/2}; superscript denotes matrix transpose. Defn: X R p has a multivariate normal distribution if it has the same distribution as AZ+µ for some µ R p, some p q matrix of constants A and Z MV N q (0, I). 63
Lemma: The matrix A can be taken to be square with no loss of generality. Proof: The simplest proof involves multivariate characteristic functions: φ X (t) = E(e it X ) You check that for Z MV N p (0, I) we have Then φ Z (t) = e t t/2 φ AZ+µ (u) = E(e iu (AZ+µ) ) = e iu µ φ Z (A u) = e iu µ u AA u/2 Result depends only on µ and Σ = AA. So: distribution is same for any two A giving same AA. Given A : p q use Cholesky decomposition to find square B with BB = AA. I omit a proof. 64
A singular: X does not have a density. A invertible: derive MVN density by change of variables: We get X = AZ + µ Z = A 1 (X µ) f X (x) = (2π) p/2 det(σ) 1/2 where Σ = AA. exp{ 1 2 (x µ) Σ 1 (x µ)} Properties of the MV N distribution 1. If X MV N(µ,Σ) then MX + b MV N(Mµ + b, MΣM ) follows easily from our definition. 65
2 All margins are multivariate normal: if and Σ = X = µ = [ X1 X 2 [ µ1 µ 2 [ Σ11 Σ 12 Σ 21 Σ 22 then X MV N(µ,Σ) implies that X 1 MV N(µ 1,Σ 11 ). (Special case of the previous property.) 3 All conditionals are normal: the conditional distribution of X 1 given X 2 = x 2 is MV N: E(X 1 X 2 = x 2 ) = µ 1 + Σ 12 Σ 1 22 (x 2 µ 2 ) and Var(X 1 X 2 = x 2 ) = Σ 11 Σ 12 Σ 1 22 Σ 21. Proof: algebra with definition of the conditional density. Note: meaningful even if Σ 22 is singular. 66
Partial Correlations If X is an AR(p) process then X t = p j=1 a j X t j + ǫ t If you hold X t 1,..., X t p fixed then X t is ǫ t plus a fixed constant. Since ǫ t is independent of X s for all s < t we see that given X t 1,..., X t p the variables X t and X t r for r > p are independent. This would imply for all q > p. Cov(X t, X t q X t 1,..., X t p ) = 0 Gaussian X: results above on conditional distributions guarantee that we can calculate this partial autocovariance from the variance covariance matrix of (X t, X t q, X t 1,..., X t p ). 67
Partition this vector into a first piece with 2 entries and a second piece with p entries. Conditional covariance of (X t, X t q ) given the others is Σ 11 Σ 12 Σ 1 22 Σ 21 In this formula Σ 11 is the 2 by 2 unconditional covariance matrix of (X t, X t q ), namely, [ C(0) C(q) C(q) C(0) Matrix Σ 22 : p p Toeplitz matrix with C(0) down diagonal, C(1) down first sub and super diagonal, and so on. The matrix Σ 12 = Σ 21 [ is the 2 p matrix C(1) C(p) C(q 1) C(q p) 68
Calculate resulting 2 2 matrix; pick out off diagonal element; this is the partial autocovariance. Even for non-gaussian data use same arithmetic with autocovariance to define partial autocovariance. Define partial autocorrelation function: PACF(h) = Corr(X t, X t h X t 1,..., X t h+1 ) If h = 1 then there are no Xs between X t and X t h so there is nothing to condition on. This makes PACF(1) = ACF(1). Qualitative idea: plot sample partial autocorrelation function (replace C X by Ĉ X in defn of PACF). For an AR(p) process this sample plot ought to drop suddenly to 0 for h > p. 69
AR(1) example Calculate PACF(2) for mean 0 AR(1) process. Have C X (h) = C X (0)ρ h. Let [ Y Z = X t X t 2 X t 1 with Y = (X t, X t 2 ) and Z = X t 1. Then (Y, Z) has a MV N(0,Σ) distribution with Σ = C X (0) 1 ρ 2 ρ 1 ρ ρ ρ 1 Following the partitioning into Y and Z we find [ 1 ρ 2 Σ 11 = Σ = C X (0) ρ 2 1 and ρ 2 Σ 12 = C X (0) [ ρ ρ Σ 22 = C X (0)[1 70
Get Cov(X t,x t 2 X t 1 ) = Σ 11 Σ 12 Σ 1 22 Σ 21 [ [ 1 ρ 2 ρ = C X (0) ρ 2 C 1 X (0) 2 ρ 2 ρ 2 ρ 2 [ 1 ρ 2 0 = C X (0) 0 1 ρ 2 Now to compute the correlation we divide the conditional covariance by the product of the two conditional SDs, that is: PACF(2) = = = 0 Cov(X t, X t 2 X t 1 ) Var(X t X t 1 )Var(X t 2 X t 1 ) 0 (1 ρ 2 )(1 ρ 2 ) 71
Problem: predict X t knowing X t 1,..., X t K. Choose predictor ˆX t = f(x t 1,..., X t K ) to minimize mean squared prediction error: Solution: take E[(X t ˆX t ) 2 ˆX t = E(X t X t 1,..., X t K ) For multivariate normal data this predictor is the usual regression solution: E(X t X t 1,..., X t K ) = µ + A X t 1 µ. X t k µ where the matrix A is Σ 12 Σ 1 22 in the notation of the earlier notes. This can be written in the form ˆX t = µ + k 1 α i (X t i µ) for suitable constants α. Let Y t = X t µ. 72
The α i minimize E[(Y t α i Y t i ) 2 = E(Y 2 t ) 2 α i E(Y t Y t i ) + ij α i α j E(Y t i Y t j ) = C(0) 2 α i C(i) + ij α i α j C(i j) Take the derivative with respect to α m and set the resulting k quantities equal to 0 giving C(m) = j α j C(m j) or, dividing by C(0) ρ(m) = j α j ρ(m j) (These are the Yule Walker equations again!) It is a fact that the solution α m is Corr(X t, X t m X t 1,..., X t k but not X t m ) so that α k (the last α) is PACF(k). 73
Remark: This makes it look as if you would compute the first 40 values of the PACF by solving 40 different problems and picking out the last α in each one but in fact there is an explicit recursive algorithm to compute these things. Estimation of PACF To estimate the PACF you can either estimate the ACF and do the arithmetic above with estimates instead of theoretical values or minimize a sample version of the mean squared prediction error: 74
Mean Squared Prediction Error is E [(X t µ k j=1 α j (X t j µ) 2 Estimate µ by X; replace expected value with average over the data. So: minimize T 1 t=k [(X t X k j=1 α j (X t j X) 2 /T Least squares problem regressing vector Y = on design matrix Z = X k X. X T 1 X X k 1 X.. X T 2 X X 0 X. X T k 1 X The estimate of PACF(k) is the kth entry in ˆα = (Z Z) 1 Z Y. 75