arxiv: v2 [stat.ot] 9 Apr 2014

Size: px

Start display at page:

Download "arxiv: v2 [stat.ot] 9 Apr 2014"

Kristina Chambers
5 years ago
Views:

1 Spectral Correlation Hub Screening of Multivariate Time Series Hamed Firouzi, Dennis Wei and Alfred O. Hero III arxiv:43.337v2 [stat.ot] 9 Apr 24 Electrical Engineering and Computer Science Department, University of Michigan, USA firouzi@umich.edu, dlwei@eecs.umich.edu, hero@eecs.umich.edu Summary. This chapter discusses correlation analysis of stationary multivariate Gaussian time series in the spectral or Fourier domain. The goal is to identify the hub time series, i.e., those that are highly correlated with a specified number of other time series. We show that Fourier components of the time series at different frequencies are asymptotically statistically independent. This property permits independent correlation analysis at each frequency, alleviating the computational and statistical challenges of high-dimensional time series. To detect correlation hubs at each frequency, an existing correlation screening method is extended to the complex numbers to accommodate complex-valued Fourier components. We characterize the number of hub discoveries at specified correlation and degree thresholds in the regime of increasing dimension and fixed sample size. The theory specifies appropriate thresholds to apply to sample correlation matrices to detect hubs and also allows statistical significance to be attributed to hub discoveries. Numerical results illustrate the accuracy of the theory and the usefulness of the proposed spectral framework. Key words: Complex-valued correlation screening, Spectral correlation analysis, Gaussian stationary processes, Hub screening, Correlation graph, Correlation network, Spatio-temporal analysis of multivariate time series, High dimensional data analysis Introduction Correlation analysis of multivariate time series is important in many applications such as wireless sensor networks, computer networks, neuroimaging, and finance [, 2, 3, 4, 5]. This chapter focuses on the problem of detecting hub time series, ones that have a high degree of interaction with other time series as measured by correlation or partial correlation. Detection of hubs can lead to reduced computational and/or sampling costs. For example in wireless sensor networks, the identification of hub nodes can be useful for reducing power usage and adding or removing sensors from the network [6, 7]. Hub detection can also give new insights about underlying structure in the dataset.

2 2 Hamed Firouzi, Dennis Wei and Alfred O. Hero III In neuroimaging for instance, studies have consistently shown the existence of highly connected hubs in brain graphs connectomes) [8]. In finance, a hub might indicate a vulnerable financial instrument or a sector whose collapse could have a major effect on the market [9]. Correlation analysis becomes challenging for multivariate time series when the dimension p of the time series, i.e. the number of scalar time series, and the number of time samples N are large [4]. A naive approach is to treat the time series as a set of independent samples of a p-dimensional random vector and estimate the associated covariance or correlation matrix, but this approach completely ignores temporal correlations as it only considers dependences at the same time instant and not between different time instants. The work in [] accounts for temporal correlations by quantifying their effect on convergence rates in covariance and precision matrix estimation; however, only correlations at the same time instant are estimated. A more general approach is to consider all correlations between any two time instants of any two series within a window of n N consecutive samples, where the previous case corresponds to n =. However, in general this would entail the estimation of an np np correlation matrix from a reduced sample of size m = N/n, which can be computationally costly as well as statistically problematic. In this chapter, we propose spectral correlation analysis as a method of overcoming the issues discussed above. As before, the time series are divided into m temporal segments of n consecutive samples, but instead of estimating temporal correlations directly, the method performs analysis on the Discrete Fourier Transforms DFT) of the time series. We prove in Theorem that for stationary, jointly Gaussian time series under the mild condition of absolute summability of the auto- and cross-correlation functions, different Fourier components frequencies) become asymptotically independent of each other as the DFT length n increases. This property of stationary Gaussian processes allows us to focus on the p p correlations at each frequency separately without having to consider correlations between different frequencies. Moreover, spectral analysis isolates correlations at specific frequencies or timescales, potentially leading to greater insight. To make aggregate inferences based on all frequencies, straightforward procedures for multiple inference can be used as described in Section 4. The spectral approach reduces the detection of hub time series to the independent detection of hubs at each frequency. However, in exchange for achieving spectral resolution, the sample size is reduced by the factor n, from N to m = N/n. To confidently detect hubs in this high-dimensional, lowsample regime large p, small m), as well as to accommodate complex-valued DFTs, we develop a method that we call complex-valued partial) correlation screening. This is a generalization of the correlation and partial correlation screening method of [, 9, 2] to complex-valued random variables. For each frequency, the method computes the sample partial) correlation matrix of the DFT components of the p time series. Highly correlated variables hubs) are then identified by thresholding the sample correlation matrix at a level

3 Spectral Correlation Hub Screening of Multivariate Time Series 3 ρ and screening for rows or columns) with a specified number δ of non-zero entries. We characterize the behavior of complex-valued correlation screening in the high-dimensional regime of large p and fixed sample size m. Specifically, Theorem 2 and Corollary 2 give asymptotic expressions in the limit p for the mean number of hubs detected at thresholds ρ, δ and the probability of discovering at least one such hub. Bounds on the rates of convergence are also provided. These results show that the number of hub discoveries undergoes a phase transition as ρ decreases from, from almost no discoveries to the maximum number, p. An expression 33) for the critical threshold ρ c,δ is derived to guide the selection of ρ under different settings of p, m, and δ. Furthermore, given a null hypothesis that the population correlation matrix is sufficiently sparse, the expressions in Corollary 2 become independent of the underlying probability distribution and can thus be easily evaluated. This allows the statistical significance of a hub discovery to be quantified, specifically in the form of a p-value under the null hypothesis. We note that our results on complex-valued correlation screening apply more generally than to spectral correlation analysis and thus may be of independent interest. The remainder of the chapter is organized as follows. Section 2 presents notation and definitions for multivariate time series and establishes the asymptotic independence of spectral components. Section 3 describes complexvalued correlation screening and characterizes its properties in terms of numbers of hub discoveries and phase transitions. Section 4 discusses the application of complex-valued correlation screening to the spectra of multivariate time series. Finally, Sec. 5 illustrates the applicability of the proposed framework through simulation analysis.. Notation A triplet Ω, F, P) represents a probability space with sample space Ω, σ- algebra of events F, and probability measure P. For an event A F, PA) represents the probability of A. Scalar random variables and their realizations are denoted with upper case and lower case letters, respectively. Random vectors and their realizations are denoted with bold upper case and bold lower case letters. The expectation operator is denoted as E. For a random variable X, the cumulative probability distribution cdf) of X is defined as F X x) = PX x). For an absolutely continuous cdf F X.) the probability density function pdf) is defined as f X x) = df X x)/dx. The cdf and pdf are defined similarly for random vectors. Moreover, we follow the definitions in [3] for conditional probabilities, conditional expectations and conditional densities. For a complex number z = a + b C, Rz) = a and Iz) = b represent the real and imaginary parts of z, respectively. A complex-valued random variable is composed of two real-valued random variables as its real and imaginary parts. A complex-valued Gaussian variable has real and imaginary parts

4 4 Hamed Firouzi, Dennis Wei and Alfred O. Hero III that are Gaussian. A complex-valued Gaussian) random vector is a vector whose entries are complex-valued Gaussian) random variables. The covariance of a p-dimensional complex-valued random vector Y and a q-dimensional complex-valued random vector Z is a p q matrix defined as covy, Z) = E [ Y E[Y])Z E[Z]) H], where H denotes the Hermitian transpose. We write covy) for covy, Y) and vary ) = covy, Y ) for the variance of a scalar random variable Y. The correlation coefficient between random variables Y and Z is defined as cory, Z) = covy, Z) vary )varz). Matrices are also denoted by bold upper case letters. In most cases the distinction between matrices and random vectors will be clear from the context. For a matrix A we represent the i, j)th entry of A by a ij. Also D A represents the diagonal matrix that is obtained by zeroing out all but the diagonal entries of A. 2 Spectral Representation of Multivariate Time Series 2. Definitions Let Xk) = [X ) k), X 2) k), X p) k)], k Z, be a multivariate time series with time index k. We assume that the time series X ), X 2), X p) are second-order stationary random processes, i.e.: and E[X i) k)] = E[X i) k + )] ) cov[x i) k), X j) l)] = cov[x i) k + ), X j) l + )] 2) for any integer time shift. For i p, let X i) = [X i) k),, X i) k + n )] denote any vector of n consecutive samples of time series X i). The n-point Discrete Fourier Transform DFT) of X i) is denoted by Y i) = [Y i) ),, Y i) n )] and defined by Y i) = WX i), i p in which W is the DFT matrix: W = ω ω. n....., ω ω )2

5 Spectral Correlation Hub Screening of Multivariate Time Series 5 where ω = e 2π /n. We denote the n n population covariance matrix of X i) as C i,i) = [c i,i) kl ] k,l n and the n n population cross covariance matrix between X i) and X j) as C i,j) = [c i,j) kl ] k,l n for i j. The translation invariance properties ) and 2) imply that C i,i) and C i,j) are Toeplitz matrices. Therefore c i,i) kl and c i,j) kl depend on k and l only through the quantity k l. Representing the k, l)th entry of a Toeplitz matrix T by tk l), we write c i,i) kl = c i,i) k l) and c i,j) kl = c i,j) k l), where k l takes values from n to n. In addition, C i,i) is symmetric. 2.2 Asymptotic Independence of Spectral Components The following theorem states that for stationary time series, DFT components at different spectral indices i.e. frequencies) are asymptotically uncorrelated under the condition that the auto-covariance and cross-covariance functions are absolutely summable. This theorem follows directly from the spectral theory of large Toeplitz matrices, see, for example, [4] and [5]. However, for the benefit of the reader we give a self contained proof of the theorem. Theorem Assume lim n t= ci,j) t) = M i,j) < for all i, j p. Define err i,j) n) = M i,j) m = ci,j) m ) and avg i,j) n) = n m = erri,j) m ). Then for k l, we have: ) cor Y i) k), Y j) l) = Omax{/n, avg i,j) n)}). In other words Y i) k) and Y j) l) are asymptotically uncorrelated as n. Proof. Without loss of generality we assume that the time series have zero mean i.e. E[X i) k)] =, i p, k n ). We first establish a representation of E[Z i) k)z j) l) ] for general linear functionals: Z i) k) = m = g k m )X i) m ), in which g k.) is an arbitrary complex sequence for k n. We have: E[Z i) k)z j) l) ] [ ) ) ] = E g k m )X i) m ) g l n )X j) n ) = = m = m = m = g k m ) g k m ) n = n = n = g l n ) E[X i) m )X j) n ) ] g l n ) c i,j) m n 3)

6 6 Hamed Firouzi, Dennis Wei and Alfred O. Hero III Now for a Toeplitz matrix T, define the circulant matrix D T as: t) t ) + tn ) t n) + t) t) + t n) t) t2 n) + t2) D T = tn 2) + t 2) tn 3) + t 3) t ) + tn ) tn ) + t ) tn 2) + t 2) t) We can write: C i,j) = D C i,j) + E i,j) for some Toeplitz matrix E i,j). Thus c i,j) m n ) = d i,j) m n ) + e i,j) m n ) where d i,j) m n ) and e i,j) m n ) are the m, n ) entries of D C i,j) and E i,j), respectively. Therefore, 3) can be written as: m = g k m ) n = g l n ) d i,j) m n ) + The first term can be written as: m = g k m ) gl d i,j)) m ) = m = n = m = g k m )g l n ) e i,j) m n ) g k m )v i,j) l m ) where we have recognized v i,j) l m ) = gl di,j) as the circular convolution of gl.) and di,j).) [6]. Let G k.) and D i,j).) be the the DFT of g k.) and d i,j).), respectively. By Plancherel s theorem [7] we have: m = g k m )v i,j) l m ) = = = m = m = m = g k m ) v i,j) l m ) ) G k m ) G l m )D i,j) m ) ) G k m )G l m ) D i,j) m ). 4) Now let g k m ) = ω km / n for k, m n. For this choice of g k.) we have G k m ) = for all m n k and G k n k) =. Hence for k l the quantity 4) becomes. Therefore using the representation E i,j) = C i,j) D C i,j) we have:

7 Spectral Correlation Hub Screening of Multivariate Time Series 7 ) cov Y i) k), Y j) l) = E[Y i) k)y j) l) ] = n = 2 n m = n = m = n = m = g k m )g l n ) e i,j) m n ) e i,j) m n ) m c i,j) m ), 5) in which the last equation is due to the fact that c i,j) m ) = c i,j) m ). Now using 4) and 5) we obtain expressions for var Y i) k) ) and var Y j) l) ). Letting j = i and l = k in 4) and 5) gives: = var m = ) Y i) k) = cov ) Y i) k), Y i) k) G k m )G k m ) D i,i) m ) + = n.. D i,i) k) + n n = D i,i) k) + m = n = m = n = m = n = g k m )g k n ) e i,i) m n ) g k m )g k n ) e i,i) m n ) g k m )g k n ) e i,i) m n ), 6) in which the magnitude of the summation term is bounded as: Similarly: in which n = 2 n m = n = m = n = m = ) var Y j) l) = D j,j) l) + g k m )g k n ) e i,i) m n ) e i,i) m n ) m c i,i) m ). 7) m = n = g l m )g l n ) e j,j) m n ), 8)

8 8 Hamed Firouzi, Dennis Wei and Alfred O. Hero III 2 n m = n = m = g l m )g l n ) e j,j) m n ) m c j,j) m ). 9) To complete the proof the following lemma is needed. Lemma If {a m } m = is a sequence of non-negative numbers such that m = a m = M <. Define errn) = M m = a m and avgn) = n m = errm ). Then n m = m a m M/n + errn) + avgn). Proof. Let S = and for n define S n = m = a m. We have: Therefore: m = n ma m = n )S n S + S S ). m = m a m = n n S n m = S m. Since M M/n errn) n S M and M avgn) n M, using the triangle inequality the result follows. m = S m Now let a m = c i,j) m ). By assumption lim n m = a m = M i,j) <. Therefore, Lemma along with 5) concludes: ) cov Y i) k), Y j) l) = Omax{/n, err i,j) n), avg i,j) n)}). ) err i,j) n) is a decreasing decreasing function of n. Therefore avg i,j) n) err i,j) n), for n. Hence: ) cov Y i) k), Y j) l) = Omax{/n, avg i,j) n)}). Similarly using Lemma along with 6), 7), 8) and 9) we obtain: ) var Y i) k) D i,i) k) = Omax{/n, avg i,i) n)}), ) and ) var Y j) l) D j,j) l) = Omax{/n, avg j,j) n)}). 2) Using the definition

9 Spectral Correlation Hub Screening of Multivariate Time Series 9 ) cor Y i) k), Y j) cov Y i) k), Y j) l) ) l) = var Y i) k) ) var Y j) l) ), and the fact that as n, D i,i) k) and D j,j) l) converge to constants C i,i) k) and C j,j) l), respectively, equations ), ) and 2) conclude: ) cor Y i) k), Y j) l) = Omax{/n, avg i,j) n)}). As an example we apply Theorem to a scalar auto-regressive AR) process Xk) specified by Xk) = L ϕ l Xk l) + εk), l= in which ϕ l are real-valued coefficients and ε.) is a stationary process with no temporal correlation. The auto-covariance function of an AR process can be written as [8]: ct) = L l= α l r t l, in which r,..., r l are the roots of the polynomial βx) = x L L l= ϕ lx L l. It is known that for a stationary AR process, r l < for all l L [8]. Therefore, using the definition of err.) we have: L errn) = ct) = α l rl t = t=n t=n l= L r l n α l r l Cζn, l= L α l r l t in which C = L l= α l / r l ) and ζ = max l L r l <. Hence: avgn) = n m = Therefore, Theorem concludes: errm ) n m = l= Cζ m cor Y k), Y l)) = O/n), k l, t=n C n ζ). where Y.) represents the n-point DFT of the AR process X.). In the sequel, we assume that the time series X is multivariate Gaussian, i.e., X ),..., X p) are jointly Gaussian processes. It follows that the DFT components Y i) k) are jointly complex) Gaussian as linear functionals of X. Theorem then immediately implies asymptotic independence of DFT components through a well-known property of jointly Gaussian random variables.

10 Hamed Firouzi, Dennis Wei and Alfred O. Hero III Corollary Assume that the time series X is multivariate Gaussian. Under the absolute summability conditions in Theorem, the DFT components Y i) k) and Y j) l) are asymptotically independent for k l and n. Corollary implies that for large n, correlation analysis of the time series X can be done independently on each frequency in the spectral domain. This reduces the problem of screening for hub time series to screening for hub variables among the p DFT components at a given frequency. A procedure for the latter problem and a corresponding theory are described next. 3 Complex-Valued Correlation Hub Screening This section discusses complex-valued correlation hub screening, a generalization of real-valued correlation screening in [, 9], for identifying highly correlated components of a complex-valued random vector from its sample values. The method is applied to multivariate time series in Section 4 to discover correlation hubs among the spectral components at each frequency. Sections 3. and 3.2 describe the underlying statistical model and the screening procedure. Sections 3.3 and 3.4 provide background on the U-score representation of correlation matrices and associated definitions and properties. Section 3.5 contains the main theoretical result characterizing the number of hub discoveries in the high-dimensional regime, while Section 3.6 elaborates on the phenomenon of phase transitions in the number of discoveries. 3. Statistical Model We use the generic notation Z = [Z, Z 2,, Z p ] T in this section to refer to a complex-valued random vector. The mean of Z is denoted as µ and its p p non-singular covariance matrix is denoted as Σ. We assume that the vector Z follows a complex elliptically contoured distribution with pdf f Z z) = g z µ) H Σ z µ) ), in which g : R R > is an integrable and strictly decreasing function [9]. This assumption generalizes the Gaussian assumption made in Section 2 as the Gaussian distribution is one example of an elliptically contoured distribution. In correlation hub screening, the quantities of interest are the correlation matrix and partial correlation matrix associated with Z. These are defined as Γ = D 2 Σ ΣD 2 Σ and Ω = D 2 Σ Σ D 2 Σ, respectively. Note that Γ and Ω are normalized matrices with unit diagonals. 3.2 Screening Procedure The goal of correlation hub screening is to identify highly correlated components of the random vector Z from its sample realizations. Assume that m samples z,..., z m R p of Z are available. To simplify the development of the

11 Spectral Correlation Hub Screening of Multivariate Time Series theory, the samples are assumed to be independent and identically distributed i.i.d.) although the theory also applies to dependent samples. We compute sample correlation and partial correlation matrices from the samples z,..., z m as surrogates for the unknown population correlation matrices Γ and Ω in Section 3.. First define the p p sample covariance matrix S as S = m m i= z i z)z i z) H, where z is the sample mean, the average of z,..., z m. The sample correlation and sample partial correlation matrices are then defined as R = D 2 S SD 2 S and P = D 2 R R D 2 R, respectively, where R is the Moore-Penrose pseudo-inverse of R. Correlation hubs are screened by applying thresholds to the sample partial) correlation matrix. A variable Z i is declared a hub screening discovery at degree level δ {, 2,...} and threshold level ρ [, ] if {j : j i, ψ ij ρ} δ, where Ψ = R for correlation screening and Ψ = P for partial correlation screening. We denote by N δ,ρ {,..., p} the total number of hub screening discoveries at levels δ, ρ. Correlation hub screening can also be interpreted in terms of the partial) correlation graph G ρ Ψ), depicted in Fig. and defined as follows. The vertices of G ρ Ψ) are v,, v p which correspond to Z,, Z p, respectively. For i, j p, v i and v j are connected by an edge in G ρ Ψ) if the magnitude of the sample partial) correlation coefficient between Z i and Z j is at least ρ. A vertex of G ρ Ψ) is called a δ-hub if its degree, the number of incident edges, is at least δ. Then the number of discoveries N δ,ρ defined earlier is the number of δ-hubs in the graph G ρ Ψ). v v 2 v p v 3 v j Fig.. Complex-valued partial) correlation hub screening thresholds the sample correlation or partial correlation matrix, denoted generically by the matrix Ψ, to find variables Z i that are highly correlated with other variables. This is equivalent to finding hubs in a graph G ρψ) with p vertices v,, v p. For i, j p, v i is connected to v j in G ρψ) if ψ ij ρ. v i

12 2 Hamed Firouzi, Dennis Wei and Alfred O. Hero III 3.3 U-score Representation of Correlation Matrices Our theory for complex-valued correlation screening is based on the U-score representation of the sample correlation and partial correlation matrices. Similarly to the real case [9], it can be shown that there exists an m ) p complex-valued matrix U R with unit-norm columns u i) R Cm such that the following representation holds: R = U H RU R. 3) Similar to Lemma in [9] it is straightforward to show that: R = U H RU R U H R) 2 U R. Hence by defining U P = U R U H R ) U R D 2 U H R U we have the representation: RU H R ) 2 U R P = U H P U P, 4) where the m ) p matrix U P has unit-norm columns u i) P Cm. 3.4 Properties of U-scores The U-score factorizations in 3) and 4) show that sample partial) correlation matrices can be represented in terms of unit vectors in C m. This subsection presents definitions and properties related to U-scores that will be used in Section 3.5. We denote the unit spheres in R m and C m as S m and T m, respectively. The surface areas of S m and T m are denoted as a m and b m respectively. Define the interleaving function h : R 2m 2 C m as below: h[x, x 2,, x 2m 2 ] T ) = [x + x 2, x3 + x 4,, x2m 3 + x 2m 2 ] T. Note that h.) is a one-to-one and onto function and it maps S 2m 2 to T m. For a fixed vector u T m and a threshold ρ define the spherical cap in T m : A ρ u) = {y : y T m, y H u ρ}. Also define P as the probability that a random point Y that is uniformly distributed on T m falls into A ρ u). Below we give a simple expression for P as a function of ρ and m. Lemma 2 Let Y be an m )-dimensional complex-valued random vector that is uniformly distributed over T m. We have P = P Y A ρ u)) = ρ 2 ) m 2.

13 Spectral Correlation Hub Screening of Multivariate Time Series 3 Proof. Without loss of generality we assume u = [,,, ] T. We have: P = P Y ρ) = PRY ) 2 + IY ) 2 ρ 2 ). Since Y is uniform on T m, we can write Y = X/ X 2, in which X is complex-valued random vector whose entries are i.i.d. complex-valued Gaussian variables with mean and variance. Thus: P = P RX ) 2 + IX 2 ) ) / X 2 2 ρ 2) = P ρ 2 ) ) RX ) 2 + IX ) 2) m ρ 2 RX k ) 2 + IX k ) 2. Define V = RX ) 2 + IX ) 2 and V 2 = m k=2 RX k) 2 + IX k ) 2. V and V 2 are independent and have chi-squared distributions with 2 and 2m 2) degrees of freedom, respectively [2]. Therefore, P = = = = = ρ 2 v 2/ ρ 2 ) χ 2 2m 2) v 2) k=2 χ 2 2v )χ 2 2m 2) v 2)dv dv 2 ρ 2 v 2/ ρ 2 ) 2 m 2 Γ m 2) vm 3 Γ m 2) ρ2 ) m 2 2 e v/2 dv dv 2 2 e v2/2 e ρ 2 2 ρ 2 ) v2 dv 2 x m 3 e x dx Γ m 2) ρ2 ) m 2 Γ m 2) = ρ 2 ) m 2, in which we have made a change of variable x = v2 2 ρ 2 ). Under the assumption that the joint pdf of Z exists, the p columns of the U-score matrix have joint pdf f U,...,U p u,..., u p ) on T p m = p i= T m. The following δ + )-fold average of the joint pdf will play a significant role in Section 3.5. This δ + )-fold average is defined as: f U,...,U δ+ u,..., u δ+ ) = 2π) δ+ p ) p i < <i δ p,i δ+ / {i,,i δ } 2π 2π δ 2π f Ui,...,U iδ,u iδ+ e θ u,..., e θ δ u δ, e θ u δ+ ) dθ dθ δ dθ. Also for a joint pdf f U,...,U δ+ u,..., u δ+ ) on Tm δ+ define Jf U,...,U δ+ ) = a δ 2m 2 f U,...,U δ+ hu),..., hu))du. S 2m 2

14 4 Hamed Firouzi, Dennis Wei and Alfred O. Hero III Note that Jf U,...,U δ+ ) is proportional to the integral of f U,...,U δ+ over the manifold u =... = u δ+. The quantity Jf U,...,U δ+) ) is key in determining the asymptotic average number of hubs in a complex-valued correlation network. This will be described in more detail in Sec Let i = i, i,..., i δ ) be a set of distinct indices, i.e., i p, i <... < i δ p and i,..., i δ i. For a U-score matrix U define the dependency coefficient between the columns U i = {U i, U i,..., U iδ } and their complementary k-nn k-nearest neighbor) set A k i) defined in 29) and Fig. 2 as p,m,k,δ i) = f Ui U f Ak i) U i )/f Ui, where denotes the supremum norm. The average of these coefficients is defined as: p,m,k,δ = p ) p δ p i = i,...,i δ i i <...<i δ p p,m,k,δ i). 5) 3.5 Number of Hub Discoveries in the High-Dimensional Limit We now present the main theoretical result on complex-valued correlation screening. The following theorem gives asymptotic expressions for the mean number of δ-hubs and the probability of discovery of at least one δ-hub in the graph G ρ Ψ). It also gives bounds on the rates of convergence to these approximations as the dimension p increases and ρ. We use U = [U,, U p ] as a generic notation for the U-score representation of the sample partial) correlation matrix. The asymptotic expression for the mean E[N δ,ρ ] is denoted by Λ and is given by: ) p Λ = p P δ Jf U,...,U δ δ+) ). 6) Define η p,δ as: η p,δ = p /δ p )P = p /δ p ) ρ 2 ) m 2), 7) where the last equation is due to Lemma 2. The parameter k below represents an upper bound on the true hub degree, i.e. the number of non-zero entries in any row of the population covariance matrix Σ. Also let ϕδ) be the function that takes values ϕδ) = 2 for δ = and ϕδ) = for δ >. Theorem 2 Let U = [U,..., U p ] be a m ) p random matrix with U i T m where m > 2. Let δ be a fixed integer. Assume the joint pdf of any subset of the U i s is bounded and differentiable. Then, with Λ defined in 6),

15 Spectral Correlation Hub Screening of Multivariate Time Series 5 { E[N δ,ρ ] Λ O ηp,δ δ max η p,δ p /δ, ρ) /2}). 8) Furthermore, let Nδ,ρ be a Poisson distributed random variable with rate E[Nδ,ρ ] = Λ/ϕδ). If p )P, then PN δ,ρ > ) PNδ,ρ > ) }) ηp,δ {η δ max p,δ δ k/p)δ+, Q p,k,δ, p,m,k,δ, p /δ, ρ) /2 O, δ > }) O η p, max {η p, k/p) 2, p,m,k,, p, ρ) /2, δ =, with Q p,k,δ = η p,δ k/p /δ ) δ+ and p,m,k,δ defined in 5). Proof. The proof is similar to the proof of proposition in [9]. First we prove 8). Let φ i = Id i δ) be the indicator of the event that d i δ, in which d i represents the degree of the vertex v i in the graph G ρ Ψ). We have N δ,ρ = p i= φ i. With φ ij being the indicator of the presence of an edge in G ρ Ψ) between vertices v i and v j we have the relation: φ i = p l l=δ k C ip,l) j= φ ikj p q=l+ φ ikq ) 2) where we have defined the index vector k = k,..., k p ) and the set C i p, l) = {k : k <... < k l, k l+ <... < k p k j {,..., p} {i}, k j k j }. The inner summation in 2) simply sums over the set of distinct indices not equal to i that index all ) p l different types of products of the form: l j= φ p ik j q=l+ φ ik q ). Subtracting δ k C ip,δ) j= φ ik j from both sides of 2) φ i = δ k C ip,δ) j= p φ ikj l l=δ+ k C ip,l) j= + k C ip,l) q=δ+ p φ ikj ) q δ p q=l+ k δ+ <...<k q,{k δ+,...,k q } {k δ+,...,k p } j= φ ikq ) l φ ikj q s=δ+ 9) φ ik s 2)

16 6 Hamed Firouzi, Dennis Wei and Alfred O. Hero III in which we have used the expansion p q=δ+ φ ikq ) = + p q=δ+ ) q δ k δ+ <...<k q,{k δ+,...,k q } {k δ+,...,k p } s=δ+ The following simple asymptotic representation will be useful in the sequel. For any i,..., i k {,..., p}, i i k i, k {,..., p }, k E φ iij = h A ρv)) h A ρv)) j= S 2m 2 f Ui,...,U ik,u i hv ),, hv k ), hv)) dv dv k dv P k a k 2m 2M k 22) where P, A ρ u) and the function h.) are defined in Sec Moreover M k = fui max,...,u ik U ik+. i =i k+ The following simple generalization of 22) to arbitrary product indices φ ij will also be needed [ q ] E φ il j l P q aq 2m 2 M Q, 23) l= where Q =unique{i l, j l } q l= ) is the set of unique indices among the distinct pairs {i l, j l )} q l= and M Q is a bound on the joint pdf of U Q. Define the random variable ) p δ θ i = φ ikj. δ k C ip,δ) j= We show below that for sufficiently large p ) p E[φ i] E[θ i ] δ γ p,δp )P ) δ+, 24) where γ p,δ = max δ+ l<p {a l 2m 2M l } e δ l= l!) ) + δ!) and M l is a least upper bound on any l-dimensional joint pdf of the variables {U i } p j i conditioned on U i. To show inequality 24) take expectations of 2) and apply the bound 22) to obtain ) p E[φ i] E[θ i ] δ p ) ) p δ p p ) p δ P l l a l 2m 2M l + P δ+l a δ+l 2m 2 δ l M δ+l l=δ+ A + δ!) ), 25) l= q φ ik s.

17 where Spectral Correlation Hub Screening of Multivariate Time Series 7 A = p l=δ+ ) p p )P ) l a l l 2m 2M l. The line 25) follows from the identity ) p δ p ) l δ = p ) l+δ ) l+δ l and a change of index in the second summation on the previous line. Since p )P < A max δ+ l<p {al 2m 2M l } max δ+ l<p {al 2m 2M l } p ) p p )P ) l l l=δ+ ) δ e p )P ) δ+. l! Application of the mean value theorem to the integral representation 22) yields E[θi ] P δ Jf U i,...,u δ i,u i ) γp,δ p )P ) δ r, 26) where l= f U i,...,u δ i,u i u,..., u δ+ ) = 2π) δ ) p δ i < <i δ p i / {i,,i δ } 2π 2π f Ui,...,U iδ,u i e θ u,..., e θ δ u δ, u δ+ ) dθ dθ δ, r = 2 ρ), γ p,δ = 2a δ+ 2m 2Ṁδ+ /δ! and Ṁδ+ is a bound on the norm of the gradient ui,...,u iδ f U i,...,u δ i U i u i,..., u iδ u i ). Combining 24)-26) and the relation r = O ρ) /2 ), ) p E[φ i] P δ Jf U,...,U δ δ+) ) { O p )P ) δ max p )P, ρ) /2}). Summing over i and recalling the definitions 6) and 7) of Λ and η p,δ, E[N δ,ρ ] Λ O pp )P ) δ max {p )P, ρ) /2}) { = O ηp,δ δ max η p,δ p /δ, ρ) /2}). This establishes the bound 8).

18 8 Hamed Firouzi, Dennis Wei and Alfred O. Hero III Next we prove the bound 9) by using the Chen-Stein method [2]. Define: Ñ δ,ρ = ϕδ) p i = i <...<i δ p j= δ φ ii j, 27) Where the second sum is over the indices i <... < i δ p such that i j i, j δ. For i def = i, i,..., i δ ) define the index set B i = B i,i,...,i δ = {j, j,..., j δ ) : j l N k i l ) {i l }, l =,..., δ} C < where C < = {j,..., j δ ) : j p, j < < j δ p, j l j, l δ}. These index the distinct sets of points U i = {U i, U i,..., U iδ } and their respective k- NN s. Note that B i k δ+. Identifying Ñδ,ρ = δ i C < l= φ i i l and Nδ,ρ a Poisson distributed random variable with rate E[Ñδ,ρ], the Chen-Stein bound [2, Theorem ] is 2 max A PÑδ,ρ A) PN δ,ρ A) b + b 2 + b 3, 28) where b = E i C < j B i b 2 = [ δ ] [ δ ] φ ii l E φ jj q, l= E i C < j B i {i} [ δ q= φ ii l l= q= ] δ φ jj q, and, for p i = E[ δ l= φ i i l ], b 3 = i C < E [ [ δ ]] E φ ii l p i φ j : j B i. l= Over the range of indices in the sum of b E[ δ l= φ ii l ] is of order OP δ ), by 23), and therefore b O p δ+ k δ+ P 2δ ) = O η 2δ p,δk/p) δ+), which follows from definition 7). More care is needed to bound b 2 due to the repetition of characteristic functions φ ij. Since i j, E[ δ l= φ δ i i l q= φ j j q ] is a multiplication of at least δ + different characteristic functions, hence by 23), δ δ E[ φ ii l φ jj q ] = O P δ+ ). l= q= Therefore, we conclude that b 2 O p δ+ k δ+ P δ+ ).

19 Spectral Correlation Hub Screening of Multivariate Time Series 9 i i Fig. 2. The complementary k-nn set A k i) illustrated for δ = and k = 5. Here we have i = i, i ). The vertices i, i and their k-nns are depicted in black and blue respectively. The complement of the union of {i, i } and its k-nns is the complementary k-nn set A k i) and is depicted in red. Next we bound the term b 3 in 28). The set A k i) = B c i {i} 29) indexes the complementary k-nn of U i see Fig. 2) so that, using the representation 23), b 3 = i C < E [ [ δ ]] E φ ii l p i U Ak i) l= = δ ) du Ak i) du i du il i C < S A k i) 2m 2 l= S 2m 2 Ar,u i ) fui U u ) Ak i u Ak i)) f Ui u i ) f Ui u i )f UAk i) f Ui u i ) u A k i)) O p δ+ P δ p,m,k,δ ) = O η δ p,δ p,m,k,δ ). Note that by definition of Ñδ,ρ we have Ñδ,ρ > if and only if N δ,ρ >. This yields:

20 2 Hamed Firouzi, Dennis Wei and Alfred O. Hero III PN δ,ρ > ) exp Λ)) PÑδ,ρ > ) PN δ,ρ > ) + exp E[ PÑδ,ρ > ) exp E[Ñδ,ρ])) + Ñ δ,ρ ]) exp Λ) ) E[ b + b 2 + b 3 + O Ñ δ,ρ ] Λ 3) Combining the above inequalities on b, b 2 and b 3 yields the first three terms in the argument of the max on the right side of 9). It remains to bound the term E[Ñδ,ρ] Λ. Application of the mean value theorem to the multiple integral 23) gives [ δ ] E φ iil P δ J l= f Ui,...,U iδ,u i ) O P δ r ). Applying relation 27) yields ) p E[Ñδ,ρ] p P δ J ) f U,...,U δ δ+) O p δ+ P δ r ) = O ηp,δr δ ). Combine this with 3) to obtain the bound 9). This completes the proof of Theorem 2. An immediate consequence of Theorem 2 is the following result, similar to Proposition 2 in [9], which provides asymptotic expressions for the mean number of δ-hubs and the probability of the event N δ,ρ > as p goes to and ρ converges to at a prescribed rate. Corollary 2 Let ρ p [, ] be a sequence converging to one as p such that η p,δ = p /δ p ) ρ 2 p) m 2) e m,δ, ). Then lim E[N δ,ρ p ] = Λ = e δ m,δ/δ! p lim p Jf U,...,U δ+) ). 3) Assume that k = op /δ ) and that for the weak dependency coefficient p,m,k,δ, defined via 5), we have lim p p,m,k,δ =. Then PN δ,ρp > ) exp Λ /ϕδ)). 32) Corollary 2 shows that in the limit p, the number of detected hubs depends on the true population correlations only through the quantity Jf U,...,U δ+) ). In some cases Jf U,...,U δ+) ) can be evaluated explicitly. Similar to the argument in [9], it can be shown that if the population covariance matrix Σ is sparse in the sense that its non-zero off-diagonal entries can be arranged into a k k submatrix by reordering rows and columns, then Jf U,...,U δ+) ) = + Ok/p).

Spectral Correlation Hub Screening of Multivariate Time Series 2 Hence, if k = op) as p, the quantity Jf U,...,U δ+) ) converges to. If Σ is diagonal, then Jf U,...,U δ+) ) = exactly.

21 Spectral Correlation Hub Screening of Multivariate Time Series 2 Hence, if k = op) as p, the quantity Jf U,...,U δ+) ) converges to. If Σ is diagonal, then Jf U,...,U δ+) ) = exactly. In such cases, the quantity Λ in Corollary 2 does not depend on the unknown underlying distribution of the U-scores. As a result, the expected number of δ-hubs in G ρ Ψ) and the probability of discovery of at least one δ-hub do not depend on the underlying distribution. We will see in Sec. 4 that this result is useful in assigning statistical significance levels to vertices of the graph G ρ Ψ). 3.6 Phase Transitions and Critical Threshold It can be seen from Theorem 2 and Corollary 2 that the number of δ-hub discoveries exhibits a phase transition in the high-dimensional regime where the number of variables p can be very large relative to the number of samples m. Specifically, assume that the population covariance matrix Σ is blocksparse as in Section 3.5. Then as the correlation threshold ρ is reduced, the number of δ-hub discoveries abruptly increases to the maximum, p. Conversely as ρ increases, the number of discoveries quickly approaches zero. Similarly, the family-wise error rate i.e. the probability of discovering at least one δ- hub in a graph with no true hubs) exhibits a phase transition as a function of ρ. Figure 3 shows the family-wise error rate obtained via expression 32) for δ = and p =, as a function of ρ and the number of samples m. It is seen that for a fixed value of m there is a sharp transition in the family-wise error rate as a function of ρ. Number of Observations m Applied Threshold ρ Fig. 3. Family-wise error rate as a function of correlation threshold ρ and number of samples m for p =, δ =. The phase transition phenomenon is clearly observable in the plot.

22 22 Hamed Firouzi, Dennis Wei and Alfred O. Hero III The phase transition phenomenon motivates the definition of a critical threshold ρ c,δ as the threshold ρ satisfying the following slope condition: E[N δ,ρ ]/ ρ = p. Using 6) the solution of the above equation can be approximated via the expression below: ρ c,δ = c m,δ p )) 2δ/δ2m 3) 2), 33) where c m,δ = b m δjf U,...,U δ+) ). The screening threshold ρ should be chosen greater than ρ c,δ to prevent excessively large numbers of false positives. Note that the critical threshold ρ c,δ also does not depend on the underlying distribution of the U-scores when the covariance matrix Σ is block-sparse. Expression 33) is similar to the expression obtained in [9] for the critical threshold in real-valued correlation screening. However, in the complex-valued case the coefficient c m,δ and the exponent of the term c m,δ p ) are different from the real case. This generally results in smaller values of ρ c,δ for fixed m and δ. Figure 4 shows the value of ρ c,δ obtained via 33) as a function of m for different values of δ and p. The critical threshold decreases as either the sample size m increases, the number of variables p decreases, or the vertex degree δ increases. Note that even for ten billion ) dimensions upper triplet of curves in the figure) only a relatively small number of samples are necessary for complex-valued correlation screening to be useful. For example, with m = 2 one can reliably discover connected vertices δ = in the figure) having correlation greater than ρ c,δ =.5. 4 Application to Spectral Screening of Multivariate Gaussian Time Series In this section, the complex-valued correlation hub screening method of Section 3 is applied to stationary multivariate Gaussian time series. Assume that the time series X ),, X p) defined in Section 2 satisfy the conditions of Corollary. Assume also that a total of N = n m time samples of X ),, X p) are available. We divide the N samples into m parts of n consecutive samples and we take the n-point DFT of each part. Therefore, for each time series, at each frequency f i = i )/n, i n, m samples are available. This allows us to construct a partial) correlation graph corresponding to each frequency. We denote the partial) correlation graph corresponding to frequency f i and correlation threshold ρ i as G fi,ρ i. G fi,ρ i has p vertices v, v 2,, v p corresponding to time series X ), X 2),, X p), respectively. Vertices v k and v l are connected if the magnitude of the sample partial) correlation between the DFTs of X k) and X l) at frequency f i i.e.

23 Spectral Correlation Hub Screening of Multivariate Time Series 23 Critical Threshold ρ c,δ Number of Observations m) p= p= p= 2 3 Fig. 4. The critical threshold ρ c,δ as a function of the sample size m for δ =, 2, 3 curve labels) and p =,, bottom to top triplets of curves). The figure shows that the critical threshold decreases as either m or δ increases. When the number of samples m is small the critical threshold is close to in which case reliable hub discovery is impossible. However a relatively small increment in m is sufficient to reduce the critical threshold significantly. For example for p =, only m = 2 samples are enough to bring ρ c, down to.5. the sample partial) correlation between Y k) i ) and Y l) i )) is at least ρ i. Consider a single frequency f i and the null hypothesis, H, that the correlations among the time series X ), X 2),, X p) at frequency f i are block sparse in the sense of Section 3.5. As discussed in Sec. 3.5, under H the expected number of δ-hubs and the probability of discovery of at least one δ-hub in graph G fi,ρ i are not functions of the unknown underlying distribution of the data. Therefore the results of Corollary 2 may be used to quantify the statistical significance of declaring vertices of G fi,ρ i to be δ-hubs. The statistical significance is represented by the p-value, defined in general as the probability of having a test statistic at least as extreme as the value actually observed assuming that the null hypothesis H is true. In the case of correlation hub screening, the p-value pv δ j) assigned to vertex v j for being a δ-hub is the maximal probability that v j maintains degree δ given the observed sample correlations, assuming that the block-sparse hypothesis H is true. The detailed procedure for assigning p-values is similar to the procedure in [9] for real-valued correlation screening and is illustrated in Fig. 5. Equation 33) helps in choosing the initial threshold ρ. Given Corollary, for i j the correlation graphs G fi,ρ i and G fj,ρ j and their associated inferences are approximately independent. Thus we can solve multiple inference problems by first performing correlation hub screening on

24 24 Hamed Firouzi, Dennis Wei and Alfred O. Hero III Initialization:. Choose a degree threshold δ. 2. Choose an initial threshold ρ > ρ c,δ. 3. Calculate the degree d j of each vertex of graph G ρ Ψ). 4. Select a value of δ {,, max j p d j}. For each j =,, p find ρ jδ) as the δth greatest element of the jth row of the sample partial) correlation matrix. Approximate the p-value corresponding to vertex v j as pv δ j) exp E[N δ,ρj δ)]/ϕδ)), where E[N δ,ρj δ)] is approximated by the limiting expression 3) using Jf U,...,U δ+) ) =. Screen variables by thresholding the p-values pv δ j) at desired significance level. Fig. 5. Procedure for assigning p-values to the vertices of G ρ Ψ). each graph as discussed above and then aggregating the inferences at each frequency in a straightforward manner. Examples of aggregation procedures are described below. 4. Disjunctive Hubs One task that can be easily performed is finding the p-value for a given time series to be a hub in at least one of the graphs G f,ρ,, G fn,ρ n. More specifically, for each j =,..., p denote the p-values for vertex v j being a δ-hub in G f,ρ,, G fn,ρ n by pv f,ρ,δj),, pv fn,ρ n,δj) respectively. These p-values are obtained using the method of Fig. 5. Then pv δ j), the p-value for the vertex v j being a δ-hub in at least one of the frequency graphs G f,ρ,, G fn,ρ n can be approximated as: P i : d j,fi δ H ) pv ˆ δ j) = n pv fi,ρ i,δj)), i= in which d j,fi is the degree of v j in the graph G fi,ρ i. 4.2 Conjunctive Hubs Another property of interest is the existence of a hub at all frequencies for a particular time series. In this case we have: P i : d j,fi δ H ) pv ˇ δ j) = n pv fi,ρ i,δj). i=

25 Spectral Correlation Hub Screening of Multivariate Time Series General Persistent Hubs The general case is the event that at least K frequencies have hubs of degree at least δ at vertex v j. For this general case we have: P i,..., i K : d j,fi δ,... d j,fik δ H ) = n k =K i <...<i k,i k + <...<in {i,...,i n}={,...,n} pv fil,ρ il,δj) k l= n l =k + ) pv fil,ρi l,δ j). 5 Experimental Results 5. Phase Transition Phenomenon and Mean Number of Hubs We first performed numerical simulations to confirm Theorem 2 and Corollary 2 for complex-valued correlation screening. Samples were generated from p uncorrelated complex Gaussian random variables. Figure 6 shows the number of discovered -hubs for p = and several sample sizes m. The plots from left to right correspond to m = 2,, 5,, 5, 2,, 6 and 4, respectively. The phase transition phenomenon is clearly observed in the plot. Table shows the predicted value obtained from formula 33) for the critical threshold. As can be seen in Fig. 6, the empirical phase transition thresholds approximately match the predicted values of Table. Moreover, to confirm the accuracy of equation 3) in Corollary 2, we list the number of hubs for m = in Table 2. The left column shows the empirical average number of hubs of degree at least δ =, 2, 3, 4 in a network of i.i.d. complex Gaussian random variables. The numbers in this column are obtained by averaging independent experiments. The right column shows the predicted value of E[N δ,ρ ] obtained via formula 3) with Jf U,...,U δ+) ) = for the i.i.d. case. As we see the empirical and predicted values are close to each other. m ρ c,δ Table. The value of critical threshold ρ c,δ obtained from formula 33) for p = complex variables and δ =. The predicted ρ c,δ approximates the phase transition thresholds in Fig Asymptotic Independence of Spectral Components for AR) Model To illustrate the asymptotic independence property and convergence rate of Theorem, we considered the simple case of an AR) process,

26 26 Hamed Firouzi, Dennis Wei and Alfred O. Hero III ρ Fig. 6. Phase transition phenomenon: the number of -hubs in the sample correlation graph corresponding to uncorrelated complex Gaussian variables as a function of correlation threshold ρ. Here, p = and the plots from left to right correspond to m = 2,, 5,, 5, 2,, 6 and 4, respectively. degree threshold empirical E[N δ,ρ ]) predicted E[N δ,ρ ]) d i δ = d i δ = d i δ = d i δ = 4 Table 2. Empirical average number of discovered hubs vs. predicted average number of discovered hubs in an uncorrelated complex Gaussian network. Here p =, m =, ρ =.28. The empirical values are obtained by performing independent experiments. Xk) = ϕ Xk ) + εk), k, 34) in which X) =, ϕ =.9 and ε.) is a stationary Gaussian process with no temporal correlation and standard deviation. We performed Monte-Carlo simulations to compute the correlation between spectral components at different frequencies for window sizes n =, 2,..., 25. More specifically, we set k = and l = 2 and empirically estimated cor Y k), Y l)) using 5 Monte-Carlo trials for each value of window size n. Figure 7 shows the result of this experiment. It is observable that the magnitude of cor Y k), Y l)) is bounded above by the function /n. This observation is consistent with Theorem.

27 Spectral Correlation Hub Screening of Multivariate Time Series coryk),yl)) /n n Fig. 7. Correlation coefficient cor Y ), Y 2)) as a function of window size n, empirically estimated using 5 Monte-Carlo trials. Here Y.) is the DFT of the AR) process 34). The magnitude of the correlation for n =, 2,..., 25 is bounded above by the function /n. This observation is consistent with the convergence rate in Theorem. 5.3 Spectral Correlation Screening of a Band-Pass Multivariate Time Series Next we analyzed the performance of the proposed complex-valued correlation screening framework on a synthetic data set for which the expected results are known. We synthesized a multivariate stationary Gaussian time series using the the following procedure. Here we set p =, N = 2 and m = n =. The discrepancy between N and the product mn is explained below. Let Xk), k N be a sequence of i.i.d. zero-mean Gaussian random variables i.e. white Gaussian noise) with standard deviation of. The p time series X ) k),..., X p) k), k N are obtained from Xk) by bandpass filtering and adding independent white Gaussian noise. Specifically, X i) k) = h i k) Xk) + N i k), i p, k N, in which represents the convolution operator, h i.) is the impulse response of the ith band-pass filter and N i.) is an independent white Gaussian noise series whose standard deviation is.. Since stable filtering of a stationary series results in another stationary series, the obtained series X ) k),..., X p) k) are stationary and Gaussian. For i = l, l 5, h i k) is the impulse response of a band-pass filter with pass band f [4l )/4, 4l/4]. We

28 28 Hamed Firouzi, Dennis Wei and Alfred O. Hero III approximate the ideal band-pass filters with finite impulse response FIR) Chebyshev filters [6]. Also for i = 5 + l, l 5 we set h i k) = h i 5 k). For all of the other values of i i.e. i l) we set h i k) =, k N. Figure 8 shows the signal part of the time series i.e. h i k) Xk)) for i =, 2, 3, 4. It is seen that the first 2 samples of the signals reflect the transient response of the filters. These 2 samples are not included for the purpose of correlation screening. Hence the actual number of time samples considered is mn =. Figure 9 shows the magnitude of the DFTs of the signals, Y i) k), for i = 5,,..., 5. The band-pass structure of the signals is clearly observable in the figure..5 Signal Freq.) Signal 2 Freq.2) Signal 3 Freq.3) Signal 4 Freq.4) Fig. 8. Signal part of the band-pass time series X i) k) i.e. h ik) Xk)) for i =, 2, 3, 4. We first constructed a correlation matrix for the time series X ) k),..., X p) k) from their simultaneous time samples. Figure illustrates the structure of the thresholded sample correlation matrix and the corresponding correlation graph. Note that this is a real-valued correlation screening problem in the time domain. The correlation threshold used here is ρ =.2 which is well above the critical threshold ρ c, =.28 obtained via formula ) in [9] for p = and N =. To examine the spectral structure of the correlations in Fig., we then performed complex-valued correlation screening on the spectra of the time series X ) k),..., X p) k). Figure shows the constructed correlation graphs G f,ρ for f = [.,.2,.3,.4] and correlation threshold ρ =.9, which corre-

29 Spectral Correlation Hub Screening of Multivariate Time Series Magnitude db) Frequency Fig. 9. DFT magnitude of the band-pass signals h ik) Xk) i.e. 2 log Y i).) )) as a function of frequency for i = 5,,..., Fig.. Left) The structure of the thresholded sample correlation matrix in the time domain. Right) The correlation graph corresponding to the thresholded sample correlation matrix in the time domain. sponds to a δ = false positive rate PN δ,ρ > ) 65 using δ = in the asymptotic relation 32) with Λ = e δ m,δ /δ! as specified by 3)). Note that the value of the correlation threshold is set to be higher than the critical threshold ρ c =.24. It can be observed that performing complex-valued spectral correlation screening at each frequency correctly discovers the correlations between the time series which are active around that frequency. As an example, for f =.2 the discovered hubs for δ = ) are the time series X i) k) for i {2, 7}. These time series are the ones that are active at

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

MA 575 Linear Models: Cedric E Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 1 Revision: Probability Theory 11 Random Variables A real-valued random variable is