arxiv: v2 [stat.ot] 9 Apr 2014

Size: px
Start display at page:

Download "arxiv: v2 [stat.ot] 9 Apr 2014"

Transcription

1 Spectral Correlation Hub Screening of Multivariate Time Series Hamed Firouzi, Dennis Wei and Alfred O. Hero III arxiv:43.337v2 [stat.ot] 9 Apr 24 Electrical Engineering and Computer Science Department, University of Michigan, USA firouzi@umich.edu, dlwei@eecs.umich.edu, hero@eecs.umich.edu Summary. This chapter discusses correlation analysis of stationary multivariate Gaussian time series in the spectral or Fourier domain. The goal is to identify the hub time series, i.e., those that are highly correlated with a specified number of other time series. We show that Fourier components of the time series at different frequencies are asymptotically statistically independent. This property permits independent correlation analysis at each frequency, alleviating the computational and statistical challenges of high-dimensional time series. To detect correlation hubs at each frequency, an existing correlation screening method is extended to the complex numbers to accommodate complex-valued Fourier components. We characterize the number of hub discoveries at specified correlation and degree thresholds in the regime of increasing dimension and fixed sample size. The theory specifies appropriate thresholds to apply to sample correlation matrices to detect hubs and also allows statistical significance to be attributed to hub discoveries. Numerical results illustrate the accuracy of the theory and the usefulness of the proposed spectral framework. Key words: Complex-valued correlation screening, Spectral correlation analysis, Gaussian stationary processes, Hub screening, Correlation graph, Correlation network, Spatio-temporal analysis of multivariate time series, High dimensional data analysis Introduction Correlation analysis of multivariate time series is important in many applications such as wireless sensor networks, computer networks, neuroimaging, and finance [, 2, 3, 4, 5]. This chapter focuses on the problem of detecting hub time series, ones that have a high degree of interaction with other time series as measured by correlation or partial correlation. Detection of hubs can lead to reduced computational and/or sampling costs. For example in wireless sensor networks, the identification of hub nodes can be useful for reducing power usage and adding or removing sensors from the network [6, 7]. Hub detection can also give new insights about underlying structure in the dataset.

2 2 Hamed Firouzi, Dennis Wei and Alfred O. Hero III In neuroimaging for instance, studies have consistently shown the existence of highly connected hubs in brain graphs connectomes) [8]. In finance, a hub might indicate a vulnerable financial instrument or a sector whose collapse could have a major effect on the market [9]. Correlation analysis becomes challenging for multivariate time series when the dimension p of the time series, i.e. the number of scalar time series, and the number of time samples N are large [4]. A naive approach is to treat the time series as a set of independent samples of a p-dimensional random vector and estimate the associated covariance or correlation matrix, but this approach completely ignores temporal correlations as it only considers dependences at the same time instant and not between different time instants. The work in [] accounts for temporal correlations by quantifying their effect on convergence rates in covariance and precision matrix estimation; however, only correlations at the same time instant are estimated. A more general approach is to consider all correlations between any two time instants of any two series within a window of n N consecutive samples, where the previous case corresponds to n =. However, in general this would entail the estimation of an np np correlation matrix from a reduced sample of size m = N/n, which can be computationally costly as well as statistically problematic. In this chapter, we propose spectral correlation analysis as a method of overcoming the issues discussed above. As before, the time series are divided into m temporal segments of n consecutive samples, but instead of estimating temporal correlations directly, the method performs analysis on the Discrete Fourier Transforms DFT) of the time series. We prove in Theorem that for stationary, jointly Gaussian time series under the mild condition of absolute summability of the auto- and cross-correlation functions, different Fourier components frequencies) become asymptotically independent of each other as the DFT length n increases. This property of stationary Gaussian processes allows us to focus on the p p correlations at each frequency separately without having to consider correlations between different frequencies. Moreover, spectral analysis isolates correlations at specific frequencies or timescales, potentially leading to greater insight. To make aggregate inferences based on all frequencies, straightforward procedures for multiple inference can be used as described in Section 4. The spectral approach reduces the detection of hub time series to the independent detection of hubs at each frequency. However, in exchange for achieving spectral resolution, the sample size is reduced by the factor n, from N to m = N/n. To confidently detect hubs in this high-dimensional, lowsample regime large p, small m), as well as to accommodate complex-valued DFTs, we develop a method that we call complex-valued partial) correlation screening. This is a generalization of the correlation and partial correlation screening method of [, 9, 2] to complex-valued random variables. For each frequency, the method computes the sample partial) correlation matrix of the DFT components of the p time series. Highly correlated variables hubs) are then identified by thresholding the sample correlation matrix at a level

3 Spectral Correlation Hub Screening of Multivariate Time Series 3 ρ and screening for rows or columns) with a specified number δ of non-zero entries. We characterize the behavior of complex-valued correlation screening in the high-dimensional regime of large p and fixed sample size m. Specifically, Theorem 2 and Corollary 2 give asymptotic expressions in the limit p for the mean number of hubs detected at thresholds ρ, δ and the probability of discovering at least one such hub. Bounds on the rates of convergence are also provided. These results show that the number of hub discoveries undergoes a phase transition as ρ decreases from, from almost no discoveries to the maximum number, p. An expression 33) for the critical threshold ρ c,δ is derived to guide the selection of ρ under different settings of p, m, and δ. Furthermore, given a null hypothesis that the population correlation matrix is sufficiently sparse, the expressions in Corollary 2 become independent of the underlying probability distribution and can thus be easily evaluated. This allows the statistical significance of a hub discovery to be quantified, specifically in the form of a p-value under the null hypothesis. We note that our results on complex-valued correlation screening apply more generally than to spectral correlation analysis and thus may be of independent interest. The remainder of the chapter is organized as follows. Section 2 presents notation and definitions for multivariate time series and establishes the asymptotic independence of spectral components. Section 3 describes complexvalued correlation screening and characterizes its properties in terms of numbers of hub discoveries and phase transitions. Section 4 discusses the application of complex-valued correlation screening to the spectra of multivariate time series. Finally, Sec. 5 illustrates the applicability of the proposed framework through simulation analysis.. Notation A triplet Ω, F, P) represents a probability space with sample space Ω, σ- algebra of events F, and probability measure P. For an event A F, PA) represents the probability of A. Scalar random variables and their realizations are denoted with upper case and lower case letters, respectively. Random vectors and their realizations are denoted with bold upper case and bold lower case letters. The expectation operator is denoted as E. For a random variable X, the cumulative probability distribution cdf) of X is defined as F X x) = PX x). For an absolutely continuous cdf F X.) the probability density function pdf) is defined as f X x) = df X x)/dx. The cdf and pdf are defined similarly for random vectors. Moreover, we follow the definitions in [3] for conditional probabilities, conditional expectations and conditional densities. For a complex number z = a + b C, Rz) = a and Iz) = b represent the real and imaginary parts of z, respectively. A complex-valued random variable is composed of two real-valued random variables as its real and imaginary parts. A complex-valued Gaussian variable has real and imaginary parts

4 4 Hamed Firouzi, Dennis Wei and Alfred O. Hero III that are Gaussian. A complex-valued Gaussian) random vector is a vector whose entries are complex-valued Gaussian) random variables. The covariance of a p-dimensional complex-valued random vector Y and a q-dimensional complex-valued random vector Z is a p q matrix defined as covy, Z) = E [ Y E[Y])Z E[Z]) H], where H denotes the Hermitian transpose. We write covy) for covy, Y) and vary ) = covy, Y ) for the variance of a scalar random variable Y. The correlation coefficient between random variables Y and Z is defined as cory, Z) = covy, Z) vary )varz). Matrices are also denoted by bold upper case letters. In most cases the distinction between matrices and random vectors will be clear from the context. For a matrix A we represent the i, j)th entry of A by a ij. Also D A represents the diagonal matrix that is obtained by zeroing out all but the diagonal entries of A. 2 Spectral Representation of Multivariate Time Series 2. Definitions Let Xk) = [X ) k), X 2) k), X p) k)], k Z, be a multivariate time series with time index k. We assume that the time series X ), X 2), X p) are second-order stationary random processes, i.e.: and E[X i) k)] = E[X i) k + )] ) cov[x i) k), X j) l)] = cov[x i) k + ), X j) l + )] 2) for any integer time shift. For i p, let X i) = [X i) k),, X i) k + n )] denote any vector of n consecutive samples of time series X i). The n-point Discrete Fourier Transform DFT) of X i) is denoted by Y i) = [Y i) ),, Y i) n )] and defined by Y i) = WX i), i p in which W is the DFT matrix: W = ω ω. n....., ω ω )2

5 Spectral Correlation Hub Screening of Multivariate Time Series 5 where ω = e 2π /n. We denote the n n population covariance matrix of X i) as C i,i) = [c i,i) kl ] k,l n and the n n population cross covariance matrix between X i) and X j) as C i,j) = [c i,j) kl ] k,l n for i j. The translation invariance properties ) and 2) imply that C i,i) and C i,j) are Toeplitz matrices. Therefore c i,i) kl and c i,j) kl depend on k and l only through the quantity k l. Representing the k, l)th entry of a Toeplitz matrix T by tk l), we write c i,i) kl = c i,i) k l) and c i,j) kl = c i,j) k l), where k l takes values from n to n. In addition, C i,i) is symmetric. 2.2 Asymptotic Independence of Spectral Components The following theorem states that for stationary time series, DFT components at different spectral indices i.e. frequencies) are asymptotically uncorrelated under the condition that the auto-covariance and cross-covariance functions are absolutely summable. This theorem follows directly from the spectral theory of large Toeplitz matrices, see, for example, [4] and [5]. However, for the benefit of the reader we give a self contained proof of the theorem. Theorem Assume lim n t= ci,j) t) = M i,j) < for all i, j p. Define err i,j) n) = M i,j) m = ci,j) m ) and avg i,j) n) = n m = erri,j) m ). Then for k l, we have: ) cor Y i) k), Y j) l) = Omax{/n, avg i,j) n)}). In other words Y i) k) and Y j) l) are asymptotically uncorrelated as n. Proof. Without loss of generality we assume that the time series have zero mean i.e. E[X i) k)] =, i p, k n ). We first establish a representation of E[Z i) k)z j) l) ] for general linear functionals: Z i) k) = m = g k m )X i) m ), in which g k.) is an arbitrary complex sequence for k n. We have: E[Z i) k)z j) l) ] [ ) ) ] = E g k m )X i) m ) g l n )X j) n ) = = m = m = m = g k m ) g k m ) n = n = n = g l n ) E[X i) m )X j) n ) ] g l n ) c i,j) m n 3)

6 6 Hamed Firouzi, Dennis Wei and Alfred O. Hero III Now for a Toeplitz matrix T, define the circulant matrix D T as: t) t ) + tn ) t n) + t) t) + t n) t) t2 n) + t2) D T = tn 2) + t 2) tn 3) + t 3) t ) + tn ) tn ) + t ) tn 2) + t 2) t) We can write: C i,j) = D C i,j) + E i,j) for some Toeplitz matrix E i,j). Thus c i,j) m n ) = d i,j) m n ) + e i,j) m n ) where d i,j) m n ) and e i,j) m n ) are the m, n ) entries of D C i,j) and E i,j), respectively. Therefore, 3) can be written as: m = g k m ) n = g l n ) d i,j) m n ) + The first term can be written as: m = g k m ) gl d i,j)) m ) = m = n = m = g k m )g l n ) e i,j) m n ) g k m )v i,j) l m ) where we have recognized v i,j) l m ) = gl di,j) as the circular convolution of gl.) and di,j).) [6]. Let G k.) and D i,j).) be the the DFT of g k.) and d i,j).), respectively. By Plancherel s theorem [7] we have: m = g k m )v i,j) l m ) = = = m = m = m = g k m ) v i,j) l m ) ) G k m ) G l m )D i,j) m ) ) G k m )G l m ) D i,j) m ). 4) Now let g k m ) = ω km / n for k, m n. For this choice of g k.) we have G k m ) = for all m n k and G k n k) =. Hence for k l the quantity 4) becomes. Therefore using the representation E i,j) = C i,j) D C i,j) we have:

7 Spectral Correlation Hub Screening of Multivariate Time Series 7 ) cov Y i) k), Y j) l) = E[Y i) k)y j) l) ] = n = 2 n m = n = m = n = m = g k m )g l n ) e i,j) m n ) e i,j) m n ) m c i,j) m ), 5) in which the last equation is due to the fact that c i,j) m ) = c i,j) m ). Now using 4) and 5) we obtain expressions for var Y i) k) ) and var Y j) l) ). Letting j = i and l = k in 4) and 5) gives: = var m = ) Y i) k) = cov ) Y i) k), Y i) k) G k m )G k m ) D i,i) m ) + = n.. D i,i) k) + n n = D i,i) k) + m = n = m = n = m = n = g k m )g k n ) e i,i) m n ) g k m )g k n ) e i,i) m n ) g k m )g k n ) e i,i) m n ), 6) in which the magnitude of the summation term is bounded as: Similarly: in which n = 2 n m = n = m = n = m = ) var Y j) l) = D j,j) l) + g k m )g k n ) e i,i) m n ) e i,i) m n ) m c i,i) m ). 7) m = n = g l m )g l n ) e j,j) m n ), 8)

8 8 Hamed Firouzi, Dennis Wei and Alfred O. Hero III 2 n m = n = m = g l m )g l n ) e j,j) m n ) m c j,j) m ). 9) To complete the proof the following lemma is needed. Lemma If {a m } m = is a sequence of non-negative numbers such that m = a m = M <. Define errn) = M m = a m and avgn) = n m = errm ). Then n m = m a m M/n + errn) + avgn). Proof. Let S = and for n define S n = m = a m. We have: Therefore: m = n ma m = n )S n S + S S ). m = m a m = n n S n m = S m. Since M M/n errn) n S M and M avgn) n M, using the triangle inequality the result follows. m = S m Now let a m = c i,j) m ). By assumption lim n m = a m = M i,j) <. Therefore, Lemma along with 5) concludes: ) cov Y i) k), Y j) l) = Omax{/n, err i,j) n), avg i,j) n)}). ) err i,j) n) is a decreasing decreasing function of n. Therefore avg i,j) n) err i,j) n), for n. Hence: ) cov Y i) k), Y j) l) = Omax{/n, avg i,j) n)}). Similarly using Lemma along with 6), 7), 8) and 9) we obtain: ) var Y i) k) D i,i) k) = Omax{/n, avg i,i) n)}), ) and ) var Y j) l) D j,j) l) = Omax{/n, avg j,j) n)}). 2) Using the definition

9 Spectral Correlation Hub Screening of Multivariate Time Series 9 ) cor Y i) k), Y j) cov Y i) k), Y j) l) ) l) = var Y i) k) ) var Y j) l) ), and the fact that as n, D i,i) k) and D j,j) l) converge to constants C i,i) k) and C j,j) l), respectively, equations ), ) and 2) conclude: ) cor Y i) k), Y j) l) = Omax{/n, avg i,j) n)}). As an example we apply Theorem to a scalar auto-regressive AR) process Xk) specified by Xk) = L ϕ l Xk l) + εk), l= in which ϕ l are real-valued coefficients and ε.) is a stationary process with no temporal correlation. The auto-covariance function of an AR process can be written as [8]: ct) = L l= α l r t l, in which r,..., r l are the roots of the polynomial βx) = x L L l= ϕ lx L l. It is known that for a stationary AR process, r l < for all l L [8]. Therefore, using the definition of err.) we have: L errn) = ct) = α l rl t = t=n t=n l= L r l n α l r l Cζn, l= L α l r l t in which C = L l= α l / r l ) and ζ = max l L r l <. Hence: avgn) = n m = Therefore, Theorem concludes: errm ) n m = l= Cζ m cor Y k), Y l)) = O/n), k l, t=n C n ζ). where Y.) represents the n-point DFT of the AR process X.). In the sequel, we assume that the time series X is multivariate Gaussian, i.e., X ),..., X p) are jointly Gaussian processes. It follows that the DFT components Y i) k) are jointly complex) Gaussian as linear functionals of X. Theorem then immediately implies asymptotic independence of DFT components through a well-known property of jointly Gaussian random variables.

10 Hamed Firouzi, Dennis Wei and Alfred O. Hero III Corollary Assume that the time series X is multivariate Gaussian. Under the absolute summability conditions in Theorem, the DFT components Y i) k) and Y j) l) are asymptotically independent for k l and n. Corollary implies that for large n, correlation analysis of the time series X can be done independently on each frequency in the spectral domain. This reduces the problem of screening for hub time series to screening for hub variables among the p DFT components at a given frequency. A procedure for the latter problem and a corresponding theory are described next. 3 Complex-Valued Correlation Hub Screening This section discusses complex-valued correlation hub screening, a generalization of real-valued correlation screening in [, 9], for identifying highly correlated components of a complex-valued random vector from its sample values. The method is applied to multivariate time series in Section 4 to discover correlation hubs among the spectral components at each frequency. Sections 3. and 3.2 describe the underlying statistical model and the screening procedure. Sections 3.3 and 3.4 provide background on the U-score representation of correlation matrices and associated definitions and properties. Section 3.5 contains the main theoretical result characterizing the number of hub discoveries in the high-dimensional regime, while Section 3.6 elaborates on the phenomenon of phase transitions in the number of discoveries. 3. Statistical Model We use the generic notation Z = [Z, Z 2,, Z p ] T in this section to refer to a complex-valued random vector. The mean of Z is denoted as µ and its p p non-singular covariance matrix is denoted as Σ. We assume that the vector Z follows a complex elliptically contoured distribution with pdf f Z z) = g z µ) H Σ z µ) ), in which g : R R > is an integrable and strictly decreasing function [9]. This assumption generalizes the Gaussian assumption made in Section 2 as the Gaussian distribution is one example of an elliptically contoured distribution. In correlation hub screening, the quantities of interest are the correlation matrix and partial correlation matrix associated with Z. These are defined as Γ = D 2 Σ ΣD 2 Σ and Ω = D 2 Σ Σ D 2 Σ, respectively. Note that Γ and Ω are normalized matrices with unit diagonals. 3.2 Screening Procedure The goal of correlation hub screening is to identify highly correlated components of the random vector Z from its sample realizations. Assume that m samples z,..., z m R p of Z are available. To simplify the development of the

11 Spectral Correlation Hub Screening of Multivariate Time Series theory, the samples are assumed to be independent and identically distributed i.i.d.) although the theory also applies to dependent samples. We compute sample correlation and partial correlation matrices from the samples z,..., z m as surrogates for the unknown population correlation matrices Γ and Ω in Section 3.. First define the p p sample covariance matrix S as S = m m i= z i z)z i z) H, where z is the sample mean, the average of z,..., z m. The sample correlation and sample partial correlation matrices are then defined as R = D 2 S SD 2 S and P = D 2 R R D 2 R, respectively, where R is the Moore-Penrose pseudo-inverse of R. Correlation hubs are screened by applying thresholds to the sample partial) correlation matrix. A variable Z i is declared a hub screening discovery at degree level δ {, 2,...} and threshold level ρ [, ] if {j : j i, ψ ij ρ} δ, where Ψ = R for correlation screening and Ψ = P for partial correlation screening. We denote by N δ,ρ {,..., p} the total number of hub screening discoveries at levels δ, ρ. Correlation hub screening can also be interpreted in terms of the partial) correlation graph G ρ Ψ), depicted in Fig. and defined as follows. The vertices of G ρ Ψ) are v,, v p which correspond to Z,, Z p, respectively. For i, j p, v i and v j are connected by an edge in G ρ Ψ) if the magnitude of the sample partial) correlation coefficient between Z i and Z j is at least ρ. A vertex of G ρ Ψ) is called a δ-hub if its degree, the number of incident edges, is at least δ. Then the number of discoveries N δ,ρ defined earlier is the number of δ-hubs in the graph G ρ Ψ). v v 2 v p v 3 v j Fig.. Complex-valued partial) correlation hub screening thresholds the sample correlation or partial correlation matrix, denoted generically by the matrix Ψ, to find variables Z i that are highly correlated with other variables. This is equivalent to finding hubs in a graph G ρψ) with p vertices v,, v p. For i, j p, v i is connected to v j in G ρψ) if ψ ij ρ. v i

12 2 Hamed Firouzi, Dennis Wei and Alfred O. Hero III 3.3 U-score Representation of Correlation Matrices Our theory for complex-valued correlation screening is based on the U-score representation of the sample correlation and partial correlation matrices. Similarly to the real case [9], it can be shown that there exists an m ) p complex-valued matrix U R with unit-norm columns u i) R Cm such that the following representation holds: R = U H RU R. 3) Similar to Lemma in [9] it is straightforward to show that: R = U H RU R U H R) 2 U R. Hence by defining U P = U R U H R ) U R D 2 U H R U we have the representation: RU H R ) 2 U R P = U H P U P, 4) where the m ) p matrix U P has unit-norm columns u i) P Cm. 3.4 Properties of U-scores The U-score factorizations in 3) and 4) show that sample partial) correlation matrices can be represented in terms of unit vectors in C m. This subsection presents definitions and properties related to U-scores that will be used in Section 3.5. We denote the unit spheres in R m and C m as S m and T m, respectively. The surface areas of S m and T m are denoted as a m and b m respectively. Define the interleaving function h : R 2m 2 C m as below: h[x, x 2,, x 2m 2 ] T ) = [x + x 2, x3 + x 4,, x2m 3 + x 2m 2 ] T. Note that h.) is a one-to-one and onto function and it maps S 2m 2 to T m. For a fixed vector u T m and a threshold ρ define the spherical cap in T m : A ρ u) = {y : y T m, y H u ρ}. Also define P as the probability that a random point Y that is uniformly distributed on T m falls into A ρ u). Below we give a simple expression for P as a function of ρ and m. Lemma 2 Let Y be an m )-dimensional complex-valued random vector that is uniformly distributed over T m. We have P = P Y A ρ u)) = ρ 2 ) m 2.

13 Spectral Correlation Hub Screening of Multivariate Time Series 3 Proof. Without loss of generality we assume u = [,,, ] T. We have: P = P Y ρ) = PRY ) 2 + IY ) 2 ρ 2 ). Since Y is uniform on T m, we can write Y = X/ X 2, in which X is complex-valued random vector whose entries are i.i.d. complex-valued Gaussian variables with mean and variance. Thus: P = P RX ) 2 + IX 2 ) ) / X 2 2 ρ 2) = P ρ 2 ) ) RX ) 2 + IX ) 2) m ρ 2 RX k ) 2 + IX k ) 2. Define V = RX ) 2 + IX ) 2 and V 2 = m k=2 RX k) 2 + IX k ) 2. V and V 2 are independent and have chi-squared distributions with 2 and 2m 2) degrees of freedom, respectively [2]. Therefore, P = = = = = ρ 2 v 2/ ρ 2 ) χ 2 2m 2) v 2) k=2 χ 2 2v )χ 2 2m 2) v 2)dv dv 2 ρ 2 v 2/ ρ 2 ) 2 m 2 Γ m 2) vm 3 Γ m 2) ρ2 ) m 2 2 e v/2 dv dv 2 2 e v2/2 e ρ 2 2 ρ 2 ) v2 dv 2 x m 3 e x dx Γ m 2) ρ2 ) m 2 Γ m 2) = ρ 2 ) m 2, in which we have made a change of variable x = v2 2 ρ 2 ). Under the assumption that the joint pdf of Z exists, the p columns of the U-score matrix have joint pdf f U,...,U p u,..., u p ) on T p m = p i= T m. The following δ + )-fold average of the joint pdf will play a significant role in Section 3.5. This δ + )-fold average is defined as: f U,...,U δ+ u,..., u δ+ ) = 2π) δ+ p ) p i < <i δ p,i δ+ / {i,,i δ } 2π 2π δ 2π f Ui,...,U iδ,u iδ+ e θ u,..., e θ δ u δ, e θ u δ+ ) dθ dθ δ dθ. Also for a joint pdf f U,...,U δ+ u,..., u δ+ ) on Tm δ+ define Jf U,...,U δ+ ) = a δ 2m 2 f U,...,U δ+ hu),..., hu))du. S 2m 2

14 4 Hamed Firouzi, Dennis Wei and Alfred O. Hero III Note that Jf U,...,U δ+ ) is proportional to the integral of f U,...,U δ+ over the manifold u =... = u δ+. The quantity Jf U,...,U δ+) ) is key in determining the asymptotic average number of hubs in a complex-valued correlation network. This will be described in more detail in Sec Let i = i, i,..., i δ ) be a set of distinct indices, i.e., i p, i <... < i δ p and i,..., i δ i. For a U-score matrix U define the dependency coefficient between the columns U i = {U i, U i,..., U iδ } and their complementary k-nn k-nearest neighbor) set A k i) defined in 29) and Fig. 2 as p,m,k,δ i) = f Ui U f Ak i) U i )/f Ui, where denotes the supremum norm. The average of these coefficients is defined as: p,m,k,δ = p ) p δ p i = i,...,i δ i i <...<i δ p p,m,k,δ i). 5) 3.5 Number of Hub Discoveries in the High-Dimensional Limit We now present the main theoretical result on complex-valued correlation screening. The following theorem gives asymptotic expressions for the mean number of δ-hubs and the probability of discovery of at least one δ-hub in the graph G ρ Ψ). It also gives bounds on the rates of convergence to these approximations as the dimension p increases and ρ. We use U = [U,, U p ] as a generic notation for the U-score representation of the sample partial) correlation matrix. The asymptotic expression for the mean E[N δ,ρ ] is denoted by Λ and is given by: ) p Λ = p P δ Jf U,...,U δ δ+) ). 6) Define η p,δ as: η p,δ = p /δ p )P = p /δ p ) ρ 2 ) m 2), 7) where the last equation is due to Lemma 2. The parameter k below represents an upper bound on the true hub degree, i.e. the number of non-zero entries in any row of the population covariance matrix Σ. Also let ϕδ) be the function that takes values ϕδ) = 2 for δ = and ϕδ) = for δ >. Theorem 2 Let U = [U,..., U p ] be a m ) p random matrix with U i T m where m > 2. Let δ be a fixed integer. Assume the joint pdf of any subset of the U i s is bounded and differentiable. Then, with Λ defined in 6),

15 Spectral Correlation Hub Screening of Multivariate Time Series 5 { E[N δ,ρ ] Λ O ηp,δ δ max η p,δ p /δ, ρ) /2}). 8) Furthermore, let Nδ,ρ be a Poisson distributed random variable with rate E[Nδ,ρ ] = Λ/ϕδ). If p )P, then PN δ,ρ > ) PNδ,ρ > ) }) ηp,δ {η δ max p,δ δ k/p)δ+, Q p,k,δ, p,m,k,δ, p /δ, ρ) /2 O, δ > }) O η p, max {η p, k/p) 2, p,m,k,, p, ρ) /2, δ =, with Q p,k,δ = η p,δ k/p /δ ) δ+ and p,m,k,δ defined in 5). Proof. The proof is similar to the proof of proposition in [9]. First we prove 8). Let φ i = Id i δ) be the indicator of the event that d i δ, in which d i represents the degree of the vertex v i in the graph G ρ Ψ). We have N δ,ρ = p i= φ i. With φ ij being the indicator of the presence of an edge in G ρ Ψ) between vertices v i and v j we have the relation: φ i = p l l=δ k C ip,l) j= φ ikj p q=l+ φ ikq ) 2) where we have defined the index vector k = k,..., k p ) and the set C i p, l) = {k : k <... < k l, k l+ <... < k p k j {,..., p} {i}, k j k j }. The inner summation in 2) simply sums over the set of distinct indices not equal to i that index all ) p l different types of products of the form: l j= φ p ik j q=l+ φ ik q ). Subtracting δ k C ip,δ) j= φ ik j from both sides of 2) φ i = δ k C ip,δ) j= p φ ikj l l=δ+ k C ip,l) j= + k C ip,l) q=δ+ p φ ikj ) q δ p q=l+ k δ+ <...<k q,{k δ+,...,k q } {k δ+,...,k p } j= φ ikq ) l φ ikj q s=δ+ 9) φ ik s 2)

16 6 Hamed Firouzi, Dennis Wei and Alfred O. Hero III in which we have used the expansion p q=δ+ φ ikq ) = + p q=δ+ ) q δ k δ+ <...<k q,{k δ+,...,k q } {k δ+,...,k p } s=δ+ The following simple asymptotic representation will be useful in the sequel. For any i,..., i k {,..., p}, i i k i, k {,..., p }, k E φ iij = h A ρv)) h A ρv)) j= S 2m 2 f Ui,...,U ik,u i hv ),, hv k ), hv)) dv dv k dv P k a k 2m 2M k 22) where P, A ρ u) and the function h.) are defined in Sec Moreover M k = fui max,...,u ik U ik+. i =i k+ The following simple generalization of 22) to arbitrary product indices φ ij will also be needed [ q ] E φ il j l P q aq 2m 2 M Q, 23) l= where Q =unique{i l, j l } q l= ) is the set of unique indices among the distinct pairs {i l, j l )} q l= and M Q is a bound on the joint pdf of U Q. Define the random variable ) p δ θ i = φ ikj. δ k C ip,δ) j= We show below that for sufficiently large p ) p E[φ i] E[θ i ] δ γ p,δp )P ) δ+, 24) where γ p,δ = max δ+ l<p {a l 2m 2M l } e δ l= l!) ) + δ!) and M l is a least upper bound on any l-dimensional joint pdf of the variables {U i } p j i conditioned on U i. To show inequality 24) take expectations of 2) and apply the bound 22) to obtain ) p E[φ i] E[θ i ] δ p ) ) p δ p p ) p δ P l l a l 2m 2M l + P δ+l a δ+l 2m 2 δ l M δ+l l=δ+ A + δ!) ), 25) l= q φ ik s.

17 where Spectral Correlation Hub Screening of Multivariate Time Series 7 A = p l=δ+ ) p p )P ) l a l l 2m 2M l. The line 25) follows from the identity ) p δ p ) l δ = p ) l+δ ) l+δ l and a change of index in the second summation on the previous line. Since p )P < A max δ+ l<p {al 2m 2M l } max δ+ l<p {al 2m 2M l } p ) p p )P ) l l l=δ+ ) δ e p )P ) δ+. l! Application of the mean value theorem to the integral representation 22) yields E[θi ] P δ Jf U i,...,u δ i,u i ) γp,δ p )P ) δ r, 26) where l= f U i,...,u δ i,u i u,..., u δ+ ) = 2π) δ ) p δ i < <i δ p i / {i,,i δ } 2π 2π f Ui,...,U iδ,u i e θ u,..., e θ δ u δ, u δ+ ) dθ dθ δ, r = 2 ρ), γ p,δ = 2a δ+ 2m 2Ṁδ+ /δ! and Ṁδ+ is a bound on the norm of the gradient ui,...,u iδ f U i,...,u δ i U i u i,..., u iδ u i ). Combining 24)-26) and the relation r = O ρ) /2 ), ) p E[φ i] P δ Jf U,...,U δ δ+) ) { O p )P ) δ max p )P, ρ) /2}). Summing over i and recalling the definitions 6) and 7) of Λ and η p,δ, E[N δ,ρ ] Λ O pp )P ) δ max {p )P, ρ) /2}) { = O ηp,δ δ max η p,δ p /δ, ρ) /2}). This establishes the bound 8).

18 8 Hamed Firouzi, Dennis Wei and Alfred O. Hero III Next we prove the bound 9) by using the Chen-Stein method [2]. Define: Ñ δ,ρ = ϕδ) p i = i <...<i δ p j= δ φ ii j, 27) Where the second sum is over the indices i <... < i δ p such that i j i, j δ. For i def = i, i,..., i δ ) define the index set B i = B i,i,...,i δ = {j, j,..., j δ ) : j l N k i l ) {i l }, l =,..., δ} C < where C < = {j,..., j δ ) : j p, j < < j δ p, j l j, l δ}. These index the distinct sets of points U i = {U i, U i,..., U iδ } and their respective k- NN s. Note that B i k δ+. Identifying Ñδ,ρ = δ i C < l= φ i i l and Nδ,ρ a Poisson distributed random variable with rate E[Ñδ,ρ], the Chen-Stein bound [2, Theorem ] is 2 max A PÑδ,ρ A) PN δ,ρ A) b + b 2 + b 3, 28) where b = E i C < j B i b 2 = [ δ ] [ δ ] φ ii l E φ jj q, l= E i C < j B i {i} [ δ q= φ ii l l= q= ] δ φ jj q, and, for p i = E[ δ l= φ i i l ], b 3 = i C < E [ [ δ ]] E φ ii l p i φ j : j B i. l= Over the range of indices in the sum of b E[ δ l= φ ii l ] is of order OP δ ), by 23), and therefore b O p δ+ k δ+ P 2δ ) = O η 2δ p,δk/p) δ+), which follows from definition 7). More care is needed to bound b 2 due to the repetition of characteristic functions φ ij. Since i j, E[ δ l= φ δ i i l q= φ j j q ] is a multiplication of at least δ + different characteristic functions, hence by 23), δ δ E[ φ ii l φ jj q ] = O P δ+ ). l= q= Therefore, we conclude that b 2 O p δ+ k δ+ P δ+ ).

19 Spectral Correlation Hub Screening of Multivariate Time Series 9 i i Fig. 2. The complementary k-nn set A k i) illustrated for δ = and k = 5. Here we have i = i, i ). The vertices i, i and their k-nns are depicted in black and blue respectively. The complement of the union of {i, i } and its k-nns is the complementary k-nn set A k i) and is depicted in red. Next we bound the term b 3 in 28). The set A k i) = B c i {i} 29) indexes the complementary k-nn of U i see Fig. 2) so that, using the representation 23), b 3 = i C < E [ [ δ ]] E φ ii l p i U Ak i) l= = δ ) du Ak i) du i du il i C < S A k i) 2m 2 l= S 2m 2 Ar,u i ) fui U u ) Ak i u Ak i)) f Ui u i ) f Ui u i )f UAk i) f Ui u i ) u A k i)) O p δ+ P δ p,m,k,δ ) = O η δ p,δ p,m,k,δ ). Note that by definition of Ñδ,ρ we have Ñδ,ρ > if and only if N δ,ρ >. This yields:

20 2 Hamed Firouzi, Dennis Wei and Alfred O. Hero III PN δ,ρ > ) exp Λ)) PÑδ,ρ > ) PN δ,ρ > ) + exp E[ PÑδ,ρ > ) exp E[Ñδ,ρ])) + Ñ δ,ρ ]) exp Λ) ) E[ b + b 2 + b 3 + O Ñ δ,ρ ] Λ 3) Combining the above inequalities on b, b 2 and b 3 yields the first three terms in the argument of the max on the right side of 9). It remains to bound the term E[Ñδ,ρ] Λ. Application of the mean value theorem to the multiple integral 23) gives [ δ ] E φ iil P δ J l= f Ui,...,U iδ,u i ) O P δ r ). Applying relation 27) yields ) p E[Ñδ,ρ] p P δ J ) f U,...,U δ δ+) O p δ+ P δ r ) = O ηp,δr δ ). Combine this with 3) to obtain the bound 9). This completes the proof of Theorem 2. An immediate consequence of Theorem 2 is the following result, similar to Proposition 2 in [9], which provides asymptotic expressions for the mean number of δ-hubs and the probability of the event N δ,ρ > as p goes to and ρ converges to at a prescribed rate. Corollary 2 Let ρ p [, ] be a sequence converging to one as p such that η p,δ = p /δ p ) ρ 2 p) m 2) e m,δ, ). Then lim E[N δ,ρ p ] = Λ = e δ m,δ/δ! p lim p Jf U,...,U δ+) ). 3) Assume that k = op /δ ) and that for the weak dependency coefficient p,m,k,δ, defined via 5), we have lim p p,m,k,δ =. Then PN δ,ρp > ) exp Λ /ϕδ)). 32) Corollary 2 shows that in the limit p, the number of detected hubs depends on the true population correlations only through the quantity Jf U,...,U δ+) ). In some cases Jf U,...,U δ+) ) can be evaluated explicitly. Similar to the argument in [9], it can be shown that if the population covariance matrix Σ is sparse in the sense that its non-zero off-diagonal entries can be arranged into a k k submatrix by reordering rows and columns, then Jf U,...,U δ+) ) = + Ok/p).

21 Spectral Correlation Hub Screening of Multivariate Time Series 2 Hence, if k = op) as p, the quantity Jf U,...,U δ+) ) converges to. If Σ is diagonal, then Jf U,...,U δ+) ) = exactly. In such cases, the quantity Λ in Corollary 2 does not depend on the unknown underlying distribution of the U-scores. As a result, the expected number of δ-hubs in G ρ Ψ) and the probability of discovery of at least one δ-hub do not depend on the underlying distribution. We will see in Sec. 4 that this result is useful in assigning statistical significance levels to vertices of the graph G ρ Ψ). 3.6 Phase Transitions and Critical Threshold It can be seen from Theorem 2 and Corollary 2 that the number of δ-hub discoveries exhibits a phase transition in the high-dimensional regime where the number of variables p can be very large relative to the number of samples m. Specifically, assume that the population covariance matrix Σ is blocksparse as in Section 3.5. Then as the correlation threshold ρ is reduced, the number of δ-hub discoveries abruptly increases to the maximum, p. Conversely as ρ increases, the number of discoveries quickly approaches zero. Similarly, the family-wise error rate i.e. the probability of discovering at least one δ- hub in a graph with no true hubs) exhibits a phase transition as a function of ρ. Figure 3 shows the family-wise error rate obtained via expression 32) for δ = and p =, as a function of ρ and the number of samples m. It is seen that for a fixed value of m there is a sharp transition in the family-wise error rate as a function of ρ. Number of Observations m Applied Threshold ρ Fig. 3. Family-wise error rate as a function of correlation threshold ρ and number of samples m for p =, δ =. The phase transition phenomenon is clearly observable in the plot.

22 22 Hamed Firouzi, Dennis Wei and Alfred O. Hero III The phase transition phenomenon motivates the definition of a critical threshold ρ c,δ as the threshold ρ satisfying the following slope condition: E[N δ,ρ ]/ ρ = p. Using 6) the solution of the above equation can be approximated via the expression below: ρ c,δ = c m,δ p )) 2δ/δ2m 3) 2), 33) where c m,δ = b m δjf U,...,U δ+) ). The screening threshold ρ should be chosen greater than ρ c,δ to prevent excessively large numbers of false positives. Note that the critical threshold ρ c,δ also does not depend on the underlying distribution of the U-scores when the covariance matrix Σ is block-sparse. Expression 33) is similar to the expression obtained in [9] for the critical threshold in real-valued correlation screening. However, in the complex-valued case the coefficient c m,δ and the exponent of the term c m,δ p ) are different from the real case. This generally results in smaller values of ρ c,δ for fixed m and δ. Figure 4 shows the value of ρ c,δ obtained via 33) as a function of m for different values of δ and p. The critical threshold decreases as either the sample size m increases, the number of variables p decreases, or the vertex degree δ increases. Note that even for ten billion ) dimensions upper triplet of curves in the figure) only a relatively small number of samples are necessary for complex-valued correlation screening to be useful. For example, with m = 2 one can reliably discover connected vertices δ = in the figure) having correlation greater than ρ c,δ =.5. 4 Application to Spectral Screening of Multivariate Gaussian Time Series In this section, the complex-valued correlation hub screening method of Section 3 is applied to stationary multivariate Gaussian time series. Assume that the time series X ),, X p) defined in Section 2 satisfy the conditions of Corollary. Assume also that a total of N = n m time samples of X ),, X p) are available. We divide the N samples into m parts of n consecutive samples and we take the n-point DFT of each part. Therefore, for each time series, at each frequency f i = i )/n, i n, m samples are available. This allows us to construct a partial) correlation graph corresponding to each frequency. We denote the partial) correlation graph corresponding to frequency f i and correlation threshold ρ i as G fi,ρ i. G fi,ρ i has p vertices v, v 2,, v p corresponding to time series X ), X 2),, X p), respectively. Vertices v k and v l are connected if the magnitude of the sample partial) correlation between the DFTs of X k) and X l) at frequency f i i.e.

23 Spectral Correlation Hub Screening of Multivariate Time Series 23 Critical Threshold ρ c,δ Number of Observations m) p= p= p= 2 3 Fig. 4. The critical threshold ρ c,δ as a function of the sample size m for δ =, 2, 3 curve labels) and p =,, bottom to top triplets of curves). The figure shows that the critical threshold decreases as either m or δ increases. When the number of samples m is small the critical threshold is close to in which case reliable hub discovery is impossible. However a relatively small increment in m is sufficient to reduce the critical threshold significantly. For example for p =, only m = 2 samples are enough to bring ρ c, down to.5. the sample partial) correlation between Y k) i ) and Y l) i )) is at least ρ i. Consider a single frequency f i and the null hypothesis, H, that the correlations among the time series X ), X 2),, X p) at frequency f i are block sparse in the sense of Section 3.5. As discussed in Sec. 3.5, under H the expected number of δ-hubs and the probability of discovery of at least one δ-hub in graph G fi,ρ i are not functions of the unknown underlying distribution of the data. Therefore the results of Corollary 2 may be used to quantify the statistical significance of declaring vertices of G fi,ρ i to be δ-hubs. The statistical significance is represented by the p-value, defined in general as the probability of having a test statistic at least as extreme as the value actually observed assuming that the null hypothesis H is true. In the case of correlation hub screening, the p-value pv δ j) assigned to vertex v j for being a δ-hub is the maximal probability that v j maintains degree δ given the observed sample correlations, assuming that the block-sparse hypothesis H is true. The detailed procedure for assigning p-values is similar to the procedure in [9] for real-valued correlation screening and is illustrated in Fig. 5. Equation 33) helps in choosing the initial threshold ρ. Given Corollary, for i j the correlation graphs G fi,ρ i and G fj,ρ j and their associated inferences are approximately independent. Thus we can solve multiple inference problems by first performing correlation hub screening on

24 24 Hamed Firouzi, Dennis Wei and Alfred O. Hero III Initialization:. Choose a degree threshold δ. 2. Choose an initial threshold ρ > ρ c,δ. 3. Calculate the degree d j of each vertex of graph G ρ Ψ). 4. Select a value of δ {,, max j p d j}. For each j =,, p find ρ jδ) as the δth greatest element of the jth row of the sample partial) correlation matrix. Approximate the p-value corresponding to vertex v j as pv δ j) exp E[N δ,ρj δ)]/ϕδ)), where E[N δ,ρj δ)] is approximated by the limiting expression 3) using Jf U,...,U δ+) ) =. Screen variables by thresholding the p-values pv δ j) at desired significance level. Fig. 5. Procedure for assigning p-values to the vertices of G ρ Ψ). each graph as discussed above and then aggregating the inferences at each frequency in a straightforward manner. Examples of aggregation procedures are described below. 4. Disjunctive Hubs One task that can be easily performed is finding the p-value for a given time series to be a hub in at least one of the graphs G f,ρ,, G fn,ρ n. More specifically, for each j =,..., p denote the p-values for vertex v j being a δ-hub in G f,ρ,, G fn,ρ n by pv f,ρ,δj),, pv fn,ρ n,δj) respectively. These p-values are obtained using the method of Fig. 5. Then pv δ j), the p-value for the vertex v j being a δ-hub in at least one of the frequency graphs G f,ρ,, G fn,ρ n can be approximated as: P i : d j,fi δ H ) pv ˆ δ j) = n pv fi,ρ i,δj)), i= in which d j,fi is the degree of v j in the graph G fi,ρ i. 4.2 Conjunctive Hubs Another property of interest is the existence of a hub at all frequencies for a particular time series. In this case we have: P i : d j,fi δ H ) pv ˇ δ j) = n pv fi,ρ i,δj). i=

25 Spectral Correlation Hub Screening of Multivariate Time Series General Persistent Hubs The general case is the event that at least K frequencies have hubs of degree at least δ at vertex v j. For this general case we have: P i,..., i K : d j,fi δ,... d j,fik δ H ) = n k =K i <...<i k,i k + <...<in {i,...,i n}={,...,n} pv fil,ρ il,δj) k l= n l =k + ) pv fil,ρi l,δ j). 5 Experimental Results 5. Phase Transition Phenomenon and Mean Number of Hubs We first performed numerical simulations to confirm Theorem 2 and Corollary 2 for complex-valued correlation screening. Samples were generated from p uncorrelated complex Gaussian random variables. Figure 6 shows the number of discovered -hubs for p = and several sample sizes m. The plots from left to right correspond to m = 2,, 5,, 5, 2,, 6 and 4, respectively. The phase transition phenomenon is clearly observed in the plot. Table shows the predicted value obtained from formula 33) for the critical threshold. As can be seen in Fig. 6, the empirical phase transition thresholds approximately match the predicted values of Table. Moreover, to confirm the accuracy of equation 3) in Corollary 2, we list the number of hubs for m = in Table 2. The left column shows the empirical average number of hubs of degree at least δ =, 2, 3, 4 in a network of i.i.d. complex Gaussian random variables. The numbers in this column are obtained by averaging independent experiments. The right column shows the predicted value of E[N δ,ρ ] obtained via formula 3) with Jf U,...,U δ+) ) = for the i.i.d. case. As we see the empirical and predicted values are close to each other. m ρ c,δ Table. The value of critical threshold ρ c,δ obtained from formula 33) for p = complex variables and δ =. The predicted ρ c,δ approximates the phase transition thresholds in Fig Asymptotic Independence of Spectral Components for AR) Model To illustrate the asymptotic independence property and convergence rate of Theorem, we considered the simple case of an AR) process,

26 26 Hamed Firouzi, Dennis Wei and Alfred O. Hero III ρ Fig. 6. Phase transition phenomenon: the number of -hubs in the sample correlation graph corresponding to uncorrelated complex Gaussian variables as a function of correlation threshold ρ. Here, p = and the plots from left to right correspond to m = 2,, 5,, 5, 2,, 6 and 4, respectively. degree threshold empirical E[N δ,ρ ]) predicted E[N δ,ρ ]) d i δ = d i δ = d i δ = d i δ = 4 Table 2. Empirical average number of discovered hubs vs. predicted average number of discovered hubs in an uncorrelated complex Gaussian network. Here p =, m =, ρ =.28. The empirical values are obtained by performing independent experiments. Xk) = ϕ Xk ) + εk), k, 34) in which X) =, ϕ =.9 and ε.) is a stationary Gaussian process with no temporal correlation and standard deviation. We performed Monte-Carlo simulations to compute the correlation between spectral components at different frequencies for window sizes n =, 2,..., 25. More specifically, we set k = and l = 2 and empirically estimated cor Y k), Y l)) using 5 Monte-Carlo trials for each value of window size n. Figure 7 shows the result of this experiment. It is observable that the magnitude of cor Y k), Y l)) is bounded above by the function /n. This observation is consistent with Theorem.

27 Spectral Correlation Hub Screening of Multivariate Time Series coryk),yl)) /n n Fig. 7. Correlation coefficient cor Y ), Y 2)) as a function of window size n, empirically estimated using 5 Monte-Carlo trials. Here Y.) is the DFT of the AR) process 34). The magnitude of the correlation for n =, 2,..., 25 is bounded above by the function /n. This observation is consistent with the convergence rate in Theorem. 5.3 Spectral Correlation Screening of a Band-Pass Multivariate Time Series Next we analyzed the performance of the proposed complex-valued correlation screening framework on a synthetic data set for which the expected results are known. We synthesized a multivariate stationary Gaussian time series using the the following procedure. Here we set p =, N = 2 and m = n =. The discrepancy between N and the product mn is explained below. Let Xk), k N be a sequence of i.i.d. zero-mean Gaussian random variables i.e. white Gaussian noise) with standard deviation of. The p time series X ) k),..., X p) k), k N are obtained from Xk) by bandpass filtering and adding independent white Gaussian noise. Specifically, X i) k) = h i k) Xk) + N i k), i p, k N, in which represents the convolution operator, h i.) is the impulse response of the ith band-pass filter and N i.) is an independent white Gaussian noise series whose standard deviation is.. Since stable filtering of a stationary series results in another stationary series, the obtained series X ) k),..., X p) k) are stationary and Gaussian. For i = l, l 5, h i k) is the impulse response of a band-pass filter with pass band f [4l )/4, 4l/4]. We

28 28 Hamed Firouzi, Dennis Wei and Alfred O. Hero III approximate the ideal band-pass filters with finite impulse response FIR) Chebyshev filters [6]. Also for i = 5 + l, l 5 we set h i k) = h i 5 k). For all of the other values of i i.e. i l) we set h i k) =, k N. Figure 8 shows the signal part of the time series i.e. h i k) Xk)) for i =, 2, 3, 4. It is seen that the first 2 samples of the signals reflect the transient response of the filters. These 2 samples are not included for the purpose of correlation screening. Hence the actual number of time samples considered is mn =. Figure 9 shows the magnitude of the DFTs of the signals, Y i) k), for i = 5,,..., 5. The band-pass structure of the signals is clearly observable in the figure..5 Signal Freq.) Signal 2 Freq.2) Signal 3 Freq.3) Signal 4 Freq.4) Fig. 8. Signal part of the band-pass time series X i) k) i.e. h ik) Xk)) for i =, 2, 3, 4. We first constructed a correlation matrix for the time series X ) k),..., X p) k) from their simultaneous time samples. Figure illustrates the structure of the thresholded sample correlation matrix and the corresponding correlation graph. Note that this is a real-valued correlation screening problem in the time domain. The correlation threshold used here is ρ =.2 which is well above the critical threshold ρ c, =.28 obtained via formula ) in [9] for p = and N =. To examine the spectral structure of the correlations in Fig., we then performed complex-valued correlation screening on the spectra of the time series X ) k),..., X p) k). Figure shows the constructed correlation graphs G f,ρ for f = [.,.2,.3,.4] and correlation threshold ρ =.9, which corre-

29 Spectral Correlation Hub Screening of Multivariate Time Series Magnitude db) Frequency Fig. 9. DFT magnitude of the band-pass signals h ik) Xk) i.e. 2 log Y i).) )) as a function of frequency for i = 5,,..., Fig.. Left) The structure of the thresholded sample correlation matrix in the time domain. Right) The correlation graph corresponding to the thresholded sample correlation matrix in the time domain. sponds to a δ = false positive rate PN δ,ρ > ) 65 using δ = in the asymptotic relation 32) with Λ = e δ m,δ /δ! as specified by 3)). Note that the value of the correlation threshold is set to be higher than the critical threshold ρ c =.24. It can be observed that performing complex-valued spectral correlation screening at each frequency correctly discovers the correlations between the time series which are active around that frequency. As an example, for f =.2 the discovered hubs for δ = ) are the time series X i) k) for i {2, 7}. These time series are the ones that are active at

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 MA 575 Linear Models: Cedric E Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 1 Revision: Probability Theory 11 Random Variables A real-valued random variable is

More information

Multivariate Distributions

Multivariate Distributions IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate

More information

If we want to analyze experimental or simulated data we might encounter the following tasks:

If we want to analyze experimental or simulated data we might encounter the following tasks: Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction

More information

HUB DISCOVERY IN PARTIAL CORRELATION GRAPHS

HUB DISCOVERY IN PARTIAL CORRELATION GRAPHS HUB DISCOVERY IN PARTIAL CORRELATION GRAPHS ALFRED HERO AND BALA RAJARATNAM Abstract. One of the most important problems in large scale inference problems is the identification of variables that are highly

More information

A Probability Review

A Probability Review A Probability Review Outline: A probability review Shorthand notation: RV stands for random variable EE 527, Detection and Estimation Theory, # 0b 1 A Probability Review Reading: Go over handouts 2 5 in

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

ELEMENTS OF PROBABILITY THEORY

ELEMENTS OF PROBABILITY THEORY ELEMENTS OF PROBABILITY THEORY Elements of Probability Theory A collection of subsets of a set Ω is called a σ algebra if it contains Ω and is closed under the operations of taking complements and countable

More information

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan Monte-Carlo MMD-MA, Université Paris-Dauphine Xiaolu Tan tan@ceremade.dauphine.fr Septembre 2015 Contents 1 Introduction 1 1.1 The principle.................................. 1 1.2 The error analysis

More information

8.1 Concentration inequality for Gaussian random matrix (cont d)

8.1 Concentration inequality for Gaussian random matrix (cont d) MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Asymptotic Statistics-VI. Changliang Zou

Asymptotic Statistics-VI. Changliang Zou Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous

More information

A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices

A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices Natalia Bailey 1 M. Hashem Pesaran 2 L. Vanessa Smith 3 1 Department of Econometrics & Business Statistics, Monash

More information

Review (Probability & Linear Algebra)

Review (Probability & Linear Algebra) Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Multivariate Time Series: VAR(p) Processes and Models

Multivariate Time Series: VAR(p) Processes and Models Multivariate Time Series: VAR(p) Processes and Models A VAR(p) model, for p > 0 is X t = φ 0 + Φ 1 X t 1 + + Φ p X t p + A t, where X t, φ 0, and X t i are k-vectors, Φ 1,..., Φ p are k k matrices, with

More information

UC Berkeley Department of Electrical Engineering and Computer Sciences. EECS 126: Probability and Random Processes

UC Berkeley Department of Electrical Engineering and Computer Sciences. EECS 126: Probability and Random Processes UC Berkeley Department of Electrical Engineering and Computer Sciences EECS 6: Probability and Random Processes Problem Set 3 Spring 9 Self-Graded Scores Due: February 8, 9 Submit your self-graded scores

More information

Probability and Distributions

Probability and Distributions Probability and Distributions What is a statistical model? A statistical model is a set of assumptions by which the hypothetical population distribution of data is inferred. It is typically postulated

More information

Lawrence D. Brown* and Daniel McCarthy*

Lawrence D. Brown* and Daniel McCarthy* Comments on the paper, An adaptive resampling test for detecting the presence of significant predictors by I. W. McKeague and M. Qian Lawrence D. Brown* and Daniel McCarthy* ABSTRACT: This commentary deals

More information

Gaussian vectors and central limit theorem

Gaussian vectors and central limit theorem Gaussian vectors and central limit theorem Samy Tindel Purdue University Probability Theory 2 - MA 539 Samy T. Gaussian vectors & CLT Probability Theory 1 / 86 Outline 1 Real Gaussian random variables

More information

Statistical signal processing

Statistical signal processing Statistical signal processing Short overview of the fundamentals Outline Random variables Random processes Stationarity Ergodicity Spectral analysis Random variable and processes Intuition: A random variable

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

2. Matrix Algebra and Random Vectors

2. Matrix Algebra and Random Vectors 2. Matrix Algebra and Random Vectors 2.1 Introduction Multivariate data can be conveniently display as array of numbers. In general, a rectangular array of numbers with, for instance, n rows and p columns

More information

Lecture 2: Review of Basic Probability Theory

Lecture 2: Review of Basic Probability Theory ECE 830 Fall 2010 Statistical Signal Processing instructor: R. Nowak, scribe: R. Nowak Lecture 2: Review of Basic Probability Theory Probabilistic models will be used throughout the course to represent

More information

where r n = dn+1 x(t)

where r n = dn+1 x(t) Random Variables Overview Probability Random variables Transforms of pdfs Moments and cumulants Useful distributions Random vectors Linear transformations of random vectors The multivariate normal distribution

More information

Upper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1

Upper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1 Upper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1 Feng Wei 2 University of Michigan July 29, 2016 1 This presentation is based a project under the supervision of M. Rudelson.

More information

Lecture Note 1: Probability Theory and Statistics

Lecture Note 1: Probability Theory and Statistics Univ. of Michigan - NAME 568/EECS 568/ROB 530 Winter 2018 Lecture Note 1: Probability Theory and Statistics Lecturer: Maani Ghaffari Jadidi Date: April 6, 2018 For this and all future notes, if you would

More information

Probability Theory and Statistics. Peter Jochumzen

Probability Theory and Statistics. Peter Jochumzen Probability Theory and Statistics Peter Jochumzen April 18, 2016 Contents 1 Probability Theory And Statistics 3 1.1 Experiment, Outcome and Event................................ 3 1.2 Probability............................................

More information

UNIFORMLY MOST POWERFUL CYCLIC PERMUTATION INVARIANT DETECTION FOR DISCRETE-TIME SIGNALS

UNIFORMLY MOST POWERFUL CYCLIC PERMUTATION INVARIANT DETECTION FOR DISCRETE-TIME SIGNALS UNIFORMLY MOST POWERFUL CYCLIC PERMUTATION INVARIANT DETECTION FOR DISCRETE-TIME SIGNALS F. C. Nicolls and G. de Jager Department of Electrical Engineering, University of Cape Town Rondebosch 77, South

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional

More information

Composite Hypotheses and Generalized Likelihood Ratio Tests

Composite Hypotheses and Generalized Likelihood Ratio Tests Composite Hypotheses and Generalized Likelihood Ratio Tests Rebecca Willett, 06 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve

More information

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar

More information

Long-Run Covariability

Long-Run Covariability Long-Run Covariability Ulrich K. Müller and Mark W. Watson Princeton University October 2016 Motivation Study the long-run covariability/relationship between economic variables great ratios, long-run Phillips

More information

Large Sample Properties of Estimators in the Classical Linear Regression Model

Large Sample Properties of Estimators in the Classical Linear Regression Model Large Sample Properties of Estimators in the Classical Linear Regression Model 7 October 004 A. Statement of the classical linear regression model The classical linear regression model can be written in

More information

1: PROBABILITY REVIEW

1: PROBABILITY REVIEW 1: PROBABILITY REVIEW Marek Rutkowski School of Mathematics and Statistics University of Sydney Semester 2, 2016 M. Rutkowski (USydney) Slides 1: Probability Review 1 / 56 Outline We will review the following

More information

Laplace s Equation. Chapter Mean Value Formulas

Laplace s Equation. Chapter Mean Value Formulas Chapter 1 Laplace s Equation Let be an open set in R n. A function u C 2 () is called harmonic in if it satisfies Laplace s equation n (1.1) u := D ii u = 0 in. i=1 A function u C 2 () is called subharmonic

More information

Lecture 3. Inference about multivariate normal distribution

Lecture 3. Inference about multivariate normal distribution Lecture 3. Inference about multivariate normal distribution 3.1 Point and Interval Estimation Let X 1,..., X n be i.i.d. N p (µ, Σ). We are interested in evaluation of the maximum likelihood estimates

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

Regression. Oscar García

Regression. Oscar García Regression Oscar García Regression methods are fundamental in Forest Mensuration For a more concise and general presentation, we shall first review some matrix concepts 1 Matrices An order n m matrix is

More information

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) 1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For

More information

Multivariate Regression

Multivariate Regression Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

SHOTA KATAYAMA AND YUTAKA KANO. Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka , Japan

SHOTA KATAYAMA AND YUTAKA KANO. Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka , Japan A New Test on High-Dimensional Mean Vector Without Any Assumption on Population Covariance Matrix SHOTA KATAYAMA AND YUTAKA KANO Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama,

More information

covariance function, 174 probability structure of; Yule-Walker equations, 174 Moving average process, fluctuations, 5-6, 175 probability structure of

covariance function, 174 probability structure of; Yule-Walker equations, 174 Moving average process, fluctuations, 5-6, 175 probability structure of Index* The Statistical Analysis of Time Series by T. W. Anderson Copyright 1971 John Wiley & Sons, Inc. Aliasing, 387-388 Autoregressive {continued) Amplitude, 4, 94 case of first-order, 174 Associated

More information

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y. CS450 Final Review Problems Fall 08 Solutions or worked answers provided Problems -6 are based on the midterm review Identical problems are marked recap] Please consult previous recitations and textbook

More information

Prof. Dr.-Ing. Armin Dekorsy Department of Communications Engineering. Stochastic Processes and Linear Algebra Recap Slides

Prof. Dr.-Ing. Armin Dekorsy Department of Communications Engineering. Stochastic Processes and Linear Algebra Recap Slides Prof. Dr.-Ing. Armin Dekorsy Department of Communications Engineering Stochastic Processes and Linear Algebra Recap Slides Stochastic processes and variables XX tt 0 = XX xx nn (tt) xx 2 (tt) XX tt XX

More information

Overview of Spatial Statistics with Applications to fmri

Overview of Spatial Statistics with Applications to fmri with Applications to fmri School of Mathematics & Statistics Newcastle University April 8 th, 2016 Outline Why spatial statistics? Basic results Nonstationary models Inference for large data sets An example

More information

Notes on Time Series Modeling

Notes on Time Series Modeling Notes on Time Series Modeling Garey Ramey University of California, San Diego January 17 1 Stationary processes De nition A stochastic process is any set of random variables y t indexed by t T : fy t g

More information

Primer on statistics:

Primer on statistics: Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood

More information

Random Variables and Their Distributions

Random Variables and Their Distributions Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

Stein s Method and Characteristic Functions

Stein s Method and Characteristic Functions Stein s Method and Characteristic Functions Alexander Tikhomirov Komi Science Center of Ural Division of RAS, Syktyvkar, Russia; Singapore, NUS, 18-29 May 2015 Workshop New Directions in Stein s method

More information

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna

More information

Probability Space. J. McNames Portland State University ECE 538/638 Stochastic Signals Ver

Probability Space. J. McNames Portland State University ECE 538/638 Stochastic Signals Ver Stochastic Signals Overview Definitions Second order statistics Stationarity and ergodicity Random signal variability Power spectral density Linear systems with stationary inputs Random signal memory Correlation

More information

1. Stochastic Processes and Stationarity

1. Stochastic Processes and Stationarity Massachusetts Institute of Technology Department of Economics Time Series 14.384 Guido Kuersteiner Lecture Note 1 - Introduction This course provides the basic tools needed to analyze data that is observed

More information

Random Matrix Eigenvalue Problems in Probabilistic Structural Mechanics

Random Matrix Eigenvalue Problems in Probabilistic Structural Mechanics Random Matrix Eigenvalue Problems in Probabilistic Structural Mechanics S Adhikari Department of Aerospace Engineering, University of Bristol, Bristol, U.K. URL: http://www.aer.bris.ac.uk/contact/academic/adhikari/home.html

More information

CONTAGION VERSUS FLIGHT TO QUALITY IN FINANCIAL MARKETS

CONTAGION VERSUS FLIGHT TO QUALITY IN FINANCIAL MARKETS EVA IV, CONTAGION VERSUS FLIGHT TO QUALITY IN FINANCIAL MARKETS Jose Olmo Department of Economics City University, London (joint work with Jesús Gonzalo, Universidad Carlos III de Madrid) 4th Conference

More information

ENGR352 Problem Set 02

ENGR352 Problem Set 02 engr352/engr352p02 September 13, 2018) ENGR352 Problem Set 02 Transfer function of an estimator 1. Using Eq. (1.1.4-27) from the text, find the correct value of r ss (the result given in the text is incorrect).

More information

Eigenvalue spectra of time-lagged covariance matrices: Possibilities for arbitrage?

Eigenvalue spectra of time-lagged covariance matrices: Possibilities for arbitrage? Eigenvalue spectra of time-lagged covariance matrices: Possibilities for arbitrage? Stefan Thurner www.complex-systems.meduniwien.ac.at www.santafe.edu London July 28 Foundations of theory of financial

More information

SEBASTIAN RASCHKA. Introduction to Artificial Neural Networks and Deep Learning. with Applications in Python

SEBASTIAN RASCHKA. Introduction to Artificial Neural Networks and Deep Learning. with Applications in Python SEBASTIAN RASCHKA Introduction to Artificial Neural Networks and Deep Learning with Applications in Python Introduction to Artificial Neural Networks with Applications in Python Sebastian Raschka Last

More information

Kaczmarz algorithm in Hilbert space

Kaczmarz algorithm in Hilbert space STUDIA MATHEMATICA 169 (2) (2005) Kaczmarz algorithm in Hilbert space by Rainis Haller (Tartu) and Ryszard Szwarc (Wrocław) Abstract The aim of the Kaczmarz algorithm is to reconstruct an element in a

More information

4 Sums of Independent Random Variables

4 Sums of Independent Random Variables 4 Sums of Independent Random Variables Standing Assumptions: Assume throughout this section that (,F,P) is a fixed probability space and that X 1, X 2, X 3,... are independent real-valued random variables

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

Understanding Regressions with Observations Collected at High Frequency over Long Span

Understanding Regressions with Observations Collected at High Frequency over Long Span Understanding Regressions with Observations Collected at High Frequency over Long Span Yoosoon Chang Department of Economics, Indiana University Joon Y. Park Department of Economics, Indiana University

More information

Evaluation. Andrea Passerini Machine Learning. Evaluation

Evaluation. Andrea Passerini Machine Learning. Evaluation Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain

More information

conditional cdf, conditional pdf, total probability theorem?

conditional cdf, conditional pdf, total probability theorem? 6 Multiple Random Variables 6.0 INTRODUCTION scalar vs. random variable cdf, pdf transformation of a random variable conditional cdf, conditional pdf, total probability theorem expectation of a random

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j. Chapter 9 Pearson s chi-square test 9. Null hypothesis asymptotics Let X, X 2, be independent from a multinomial(, p) distribution, where p is a k-vector with nonnegative entries that sum to one. That

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

Stable Process. 2. Multivariate Stable Distributions. July, 2006

Stable Process. 2. Multivariate Stable Distributions. July, 2006 Stable Process 2. Multivariate Stable Distributions July, 2006 1. Stable random vectors. 2. Characteristic functions. 3. Strictly stable and symmetric stable random vectors. 4. Sub-Gaussian random vectors.

More information

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables To be provided to students with STAT2201 or CIVIL-2530 (Probability and Statistics) Exam Main exam date: Tuesday, 20 June 1

More information

On prediction and density estimation Peter McCullagh University of Chicago December 2004

On prediction and density estimation Peter McCullagh University of Chicago December 2004 On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating

More information

Discrete Applied Mathematics

Discrete Applied Mathematics Discrete Applied Mathematics 194 (015) 37 59 Contents lists available at ScienceDirect Discrete Applied Mathematics journal homepage: wwwelseviercom/locate/dam Loopy, Hankel, and combinatorially skew-hankel

More information

Economics 583: Econometric Theory I A Primer on Asymptotics

Economics 583: Econometric Theory I A Primer on Asymptotics Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013 The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency:

More information

STA205 Probability: Week 8 R. Wolpert

STA205 Probability: Week 8 R. Wolpert INFINITE COIN-TOSS AND THE LAWS OF LARGE NUMBERS The traditional interpretation of the probability of an event E is its asymptotic frequency: the limit as n of the fraction of n repeated, similar, and

More information

Modeling and Stability Analysis of a Communication Network System

Modeling and Stability Analysis of a Communication Network System Modeling and Stability Analysis of a Communication Network System Zvi Retchkiman Königsberg Instituto Politecnico Nacional e-mail: mzvi@cic.ipn.mx Abstract In this work, the modeling and stability problem

More information

1 Exercises for lecture 1

1 Exercises for lecture 1 1 Exercises for lecture 1 Exercise 1 a) Show that if F is symmetric with respect to µ, and E( X )

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis Introduction to Time Series Analysis 1 Contents: I. Basics of Time Series Analysis... 4 I.1 Stationarity... 5 I.2 Autocorrelation Function... 9 I.3 Partial Autocorrelation Function (PACF)... 14 I.4 Transformation

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University

A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University Lecture 19 Modeling Topics plan: Modeling (linear/non- linear least squares) Bayesian inference Bayesian approaches to spectral esbmabon;

More information

1. Fundamental concepts

1. Fundamental concepts . Fundamental concepts A time series is a sequence of data points, measured typically at successive times spaced at uniform intervals. Time series are used in such fields as statistics, signal processing

More information

01 Probability Theory and Statistics Review

01 Probability Theory and Statistics Review NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement

More information

Adjusted Empirical Likelihood for Long-memory Time Series Models

Adjusted Empirical Likelihood for Long-memory Time Series Models Adjusted Empirical Likelihood for Long-memory Time Series Models arxiv:1604.06170v1 [stat.me] 21 Apr 2016 Ramadha D. Piyadi Gamage, Wei Ning and Arjun K. Gupta Department of Mathematics and Statistics

More information

Random Eigenvalue Problems Revisited

Random Eigenvalue Problems Revisited Random Eigenvalue Problems Revisited S Adhikari Department of Aerospace Engineering, University of Bristol, Bristol, U.K. Email: S.Adhikari@bristol.ac.uk URL: http://www.aer.bris.ac.uk/contact/academic/adhikari/home.html

More information

3. Probability and Statistics

3. Probability and Statistics FE661 - Statistical Methods for Financial Engineering 3. Probability and Statistics Jitkomut Songsiri definitions, probability measures conditional expectations correlation and covariance some important

More information

Chapter One. The Calderón-Zygmund Theory I: Ellipticity

Chapter One. The Calderón-Zygmund Theory I: Ellipticity Chapter One The Calderón-Zygmund Theory I: Ellipticity Our story begins with a classical situation: convolution with homogeneous, Calderón- Zygmund ( kernels on R n. Let S n 1 R n denote the unit sphere

More information

Stochastic Processes: I. consider bowl of worms model for oscilloscope experiment:

Stochastic Processes: I. consider bowl of worms model for oscilloscope experiment: Stochastic Processes: I consider bowl of worms model for oscilloscope experiment: SAPAscope 2.0 / 0 1 RESET SAPA2e 22, 23 II 1 stochastic process is: Stochastic Processes: II informally: bowl + drawing

More information

Linear algebra for computational statistics

Linear algebra for computational statistics University of Seoul May 3, 2018 Vector and Matrix Notation Denote 2-dimensional data array (n p matrix) by X. Denote the element in the ith row and the jth column of X by x ij or (X) ij. Denote by X j

More information

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses Outline Marginal model Examples of marginal model GEE1 Augmented GEE GEE1.5 GEE2 Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association

More information

A Randomized Algorithm for the Approximation of Matrices

A Randomized Algorithm for the Approximation of Matrices A Randomized Algorithm for the Approximation of Matrices Per-Gunnar Martinsson, Vladimir Rokhlin, and Mark Tygert Technical Report YALEU/DCS/TR-36 June 29, 2006 Abstract Given an m n matrix A and a positive

More information

Chapter 6: Random Processes 1

Chapter 6: Random Processes 1 Chapter 6: Random Processes 1 Yunghsiang S. Han Graduate Institute of Communication Engineering, National Taipei University Taiwan E-mail: yshan@mail.ntpu.edu.tw 1 Modified from the lecture notes by Prof.

More information

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality. 88 Chapter 5 Distribution Theory In this chapter, we summarize the distributions related to the normal distribution that occur in linear models. Before turning to this general problem that assumes normal

More information

ADDITIONAL MATHEMATICS

ADDITIONAL MATHEMATICS ADDITIONAL MATHEMATICS GCE Ordinary Level (Syllabus 4018) CONTENTS Page NOTES 1 GCE ORDINARY LEVEL ADDITIONAL MATHEMATICS 4018 2 MATHEMATICAL NOTATION 7 4018 ADDITIONAL MATHEMATICS O LEVEL (2009) NOTES

More information

Evaluation requires to define performance measures to be optimized

Evaluation requires to define performance measures to be optimized Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation

More information

Estimation of large dimensional sparse covariance matrices

Estimation of large dimensional sparse covariance matrices Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information