arxiv: v2 [stat.ot] 9 Apr 2014
|
|
- Kristina Chambers
- 5 years ago
- Views:
Transcription
1 Spectral Correlation Hub Screening of Multivariate Time Series Hamed Firouzi, Dennis Wei and Alfred O. Hero III arxiv:43.337v2 [stat.ot] 9 Apr 24 Electrical Engineering and Computer Science Department, University of Michigan, USA firouzi@umich.edu, dlwei@eecs.umich.edu, hero@eecs.umich.edu Summary. This chapter discusses correlation analysis of stationary multivariate Gaussian time series in the spectral or Fourier domain. The goal is to identify the hub time series, i.e., those that are highly correlated with a specified number of other time series. We show that Fourier components of the time series at different frequencies are asymptotically statistically independent. This property permits independent correlation analysis at each frequency, alleviating the computational and statistical challenges of high-dimensional time series. To detect correlation hubs at each frequency, an existing correlation screening method is extended to the complex numbers to accommodate complex-valued Fourier components. We characterize the number of hub discoveries at specified correlation and degree thresholds in the regime of increasing dimension and fixed sample size. The theory specifies appropriate thresholds to apply to sample correlation matrices to detect hubs and also allows statistical significance to be attributed to hub discoveries. Numerical results illustrate the accuracy of the theory and the usefulness of the proposed spectral framework. Key words: Complex-valued correlation screening, Spectral correlation analysis, Gaussian stationary processes, Hub screening, Correlation graph, Correlation network, Spatio-temporal analysis of multivariate time series, High dimensional data analysis Introduction Correlation analysis of multivariate time series is important in many applications such as wireless sensor networks, computer networks, neuroimaging, and finance [, 2, 3, 4, 5]. This chapter focuses on the problem of detecting hub time series, ones that have a high degree of interaction with other time series as measured by correlation or partial correlation. Detection of hubs can lead to reduced computational and/or sampling costs. For example in wireless sensor networks, the identification of hub nodes can be useful for reducing power usage and adding or removing sensors from the network [6, 7]. Hub detection can also give new insights about underlying structure in the dataset.
2 2 Hamed Firouzi, Dennis Wei and Alfred O. Hero III In neuroimaging for instance, studies have consistently shown the existence of highly connected hubs in brain graphs connectomes) [8]. In finance, a hub might indicate a vulnerable financial instrument or a sector whose collapse could have a major effect on the market [9]. Correlation analysis becomes challenging for multivariate time series when the dimension p of the time series, i.e. the number of scalar time series, and the number of time samples N are large [4]. A naive approach is to treat the time series as a set of independent samples of a p-dimensional random vector and estimate the associated covariance or correlation matrix, but this approach completely ignores temporal correlations as it only considers dependences at the same time instant and not between different time instants. The work in [] accounts for temporal correlations by quantifying their effect on convergence rates in covariance and precision matrix estimation; however, only correlations at the same time instant are estimated. A more general approach is to consider all correlations between any two time instants of any two series within a window of n N consecutive samples, where the previous case corresponds to n =. However, in general this would entail the estimation of an np np correlation matrix from a reduced sample of size m = N/n, which can be computationally costly as well as statistically problematic. In this chapter, we propose spectral correlation analysis as a method of overcoming the issues discussed above. As before, the time series are divided into m temporal segments of n consecutive samples, but instead of estimating temporal correlations directly, the method performs analysis on the Discrete Fourier Transforms DFT) of the time series. We prove in Theorem that for stationary, jointly Gaussian time series under the mild condition of absolute summability of the auto- and cross-correlation functions, different Fourier components frequencies) become asymptotically independent of each other as the DFT length n increases. This property of stationary Gaussian processes allows us to focus on the p p correlations at each frequency separately without having to consider correlations between different frequencies. Moreover, spectral analysis isolates correlations at specific frequencies or timescales, potentially leading to greater insight. To make aggregate inferences based on all frequencies, straightforward procedures for multiple inference can be used as described in Section 4. The spectral approach reduces the detection of hub time series to the independent detection of hubs at each frequency. However, in exchange for achieving spectral resolution, the sample size is reduced by the factor n, from N to m = N/n. To confidently detect hubs in this high-dimensional, lowsample regime large p, small m), as well as to accommodate complex-valued DFTs, we develop a method that we call complex-valued partial) correlation screening. This is a generalization of the correlation and partial correlation screening method of [, 9, 2] to complex-valued random variables. For each frequency, the method computes the sample partial) correlation matrix of the DFT components of the p time series. Highly correlated variables hubs) are then identified by thresholding the sample correlation matrix at a level
3 Spectral Correlation Hub Screening of Multivariate Time Series 3 ρ and screening for rows or columns) with a specified number δ of non-zero entries. We characterize the behavior of complex-valued correlation screening in the high-dimensional regime of large p and fixed sample size m. Specifically, Theorem 2 and Corollary 2 give asymptotic expressions in the limit p for the mean number of hubs detected at thresholds ρ, δ and the probability of discovering at least one such hub. Bounds on the rates of convergence are also provided. These results show that the number of hub discoveries undergoes a phase transition as ρ decreases from, from almost no discoveries to the maximum number, p. An expression 33) for the critical threshold ρ c,δ is derived to guide the selection of ρ under different settings of p, m, and δ. Furthermore, given a null hypothesis that the population correlation matrix is sufficiently sparse, the expressions in Corollary 2 become independent of the underlying probability distribution and can thus be easily evaluated. This allows the statistical significance of a hub discovery to be quantified, specifically in the form of a p-value under the null hypothesis. We note that our results on complex-valued correlation screening apply more generally than to spectral correlation analysis and thus may be of independent interest. The remainder of the chapter is organized as follows. Section 2 presents notation and definitions for multivariate time series and establishes the asymptotic independence of spectral components. Section 3 describes complexvalued correlation screening and characterizes its properties in terms of numbers of hub discoveries and phase transitions. Section 4 discusses the application of complex-valued correlation screening to the spectra of multivariate time series. Finally, Sec. 5 illustrates the applicability of the proposed framework through simulation analysis.. Notation A triplet Ω, F, P) represents a probability space with sample space Ω, σ- algebra of events F, and probability measure P. For an event A F, PA) represents the probability of A. Scalar random variables and their realizations are denoted with upper case and lower case letters, respectively. Random vectors and their realizations are denoted with bold upper case and bold lower case letters. The expectation operator is denoted as E. For a random variable X, the cumulative probability distribution cdf) of X is defined as F X x) = PX x). For an absolutely continuous cdf F X.) the probability density function pdf) is defined as f X x) = df X x)/dx. The cdf and pdf are defined similarly for random vectors. Moreover, we follow the definitions in [3] for conditional probabilities, conditional expectations and conditional densities. For a complex number z = a + b C, Rz) = a and Iz) = b represent the real and imaginary parts of z, respectively. A complex-valued random variable is composed of two real-valued random variables as its real and imaginary parts. A complex-valued Gaussian variable has real and imaginary parts
4 4 Hamed Firouzi, Dennis Wei and Alfred O. Hero III that are Gaussian. A complex-valued Gaussian) random vector is a vector whose entries are complex-valued Gaussian) random variables. The covariance of a p-dimensional complex-valued random vector Y and a q-dimensional complex-valued random vector Z is a p q matrix defined as covy, Z) = E [ Y E[Y])Z E[Z]) H], where H denotes the Hermitian transpose. We write covy) for covy, Y) and vary ) = covy, Y ) for the variance of a scalar random variable Y. The correlation coefficient between random variables Y and Z is defined as cory, Z) = covy, Z) vary )varz). Matrices are also denoted by bold upper case letters. In most cases the distinction between matrices and random vectors will be clear from the context. For a matrix A we represent the i, j)th entry of A by a ij. Also D A represents the diagonal matrix that is obtained by zeroing out all but the diagonal entries of A. 2 Spectral Representation of Multivariate Time Series 2. Definitions Let Xk) = [X ) k), X 2) k), X p) k)], k Z, be a multivariate time series with time index k. We assume that the time series X ), X 2), X p) are second-order stationary random processes, i.e.: and E[X i) k)] = E[X i) k + )] ) cov[x i) k), X j) l)] = cov[x i) k + ), X j) l + )] 2) for any integer time shift. For i p, let X i) = [X i) k),, X i) k + n )] denote any vector of n consecutive samples of time series X i). The n-point Discrete Fourier Transform DFT) of X i) is denoted by Y i) = [Y i) ),, Y i) n )] and defined by Y i) = WX i), i p in which W is the DFT matrix: W = ω ω. n....., ω ω )2
5 Spectral Correlation Hub Screening of Multivariate Time Series 5 where ω = e 2π /n. We denote the n n population covariance matrix of X i) as C i,i) = [c i,i) kl ] k,l n and the n n population cross covariance matrix between X i) and X j) as C i,j) = [c i,j) kl ] k,l n for i j. The translation invariance properties ) and 2) imply that C i,i) and C i,j) are Toeplitz matrices. Therefore c i,i) kl and c i,j) kl depend on k and l only through the quantity k l. Representing the k, l)th entry of a Toeplitz matrix T by tk l), we write c i,i) kl = c i,i) k l) and c i,j) kl = c i,j) k l), where k l takes values from n to n. In addition, C i,i) is symmetric. 2.2 Asymptotic Independence of Spectral Components The following theorem states that for stationary time series, DFT components at different spectral indices i.e. frequencies) are asymptotically uncorrelated under the condition that the auto-covariance and cross-covariance functions are absolutely summable. This theorem follows directly from the spectral theory of large Toeplitz matrices, see, for example, [4] and [5]. However, for the benefit of the reader we give a self contained proof of the theorem. Theorem Assume lim n t= ci,j) t) = M i,j) < for all i, j p. Define err i,j) n) = M i,j) m = ci,j) m ) and avg i,j) n) = n m = erri,j) m ). Then for k l, we have: ) cor Y i) k), Y j) l) = Omax{/n, avg i,j) n)}). In other words Y i) k) and Y j) l) are asymptotically uncorrelated as n. Proof. Without loss of generality we assume that the time series have zero mean i.e. E[X i) k)] =, i p, k n ). We first establish a representation of E[Z i) k)z j) l) ] for general linear functionals: Z i) k) = m = g k m )X i) m ), in which g k.) is an arbitrary complex sequence for k n. We have: E[Z i) k)z j) l) ] [ ) ) ] = E g k m )X i) m ) g l n )X j) n ) = = m = m = m = g k m ) g k m ) n = n = n = g l n ) E[X i) m )X j) n ) ] g l n ) c i,j) m n 3)
6 6 Hamed Firouzi, Dennis Wei and Alfred O. Hero III Now for a Toeplitz matrix T, define the circulant matrix D T as: t) t ) + tn ) t n) + t) t) + t n) t) t2 n) + t2) D T = tn 2) + t 2) tn 3) + t 3) t ) + tn ) tn ) + t ) tn 2) + t 2) t) We can write: C i,j) = D C i,j) + E i,j) for some Toeplitz matrix E i,j). Thus c i,j) m n ) = d i,j) m n ) + e i,j) m n ) where d i,j) m n ) and e i,j) m n ) are the m, n ) entries of D C i,j) and E i,j), respectively. Therefore, 3) can be written as: m = g k m ) n = g l n ) d i,j) m n ) + The first term can be written as: m = g k m ) gl d i,j)) m ) = m = n = m = g k m )g l n ) e i,j) m n ) g k m )v i,j) l m ) where we have recognized v i,j) l m ) = gl di,j) as the circular convolution of gl.) and di,j).) [6]. Let G k.) and D i,j).) be the the DFT of g k.) and d i,j).), respectively. By Plancherel s theorem [7] we have: m = g k m )v i,j) l m ) = = = m = m = m = g k m ) v i,j) l m ) ) G k m ) G l m )D i,j) m ) ) G k m )G l m ) D i,j) m ). 4) Now let g k m ) = ω km / n for k, m n. For this choice of g k.) we have G k m ) = for all m n k and G k n k) =. Hence for k l the quantity 4) becomes. Therefore using the representation E i,j) = C i,j) D C i,j) we have:
7 Spectral Correlation Hub Screening of Multivariate Time Series 7 ) cov Y i) k), Y j) l) = E[Y i) k)y j) l) ] = n = 2 n m = n = m = n = m = g k m )g l n ) e i,j) m n ) e i,j) m n ) m c i,j) m ), 5) in which the last equation is due to the fact that c i,j) m ) = c i,j) m ). Now using 4) and 5) we obtain expressions for var Y i) k) ) and var Y j) l) ). Letting j = i and l = k in 4) and 5) gives: = var m = ) Y i) k) = cov ) Y i) k), Y i) k) G k m )G k m ) D i,i) m ) + = n.. D i,i) k) + n n = D i,i) k) + m = n = m = n = m = n = g k m )g k n ) e i,i) m n ) g k m )g k n ) e i,i) m n ) g k m )g k n ) e i,i) m n ), 6) in which the magnitude of the summation term is bounded as: Similarly: in which n = 2 n m = n = m = n = m = ) var Y j) l) = D j,j) l) + g k m )g k n ) e i,i) m n ) e i,i) m n ) m c i,i) m ). 7) m = n = g l m )g l n ) e j,j) m n ), 8)
8 8 Hamed Firouzi, Dennis Wei and Alfred O. Hero III 2 n m = n = m = g l m )g l n ) e j,j) m n ) m c j,j) m ). 9) To complete the proof the following lemma is needed. Lemma If {a m } m = is a sequence of non-negative numbers such that m = a m = M <. Define errn) = M m = a m and avgn) = n m = errm ). Then n m = m a m M/n + errn) + avgn). Proof. Let S = and for n define S n = m = a m. We have: Therefore: m = n ma m = n )S n S + S S ). m = m a m = n n S n m = S m. Since M M/n errn) n S M and M avgn) n M, using the triangle inequality the result follows. m = S m Now let a m = c i,j) m ). By assumption lim n m = a m = M i,j) <. Therefore, Lemma along with 5) concludes: ) cov Y i) k), Y j) l) = Omax{/n, err i,j) n), avg i,j) n)}). ) err i,j) n) is a decreasing decreasing function of n. Therefore avg i,j) n) err i,j) n), for n. Hence: ) cov Y i) k), Y j) l) = Omax{/n, avg i,j) n)}). Similarly using Lemma along with 6), 7), 8) and 9) we obtain: ) var Y i) k) D i,i) k) = Omax{/n, avg i,i) n)}), ) and ) var Y j) l) D j,j) l) = Omax{/n, avg j,j) n)}). 2) Using the definition
9 Spectral Correlation Hub Screening of Multivariate Time Series 9 ) cor Y i) k), Y j) cov Y i) k), Y j) l) ) l) = var Y i) k) ) var Y j) l) ), and the fact that as n, D i,i) k) and D j,j) l) converge to constants C i,i) k) and C j,j) l), respectively, equations ), ) and 2) conclude: ) cor Y i) k), Y j) l) = Omax{/n, avg i,j) n)}). As an example we apply Theorem to a scalar auto-regressive AR) process Xk) specified by Xk) = L ϕ l Xk l) + εk), l= in which ϕ l are real-valued coefficients and ε.) is a stationary process with no temporal correlation. The auto-covariance function of an AR process can be written as [8]: ct) = L l= α l r t l, in which r,..., r l are the roots of the polynomial βx) = x L L l= ϕ lx L l. It is known that for a stationary AR process, r l < for all l L [8]. Therefore, using the definition of err.) we have: L errn) = ct) = α l rl t = t=n t=n l= L r l n α l r l Cζn, l= L α l r l t in which C = L l= α l / r l ) and ζ = max l L r l <. Hence: avgn) = n m = Therefore, Theorem concludes: errm ) n m = l= Cζ m cor Y k), Y l)) = O/n), k l, t=n C n ζ). where Y.) represents the n-point DFT of the AR process X.). In the sequel, we assume that the time series X is multivariate Gaussian, i.e., X ),..., X p) are jointly Gaussian processes. It follows that the DFT components Y i) k) are jointly complex) Gaussian as linear functionals of X. Theorem then immediately implies asymptotic independence of DFT components through a well-known property of jointly Gaussian random variables.
10 Hamed Firouzi, Dennis Wei and Alfred O. Hero III Corollary Assume that the time series X is multivariate Gaussian. Under the absolute summability conditions in Theorem, the DFT components Y i) k) and Y j) l) are asymptotically independent for k l and n. Corollary implies that for large n, correlation analysis of the time series X can be done independently on each frequency in the spectral domain. This reduces the problem of screening for hub time series to screening for hub variables among the p DFT components at a given frequency. A procedure for the latter problem and a corresponding theory are described next. 3 Complex-Valued Correlation Hub Screening This section discusses complex-valued correlation hub screening, a generalization of real-valued correlation screening in [, 9], for identifying highly correlated components of a complex-valued random vector from its sample values. The method is applied to multivariate time series in Section 4 to discover correlation hubs among the spectral components at each frequency. Sections 3. and 3.2 describe the underlying statistical model and the screening procedure. Sections 3.3 and 3.4 provide background on the U-score representation of correlation matrices and associated definitions and properties. Section 3.5 contains the main theoretical result characterizing the number of hub discoveries in the high-dimensional regime, while Section 3.6 elaborates on the phenomenon of phase transitions in the number of discoveries. 3. Statistical Model We use the generic notation Z = [Z, Z 2,, Z p ] T in this section to refer to a complex-valued random vector. The mean of Z is denoted as µ and its p p non-singular covariance matrix is denoted as Σ. We assume that the vector Z follows a complex elliptically contoured distribution with pdf f Z z) = g z µ) H Σ z µ) ), in which g : R R > is an integrable and strictly decreasing function [9]. This assumption generalizes the Gaussian assumption made in Section 2 as the Gaussian distribution is one example of an elliptically contoured distribution. In correlation hub screening, the quantities of interest are the correlation matrix and partial correlation matrix associated with Z. These are defined as Γ = D 2 Σ ΣD 2 Σ and Ω = D 2 Σ Σ D 2 Σ, respectively. Note that Γ and Ω are normalized matrices with unit diagonals. 3.2 Screening Procedure The goal of correlation hub screening is to identify highly correlated components of the random vector Z from its sample realizations. Assume that m samples z,..., z m R p of Z are available. To simplify the development of the
11 Spectral Correlation Hub Screening of Multivariate Time Series theory, the samples are assumed to be independent and identically distributed i.i.d.) although the theory also applies to dependent samples. We compute sample correlation and partial correlation matrices from the samples z,..., z m as surrogates for the unknown population correlation matrices Γ and Ω in Section 3.. First define the p p sample covariance matrix S as S = m m i= z i z)z i z) H, where z is the sample mean, the average of z,..., z m. The sample correlation and sample partial correlation matrices are then defined as R = D 2 S SD 2 S and P = D 2 R R D 2 R, respectively, where R is the Moore-Penrose pseudo-inverse of R. Correlation hubs are screened by applying thresholds to the sample partial) correlation matrix. A variable Z i is declared a hub screening discovery at degree level δ {, 2,...} and threshold level ρ [, ] if {j : j i, ψ ij ρ} δ, where Ψ = R for correlation screening and Ψ = P for partial correlation screening. We denote by N δ,ρ {,..., p} the total number of hub screening discoveries at levels δ, ρ. Correlation hub screening can also be interpreted in terms of the partial) correlation graph G ρ Ψ), depicted in Fig. and defined as follows. The vertices of G ρ Ψ) are v,, v p which correspond to Z,, Z p, respectively. For i, j p, v i and v j are connected by an edge in G ρ Ψ) if the magnitude of the sample partial) correlation coefficient between Z i and Z j is at least ρ. A vertex of G ρ Ψ) is called a δ-hub if its degree, the number of incident edges, is at least δ. Then the number of discoveries N δ,ρ defined earlier is the number of δ-hubs in the graph G ρ Ψ). v v 2 v p v 3 v j Fig.. Complex-valued partial) correlation hub screening thresholds the sample correlation or partial correlation matrix, denoted generically by the matrix Ψ, to find variables Z i that are highly correlated with other variables. This is equivalent to finding hubs in a graph G ρψ) with p vertices v,, v p. For i, j p, v i is connected to v j in G ρψ) if ψ ij ρ. v i
12 2 Hamed Firouzi, Dennis Wei and Alfred O. Hero III 3.3 U-score Representation of Correlation Matrices Our theory for complex-valued correlation screening is based on the U-score representation of the sample correlation and partial correlation matrices. Similarly to the real case [9], it can be shown that there exists an m ) p complex-valued matrix U R with unit-norm columns u i) R Cm such that the following representation holds: R = U H RU R. 3) Similar to Lemma in [9] it is straightforward to show that: R = U H RU R U H R) 2 U R. Hence by defining U P = U R U H R ) U R D 2 U H R U we have the representation: RU H R ) 2 U R P = U H P U P, 4) where the m ) p matrix U P has unit-norm columns u i) P Cm. 3.4 Properties of U-scores The U-score factorizations in 3) and 4) show that sample partial) correlation matrices can be represented in terms of unit vectors in C m. This subsection presents definitions and properties related to U-scores that will be used in Section 3.5. We denote the unit spheres in R m and C m as S m and T m, respectively. The surface areas of S m and T m are denoted as a m and b m respectively. Define the interleaving function h : R 2m 2 C m as below: h[x, x 2,, x 2m 2 ] T ) = [x + x 2, x3 + x 4,, x2m 3 + x 2m 2 ] T. Note that h.) is a one-to-one and onto function and it maps S 2m 2 to T m. For a fixed vector u T m and a threshold ρ define the spherical cap in T m : A ρ u) = {y : y T m, y H u ρ}. Also define P as the probability that a random point Y that is uniformly distributed on T m falls into A ρ u). Below we give a simple expression for P as a function of ρ and m. Lemma 2 Let Y be an m )-dimensional complex-valued random vector that is uniformly distributed over T m. We have P = P Y A ρ u)) = ρ 2 ) m 2.
13 Spectral Correlation Hub Screening of Multivariate Time Series 3 Proof. Without loss of generality we assume u = [,,, ] T. We have: P = P Y ρ) = PRY ) 2 + IY ) 2 ρ 2 ). Since Y is uniform on T m, we can write Y = X/ X 2, in which X is complex-valued random vector whose entries are i.i.d. complex-valued Gaussian variables with mean and variance. Thus: P = P RX ) 2 + IX 2 ) ) / X 2 2 ρ 2) = P ρ 2 ) ) RX ) 2 + IX ) 2) m ρ 2 RX k ) 2 + IX k ) 2. Define V = RX ) 2 + IX ) 2 and V 2 = m k=2 RX k) 2 + IX k ) 2. V and V 2 are independent and have chi-squared distributions with 2 and 2m 2) degrees of freedom, respectively [2]. Therefore, P = = = = = ρ 2 v 2/ ρ 2 ) χ 2 2m 2) v 2) k=2 χ 2 2v )χ 2 2m 2) v 2)dv dv 2 ρ 2 v 2/ ρ 2 ) 2 m 2 Γ m 2) vm 3 Γ m 2) ρ2 ) m 2 2 e v/2 dv dv 2 2 e v2/2 e ρ 2 2 ρ 2 ) v2 dv 2 x m 3 e x dx Γ m 2) ρ2 ) m 2 Γ m 2) = ρ 2 ) m 2, in which we have made a change of variable x = v2 2 ρ 2 ). Under the assumption that the joint pdf of Z exists, the p columns of the U-score matrix have joint pdf f U,...,U p u,..., u p ) on T p m = p i= T m. The following δ + )-fold average of the joint pdf will play a significant role in Section 3.5. This δ + )-fold average is defined as: f U,...,U δ+ u,..., u δ+ ) = 2π) δ+ p ) p i < <i δ p,i δ+ / {i,,i δ } 2π 2π δ 2π f Ui,...,U iδ,u iδ+ e θ u,..., e θ δ u δ, e θ u δ+ ) dθ dθ δ dθ. Also for a joint pdf f U,...,U δ+ u,..., u δ+ ) on Tm δ+ define Jf U,...,U δ+ ) = a δ 2m 2 f U,...,U δ+ hu),..., hu))du. S 2m 2
14 4 Hamed Firouzi, Dennis Wei and Alfred O. Hero III Note that Jf U,...,U δ+ ) is proportional to the integral of f U,...,U δ+ over the manifold u =... = u δ+. The quantity Jf U,...,U δ+) ) is key in determining the asymptotic average number of hubs in a complex-valued correlation network. This will be described in more detail in Sec Let i = i, i,..., i δ ) be a set of distinct indices, i.e., i p, i <... < i δ p and i,..., i δ i. For a U-score matrix U define the dependency coefficient between the columns U i = {U i, U i,..., U iδ } and their complementary k-nn k-nearest neighbor) set A k i) defined in 29) and Fig. 2 as p,m,k,δ i) = f Ui U f Ak i) U i )/f Ui, where denotes the supremum norm. The average of these coefficients is defined as: p,m,k,δ = p ) p δ p i = i,...,i δ i i <...<i δ p p,m,k,δ i). 5) 3.5 Number of Hub Discoveries in the High-Dimensional Limit We now present the main theoretical result on complex-valued correlation screening. The following theorem gives asymptotic expressions for the mean number of δ-hubs and the probability of discovery of at least one δ-hub in the graph G ρ Ψ). It also gives bounds on the rates of convergence to these approximations as the dimension p increases and ρ. We use U = [U,, U p ] as a generic notation for the U-score representation of the sample partial) correlation matrix. The asymptotic expression for the mean E[N δ,ρ ] is denoted by Λ and is given by: ) p Λ = p P δ Jf U,...,U δ δ+) ). 6) Define η p,δ as: η p,δ = p /δ p )P = p /δ p ) ρ 2 ) m 2), 7) where the last equation is due to Lemma 2. The parameter k below represents an upper bound on the true hub degree, i.e. the number of non-zero entries in any row of the population covariance matrix Σ. Also let ϕδ) be the function that takes values ϕδ) = 2 for δ = and ϕδ) = for δ >. Theorem 2 Let U = [U,..., U p ] be a m ) p random matrix with U i T m where m > 2. Let δ be a fixed integer. Assume the joint pdf of any subset of the U i s is bounded and differentiable. Then, with Λ defined in 6),
15 Spectral Correlation Hub Screening of Multivariate Time Series 5 { E[N δ,ρ ] Λ O ηp,δ δ max η p,δ p /δ, ρ) /2}). 8) Furthermore, let Nδ,ρ be a Poisson distributed random variable with rate E[Nδ,ρ ] = Λ/ϕδ). If p )P, then PN δ,ρ > ) PNδ,ρ > ) }) ηp,δ {η δ max p,δ δ k/p)δ+, Q p,k,δ, p,m,k,δ, p /δ, ρ) /2 O, δ > }) O η p, max {η p, k/p) 2, p,m,k,, p, ρ) /2, δ =, with Q p,k,δ = η p,δ k/p /δ ) δ+ and p,m,k,δ defined in 5). Proof. The proof is similar to the proof of proposition in [9]. First we prove 8). Let φ i = Id i δ) be the indicator of the event that d i δ, in which d i represents the degree of the vertex v i in the graph G ρ Ψ). We have N δ,ρ = p i= φ i. With φ ij being the indicator of the presence of an edge in G ρ Ψ) between vertices v i and v j we have the relation: φ i = p l l=δ k C ip,l) j= φ ikj p q=l+ φ ikq ) 2) where we have defined the index vector k = k,..., k p ) and the set C i p, l) = {k : k <... < k l, k l+ <... < k p k j {,..., p} {i}, k j k j }. The inner summation in 2) simply sums over the set of distinct indices not equal to i that index all ) p l different types of products of the form: l j= φ p ik j q=l+ φ ik q ). Subtracting δ k C ip,δ) j= φ ik j from both sides of 2) φ i = δ k C ip,δ) j= p φ ikj l l=δ+ k C ip,l) j= + k C ip,l) q=δ+ p φ ikj ) q δ p q=l+ k δ+ <...<k q,{k δ+,...,k q } {k δ+,...,k p } j= φ ikq ) l φ ikj q s=δ+ 9) φ ik s 2)
16 6 Hamed Firouzi, Dennis Wei and Alfred O. Hero III in which we have used the expansion p q=δ+ φ ikq ) = + p q=δ+ ) q δ k δ+ <...<k q,{k δ+,...,k q } {k δ+,...,k p } s=δ+ The following simple asymptotic representation will be useful in the sequel. For any i,..., i k {,..., p}, i i k i, k {,..., p }, k E φ iij = h A ρv)) h A ρv)) j= S 2m 2 f Ui,...,U ik,u i hv ),, hv k ), hv)) dv dv k dv P k a k 2m 2M k 22) where P, A ρ u) and the function h.) are defined in Sec Moreover M k = fui max,...,u ik U ik+. i =i k+ The following simple generalization of 22) to arbitrary product indices φ ij will also be needed [ q ] E φ il j l P q aq 2m 2 M Q, 23) l= where Q =unique{i l, j l } q l= ) is the set of unique indices among the distinct pairs {i l, j l )} q l= and M Q is a bound on the joint pdf of U Q. Define the random variable ) p δ θ i = φ ikj. δ k C ip,δ) j= We show below that for sufficiently large p ) p E[φ i] E[θ i ] δ γ p,δp )P ) δ+, 24) where γ p,δ = max δ+ l<p {a l 2m 2M l } e δ l= l!) ) + δ!) and M l is a least upper bound on any l-dimensional joint pdf of the variables {U i } p j i conditioned on U i. To show inequality 24) take expectations of 2) and apply the bound 22) to obtain ) p E[φ i] E[θ i ] δ p ) ) p δ p p ) p δ P l l a l 2m 2M l + P δ+l a δ+l 2m 2 δ l M δ+l l=δ+ A + δ!) ), 25) l= q φ ik s.
17 where Spectral Correlation Hub Screening of Multivariate Time Series 7 A = p l=δ+ ) p p )P ) l a l l 2m 2M l. The line 25) follows from the identity ) p δ p ) l δ = p ) l+δ ) l+δ l and a change of index in the second summation on the previous line. Since p )P < A max δ+ l<p {al 2m 2M l } max δ+ l<p {al 2m 2M l } p ) p p )P ) l l l=δ+ ) δ e p )P ) δ+. l! Application of the mean value theorem to the integral representation 22) yields E[θi ] P δ Jf U i,...,u δ i,u i ) γp,δ p )P ) δ r, 26) where l= f U i,...,u δ i,u i u,..., u δ+ ) = 2π) δ ) p δ i < <i δ p i / {i,,i δ } 2π 2π f Ui,...,U iδ,u i e θ u,..., e θ δ u δ, u δ+ ) dθ dθ δ, r = 2 ρ), γ p,δ = 2a δ+ 2m 2Ṁδ+ /δ! and Ṁδ+ is a bound on the norm of the gradient ui,...,u iδ f U i,...,u δ i U i u i,..., u iδ u i ). Combining 24)-26) and the relation r = O ρ) /2 ), ) p E[φ i] P δ Jf U,...,U δ δ+) ) { O p )P ) δ max p )P, ρ) /2}). Summing over i and recalling the definitions 6) and 7) of Λ and η p,δ, E[N δ,ρ ] Λ O pp )P ) δ max {p )P, ρ) /2}) { = O ηp,δ δ max η p,δ p /δ, ρ) /2}). This establishes the bound 8).
18 8 Hamed Firouzi, Dennis Wei and Alfred O. Hero III Next we prove the bound 9) by using the Chen-Stein method [2]. Define: Ñ δ,ρ = ϕδ) p i = i <...<i δ p j= δ φ ii j, 27) Where the second sum is over the indices i <... < i δ p such that i j i, j δ. For i def = i, i,..., i δ ) define the index set B i = B i,i,...,i δ = {j, j,..., j δ ) : j l N k i l ) {i l }, l =,..., δ} C < where C < = {j,..., j δ ) : j p, j < < j δ p, j l j, l δ}. These index the distinct sets of points U i = {U i, U i,..., U iδ } and their respective k- NN s. Note that B i k δ+. Identifying Ñδ,ρ = δ i C < l= φ i i l and Nδ,ρ a Poisson distributed random variable with rate E[Ñδ,ρ], the Chen-Stein bound [2, Theorem ] is 2 max A PÑδ,ρ A) PN δ,ρ A) b + b 2 + b 3, 28) where b = E i C < j B i b 2 = [ δ ] [ δ ] φ ii l E φ jj q, l= E i C < j B i {i} [ δ q= φ ii l l= q= ] δ φ jj q, and, for p i = E[ δ l= φ i i l ], b 3 = i C < E [ [ δ ]] E φ ii l p i φ j : j B i. l= Over the range of indices in the sum of b E[ δ l= φ ii l ] is of order OP δ ), by 23), and therefore b O p δ+ k δ+ P 2δ ) = O η 2δ p,δk/p) δ+), which follows from definition 7). More care is needed to bound b 2 due to the repetition of characteristic functions φ ij. Since i j, E[ δ l= φ δ i i l q= φ j j q ] is a multiplication of at least δ + different characteristic functions, hence by 23), δ δ E[ φ ii l φ jj q ] = O P δ+ ). l= q= Therefore, we conclude that b 2 O p δ+ k δ+ P δ+ ).
19 Spectral Correlation Hub Screening of Multivariate Time Series 9 i i Fig. 2. The complementary k-nn set A k i) illustrated for δ = and k = 5. Here we have i = i, i ). The vertices i, i and their k-nns are depicted in black and blue respectively. The complement of the union of {i, i } and its k-nns is the complementary k-nn set A k i) and is depicted in red. Next we bound the term b 3 in 28). The set A k i) = B c i {i} 29) indexes the complementary k-nn of U i see Fig. 2) so that, using the representation 23), b 3 = i C < E [ [ δ ]] E φ ii l p i U Ak i) l= = δ ) du Ak i) du i du il i C < S A k i) 2m 2 l= S 2m 2 Ar,u i ) fui U u ) Ak i u Ak i)) f Ui u i ) f Ui u i )f UAk i) f Ui u i ) u A k i)) O p δ+ P δ p,m,k,δ ) = O η δ p,δ p,m,k,δ ). Note that by definition of Ñδ,ρ we have Ñδ,ρ > if and only if N δ,ρ >. This yields:
20 2 Hamed Firouzi, Dennis Wei and Alfred O. Hero III PN δ,ρ > ) exp Λ)) PÑδ,ρ > ) PN δ,ρ > ) + exp E[ PÑδ,ρ > ) exp E[Ñδ,ρ])) + Ñ δ,ρ ]) exp Λ) ) E[ b + b 2 + b 3 + O Ñ δ,ρ ] Λ 3) Combining the above inequalities on b, b 2 and b 3 yields the first three terms in the argument of the max on the right side of 9). It remains to bound the term E[Ñδ,ρ] Λ. Application of the mean value theorem to the multiple integral 23) gives [ δ ] E φ iil P δ J l= f Ui,...,U iδ,u i ) O P δ r ). Applying relation 27) yields ) p E[Ñδ,ρ] p P δ J ) f U,...,U δ δ+) O p δ+ P δ r ) = O ηp,δr δ ). Combine this with 3) to obtain the bound 9). This completes the proof of Theorem 2. An immediate consequence of Theorem 2 is the following result, similar to Proposition 2 in [9], which provides asymptotic expressions for the mean number of δ-hubs and the probability of the event N δ,ρ > as p goes to and ρ converges to at a prescribed rate. Corollary 2 Let ρ p [, ] be a sequence converging to one as p such that η p,δ = p /δ p ) ρ 2 p) m 2) e m,δ, ). Then lim E[N δ,ρ p ] = Λ = e δ m,δ/δ! p lim p Jf U,...,U δ+) ). 3) Assume that k = op /δ ) and that for the weak dependency coefficient p,m,k,δ, defined via 5), we have lim p p,m,k,δ =. Then PN δ,ρp > ) exp Λ /ϕδ)). 32) Corollary 2 shows that in the limit p, the number of detected hubs depends on the true population correlations only through the quantity Jf U,...,U δ+) ). In some cases Jf U,...,U δ+) ) can be evaluated explicitly. Similar to the argument in [9], it can be shown that if the population covariance matrix Σ is sparse in the sense that its non-zero off-diagonal entries can be arranged into a k k submatrix by reordering rows and columns, then Jf U,...,U δ+) ) = + Ok/p).
21 Spectral Correlation Hub Screening of Multivariate Time Series 2 Hence, if k = op) as p, the quantity Jf U,...,U δ+) ) converges to. If Σ is diagonal, then Jf U,...,U δ+) ) = exactly. In such cases, the quantity Λ in Corollary 2 does not depend on the unknown underlying distribution of the U-scores. As a result, the expected number of δ-hubs in G ρ Ψ) and the probability of discovery of at least one δ-hub do not depend on the underlying distribution. We will see in Sec. 4 that this result is useful in assigning statistical significance levels to vertices of the graph G ρ Ψ). 3.6 Phase Transitions and Critical Threshold It can be seen from Theorem 2 and Corollary 2 that the number of δ-hub discoveries exhibits a phase transition in the high-dimensional regime where the number of variables p can be very large relative to the number of samples m. Specifically, assume that the population covariance matrix Σ is blocksparse as in Section 3.5. Then as the correlation threshold ρ is reduced, the number of δ-hub discoveries abruptly increases to the maximum, p. Conversely as ρ increases, the number of discoveries quickly approaches zero. Similarly, the family-wise error rate i.e. the probability of discovering at least one δ- hub in a graph with no true hubs) exhibits a phase transition as a function of ρ. Figure 3 shows the family-wise error rate obtained via expression 32) for δ = and p =, as a function of ρ and the number of samples m. It is seen that for a fixed value of m there is a sharp transition in the family-wise error rate as a function of ρ. Number of Observations m Applied Threshold ρ Fig. 3. Family-wise error rate as a function of correlation threshold ρ and number of samples m for p =, δ =. The phase transition phenomenon is clearly observable in the plot.
22 22 Hamed Firouzi, Dennis Wei and Alfred O. Hero III The phase transition phenomenon motivates the definition of a critical threshold ρ c,δ as the threshold ρ satisfying the following slope condition: E[N δ,ρ ]/ ρ = p. Using 6) the solution of the above equation can be approximated via the expression below: ρ c,δ = c m,δ p )) 2δ/δ2m 3) 2), 33) where c m,δ = b m δjf U,...,U δ+) ). The screening threshold ρ should be chosen greater than ρ c,δ to prevent excessively large numbers of false positives. Note that the critical threshold ρ c,δ also does not depend on the underlying distribution of the U-scores when the covariance matrix Σ is block-sparse. Expression 33) is similar to the expression obtained in [9] for the critical threshold in real-valued correlation screening. However, in the complex-valued case the coefficient c m,δ and the exponent of the term c m,δ p ) are different from the real case. This generally results in smaller values of ρ c,δ for fixed m and δ. Figure 4 shows the value of ρ c,δ obtained via 33) as a function of m for different values of δ and p. The critical threshold decreases as either the sample size m increases, the number of variables p decreases, or the vertex degree δ increases. Note that even for ten billion ) dimensions upper triplet of curves in the figure) only a relatively small number of samples are necessary for complex-valued correlation screening to be useful. For example, with m = 2 one can reliably discover connected vertices δ = in the figure) having correlation greater than ρ c,δ =.5. 4 Application to Spectral Screening of Multivariate Gaussian Time Series In this section, the complex-valued correlation hub screening method of Section 3 is applied to stationary multivariate Gaussian time series. Assume that the time series X ),, X p) defined in Section 2 satisfy the conditions of Corollary. Assume also that a total of N = n m time samples of X ),, X p) are available. We divide the N samples into m parts of n consecutive samples and we take the n-point DFT of each part. Therefore, for each time series, at each frequency f i = i )/n, i n, m samples are available. This allows us to construct a partial) correlation graph corresponding to each frequency. We denote the partial) correlation graph corresponding to frequency f i and correlation threshold ρ i as G fi,ρ i. G fi,ρ i has p vertices v, v 2,, v p corresponding to time series X ), X 2),, X p), respectively. Vertices v k and v l are connected if the magnitude of the sample partial) correlation between the DFTs of X k) and X l) at frequency f i i.e.
23 Spectral Correlation Hub Screening of Multivariate Time Series 23 Critical Threshold ρ c,δ Number of Observations m) p= p= p= 2 3 Fig. 4. The critical threshold ρ c,δ as a function of the sample size m for δ =, 2, 3 curve labels) and p =,, bottom to top triplets of curves). The figure shows that the critical threshold decreases as either m or δ increases. When the number of samples m is small the critical threshold is close to in which case reliable hub discovery is impossible. However a relatively small increment in m is sufficient to reduce the critical threshold significantly. For example for p =, only m = 2 samples are enough to bring ρ c, down to.5. the sample partial) correlation between Y k) i ) and Y l) i )) is at least ρ i. Consider a single frequency f i and the null hypothesis, H, that the correlations among the time series X ), X 2),, X p) at frequency f i are block sparse in the sense of Section 3.5. As discussed in Sec. 3.5, under H the expected number of δ-hubs and the probability of discovery of at least one δ-hub in graph G fi,ρ i are not functions of the unknown underlying distribution of the data. Therefore the results of Corollary 2 may be used to quantify the statistical significance of declaring vertices of G fi,ρ i to be δ-hubs. The statistical significance is represented by the p-value, defined in general as the probability of having a test statistic at least as extreme as the value actually observed assuming that the null hypothesis H is true. In the case of correlation hub screening, the p-value pv δ j) assigned to vertex v j for being a δ-hub is the maximal probability that v j maintains degree δ given the observed sample correlations, assuming that the block-sparse hypothesis H is true. The detailed procedure for assigning p-values is similar to the procedure in [9] for real-valued correlation screening and is illustrated in Fig. 5. Equation 33) helps in choosing the initial threshold ρ. Given Corollary, for i j the correlation graphs G fi,ρ i and G fj,ρ j and their associated inferences are approximately independent. Thus we can solve multiple inference problems by first performing correlation hub screening on
24 24 Hamed Firouzi, Dennis Wei and Alfred O. Hero III Initialization:. Choose a degree threshold δ. 2. Choose an initial threshold ρ > ρ c,δ. 3. Calculate the degree d j of each vertex of graph G ρ Ψ). 4. Select a value of δ {,, max j p d j}. For each j =,, p find ρ jδ) as the δth greatest element of the jth row of the sample partial) correlation matrix. Approximate the p-value corresponding to vertex v j as pv δ j) exp E[N δ,ρj δ)]/ϕδ)), where E[N δ,ρj δ)] is approximated by the limiting expression 3) using Jf U,...,U δ+) ) =. Screen variables by thresholding the p-values pv δ j) at desired significance level. Fig. 5. Procedure for assigning p-values to the vertices of G ρ Ψ). each graph as discussed above and then aggregating the inferences at each frequency in a straightforward manner. Examples of aggregation procedures are described below. 4. Disjunctive Hubs One task that can be easily performed is finding the p-value for a given time series to be a hub in at least one of the graphs G f,ρ,, G fn,ρ n. More specifically, for each j =,..., p denote the p-values for vertex v j being a δ-hub in G f,ρ,, G fn,ρ n by pv f,ρ,δj),, pv fn,ρ n,δj) respectively. These p-values are obtained using the method of Fig. 5. Then pv δ j), the p-value for the vertex v j being a δ-hub in at least one of the frequency graphs G f,ρ,, G fn,ρ n can be approximated as: P i : d j,fi δ H ) pv ˆ δ j) = n pv fi,ρ i,δj)), i= in which d j,fi is the degree of v j in the graph G fi,ρ i. 4.2 Conjunctive Hubs Another property of interest is the existence of a hub at all frequencies for a particular time series. In this case we have: P i : d j,fi δ H ) pv ˇ δ j) = n pv fi,ρ i,δj). i=
25 Spectral Correlation Hub Screening of Multivariate Time Series General Persistent Hubs The general case is the event that at least K frequencies have hubs of degree at least δ at vertex v j. For this general case we have: P i,..., i K : d j,fi δ,... d j,fik δ H ) = n k =K i <...<i k,i k + <...<in {i,...,i n}={,...,n} pv fil,ρ il,δj) k l= n l =k + ) pv fil,ρi l,δ j). 5 Experimental Results 5. Phase Transition Phenomenon and Mean Number of Hubs We first performed numerical simulations to confirm Theorem 2 and Corollary 2 for complex-valued correlation screening. Samples were generated from p uncorrelated complex Gaussian random variables. Figure 6 shows the number of discovered -hubs for p = and several sample sizes m. The plots from left to right correspond to m = 2,, 5,, 5, 2,, 6 and 4, respectively. The phase transition phenomenon is clearly observed in the plot. Table shows the predicted value obtained from formula 33) for the critical threshold. As can be seen in Fig. 6, the empirical phase transition thresholds approximately match the predicted values of Table. Moreover, to confirm the accuracy of equation 3) in Corollary 2, we list the number of hubs for m = in Table 2. The left column shows the empirical average number of hubs of degree at least δ =, 2, 3, 4 in a network of i.i.d. complex Gaussian random variables. The numbers in this column are obtained by averaging independent experiments. The right column shows the predicted value of E[N δ,ρ ] obtained via formula 3) with Jf U,...,U δ+) ) = for the i.i.d. case. As we see the empirical and predicted values are close to each other. m ρ c,δ Table. The value of critical threshold ρ c,δ obtained from formula 33) for p = complex variables and δ =. The predicted ρ c,δ approximates the phase transition thresholds in Fig Asymptotic Independence of Spectral Components for AR) Model To illustrate the asymptotic independence property and convergence rate of Theorem, we considered the simple case of an AR) process,
26 26 Hamed Firouzi, Dennis Wei and Alfred O. Hero III ρ Fig. 6. Phase transition phenomenon: the number of -hubs in the sample correlation graph corresponding to uncorrelated complex Gaussian variables as a function of correlation threshold ρ. Here, p = and the plots from left to right correspond to m = 2,, 5,, 5, 2,, 6 and 4, respectively. degree threshold empirical E[N δ,ρ ]) predicted E[N δ,ρ ]) d i δ = d i δ = d i δ = d i δ = 4 Table 2. Empirical average number of discovered hubs vs. predicted average number of discovered hubs in an uncorrelated complex Gaussian network. Here p =, m =, ρ =.28. The empirical values are obtained by performing independent experiments. Xk) = ϕ Xk ) + εk), k, 34) in which X) =, ϕ =.9 and ε.) is a stationary Gaussian process with no temporal correlation and standard deviation. We performed Monte-Carlo simulations to compute the correlation between spectral components at different frequencies for window sizes n =, 2,..., 25. More specifically, we set k = and l = 2 and empirically estimated cor Y k), Y l)) using 5 Monte-Carlo trials for each value of window size n. Figure 7 shows the result of this experiment. It is observable that the magnitude of cor Y k), Y l)) is bounded above by the function /n. This observation is consistent with Theorem.
27 Spectral Correlation Hub Screening of Multivariate Time Series coryk),yl)) /n n Fig. 7. Correlation coefficient cor Y ), Y 2)) as a function of window size n, empirically estimated using 5 Monte-Carlo trials. Here Y.) is the DFT of the AR) process 34). The magnitude of the correlation for n =, 2,..., 25 is bounded above by the function /n. This observation is consistent with the convergence rate in Theorem. 5.3 Spectral Correlation Screening of a Band-Pass Multivariate Time Series Next we analyzed the performance of the proposed complex-valued correlation screening framework on a synthetic data set for which the expected results are known. We synthesized a multivariate stationary Gaussian time series using the the following procedure. Here we set p =, N = 2 and m = n =. The discrepancy between N and the product mn is explained below. Let Xk), k N be a sequence of i.i.d. zero-mean Gaussian random variables i.e. white Gaussian noise) with standard deviation of. The p time series X ) k),..., X p) k), k N are obtained from Xk) by bandpass filtering and adding independent white Gaussian noise. Specifically, X i) k) = h i k) Xk) + N i k), i p, k N, in which represents the convolution operator, h i.) is the impulse response of the ith band-pass filter and N i.) is an independent white Gaussian noise series whose standard deviation is.. Since stable filtering of a stationary series results in another stationary series, the obtained series X ) k),..., X p) k) are stationary and Gaussian. For i = l, l 5, h i k) is the impulse response of a band-pass filter with pass band f [4l )/4, 4l/4]. We
28 28 Hamed Firouzi, Dennis Wei and Alfred O. Hero III approximate the ideal band-pass filters with finite impulse response FIR) Chebyshev filters [6]. Also for i = 5 + l, l 5 we set h i k) = h i 5 k). For all of the other values of i i.e. i l) we set h i k) =, k N. Figure 8 shows the signal part of the time series i.e. h i k) Xk)) for i =, 2, 3, 4. It is seen that the first 2 samples of the signals reflect the transient response of the filters. These 2 samples are not included for the purpose of correlation screening. Hence the actual number of time samples considered is mn =. Figure 9 shows the magnitude of the DFTs of the signals, Y i) k), for i = 5,,..., 5. The band-pass structure of the signals is clearly observable in the figure..5 Signal Freq.) Signal 2 Freq.2) Signal 3 Freq.3) Signal 4 Freq.4) Fig. 8. Signal part of the band-pass time series X i) k) i.e. h ik) Xk)) for i =, 2, 3, 4. We first constructed a correlation matrix for the time series X ) k),..., X p) k) from their simultaneous time samples. Figure illustrates the structure of the thresholded sample correlation matrix and the corresponding correlation graph. Note that this is a real-valued correlation screening problem in the time domain. The correlation threshold used here is ρ =.2 which is well above the critical threshold ρ c, =.28 obtained via formula ) in [9] for p = and N =. To examine the spectral structure of the correlations in Fig., we then performed complex-valued correlation screening on the spectra of the time series X ) k),..., X p) k). Figure shows the constructed correlation graphs G f,ρ for f = [.,.2,.3,.4] and correlation threshold ρ =.9, which corre-
29 Spectral Correlation Hub Screening of Multivariate Time Series Magnitude db) Frequency Fig. 9. DFT magnitude of the band-pass signals h ik) Xk) i.e. 2 log Y i).) )) as a function of frequency for i = 5,,..., Fig.. Left) The structure of the thresholded sample correlation matrix in the time domain. Right) The correlation graph corresponding to the thresholded sample correlation matrix in the time domain. sponds to a δ = false positive rate PN δ,ρ > ) 65 using δ = in the asymptotic relation 32) with Λ = e δ m,δ /δ! as specified by 3)). Note that the value of the correlation threshold is set to be higher than the critical threshold ρ c =.24. It can be observed that performing complex-valued spectral correlation screening at each frequency correctly discovers the correlations between the time series which are active around that frequency. As an example, for f =.2 the discovered hubs for δ = ) are the time series X i) k) for i {2, 7}. These time series are the ones that are active at
MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2
MA 575 Linear Models: Cedric E Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 1 Revision: Probability Theory 11 Random Variables A real-valued random variable is
More informationMultivariate Distributions
IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate
More informationIf we want to analyze experimental or simulated data we might encounter the following tasks:
Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction
More informationHUB DISCOVERY IN PARTIAL CORRELATION GRAPHS
HUB DISCOVERY IN PARTIAL CORRELATION GRAPHS ALFRED HERO AND BALA RAJARATNAM Abstract. One of the most important problems in large scale inference problems is the identification of variables that are highly
More informationA Probability Review
A Probability Review Outline: A probability review Shorthand notation: RV stands for random variable EE 527, Detection and Estimation Theory, # 0b 1 A Probability Review Reading: Go over handouts 2 5 in
More informationAPPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.
APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product
More informationELEMENTS OF PROBABILITY THEORY
ELEMENTS OF PROBABILITY THEORY Elements of Probability Theory A collection of subsets of a set Ω is called a σ algebra if it contains Ω and is closed under the operations of taking complements and countable
More informationMonte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan
Monte-Carlo MMD-MA, Université Paris-Dauphine Xiaolu Tan tan@ceremade.dauphine.fr Septembre 2015 Contents 1 Introduction 1 1.1 The principle.................................. 1 1.2 The error analysis
More information8.1 Concentration inequality for Gaussian random matrix (cont d)
MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration
More information401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.
401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis
More informationAsymptotic Statistics-VI. Changliang Zou
Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous
More informationA Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices
A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices Natalia Bailey 1 M. Hashem Pesaran 2 L. Vanessa Smith 3 1 Department of Econometrics & Business Statistics, Monash
More informationReview (Probability & Linear Algebra)
Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More informationMultivariate Time Series: VAR(p) Processes and Models
Multivariate Time Series: VAR(p) Processes and Models A VAR(p) model, for p > 0 is X t = φ 0 + Φ 1 X t 1 + + Φ p X t p + A t, where X t, φ 0, and X t i are k-vectors, Φ 1,..., Φ p are k k matrices, with
More informationUC Berkeley Department of Electrical Engineering and Computer Sciences. EECS 126: Probability and Random Processes
UC Berkeley Department of Electrical Engineering and Computer Sciences EECS 6: Probability and Random Processes Problem Set 3 Spring 9 Self-Graded Scores Due: February 8, 9 Submit your self-graded scores
More informationProbability and Distributions
Probability and Distributions What is a statistical model? A statistical model is a set of assumptions by which the hypothetical population distribution of data is inferred. It is typically postulated
More informationLawrence D. Brown* and Daniel McCarthy*
Comments on the paper, An adaptive resampling test for detecting the presence of significant predictors by I. W. McKeague and M. Qian Lawrence D. Brown* and Daniel McCarthy* ABSTRACT: This commentary deals
More informationGaussian vectors and central limit theorem
Gaussian vectors and central limit theorem Samy Tindel Purdue University Probability Theory 2 - MA 539 Samy T. Gaussian vectors & CLT Probability Theory 1 / 86 Outline 1 Real Gaussian random variables
More informationStatistical signal processing
Statistical signal processing Short overview of the fundamentals Outline Random variables Random processes Stationarity Ergodicity Spectral analysis Random variable and processes Intuition: A random variable
More informationReview. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with
More information2. Matrix Algebra and Random Vectors
2. Matrix Algebra and Random Vectors 2.1 Introduction Multivariate data can be conveniently display as array of numbers. In general, a rectangular array of numbers with, for instance, n rows and p columns
More informationLecture 2: Review of Basic Probability Theory
ECE 830 Fall 2010 Statistical Signal Processing instructor: R. Nowak, scribe: R. Nowak Lecture 2: Review of Basic Probability Theory Probabilistic models will be used throughout the course to represent
More informationwhere r n = dn+1 x(t)
Random Variables Overview Probability Random variables Transforms of pdfs Moments and cumulants Useful distributions Random vectors Linear transformations of random vectors The multivariate normal distribution
More informationUpper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1
Upper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1 Feng Wei 2 University of Michigan July 29, 2016 1 This presentation is based a project under the supervision of M. Rudelson.
More informationLecture Note 1: Probability Theory and Statistics
Univ. of Michigan - NAME 568/EECS 568/ROB 530 Winter 2018 Lecture Note 1: Probability Theory and Statistics Lecturer: Maani Ghaffari Jadidi Date: April 6, 2018 For this and all future notes, if you would
More informationProbability Theory and Statistics. Peter Jochumzen
Probability Theory and Statistics Peter Jochumzen April 18, 2016 Contents 1 Probability Theory And Statistics 3 1.1 Experiment, Outcome and Event................................ 3 1.2 Probability............................................
More informationUNIFORMLY MOST POWERFUL CYCLIC PERMUTATION INVARIANT DETECTION FOR DISCRETE-TIME SIGNALS
UNIFORMLY MOST POWERFUL CYCLIC PERMUTATION INVARIANT DETECTION FOR DISCRETE-TIME SIGNALS F. C. Nicolls and G. de Jager Department of Electrical Engineering, University of Cape Town Rondebosch 77, South
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional
More informationComposite Hypotheses and Generalized Likelihood Ratio Tests
Composite Hypotheses and Generalized Likelihood Ratio Tests Rebecca Willett, 06 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve
More informationVector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.
Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar
More informationLong-Run Covariability
Long-Run Covariability Ulrich K. Müller and Mark W. Watson Princeton University October 2016 Motivation Study the long-run covariability/relationship between economic variables great ratios, long-run Phillips
More informationLarge Sample Properties of Estimators in the Classical Linear Regression Model
Large Sample Properties of Estimators in the Classical Linear Regression Model 7 October 004 A. Statement of the classical linear regression model The classical linear regression model can be written in
More information1: PROBABILITY REVIEW
1: PROBABILITY REVIEW Marek Rutkowski School of Mathematics and Statistics University of Sydney Semester 2, 2016 M. Rutkowski (USydney) Slides 1: Probability Review 1 / 56 Outline We will review the following
More informationLaplace s Equation. Chapter Mean Value Formulas
Chapter 1 Laplace s Equation Let be an open set in R n. A function u C 2 () is called harmonic in if it satisfies Laplace s equation n (1.1) u := D ii u = 0 in. i=1 A function u C 2 () is called subharmonic
More informationLecture 3. Inference about multivariate normal distribution
Lecture 3. Inference about multivariate normal distribution 3.1 Point and Interval Estimation Let X 1,..., X n be i.i.d. N p (µ, Σ). We are interested in evaluation of the maximum likelihood estimates
More informationMA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems
MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d
More informationRegression. Oscar García
Regression Oscar García Regression methods are fundamental in Forest Mensuration For a more concise and general presentation, we shall first review some matrix concepts 1 Matrices An order n m matrix is
More informationEC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)
1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For
More informationMultivariate Regression
Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the
More informationInference For High Dimensional M-estimates. Fixed Design Results
: Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and
More informationSHOTA KATAYAMA AND YUTAKA KANO. Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka , Japan
A New Test on High-Dimensional Mean Vector Without Any Assumption on Population Covariance Matrix SHOTA KATAYAMA AND YUTAKA KANO Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama,
More informationcovariance function, 174 probability structure of; Yule-Walker equations, 174 Moving average process, fluctuations, 5-6, 175 probability structure of
Index* The Statistical Analysis of Time Series by T. W. Anderson Copyright 1971 John Wiley & Sons, Inc. Aliasing, 387-388 Autoregressive {continued) Amplitude, 4, 94 case of first-order, 174 Associated
More information2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.
CS450 Final Review Problems Fall 08 Solutions or worked answers provided Problems -6 are based on the midterm review Identical problems are marked recap] Please consult previous recitations and textbook
More informationProf. Dr.-Ing. Armin Dekorsy Department of Communications Engineering. Stochastic Processes and Linear Algebra Recap Slides
Prof. Dr.-Ing. Armin Dekorsy Department of Communications Engineering Stochastic Processes and Linear Algebra Recap Slides Stochastic processes and variables XX tt 0 = XX xx nn (tt) xx 2 (tt) XX tt XX
More informationOverview of Spatial Statistics with Applications to fmri
with Applications to fmri School of Mathematics & Statistics Newcastle University April 8 th, 2016 Outline Why spatial statistics? Basic results Nonstationary models Inference for large data sets An example
More informationNotes on Time Series Modeling
Notes on Time Series Modeling Garey Ramey University of California, San Diego January 17 1 Stationary processes De nition A stochastic process is any set of random variables y t indexed by t T : fy t g
More informationPrimer on statistics:
Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood
More informationRandom Variables and Their Distributions
Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital
More informationProbabilistic Graphical Models
2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector
More informationECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis
ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear
More informationStein s Method and Characteristic Functions
Stein s Method and Characteristic Functions Alexander Tikhomirov Komi Science Center of Ural Division of RAS, Syktyvkar, Russia; Singapore, NUS, 18-29 May 2015 Workshop New Directions in Stein s method
More informationReview (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology
Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna
More informationProbability Space. J. McNames Portland State University ECE 538/638 Stochastic Signals Ver
Stochastic Signals Overview Definitions Second order statistics Stationarity and ergodicity Random signal variability Power spectral density Linear systems with stationary inputs Random signal memory Correlation
More information1. Stochastic Processes and Stationarity
Massachusetts Institute of Technology Department of Economics Time Series 14.384 Guido Kuersteiner Lecture Note 1 - Introduction This course provides the basic tools needed to analyze data that is observed
More informationRandom Matrix Eigenvalue Problems in Probabilistic Structural Mechanics
Random Matrix Eigenvalue Problems in Probabilistic Structural Mechanics S Adhikari Department of Aerospace Engineering, University of Bristol, Bristol, U.K. URL: http://www.aer.bris.ac.uk/contact/academic/adhikari/home.html
More informationCONTAGION VERSUS FLIGHT TO QUALITY IN FINANCIAL MARKETS
EVA IV, CONTAGION VERSUS FLIGHT TO QUALITY IN FINANCIAL MARKETS Jose Olmo Department of Economics City University, London (joint work with Jesús Gonzalo, Universidad Carlos III de Madrid) 4th Conference
More informationENGR352 Problem Set 02
engr352/engr352p02 September 13, 2018) ENGR352 Problem Set 02 Transfer function of an estimator 1. Using Eq. (1.1.4-27) from the text, find the correct value of r ss (the result given in the text is incorrect).
More informationEigenvalue spectra of time-lagged covariance matrices: Possibilities for arbitrage?
Eigenvalue spectra of time-lagged covariance matrices: Possibilities for arbitrage? Stefan Thurner www.complex-systems.meduniwien.ac.at www.santafe.edu London July 28 Foundations of theory of financial
More informationSEBASTIAN RASCHKA. Introduction to Artificial Neural Networks and Deep Learning. with Applications in Python
SEBASTIAN RASCHKA Introduction to Artificial Neural Networks and Deep Learning with Applications in Python Introduction to Artificial Neural Networks with Applications in Python Sebastian Raschka Last
More informationKaczmarz algorithm in Hilbert space
STUDIA MATHEMATICA 169 (2) (2005) Kaczmarz algorithm in Hilbert space by Rainis Haller (Tartu) and Ryszard Szwarc (Wrocław) Abstract The aim of the Kaczmarz algorithm is to reconstruct an element in a
More information4 Sums of Independent Random Variables
4 Sums of Independent Random Variables Standing Assumptions: Assume throughout this section that (,F,P) is a fixed probability space and that X 1, X 2, X 3,... are independent real-valued random variables
More informationMultivariate Statistics
Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical
More informationPeter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8
Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall
More informationUnderstanding Regressions with Observations Collected at High Frequency over Long Span
Understanding Regressions with Observations Collected at High Frequency over Long Span Yoosoon Chang Department of Economics, Indiana University Joon Y. Park Department of Economics, Indiana University
More informationEvaluation. Andrea Passerini Machine Learning. Evaluation
Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain
More informationconditional cdf, conditional pdf, total probability theorem?
6 Multiple Random Variables 6.0 INTRODUCTION scalar vs. random variable cdf, pdf transformation of a random variable conditional cdf, conditional pdf, total probability theorem expectation of a random
More informationLecture 2: Repetition of probability theory and statistics
Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:
More informationThe purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.
Chapter 9 Pearson s chi-square test 9. Null hypothesis asymptotics Let X, X 2, be independent from a multinomial(, p) distribution, where p is a k-vector with nonnegative entries that sum to one. That
More informationChapter 17: Undirected Graphical Models
Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)
More informationStable Process. 2. Multivariate Stable Distributions. July, 2006
Stable Process 2. Multivariate Stable Distributions July, 2006 1. Stable random vectors. 2. Characteristic functions. 3. Strictly stable and symmetric stable random vectors. 4. Sub-Gaussian random vectors.
More informationUQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables
UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables To be provided to students with STAT2201 or CIVIL-2530 (Probability and Statistics) Exam Main exam date: Tuesday, 20 June 1
More informationOn prediction and density estimation Peter McCullagh University of Chicago December 2004
On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating
More informationDiscrete Applied Mathematics
Discrete Applied Mathematics 194 (015) 37 59 Contents lists available at ScienceDirect Discrete Applied Mathematics journal homepage: wwwelseviercom/locate/dam Loopy, Hankel, and combinatorially skew-hankel
More informationEconomics 583: Econometric Theory I A Primer on Asymptotics
Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013 The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency:
More informationSTA205 Probability: Week 8 R. Wolpert
INFINITE COIN-TOSS AND THE LAWS OF LARGE NUMBERS The traditional interpretation of the probability of an event E is its asymptotic frequency: the limit as n of the fraction of n repeated, similar, and
More informationModeling and Stability Analysis of a Communication Network System
Modeling and Stability Analysis of a Communication Network System Zvi Retchkiman Königsberg Instituto Politecnico Nacional e-mail: mzvi@cic.ipn.mx Abstract In this work, the modeling and stability problem
More information1 Exercises for lecture 1
1 Exercises for lecture 1 Exercise 1 a) Show that if F is symmetric with respect to µ, and E( X )
More information[y i α βx i ] 2 (2) Q = i=1
Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation
More informationProf. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis
Introduction to Time Series Analysis 1 Contents: I. Basics of Time Series Analysis... 4 I.1 Stationarity... 5 I.2 Autocorrelation Function... 9 I.3 Partial Autocorrelation Function (PACF)... 14 I.4 Transformation
More informationEconometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018
Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate
More informationA6523 Modeling, Inference, and Mining Jim Cordes, Cornell University
A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University Lecture 19 Modeling Topics plan: Modeling (linear/non- linear least squares) Bayesian inference Bayesian approaches to spectral esbmabon;
More information1. Fundamental concepts
. Fundamental concepts A time series is a sequence of data points, measured typically at successive times spaced at uniform intervals. Time series are used in such fields as statistics, signal processing
More information01 Probability Theory and Statistics Review
NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement
More informationAdjusted Empirical Likelihood for Long-memory Time Series Models
Adjusted Empirical Likelihood for Long-memory Time Series Models arxiv:1604.06170v1 [stat.me] 21 Apr 2016 Ramadha D. Piyadi Gamage, Wei Ning and Arjun K. Gupta Department of Mathematics and Statistics
More informationRandom Eigenvalue Problems Revisited
Random Eigenvalue Problems Revisited S Adhikari Department of Aerospace Engineering, University of Bristol, Bristol, U.K. Email: S.Adhikari@bristol.ac.uk URL: http://www.aer.bris.ac.uk/contact/academic/adhikari/home.html
More information3. Probability and Statistics
FE661 - Statistical Methods for Financial Engineering 3. Probability and Statistics Jitkomut Songsiri definitions, probability measures conditional expectations correlation and covariance some important
More informationChapter One. The Calderón-Zygmund Theory I: Ellipticity
Chapter One The Calderón-Zygmund Theory I: Ellipticity Our story begins with a classical situation: convolution with homogeneous, Calderón- Zygmund ( kernels on R n. Let S n 1 R n denote the unit sphere
More informationStochastic Processes: I. consider bowl of worms model for oscilloscope experiment:
Stochastic Processes: I consider bowl of worms model for oscilloscope experiment: SAPAscope 2.0 / 0 1 RESET SAPA2e 22, 23 II 1 stochastic process is: Stochastic Processes: II informally: bowl + drawing
More informationLinear algebra for computational statistics
University of Seoul May 3, 2018 Vector and Matrix Notation Denote 2-dimensional data array (n p matrix) by X. Denote the element in the ith row and the jth column of X by x ij or (X) ij. Denote by X j
More informationModeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses
Outline Marginal model Examples of marginal model GEE1 Augmented GEE GEE1.5 GEE2 Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association
More informationA Randomized Algorithm for the Approximation of Matrices
A Randomized Algorithm for the Approximation of Matrices Per-Gunnar Martinsson, Vladimir Rokhlin, and Mark Tygert Technical Report YALEU/DCS/TR-36 June 29, 2006 Abstract Given an m n matrix A and a positive
More informationChapter 6: Random Processes 1
Chapter 6: Random Processes 1 Yunghsiang S. Han Graduate Institute of Communication Engineering, National Taipei University Taiwan E-mail: yshan@mail.ntpu.edu.tw 1 Modified from the lecture notes by Prof.
More information5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.
88 Chapter 5 Distribution Theory In this chapter, we summarize the distributions related to the normal distribution that occur in linear models. Before turning to this general problem that assumes normal
More informationADDITIONAL MATHEMATICS
ADDITIONAL MATHEMATICS GCE Ordinary Level (Syllabus 4018) CONTENTS Page NOTES 1 GCE ORDINARY LEVEL ADDITIONAL MATHEMATICS 4018 2 MATHEMATICAL NOTATION 7 4018 ADDITIONAL MATHEMATICS O LEVEL (2009) NOTES
More informationEvaluation requires to define performance measures to be optimized
Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation
More informationEstimation of large dimensional sparse covariance matrices
Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)
More informationCS 195-5: Machine Learning Problem Set 1
CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of
More information