Inference with Transposable Data: Modeling the Effects of Row and Column Correlations

Size: px
Start display at page:

Download "Inference with Transposable Data: Modeling the Effects of Row and Column Correlations"

Transcription

1 Inference with Transposable Data: Modeling the Effects of Row and Column Correlations Genevera I. Allen Department of Pediatrics-Neurology, Baylor College of Medicine, Jan and Dan Duncan Neurological Research Institute, Texas Children s Hospital,& Department of Statistics, Rice University, Houston, TX, Robert Tibshirani Departments of Health Research & Policy and Statistics, Stanford University, Stanford, CA, Summary. We consider the problem of large-scale inference on the row or column variables of data in the form of a matrix. Many of these data matrices are transposable meaning that neither the row variables nor the column variables can be considered independent instances. An example of this scenario is detecting significant genes in microarrays when the samples may be dependent due to latent variables or unknown batch effects. By modeling this matrix data using the matrix-variate normal distribution, we study and quantify the effects of row and column correlations on procedures for large-scale inference. We then propose a simple solution to the myriad of problems presented by unanticipated correlations: We simultaneously estimate row and column covariances and use these to sphere or de-correlate the noise in the underlying data before conducting inference. This procedure yields data with approximately independent rows and columns so that test statistics more closely follow null distributions and multiple testing procedures correctly control the desired error rates. Results on simulated models and real microarray data demonstrate major advantages of this approach: (1) increased statistical power, (2) less bias in estimating the false discovery rate, and (3) reduced variance of the false discovery rate estimators. Keywords: multiple testing, false discovery rate, transposable regularized covariance models, large-scale inference, covariance estimation, matrix-variate normal 1. Introduction As statisticians, we often make assumptions when constructing a model to ease computations or employ existing methodologies. When conducting inference on matrix data, we often assume that the variables along one dimension (say the columns) are independent, allowing us to pool these observations to make inferences on the variables along the other dimension (rows). In microarrays, for example, it is common to assume that the arrays are independent observations when computing test statistics allowing us to assess differential expression in genes. Since we are testing many row variables (genes, for example) simultaneously, we correct for multiple testing using procedures that are known only to control error measures when the row variables are independent or follow limited dependence structures. Thus, when conducting inference along the row variables of matrix data, we make the following assumptions to employ existing methodologies: (i) independent column variables and (ii) independent or limited dependencies among row variables. What if these assumptions are incorrect? What if this matrix data is in fact transposable, meaning that potentially both the rows and the columns are correlated? To whom correspondence should be addressed.

2 2 G. I. Allen & R. Tibshirani In this paper, we consider the problem of testing the significance of row variables in a data matrix where there are correlations among the rows, or among the columns, or among both. We study the behavior of standard statistical methodology, such as two-sample t- statistics and controlling for multiplicity via the false discovery rate (FDR), on transposable data. Then, we propose a method to directly account for these two-way dependencies by de-correlating the data matrix before conducting inference. We motivate the presence of these two-way dependencies through an example we will refer to often: testing for differential expression in the two-class microarray. Consider a microarray data set investigating differential gene expression between subjects of Asian descent or European descent in which there are 4,167 genes and 142 subjects with arrays processed between 2003 and 2007 (Spielman et al., 2007). In Figure 1, we display the histogram of two-sample t-statistics for this data set superimposed with the density of the theoretical null distribution, the t (140) distribution. The test statistics are strongly overdispersed compared to the theoretical null distribution. Many have proposed that this effect could be due to correlations among the genes (Dudoit et al., 2003; Efron, 2004; Qiu et al., 2005; Efron, 2010). Others have noted that correlations among the arrays, perhaps induced by their differing processing dates in this example, can produce the same effect (Qiu et al., 2005; Owen, 2005; Efron, 2009). What if this apparent over-dispersion is caused by both gene and array correlations? And, what effect do these correlations have on standard statistical methods used for large-scale inference? Answers to these questions have been well studied for correlations among the rows or test statistics (Dudoit et al., 2003; Efron, 2004; Owen, 2005; Qiu and Yakovlev, 2006; Efron, 2010; Schwartzman and Lin, 2011). Methods to control the false discovery rate such as the step-up method of Benjamini and Hochberg (1995) or the permutation based method of Storey (2002) were originally shown to have theoretical control only when test statistics are independent. Later, however, these conditions have been relaxed to show that the FDR is controlled under types of weak dependencies (Yekutieli and Benjamini, 1999; Storey et al., 2004; Sarkar, 2008). Benjamini and Yekutieli (2001) developed a step-up procedure that permits FDR control under arbitrary dependencies among tests, but at a great cost in terms of statistical power (Farcomeni, 2008). Thus, this method is not preferred in the literature. As the conditions of weak dependence are not easily checked with real data, it is unknown whether false discovery rates are controlled in data with strong correlations among tests such as with microarrays. In addition, recently many have noted when tests are correlated, one not only needs to worry about average false discovery proportion, but also about the variance of the number of false discoveries and the FDR (Owen, 2005; Qiu and Yakovlev, 2006; Desai et al., 2009; Efron, 2010; Schwartzman and Lin, 2011). While correlations among test statistics have been studied especially in the context of microarray data, correlations among the columns and its effect on large-scale inference has received much less scrutiny. First, one might ask whence correlations among columns arise. For microarrays, these can occur because of batch effects, latent variables or instrument drift, for example (Yang et al., 2002; Fare et al., 2003; Li and Rabinovic, 2007; Leek and Storey, 2008; Efron, 2009; Leek et al., 2010a). If these are known to the statistician in advance, these can be modeled directly (Li and Rabinovic, 2007; Leek et al., 2010a). Many of these causes of potential array correlations are unknown to the statistician or unavailable for data in many public gene expression repositories (Edgar et al., 2002). In addition, simply assessing the presence of column correlations in the presence of row correlations is significant challenge (Efron, 2009; Muralidharan, 2010). Not surprisingly then, relatively little work has been done on modeling and correcting for the effects of both row and column correlations in the context of large-scale inference. In this paper, we propose to study and develop methodology to correct for row and column correlations when conducting inference on matrix data. As the effect of the later

3 Inference and Dependence 3 Density T Statistics Fig. 1. Histogram of two-sample T -statistics for the Spielman et al. (2007) microarray data. The theoretical null, t (140), is superimposed, and the T -statistics are over-dispersed compared to the null distribution. is less developed in the literature, we focus more on the behavior of common test statistics and null distributions when there are unanticipated column correlations. We show that even though many have noted that over-dispersion of test statistics can result from correlated tests (Efron, 2004, 2010), unanticipated correlations among the columns can also lead to this result. The main contribution of this paper is a novel procedure to de-correlate both the rows and columns of the data matrix prior to conducting inference. Several have proposed such methods in the context of only row correlations (Tibshirani and Wasserman, 2006; Lai, 2008; Zuber and Strimmer, 2009) or for latent variable models (Leek and Storey, 2008), but none for scenarios in which both the rows and columns are correlated with arbitrary structure. This may be surprising as the idea seems like a simple and logical first step to tackling the problems arising from strongly correlated matrix data. It turns out, however, that estimating both row and column covariance matrices from a single matrix of data is a major challenge (Efron, 2009; Muralidharan, 2010). We model separable row and column covariances via the matrix-variate normal which assumes that the covariance between elements of the data matrix is given by the Kronecker product of its column and row covariances (Gupta and Nagar, 1999). This matrix-variate or Kronecker product model has been used by others in the context of microarray data (Efron, 2009; Teng and Huang, 2009), but not for the challenging problem of directly estimating of row and column covariances. In prior work, however, we have developed a method of simultaneously estimating row and column covariances via a penalized likelihood approach (Allen and Tibshirani, 20). In this paper, we introduce a novel procedure that uses these covariances estimates to de-correlate or sphere the noise in the data matrix without changing the underlying signal. This sphered data, which has approximately independent rows and columns, can then be used to conduct large-scale inference. Our approach has several advantages: (i) Tests are re-ordered leading to a better ranking, a lower true false discovery proportion, and greater statistical power. (ii) Estimates of the false discovery proportion are more consistent. (iii) The variance of the estimated false discovery rate is reduced. The paper is organized as follows. In Section 2 we introduce our matrix model based on the mean-restricted matrix-variate normal distribution. We then study the behavior of test statistics when the columns are correlated, Section 2.2, and illustrate how common problems associated with microarray data can lead to unanticipated array (column) correlations, Section 2.3. In Section 3, we develop our main sphering algorithm. Results on both simulated models and real microarray data are given in Section 4 and we conclude with a discussion of our work in Section 5.

4 4 G. I. Allen & R. Tibshirani 2. Framework: A Matrix-variate Model We present a matrix decomposition model based on the matrix-variate normal distribution. Using this model, we study the effects of unanticipated column correlations on common two-sample test statistics, showing that non-zero column correlations lead to an over or under dispersion of the null distribution Matrix-variate Model We propose to study row and column correlations through a simple matrix decomposition model based on the matrix-variate normal. We motivate the use of this distribution through the example of microarray data. With this data, the genes are often assumed to follow a multivariate normal distribution with the arrays independent and identically distributed. Since we aim to study the effects of array correlations, we need a parametric model that has the flexibility to model either array independence or various array correlation structures. To this end, we turn to the mean-restricted matrix-variate normal introduced in Allen and Tibshirani (20), a variation of the familiar matrix-variate normal (Gupta and Nagar, 1999). For data, X R m n, this distribution is denoted as X N m,n (ν, µ, Σ, ) and has separate mean and covariance parameters for the rows, ν R m and Σ R m m, and columns, µ R n and R n n. Thus, we can model array correlations directly though the covariance matrix. If the data matrix is transformed into a vector of length mn, we have that vec(x) N(vec(M), Ω), where M = ν1 T (n) + 1 (m)µ T and Ω = Σ. Also, the commonly used multivariate normal is a special case of the distribution. If = I and µ = 0, then X N(ν, Σ). In fact, all marginal models of the matrix-variate normal are multivariate normal, meaning that both the genes and arrays separately are multivariate normal. Further properties of this distribution are given in Allen and Tibshirani (20). In our matrix decomposition model, we assume that there is an additional signal beyond the row and column means. We then decompose the data into a mean, signal, and correlated noise matrix as follows: X m n = M m n + S m n + N m n. (1) Here, M = ν1 T (n) + 1 (m)µ T is the mean matrix, S is the problem specific signal matrix, and N N m,n (0, 0, Σ, ) is the noise matrix. Thus, X S N m,n (ν, µ, Σ, ), meaning that after removing the signal, the data follows a mean-restricted matrix-variate normal distribution. With two-class microarray data, for example, the signal matrix captures the class means. Let there be n 1 arrays in class one, indices denoted by C 1, and n 2 in class two, denoted by C 2. (For simplicity of notation, we assume that the first n 1 arrays are in class one and the last n 2 arrays are in class two.) Let the class [ signals be ψ 1 ] R m and ψ 2 R m. Then, the signal matrix, S, can be written as S = ψ 1 1 T (n ψ 1) 21 T (n 2). Notice that our matrix decomposition model is similar in spirit to the latent variable model of Leek and Storey (2008) also proposed in the context of large-scale inference. There are several further remarks to make regarding this model. Prior to analyzing data, it is common to standardize the rows. Some have proposed to doubly-standardize this two-way data by iteratively scaling both the rows and columns (Efron, 2009; Olshen and Rajaratnam, 2010). With our model, we center both the rows and columns through the mean matrix M, but do not directly scale them. Instead, we allow the diagonals of the covariance matrices of the rows, Σ, and columns, to capture the differences in variablities. Thus, our model keeps the mean and variances separate in the estimation process.

5 Inference and Dependence Null Distributions: The Two-Class Problem We study the effect of column correlations on the theoretical null distribution of two-sample test statistics computed for a single row of the data matrix. More specifically, we calculate the distributions of test statistics under our matrix decomposition model instead of the typical two-sample framework where samples are drawn independently from two populations. In the familiar two-class inference problem, we have a vector x = [x T 1 x T 2 ] T with x 1 of length n 1 and x 2 of length n 2 where the elements of each vector are x 1,i N(ψ 1, σ 2 ) and x 2,i N(ψ 2, σ 2 ). We wish to test whether there is a shift in means between the two classes, namely H 0 : ψ 1 = ψ 2 vs. H 1 : ψ 1 ψ 2. Here, we assume that the variances, σ 2, are equal between the two classes, but note that analogous results are obtainable if the variance is assumed to be unequal, as is often the case with microarray data. If the variance, σ 2 is known, we have the familiar two-sample Z-statistic, Z = ( x 1 x 2 )/σ c n that follows the distribution Z N((ψ 1 ψ 2 )/σ c n, 1), where x k = 1 nk n k i=1 x i and c n = 1 n n 2 (Lehmann and Romano, 2005). Going back to our matrix decomposition model, we wish to know the distribution of this Z-statistic for each row when there are column correlations: ( Claim 1. Let x T = [x T 1 x T 2 ] N 1,n 0, [ψ1 1 (n1) ψ 2 1 (n2)], σ 2, ) ( ψ1 ψ 2. Then, Z N σ { c n 1 i C n n 2 i C 2 where η = n i=1 n j=1 ij W i W j = W T W, with W i Thus, when the columns are correlated, the variance of the two-sample Z-statistic is inflated or deflated by η. In terms of the matrix decomposition model, the assumptions of Claim 1 correspond to a row vector that has previously been centered by ν and µ, has signal [ψ 1 1 (n1) ψ 2 1 (n2)], column covariance, and row variance σ 2, the diagonal element of Σ. Note that separability of the covariance matrices of our model allows us to dissect the effects of column correlations in this manner. Notice that if = I, η = c n and the variance of Z is one, as desired. If there is only column correlation within the two classes, then the effects of these correlations can be parsed as follows:, η ), c n Corollary ( 1. Let x T 1 N 1,n1 (0, ψ 1 1 (n1), σ 2, 1 ) independent of x T 2 N 1,n2 (0, ψ 2 1 (n2), σ 2, 2 ). ψ1 ψ 2 Then, Z N σ, η ) 1 + η 2, where η k = 1 nk nk c n c n 2 i=1 j=1 k,ij for k = 1, 2. n k These effects are explored numerically in a small study described below. We have assumed that the row variance, σ 2, was known, however, in most microarray experiments this is not known and must be estimated. With σ 2 unknown, the two-sample t- statistic is used: T = ( x 1 x 2 )/s x1,x 2 cn, where s 2 x 1,x 2 is the pooled estimate of the sample variance. Under the null hypothesis, T t (n 2), while under the alternative, T t(δ) (n 2), a non-central t distribution with non-centrality parameter δ = (ψ 1 ψ 2 )/(σ c n ) (Lehmann and Romano, 2005). When there are column correlations as in the assumptions of Claim 1, however, the distribution of T does not have a closed form. (The square of the pooled sample standard deviation is no longer distributed as a Chi-squared random variable and the numerator and denominator of T are not independent.) Hence, we explore the effects of column correlations on the T -statistic through a small simulation study. Data is simulated according to the assumptions of Claim 1 with n = 50 columns with n 1 = n 2 = 25 in each class. Four structured covariance matrices were used to assess the variances of Z and T -statistics: 1 : 1,ij = 0.9 i j, 2 is block diagonal with blocks of size 10, and within each block 2,ij = 0.9 i j, 3 : 3,ij = 0.5 i j, 4 is block diagonal with blocks of size 10 and within

6 6 G. I. Allen & R. Tibshirani Density N(0,1) Density T, df = Z Statistic T Statistic Fig. 2. Comparison of theoretical null distributions for the two-sample Z-statistic (left) and T -statistic (right) under four column correlation scenarios given in Section 2.2. Variances of the Z-statistics were calculated by the result in Claim 1, while the densities of the T -statistics were estimated via Monte Carlo simulation. Var(Z-statistic) Var(T -statistic) (0.029) (0.0144) ( ) ( ) Fig. 3. Variances of the two-sample Z and T -statistics under four column correlation scenarios. The theoretical variance of the Z-statistic should be 1, and for the T -statistic. each block, 4,ij = 0.5 i j. Positive correlation structures are used as most observed array covariances in microarray studies are positive. We note that with negative correlations among columns, η < c n resulting in under-dispersed null distributions. Figure 2 demonstrates the effect of column correlations on the distributions of Z and T. We see that positive column correlations can cause dramatic over-dispersion of the test statistics compared to their theoretical null distribution. This is a possible explanation to the over-dispersion seen in the real microarray examples displayed in Figure 1. Compared to the variance of the Z-statistic, the T -statistic is even more affected by column correlations. This is confirmed in Table 3 where we present the variances of the Z-statistic calculated by Claim 1 and the variances of the T -statistic estimated by Monte Carlo simulation. Indeed, small amounts of correlation in the columns can cause a dramatic increase in the variance of the T -statistic. We have shown how the distribution of T and Z-statistics behave when columns or arrays are correlated. When analyzing microarrays, however, many have advocated using non-parametric null distributions estimated by permuting the class labels (Dudoit et al., 2003; Storey and Tibshirani, 2003; Tusher et al., 2001). This approach is also problematic, however, when the columns or arrays are not independent. As the under the null hypothesis, the joint distribution of the columns is not invariant under permutations, the randomization hypothesis fails (Lehmann and Romano, 2005). Therefore, inferences drawn by using permutation null distributions instead of theoretical nulls suffer from the same troublesome effects of unanticipated column correlations. Our brief study of the behavior of two-sample test statistics in the presence of unantic-

7 Inference and Dependence 7 ipated column correlations reveal several problematic behaviors. Relatively small column correlations can have a large effect leading to over or under dispersion of the theoretical null distribution and incorrect inferences. In the supplementary materials, we use our matrixvariate model to study the performance of large-scale inference methodology (in particular, the step-up procedure (Benjamini and Hochberg, 1995), permutation procedure (Storey and Tibshirani, 2003), and the empirical null based local FDR procedure (Efron, 2007)) under several row and column correlation scenarios. These results reveal that when both the rows and the columns are correlated, these problems are exacerbated. Specifically, further overdispersion of null distribution occurs leading to (i) biased estimates of the FDR and (ii) greater variance of FDR estimates whose effect is even greater than when only the rows or only the columns are correlated. Therefore, a troubling picture emerges regarding the statistical perils of performing large-scale inference on a data matrix in which the rows and columns may be correlated Microarrays & Unanticipated Array Correlations Before continuing with our proposal to solve the problems associated with two-way dependencies and large-scale inference, we pause to understand and quantify some possible sources of unanticipated array correlations in microarray data. We consider models for three possible sources of array correlations in microarray data: a batch-effect model (Li and Rabinovic, 2007; Leek et al., 2010a), a latent variable model (Leek and Storey, 2008), and an instrument drift model (Fare et al., 2003). Clearly, if these sources of array correlations are known to the statistician, they should be modeled directly. Unfortunately, this information is often missing and we seek to understand the effect of these correlations if they cannot be directly accounted for. Hence, we quantify how array correlations are induced if one fits a standard model assuming that the arrays are independent when they are in fact distributed according to these other models. In other words, we calculate the array covariance resulting from model bias. Consider a standard model for microarray data assuming that the arrays are independent with Gaussian noise: X ij = S ij + ɛ ij, where S ij denotes the fixed effects from the signal of interest and ɛ ij is a random effect, ɛ ij N(0, 1). Then, the expectation of the crossproducts of population residuals, r (S) ij = X ij S ij = ɛ ij, is obviously E(r (S) ij, r(s) ij ) = 0 for j j. These cross-products are non-zero, however, for the other models we consider. Let us consider the following batch effects model: X ij = S ij + K k=1 β ki (k I(k)) +ɛ ij where I(k) denotes the batch membership and β k N(µ k, σ 2 k ) independent of ɛ ij. Thus, the batch effect is a random effect given by β k. Defining the population residuals, r (B) ij, in the same manner as above, we see { that the expected cross products are non-zero for arrays in the same batch: E(r (B) ij r (B) ij ) = µ 2 k + σ2 k (j, j ) I(k) Hence, if either the mean batch effect or the 0 otherwise. additional variance among arrays in the batch are large, then strong correlations among the arrays can result. Similarly, consider the following latent variable model: X ij = S ij + K k=1 Γ ikg kj +ɛ ij where G ik is the fixed latent variable with random weights Γ ik N(0, 1) independent of ɛ ij. Then, the expected cross products of the population residuals, r (L) ij given by: E(r (L) ij a random walk with drift model. Define D ij = µ + j r(l) ij ) = K k=1 G kjg kj. To measure the effect of instrument drift, we employ k=1 (D ik + ψ k ) with D i1 N(0, σ 2 ) and ψ k N(0, σ 2 ) independent of ɛ ij, and µ the fixed instrument drift. Then, consider the following instrument drift model: X ij = S ij + D ij + ɛ ij. Again, the expected cross product of the population residuals, r (D) ij are: E(r (D) ij r (D) ij ) = (j 1)(j 1)µ 2 +σ 2 (j j ). Here, j j are

8 8 G. I. Allen & R. Tibshirani denotes the minimum of j and j. Calculations for all of these covariances are in Appendix A. Based on these simple models for microarray data, we see that large correlations among the arrays can be induced with relatively small batch effects, latent variable effects or instrument drifts if these effects are not explicitly modeled. Putting this together with the results discussed in the previous section, these small non-zero correlations can lead to dramatically wider null distributions of common tests statistics. This in turn, leads to many more genes being rejected than are truly differentially expressed. This illustration then serves as motivation for methods of directly addressing two-way dependencies when conducting large-scale inference. 3. De-Correlating a Matrix Prior to Conducting Inference We propose a novel method for solving the numerous problems associated with conducting large-scale inference on matrix data in which the rows and columns may be correlated. The approach has three simple steps: (1) Estimate the signal in the data and subtract this to get an estimate of the two-way correlated noise. (2) Simultaneously estimate separable row and column covariances of the noise and use these estimates to sphere or de-correlate the noise. (3) Add the de-correlated noise and the signal to yield the sphered data on which one conducts large scale inference. Before we introduce our methodology, we briefly review the results and existing literature illustrating the challenges of estimating both row and column covariances from a single matrix of data. First, the problem of estimating these two separable covariances for multiple instances of independent and identically distributed matrix data has been established by Dutilleul (1999). The number of repeated matrix instances, however, must be large relative to the row and column dimensions. In our case, we have only one replicate of size mn from which to estimate m(m 1) 2 + n(n 1) 2 parameters. Furthermore, the empirical estimates of the row and column covariances share the same information content. Assume that the data, X has been row and column centered and decompose the data according the the singular value decomposition giving X = U D V T. Then, the empirical covariances, ˆΣ = X X T /m and ˆ = X T X /n, can be written as ˆΣ = U D 2 U T /m and ˆ = V D 2 V T /n. That is, the empirical covariances share the same eigenvalues. Efron (2009) goes on to show that the variances of the elements of the two empirical correlation matrices are same. Muralidharan (2010) likens this problem to estimating the variance of two random variables having only observed their sum. Given this, there are several important notes to discuss. First, if the underlying data is truly matrix-variate, meaning that neither the rows nor the columns are independent, then simply estimating the row covariance or the column covariance is insufficient. This occurs as non-zero column correlations influence the apparent row correlations and vice versa, a point discussed in detail in Efron (2009). If our ultimate goal is to de-correlate the data matrix before conducting inference, then only estimating the row covariance or only the column covariance would lead to erroneous conclusions when the data is truly transposable. Additionally, estimating the row and column covariances separately would lead to these same problems. Finally, the empirical estimates of at least one covariance matrix, and likely both, are necessarily singular. As the inverse covariance matrix is needed to de-correlate the data, this presents an additional challenge. We propose to estimate non-singular row and column covariances, ˆΣ and ˆ, simultaneously via the Transposable Regularized Covariance Models framework introduced in Allen and Tibshirani (20). Then, we use these estimates to de-correlate the noise of the data matrix, Ñ = ˆΣ 1/2 N ˆ 1/2, yielding a new noise matrix, Ñ which has approximately

9 Inference and Dependence 9 independent rows and columns. (We note that if N N n,p (0, 0, Σ ), for example, then Σ 1/2 N 1/2 N n,p (0, 0, I (n), I (p) ) (Gupta and Nagar, 1999)). Finally, this new noise matrix is added to the signal estimated from the model to conduct large-scale inference Review: Transposable Regularized Covariance Models The Transposable Regularized Covariance Model (TRCM) allows us to estimate non-singular row and column covariances simultaneously by maximizing a penalized log-likelihood of the matrix-variate normal distribution (Allen and Tibshirani, 20). The model places a matrix-convex penalty on the inverse covariances, or concentration matrices of the rows and columns, allowing one to estimate non-singular covariances. For the ultimate purpose of de-correlating a data matrix when conducting inference, we choose to employ an l 1 -norm penalty on the concentration matrices. This is done for both practical and theoretical reasons which we will discuss shortly. First, let us review the model, some of its properties, and the algorithms used to find the penalized MLE. Following from the matrix decomposition model, (1), if we let N be the noise matrix remaining after removing the means and the signal in the data, then the penalized log-likelihood is as follows: l(σ, ) = n 2 log Σ-1 + m 2 log tr ( Σ -1 N -1 N T ) λm Σ -1 1 λn -1 1 (2) where -1 1 = n n i=1 j=1-1 ij and λ is a penalty parameter controlling the amount of sparsity in the concentration matrices. Notice that the first three terms of (2) are the loglikelihood of the matrix-variate normal distribution (Gupta and Nagar, 1999). Hence, the model assumes that the row and column covariances are separable and the joint distribution of the noise, vec(n) N(0, Σ), or the joint covariance is given by the Kronecker product. The l 1 -norm penalties placed on the concentration matrices are an extension of the graphical lasso-type penalties to the matrix-variate framework (Friedman et al., 2007; Rothman et al., 2008). These penalties encourage zeros in the off-diagonals of the concentration matrices corresponding to a selection of edges in a graph structure and indicating that the variables are conditionally independent (Dempster, 1972). Allen and Tibshirani (20) showed that successive application of the graphical lasso algorithm which solves the sub-gradient equations of (2) converges to a maximum of the penalized log-likelihood. The resulting estimates for ˆΣ 1 and ˆ 1 are necessarily non-singular, as desired. Before discussing the rationale for our model with l 1 penalties, we pause to address the logical question, why not use l 2 -norm penalties as also presented in Allen and Tibshirani (20)? Recall that the l 2 -norm TRCM solutions for ˆΣ and ˆ have the same eigenvectors as their empirical counterparts. The solution for the eigenvalues are simply regularized versions of the empirical eigenvalues. From results in random matrix theory, we know that the empirical eigenvectors and eigenvalues of covariances can be inconsistent with highdimensional data (Johnstone, 2001; Johnstone and Lu, 2009). Furthermore, suppose that we were to de-correlate the noise by left and right multiplying by the matrix square root of ˆΣ 1 and ˆ 1. Let N = U D V T be the SVD of the noise, and Λ ˆΣ and Λ ˆ be the diagonal matrix of eigenvalues of ˆΣ and ˆ respectively. Then, the sphered noise, Ñ, that would result from using the l 2 TRCM estimates has the following form: Ñ = ˆΣ 1/2 N ˆ 1/2 = U Λ 1/2 ˆΣ ( U T U D V T V Λ 1/2 ˆ VT = U Λ 1/2 ˆΣ ) D Λ 1/2 V T. ˆ

10 10 G. I. Allen & R. Tibshirani Hence, using the l 2 -norm TRCM estimates to de-correlate the noise returns a matrix with the same singular vectors as the original noise. Also the resulting singular values are simply a regularized version of the original singular values of the noise, a result which can be calculated using the formulas for for Λ ˆΣ and Λ ˆ in Allen and Tibshirani (20). Therefore, employing l 2 -norm TRCM estimates changes the scale of the noise instead of projecting the noise onto directions that yield approximately independent rows and columns. Using the l 1 -norm penalty in the TRCM framework, however, has many practical advantages in the context of large-scale inference. First, one usually assumes that the columns are independent, so having = I should be our default position. As the penalty encourages sparsity in the off-diagonals of -1, estimating a diagonal covariance is a special case of this model. Furthermore, notice that the penalty parameter, λ is modulated by the dimension of the rows and columns. (We note that λ can be estimated via cross validation using the efficient alternating conditional expectations algorithm (Allen and Tibshirani, 20).) Thus, the evidence of partial correlations among the columns must be strong relative to the partial correlations among the rows in order for non-zero column correlations to be estimated. Secondly, especially in the context microarrays, it seems reasonable to assume that the covariance among the genes is sparse as biologically, genes are likely only to be correlated with genes in the same or related pathways. There is also a theoretical foundation motivating the use of the l 1 -norm TRCM estimates. These estimates, by encouraging sparsity, regularize both the eigenvectors and eigenvalues. This turns out to be important for covariance estimation. For multivariate normal data, covariance estimators resulting from the graphical lasso penalty are consistent in both the Frobenius norm and operator norm for estimating a true underlying sparse matrix (Rothman et al., 2008). Note that while convergence in the Frobenius norm gives convergence of the eigenvalues, convergence in the operator norm implies convergence of the eigenvectors (El Karoui, 2008). Additionally, regularizing eigenvectors using sparsity has been shown to yield consistent directions for dimension reduction (Johnstone and Lu, 2009). While these results are for multivariate data, we note that both the feature space and the sample space are permitted to increase as long as they increase at a constant ratio asymptotically. Given this and from our experience with the l 1 -norm TRCM estimates, we conjecture that under the right assumptions, one may prove that the ˆΣ and ˆ that maximize (2) are consistent for estimating sparse separable inverse covariance matrices of the matrix-variate normal. We leave this open problem for future work Sphering Algorithm We develop a simple method to directly address the problems associated with inference on data exhibiting row and column correlations: We de-correlate the underlying data before conducting inference. Among the many advantages of this approach is that (i) it can be used with any test statistic and (ii) with any method of controlling for multiple testing. Algorithm 1 Sphering Algorithm (a) Estimate row and column means, ˆν and ˆµ forming ˆM, and the signal matrix, Ŝ. (b) Define the noise, ˆN X ˆM Ŝ. Estimate row and column covariances of noise, ˆΣ and ˆ via TRCM. (c) Sphere the noise: Ñ ˆΣ 1 2 ˆN 1 2 ˆ. Form the sphered data matrix: X Ŝ + Ñ. Our sphering algorithm, based on the matrix decomposition model (1), is given in Algorithm 1. This algorithm simply removes the means and signal, estimates the row and column covariances of the noise, and uses these to de-correlate or sphere the noise. The estimated

11 Inference and Dependence 11 [ ] signal, Ŝ, is problem specific. For the two-class model, for example, Ŝ = ˆψ1 1 T (n 1) ˆψ 2 1 T (n 2) where ˆψ 1 and ˆψ 2 are the vectors of estimated class means for each row. Note that we use the symmetric square root defined by the following: Let ˆΣ 1 = PΛP T be the eigenvalue decomposition of ˆΣ 1, then the symmetric matrix square root is given by ˆΣ 1/2 = PΛ 1/2 P T. Then, adding the signal back into this sphered noise, we obtain X which we call the sphered data. Thus, the sphering algorithm solves the problems associated with row and column correlations by sphering the underlying data. As this algorithm operates on the original data matrix, it can be used with any test statistic and any multiple testing procedure. To better understand the algorithm, however, we investigate some of its properties for the two-class problem: Proposition 1. Let X N m,n (M + S, Σ, ) where M = ν1 T (n) + 1 (m)µ T and S = [ψ 1 1 T (n 1) ψ 21 T (n 2) ] and let X be the sphered data given by Algorithm 1. Then, (i) E( X) = S = [ψ 1 1 (n1) ψ 2 1 (n2)], d (ii) If in addition, we take some N 0 = N independent of N and define Ñ 0 = ˆΣ 1/2 N ˆ 1/2 0, then, Ñ 0 ˆΣ, ˆ ) N m,n (0, 0, Σ,, where Σ = ˆΣ 1 2 Σ ˆΣ 1 2 and = ˆ 1 2 ˆ 1 2. Thus, the signal remains the same between X and X, and the covariance structure is all that changes. Each row of X then becomes a linear combination of the other rows weighted by their partial correlations. The same applies to the columns. Now, let us study how sphering the data affects the Z and T -statistics from Section 2.2. First, the Z-statistic does not change with sphering. The numerator of both the Z and T statistic, x 1,i x 2,i is given by ˆψ 1,i ˆψ 2,i, the components of the estimated signal matrix Ŝ. The denominator of the T -statistic, namely s x 1,x 2, the estimate of the noise, however, changes with sphering. Thus, since the denominator of the T -statistic changes with sphering, the ranking of the rows changes as well. This is an important point which we will discuss in more detail subsequently. Recall also that in Section 2.2, we noted that the T -statistic does not have a closed form distribution when there are column correlations. After sphering the data, however, the T -statistic on the sphered data approximately follows a scaled t distribution under certain conditions: Claim 2. Assume the assumptions in Proposition 1 (ii) hold. In addition, let X be the sphered data defined by Ñ0 + Ŝ, and let the statistic T i be the statistic for the i th row for the data X. Then under the null hypothesis H 0 : ψ 1,i = ψ 2,i, if = I, Ti σ i η t (n 2). σ i c n Here, c n and η are defined as in Claim 1. Using our sphering algorithm to de-correlate the noise in the data matrix, we obtain test statistics that follow approximately known distributions under certain conditions. The sphered column covariance,, is assumed to be the identity. If is instead a diagonal matrix, then a simple scaling of the columns will give the above result. Notice that if the original data, X, has no column correlations, = I, then T and T both approximately follow a scaled t distribution with n 2 degrees of freedom. Thus, if the data originally follows the correct theoretical null distribution, then sphering the data does not change its null distribution, an important property. Also, if the sphered rows are independent, Σ = I, or approximately independent, then the statistics, T i are independent or approximately

12 12 G. I. Allen & R. Tibshirani independent. We also note that we can often assume that σ i = σ i, thus eliminating that coefficient ratio from the distribution. This is especially a reasonable assumption if the rows are scaled prior to applying the sphering algorithm. While and Σ are not likely to be exactly the identity, we have observed in simulations that these are often diagonal or nearly diagonal. When calculating p-values for T based on the distribution given in Claim 2, we must know the value of η which depends on the original column covariance. While one might be inclined to estimate η from ˆ, this is problematic for several reasons. First, ˆ is the penalized MLE, meaning that the estimate is biased for finite samples and the exact formula for this bias has not yet been established. Thus, estimating η in this manner would result in a global underestimate of the population variance, η. Secondly, ˆ and ˆΣ are only identifiable up to a multiplicative constant (Allen and Tibshirani, 20). Hence, the scale of the variances of the columns are not separable from that of the rows, meaning that one cannot determine the variance associated with the columns, η, from the TRCM estimates. Further research on the consistency of the estimates for ˆ and ˆΣ are needed as well as investigations into estimating η directly. For our proposes then, we propose to estimate η and hence re-scale the distribution of the sphered test statistics, T, in a data-driven manner. Note that if all of the test statistics were truly from the null distribution, then we could simply re-scale the test-statistics to have the same variance as that of the t (n 2) distribution. As we often expect a portion of the tests to be non-null, however, we do not want these tests to contaminate our variance estimate. Thus, we propose to scale by only the central portion of the observed distribution of test statistics, as these are mostly likely to be truly null tests. More specifically, we estimate η by comparing the variance of the central portion of the t (n 2) distribution to that of the central portion of the T -statistics. This procedure is outlined in Algorithm 2 where ρ α (x) denotes the α th quantile of x and I() is the indicator function. Algorithm 2 Scaling by the central portion of T. (a) Let the expected proportion of null test statistics be ˆπ 0 = ˆm 0 /m. (b) Estimate the variance of the central portion of sphered test statistics: ˆσ 2 T (ˆπ 0 ) ˆ Var [ T I ( T ρ((1 ˆπ0)/2)( T ), T ρ (1 ˆπ0)/2( T ) (c) Define the central-matched T -statistics: T T σ t(n 2) (ˆπ 0 )/ˆσ T (ˆπ 0 ), where σ 2 t (n 2) (ˆπ 0 ) is the variance of the central portion of the t (n 2) distribution. )] The estimate for the scaling factor η, is then ˆη = c n σ 2 T (ˆπ 0 )/ˆσ 2 t (n 2) (ˆπ 0 ). Thus, the resulting T -statistics can be tested against the t (n 2) distribution. As we do not want statistics corresponding to non-null tests to contaminate the variance estimates, we recommend using a conservative estimate of π 0, such as 0.8 or 0.9 for microarrays. We pause to ask a logical question. Based on the results in Section 2.2, why does one need to sphere the data and then estimate η instead of simply estimating η via Algorithm 2 at the onset? Recall that the result in Claim 1 giving the altered variance of the Z-statistic was for a single test associated with a single row of the data matrix. Also, recall that over or under dispersion of the test statistic can result from correlation among the rows alone (Qiu et al., 2005; Efron, 2010). Thus, if the row variables are left un-sphered, then estimating η via central matching will result in a biased estimate that leads to incorrect inferences. As a brief illustration of this effect, we applied central matching to the original data in an analysis of real microarray data in the next section.

13 Inference and Dependence 13 Our sphering algorithm takes a simple but direct approach to the problems associated with correlation and large-scale inference. The noise in the data is de-correlated using the TRCM estimates of the row and column covariances so that the resulting sphered noise is approximately independent. While we have mainly discussed the distributions of standard two-sample test statistics resulting from our algorithm, we note that as our approach works with the original data matrix, it is general and can be used in conjunction with any test statistic and any multiple testing procedure. 4. Results We evaluate the performance of our sphering method through simulations and a real microarray example in which batch effects have been documented Simulations We test our method of directly accounting for row and column correlations when conducting inference on simulated data sets and compare the results to those of competing methods. Four simulation models are employed for this purpose: 1) our matrix-variate model with correlation structure inspired by that of the Spielman et al. (2007) microarray data, 2) a latent variable model, 3) a batch-effect model, and 4) an instrument drift model. Data of dimension is simulated according to the following model: X = S + B + N for signal matrix S, effect matrix B and noise matrix N. The signal is constant for all four simulation models and is that of a two-class model with 25 columns in each class and 50 non-null rows: S = [ψ 1 1 T (25) ψ 2 1 T (25) ] where ψ 1,1:25 = 0.5, ψ 1,26:50 = 0.5, ψ 2,1:25 = 0.5, ψ 2,26:50 = 0.5 and ψ 1,51:250 = ψ 2,51:250 = 0. For the matrix-variate model, the effect matrix, B = 0 and the noise matrix, N = Σ 1/2 Z 1/2 with Z ij N(0, 1). The row and column covariances are inspired by the correlation observed in the Spielman et al. (2007) data: Σ and are taken as the correlation matrices of 250 randomly sampled genes and 50 columns randomly sampled according to the class labels. The latent variable simulation model is taken from Leek and Storey (2008) and consists of the noise matrix, N ij N(0, 1) and the effect matrix, B = Γ G. Here, the latent variables, G, of dimension 2 50 are given by G ij Bern(0.5) and the weights, Γ of dimension are given by Γ ij N(0, (.5) 2 ). For the batch effect model, B ij = K k=1 β iki (j I(k)), where I(k) indicates the k th batch membership and β ik N(µ k, (.5) 2 ). We simulate K = 5 batches with ten members each and µ = [ 0.5, 0.25, 0, 0.25, 0.5]. The noise, N = Σ 1/2 Z where Σ ij = (.9) i j and Z ij N(0, 1) independent of the effect matrix. Finally, the effect matrix for the instrument drift model is given by B i,j = µ + γb i,j 1 + Z i,j where the drift, µ = 0.01, the shrinkage, γ = 0.1 and the innovations Z ij N(0, (.1) 2 ). As with the batch effect model, the noise, N = Σ 1/2 Z. We compare the results of our sphering algorithm to those of competing methods, specifically to standard methodology (row variables are standardized and column variables are centered), surrogate variable analysis (Leek and Storey, 2008), the correlation sharing method (Tibshirani and Wasserman, 2006), the correlation predicted method (Lai, 2008), and the correlation adjusted method (Zuber and Strimmer, 2009). Note that all of these methods reorder the rank of the row variables compared to that of standard methodology. This means that the true false discovery proportion (FDP) will change for each method depending on whether the true non-null rows are re-ordered correctly. Thus, we compare results by fixing the number of tests rejected and comparing the true FDP and estimated FDR, estimated via the step-up procedure (Benjamini and Hochberg, 1995). Best performing methods will

14 14 G. I. Allen & R. Tibshirani Matrix-variate Model Standard Sphered SVA Correlation Correlation Correlation Sharing Predicted Adjusted FDP FDR FDP FDR FDP FDR FDP FDR FDP FDR FDP FDR 40 tests tests tests tests tests Latent Variable Model Standard Sphered SVA Correlation Correlation Correlation Sharing Predicted Adjusted FDP FDR FDP FDR FDP FDR FDP FDR FDP FDR FDP FDR 40 tests tests tests tests tests Batch Effect Model Standard Sphered SVA Correlation Correlation Correlation Sharing Predicted Adjusted FDP FDR FDP FDR FDP FDR FDP FDR FDP FDR FDP FDR 40 tests tests tests tests tests Instrument Drift Model Standard Sphered SVA Correlation Correlation Correlation Sharing Predicted Adjusted FDP FDR FDP FDR FDP FDR FDP FDR FDP FDR FDP FDR 40 tests tests tests tests tests Fig. 4. Simulation results comparing the average true false discovery proportion (FDP) to the average estimated false discovery rate ( F DR) over 100 replicates of the four simulation models for the six methods as described in Section 4.1. Results are compared when the number of rejected tests are fixed between 40 to 60 tests out of 250 total tests, with 50 tests being truly non-null. Rejecting 55 tests corresponds to controlling the oracle FDP at 10%. Best performing methods for each model, meaning that the most true non-null tests are rejected while still controlling the true FDP, are denoted in bold.

15 Inference and Dependence 15 re-order the rows such that the true FDP is lower, meaning that the statistical power is higher, while the FDR is well-estimated or slightly conservative. Methods yielding anticonservative FDR estimates, meaning that the FDR under-estimates the true FDP, exhibit problematic tendencies as controlling the FDR does not imply controlling the proportion of false discoveries. In Table 4, we display the average true FDP and estimated FDR for the six comparison methods on the four simulations models over 100 replicates for fixed numbers of rejected tests. As controlling the oracle FDP at 10% corresponds to rejecting 55 tests, we present boxplots of the true FDP and estimated FDR in Figure 5 when 55 tests are rejected. These results reveal that the sphering algorithm performs well in comparison to the other five methods. In the matrix-variate, batch effect and instrument drift models, the sphering algorithm re-orders the row rankings in such as way as to lower the true FDP, yielding an increased in statistical power. In addition, the estimated FDR is a more consistent estimate when sphering is used, allowing one to reject more truly non-null rows than other methods. Notice also that sphering decreases the variance of the FDR estimates compared to standard methodology. The correlation adjusted method (Zuber and Strimmer, 2009) generally results in a favorable re-ordering of the row rankings, but leads to an FDR estimate with larger variance. All competing methods exhibit troubling behavior in the matrix-variate simulation. Here, all methods estimate that there are no false discoveries when in fact there are at least five false discoveries. Also, competing methods such as SVA, the correlation sharing and correlation predicted exhibit these problematic behaviors in at least two of the simulation models. In microarray analysis, this behavior would lead to identifying too many genes as significant when many are likely to be false discoveries. Overall, our sphering algorithm is the most consistent and most robust method for conducting inference on matrix data with both row and column correlations Results: Real Microarray Study We compare the performance of our sphering algorithm to that of competing methods on a real microarray data set in which strong batch effects have been documented. The data presented in Spielman et al. (2007) measures the gene expression of 4,167 genes for 142 subjects with 60 of European ancestry (CEU) and 82 of Asian (ASN) ancestry. Spielman et al. (2007) find that 78% of genes are differentially expressed between the CEU and ASN groups. Subsequent work, however, has questioned the validity of these results due to strong batch effects (Akey et al., 2007; Leek et al., 2010a). Specifically, the microarrays were processed from years , with the bulk (44/60) of the CEU group processed in years and all of the ASN group processed in years Due to the strong batch effects measured by the processing year and the confounding between these batches and the two classes, it is difficult to determine the set of truly differentially expressed genes. In fact after removing the batch effects, no genes are found to be significant (Akey et al., 2007). On this data, we apply our sphering algorithm, standard methodology, and the four competing methods described in the previous section to determine the number of genes that are significantly differentially expressed between the two groups. Prior to analysis, the genes and arrays are centered. Two-sample t-statistics with the variance correction as proposed in Tusher et al. (2001) are used. Test statistics are compared to the permutation distribution (Storey and Tibshirani, 2003) to assess significance and correct for multiple testing. The FDR is controlled at 10%. While the batches are known for this data set, we ignore this information when comparing methods to test the performance of each method. In Table 6, we present the number of significant genes found by each method. We find that 49% of genes are differentially expressed using the standard methodology, but after

16 16 G. I. Allen & R. Tibshirani Matrix variate Model True FDP Estimated FDR Standard Sphered SVA Correlation Sharing Latent Variable Model Correlation Predicted Correlation Adjusted True FDP Estimated FDR Standard Sphered SVA Correlation Sharing Batch Effect Model Correlation Predicted Correlation Adjusted True FDP Estimated FDR Standard Sphered SVA Correlation Sharing Instrument Drift Model Correlation Predicted Correlation Adjusted True FDP Estimated FDR Standard Sphered SVA Correlation Sharing Correlation Predicted Correlation Adjusted Fig. 5. Boxplots of the true false discovery proportion (FDP) and the estimated false discovery rate (FDR) for six methods under four simulation models each repeated 100 times. For each method, the number of tests rejected is fixed at 55 out of 250 tests corresponding to controlling the oracle FDP at 10%, as shown with the dotted line. Methods and simulation models are described in Section 4.1. Best performing methods reorder the tests such that the true FDP is close to the oracle FDP, while the estimated FDR is a good or slightly conservative estimate of the true FDP with small variance. The best methods are then our sphering method for the matrix-variate, batch effect and instrument drift models and the SVA method for the latent variable model.

17 Number of Significant Genes Standard 2040 Standard (with Central Matching) 80 Standard after removing Batch Effects 0 SVA 2787 Correlation Sharing 4167 Correlation Predicted 248 Correlation Adjusted 178 Sphering 28 Inference and Dependence 17 Fig. 6. Number of significant genes found in the Spielman et al. (2007) data. Two-sample T -statistics with a variance adjustment were used, and the FDR was controlled at 10%. When the batch labels are known and the batch effects removed, no genes are found significant. Assuming the batch labels are unknown, competing methods find many more genes significant. Our sphering method appropriately estimates and removes the effect of the batches finding only 28 genes significant. removing the batch effects by standardizing the arrays with respect to processing year, no genes are found to be significant. These results are consistent with previous re-analysis of this data (Akey et al., 2007). When employing the surrogate variable analysis and correlation sharing methods, however, even more genes are found to be significant at 67% and 100% respectively. The correaltion predicted and correlation adjusted methods find 6% and 4% signfiicant genes, while our sphering algorithm only finds 0.67% signficant genes. Notice also that while central matching adjusts for some of the effects of correlation, since it does not re-order the gene rankings, it still finds 1.9% of genes significant. Thus, even without using knowledge of the processing years, the sphering method correctly adjusts for these batch effects, finding very few genes differentially expressed. The SVA and correlation sharing methods, however, display troubling behavior as they estimate that even more genes are significant than the standard methodology. These real data results are consistent then with what we have previously observed in the simulation study. To explore these results further, we present heatmap dendograms of the top 250 genes for the original data, the data resulting from the SVA method, and our sphered data in Figure 7. Colorbars indicating the processing years as well as the group labels are shown with these heatmaps. From this, we see that the arrays in the original data cluster by processing years or years instead of by the group status. When the SVA method is applied, this effect appears to be exacerbated as the separation between processing years is more pronounced. In contrast, the sphered data clusters the arrays by the group status and not the processing years. In addition, the arrays do not cluster by the processing years or indicating that our method removed these strong batch effects. Instead, the true problem is clearly illustrated, namely there is confounding between processing years and group status. These results on a real microarray example provide strong evidence for the utility of our sphering approach, revealing that the technique outperforms all competing methods. 5. Discussion In this paper, we have demonstrated that using standard statistical methodology to conduct inference on transposable data is problematic. As a method of solving these problems, we have prosed a sphering algorithm that de-correlates the data yielding approximately independent rows and columns to be used before conducting large-scale inference. We have demonstrated the advantages and robustness of this method through simulations on many correlated data sets.

18 G. I. Allen & R. Tibshirani 18 Fig. 7. Heatmap dendograms of the top 250 genes for the original standardized Spielman et al. (2007) data (left), the data after removing latent variables via the SVA method (center), and after sphering the data (right). Samples are labeled according to the year in which the arrays were processed (upper colorbar) and the class labels (lower colorbar). In the original data and the SVA method, samples cluster largely according to processing years, or Samples in the sphered data do not seem to cluster according to processing year, and instead, confounding of the class labels and processing years is clearly seen.

A Generalized Least Squares Matrix Decomposition

A Generalized Least Squares Matrix Decomposition A Generalized Least Squares Matrix Decomposition Genevera I. Allen Department of Pediatrics-Neurology, Baylor College of Medicine, & Department of Statistics, Rice University, Houston, TX. Logan Grosenick

More information

Supplement to A Generalized Least Squares Matrix Decomposition. 1 GPMF & Smoothness: Ω-norm Penalty & Functional Data

Supplement to A Generalized Least Squares Matrix Decomposition. 1 GPMF & Smoothness: Ω-norm Penalty & Functional Data Supplement to A Generalized Least Squares Matrix Decomposition Genevera I. Allen 1, Logan Grosenic 2, & Jonathan Taylor 3 1 Department of Statistics and Electrical and Computer Engineering, Rice University

More information

Supplement to Sparse Non-Negative Generalized PCA with Applications to Metabolomics

Supplement to Sparse Non-Negative Generalized PCA with Applications to Metabolomics Supplement to Sparse Non-Negative Generalized PCA with Applications to Metabolomics Genevera I. Allen Department of Pediatrics-Neurology, Baylor College of Medicine, Jan and Dan Duncan Neurological Research

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

Confounder Adjustment in Multiple Hypothesis Testing

Confounder Adjustment in Multiple Hypothesis Testing in Multiple Hypothesis Testing Department of Statistics, Stanford University January 28, 2016 Slides are available at http://web.stanford.edu/~qyzhao/. Collaborators Jingshu Wang Trevor Hastie Art Owen

More information

A Generalized Least Squares Matrix Decomposition

A Generalized Least Squares Matrix Decomposition A Generalized Least Squares Matrix Decomposition Genevera I. Allen Department of Pediatrics-Neurology, Baylor College of Medicine, Jan and Dan Duncan Neurological Research Institute, Texas Children s Hospital,

More information

The miss rate for the analysis of gene expression data

The miss rate for the analysis of gene expression data Biostatistics (2005), 6, 1,pp. 111 117 doi: 10.1093/biostatistics/kxh021 The miss rate for the analysis of gene expression data JONATHAN TAYLOR Department of Statistics, Stanford University, Stanford,

More information

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data Ståle Nygård Trial Lecture Dec 19, 2008 1 / 35 Lecture outline Motivation for not using

More information

Modified Simes Critical Values Under Positive Dependence

Modified Simes Critical Values Under Positive Dependence Modified Simes Critical Values Under Positive Dependence Gengqian Cai, Sanat K. Sarkar Clinical Pharmacology Statistics & Programming, BDS, GlaxoSmithKline Statistics Department, Temple University, Philadelphia

More information

A Practical Approach to Inferring Large Graphical Models from Sparse Microarray Data

A Practical Approach to Inferring Large Graphical Models from Sparse Microarray Data A Practical Approach to Inferring Large Graphical Models from Sparse Microarray Data Juliane Schäfer Department of Statistics, University of Munich Workshop: Practical Analysis of Gene Expression Data

More information

Statistical Inference On the High-dimensional Gaussian Covarianc

Statistical Inference On the High-dimensional Gaussian Covarianc Statistical Inference On the High-dimensional Gaussian Covariance Matrix Department of Mathematical Sciences, Clemson University June 6, 2011 Outline Introduction Problem Setup Statistical Inference High-Dimensional

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006 Article 28 A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments Hongmei Jiang Rebecca

More information

Statistica Sinica Preprint No: SS R2

Statistica Sinica Preprint No: SS R2 Statistica Sinica Preprint No: SS-2017-0076.R2 Title Graph Estimation for Matrix-variate Gaussian Data Manuscript ID SS-2017-0076.R2 URL http://www.stat.sinica.edu.tw/statistica/ DOI 10.5705/ss.202017.0076

More information

ESTIMATING THE PROPORTION OF TRUE NULL HYPOTHESES UNDER DEPENDENCE

ESTIMATING THE PROPORTION OF TRUE NULL HYPOTHESES UNDER DEPENDENCE Statistica Sinica 22 (2012), 1689-1716 doi:http://dx.doi.org/10.5705/ss.2010.255 ESTIMATING THE PROPORTION OF TRUE NULL HYPOTHESES UNDER DEPENDENCE Irina Ostrovnaya and Dan L. Nicolae Memorial Sloan-Kettering

More information

Chapter 3: Statistical methods for estimation and testing. Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001).

Chapter 3: Statistical methods for estimation and testing. Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001). Chapter 3: Statistical methods for estimation and testing Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001). Chapter 3: Statistical methods for estimation and testing Key reference:

More information

SUPPLEMENTARY SIMULATIONS & FIGURES

SUPPLEMENTARY SIMULATIONS & FIGURES Supplementary Material: Supplementary Material for Mixed Effects Models for Resampled Network Statistics Improve Statistical Power to Find Differences in Multi-Subject Functional Connectivity Manjari Narayan,

More information

FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES

FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES Sanat K. Sarkar a a Department of Statistics, Temple University, Speakman Hall (006-00), Philadelphia, PA 19122, USA Abstract The concept

More information

Estimation of large dimensional sparse covariance matrices

Estimation of large dimensional sparse covariance matrices Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j. Chapter 9 Pearson s chi-square test 9. Null hypothesis asymptotics Let X, X 2, be independent from a multinomial(, p) distribution, where p is a k-vector with nonnegative entries that sum to one. That

More information

Asymptotic Statistics-VI. Changliang Zou

Asymptotic Statistics-VI. Changliang Zou Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous

More information

TESTING SIGNIFICANCE OF FEATURES BY LASSOED PRINCIPAL COMPONENTS. BY DANIELA M. WITTEN 1 AND ROBERT TIBSHIRANI 2 Stanford University

TESTING SIGNIFICANCE OF FEATURES BY LASSOED PRINCIPAL COMPONENTS. BY DANIELA M. WITTEN 1 AND ROBERT TIBSHIRANI 2 Stanford University The Annals of Applied Statistics 2008, Vol. 2, No. 3, 986 1012 DOI: 10.1214/08-AOAS182 Institute of Mathematical Statistics, 2008 TESTING SIGNIFICANCE OF FEATURES BY LASSOED PRINCIPAL COMPONENTS BY DANIELA

More information

Appendix to. Automatic Feature Selection via Weighted Kernels. and Regularization. published in the Journal of Computational and Graphical.

Appendix to. Automatic Feature Selection via Weighted Kernels. and Regularization. published in the Journal of Computational and Graphical. Appendix to Automatic Feature Selection via Weighted Kernels and Regularization published in the Journal of Computational and Graphical Statistics Genevera I. Allen 1 Convergence of KNIFE We will show

More information

A Generalized Least Squares Matrix Decomposition

A Generalized Least Squares Matrix Decomposition This article was downloaded by: [171.65.65.46] On: 02 March 2014, At: 14:00 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer

More information

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

This paper has been submitted for consideration for publication in Biometrics

This paper has been submitted for consideration for publication in Biometrics BIOMETRICS, 1 10 Supplementary material for Control with Pseudo-Gatekeeping Based on a Possibly Data Driven er of the Hypotheses A. Farcomeni Department of Public Health and Infectious Diseases Sapienza

More information

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008 Regularized Estimation of High Dimensional Covariance Matrices Peter Bickel Cambridge January, 2008 With Thanks to E. Levina (Joint collaboration, slides) I. M. Johnstone (Slides) Choongsoon Bae (Slides)

More information

Are a set of microarrays independent of each other?

Are a set of microarrays independent of each other? Are a set of microarrays independent of each other? Bradley Efron Stanford University Abstract Having observed an m n matrix X whose rows are possibly correlated, we wish to test the hypothesis that the

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

arxiv: v1 [math.st] 31 Mar 2009

arxiv: v1 [math.st] 31 Mar 2009 The Annals of Statistics 2009, Vol. 37, No. 2, 619 629 DOI: 10.1214/07-AOS586 c Institute of Mathematical Statistics, 2009 arxiv:0903.5373v1 [math.st] 31 Mar 2009 AN ADAPTIVE STEP-DOWN PROCEDURE WITH PROVEN

More information

Sparse inverse covariance estimation with the lasso

Sparse inverse covariance estimation with the lasso Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS Communications in Statistics - Simulation and Computation 33 (2004) 431-446 COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS K. Krishnamoorthy and Yong Lu Department

More information

Global Testing and Large-Scale Multiple Testing for High-Dimensional Covariance Structures

Global Testing and Large-Scale Multiple Testing for High-Dimensional Covariance Structures Global Testing and Large-Scale Multiple Testing for High-Dimensional Covariance Structures T. Tony Cai 1 1 Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, USA, 19104;

More information

Stochastic Proximal Gradient Algorithm

Stochastic Proximal Gradient Algorithm Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1

More information

Computationally efficient banding of large covariance matrices for ordered data and connections to banding the inverse Cholesky factor

Computationally efficient banding of large covariance matrices for ordered data and connections to banding the inverse Cholesky factor Computationally efficient banding of large covariance matrices for ordered data and connections to banding the inverse Cholesky factor Y. Wang M. J. Daniels wang.yanpin@scrippshealth.org mjdaniels@austin.utexas.edu

More information

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results David Prince Biostat 572 dprince3@uw.edu April 19, 2012 David Prince (UW) SPICE April 19, 2012 1 / 11 Electronic

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

Sparse PCA with applications in finance

Sparse PCA with applications in finance Sparse PCA with applications in finance A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon 1 Introduction

More information

SHOTA KATAYAMA AND YUTAKA KANO. Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka , Japan

SHOTA KATAYAMA AND YUTAKA KANO. Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka , Japan A New Test on High-Dimensional Mean Vector Without Any Assumption on Population Covariance Matrix SHOTA KATAYAMA AND YUTAKA KANO Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama,

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

the long tau-path for detecting monotone association in an unspecified subpopulation

the long tau-path for detecting monotone association in an unspecified subpopulation the long tau-path for detecting monotone association in an unspecified subpopulation Joe Verducci Current Challenges in Statistical Learning Workshop Banff International Research Station Tuesday, December

More information

Rejoinder on: Control of the false discovery rate under dependence using the bootstrap and subsampling

Rejoinder on: Control of the false discovery rate under dependence using the bootstrap and subsampling Test (2008) 17: 461 471 DOI 10.1007/s11749-008-0134-6 DISCUSSION Rejoinder on: Control of the false discovery rate under dependence using the bootstrap and subsampling Joseph P. Romano Azeem M. Shaikh

More information

ROW AND COLUMN CORRELATIONS (ARE A SET OF MICROARRAYS INDEPENDENT OF EACH OTHER?) Bradley Efron Department of Statistics Stanford University

ROW AND COLUMN CORRELATIONS (ARE A SET OF MICROARRAYS INDEPENDENT OF EACH OTHER?) Bradley Efron Department of Statistics Stanford University ROW AND COLUMN CORRELATIONS (ARE A SET OF MICROARRAYS INDEPENDENT OF EACH OTHER?) By Bradley Efron Department of Statistics Stanford University Technical Report 244 March 2008 This research was supported

More information

Estimating False Discovery Proportion Under Arbitrary Covariance Dependence

Estimating False Discovery Proportion Under Arbitrary Covariance Dependence Estimating False Discovery Proportion Under Arbitrary Covariance Dependence arxiv:1010.6056v2 [stat.me] 15 Nov 2011 Jianqing Fan, Xu Han and Weijie Gu May 31, 2018 Abstract Multiple hypothesis testing

More information

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Statistics Journal Club, 36-825 Beau Dabbs and Philipp Burckhardt 9-19-2014 1 Paper

More information

Multivariate Distributions

Multivariate Distributions IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

Empirical Bayes Moderation of Asymptotically Linear Parameters

Empirical Bayes Moderation of Asymptotically Linear Parameters Empirical Bayes Moderation of Asymptotically Linear Parameters Nima Hejazi Division of Biostatistics University of California, Berkeley stat.berkeley.edu/~nhejazi nimahejazi.org twitter/@nshejazi github/nhejazi

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations

Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations Yale School of Public Health Joint work with Ning Hao, Yue S. Niu presented @Tsinghua University Outline 1 The Problem

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Supplementary Note on Bayesian analysis

Supplementary Note on Bayesian analysis Supplementary Note on Bayesian analysis Structured variability of muscle activations supports the minimal intervention principle of motor control Francisco J. Valero-Cuevas 1,2,3, Madhusudhan Venkadesan

More information

Package sgpca. R topics documented: July 6, Type Package. Title Sparse Generalized Principal Component Analysis. Version 1.0.

Package sgpca. R topics documented: July 6, Type Package. Title Sparse Generalized Principal Component Analysis. Version 1.0. Package sgpca July 6, 2013 Type Package Title Sparse Generalized Principal Component Analysis Version 1.0 Date 2012-07-05 Author Frederick Campbell Maintainer Frederick Campbell

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

Example Linear Algebra Competency Test

Example Linear Algebra Competency Test Example Linear Algebra Competency Test The 4 questions below are a combination of True or False, multiple choice, fill in the blank, and computations involving matrices and vectors. In the latter case,

More information

Sparse Gaussian conditional random fields

Sparse Gaussian conditional random fields Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian

More information

The optimal discovery procedure: a new approach to simultaneous significance testing

The optimal discovery procedure: a new approach to simultaneous significance testing J. R. Statist. Soc. B (2007) 69, Part 3, pp. 347 368 The optimal discovery procedure: a new approach to simultaneous significance testing John D. Storey University of Washington, Seattle, USA [Received

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation

More information

Within Group Variable Selection through the Exclusive Lasso

Within Group Variable Selection through the Exclusive Lasso Within Group Variable Selection through the Exclusive Lasso arxiv:1505.07517v1 [stat.me] 28 May 2015 Frederick Campbell Department of Statistics, Rice University and Genevera Allen Department of Statistics,

More information

A Multivariate Two-Sample Mean Test for Small Sample Size and Missing Data

A Multivariate Two-Sample Mean Test for Small Sample Size and Missing Data A Multivariate Two-Sample Mean Test for Small Sample Size and Missing Data Yujun Wu, Marc G. Genton, 1 and Leonard A. Stefanski 2 Department of Biostatistics, School of Public Health, University of Medicine

More information

Singular Value Decomposition and Principal Component Analysis (PCA) I

Singular Value Decomposition and Principal Component Analysis (PCA) I Singular Value Decomposition and Principal Component Analysis (PCA) I Prof Ned Wingreen MOL 40/50 Microarray review Data per array: 0000 genes, I (green) i,i (red) i 000 000+ data points! The expression

More information

Machine Learning (Spring 2012) Principal Component Analysis

Machine Learning (Spring 2012) Principal Component Analysis 1-71 Machine Learning (Spring 1) Principal Component Analysis Yang Xu This note is partly based on Chapter 1.1 in Chris Bishop s book on PRML and the lecture slides on PCA written by Carlos Guestrin in

More information

Managing Uncertainty

Managing Uncertainty Managing Uncertainty Bayesian Linear Regression and Kalman Filter December 4, 2017 Objectives The goal of this lab is multiple: 1. First it is a reminder of some central elementary notions of Bayesian

More information

Controlling Bayes Directional False Discovery Rate in Random Effects Model 1

Controlling Bayes Directional False Discovery Rate in Random Effects Model 1 Controlling Bayes Directional False Discovery Rate in Random Effects Model 1 Sanat K. Sarkar a, Tianhui Zhou b a Temple University, Philadelphia, PA 19122, USA b Wyeth Pharmaceuticals, Collegeville, PA

More information

Step-down FDR Procedures for Large Numbers of Hypotheses

Step-down FDR Procedures for Large Numbers of Hypotheses Step-down FDR Procedures for Large Numbers of Hypotheses Paul N. Somerville University of Central Florida Abstract. Somerville (2004b) developed FDR step-down procedures which were particularly appropriate

More information

In many areas of science, there has been a rapid increase in the

In many areas of science, there has been a rapid increase in the A general framework for multiple testing dependence Jeffrey T. Leek a and John D. Storey b,1 a Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, MD 21287; and b Lewis-Sigler

More information

Accounting for Population Uncertainty in Covariance Structure Analysis

Accounting for Population Uncertainty in Covariance Structure Analysis Accounting for Population Uncertainty in Structure Analysis Boston College May 21, 2013 Joint work with: Michael W. Browne The Ohio State University matrix among observed variables are usually implied

More information

A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices

A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices Natalia Bailey 1 M. Hashem Pesaran 2 L. Vanessa Smith 3 1 Department of Econometrics & Business Statistics, Monash

More information

BTRY 4090: Spring 2009 Theory of Statistics

BTRY 4090: Spring 2009 Theory of Statistics BTRY 4090: Spring 2009 Theory of Statistics Guozhang Wang September 25, 2010 1 Review of Probability We begin with a real example of using probability to solve computationally intensive (or infeasible)

More information

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire

More information

Inference For High Dimensional M-estimates: Fixed Design Results

Inference For High Dimensional M-estimates: Fixed Design Results Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49

More information

Sparse Permutation Invariant Covariance Estimation: Final Talk

Sparse Permutation Invariant Covariance Estimation: Final Talk Sparse Permutation Invariant Covariance Estimation: Final Talk David Prince Biostat 572 dprince3@uw.edu May 31, 2012 David Prince (UW) SPICE May 31, 2012 1 / 19 Electronic Journal of Statistics Vol. 2

More information

Lecture 2 Part 1 Optimization

Lecture 2 Part 1 Optimization Lecture 2 Part 1 Optimization (January 16, 2015) Mu Zhu University of Waterloo Need for Optimization E(y x), P(y x) want to go after them first, model some examples last week then, estimate didn t discuss

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas

Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas Department of Mathematics Department of Statistical Science Cornell University London, January 7, 2016 Joint work

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Exam: high-dimensional data analysis January 20, 2014

Exam: high-dimensional data analysis January 20, 2014 Exam: high-dimensional data analysis January 20, 204 Instructions: - Write clearly. Scribbles will not be deciphered. - Answer each main question not the subquestions on a separate piece of paper. - Finish

More information

Cointegrated VAR s. Eduardo Rossi University of Pavia. November Rossi Cointegrated VAR s Financial Econometrics / 56

Cointegrated VAR s. Eduardo Rossi University of Pavia. November Rossi Cointegrated VAR s Financial Econometrics / 56 Cointegrated VAR s Eduardo Rossi University of Pavia November 2013 Rossi Cointegrated VAR s Financial Econometrics - 2013 1 / 56 VAR y t = (y 1t,..., y nt ) is (n 1) vector. y t VAR(p): Φ(L)y t = ɛ t The

More information

A significance test for the lasso

A significance test for the lasso 1 First part: Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Second part: Joint work with Max Grazier G Sell, Stefan Wager and Alexandra

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2013 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Meghana Kshirsagar (mkshirsa), Yiwen Chen (yiwenche) 1 Graph

More information

Large-Scale Multiple Testing of Correlations

Large-Scale Multiple Testing of Correlations Large-Scale Multiple Testing of Correlations T. Tony Cai and Weidong Liu Abstract Multiple testing of correlations arises in many applications including gene coexpression network analysis and brain connectivity

More information

A NEW APPROACH FOR LARGE SCALE MULTIPLE TESTING WITH APPLICATION TO FDR CONTROL FOR GRAPHICALLY STRUCTURED HYPOTHESES

A NEW APPROACH FOR LARGE SCALE MULTIPLE TESTING WITH APPLICATION TO FDR CONTROL FOR GRAPHICALLY STRUCTURED HYPOTHESES A NEW APPROACH FOR LARGE SCALE MULTIPLE TESTING WITH APPLICATION TO FDR CONTROL FOR GRAPHICALLY STRUCTURED HYPOTHESES By Wenge Guo Gavin Lynch Joseph P. Romano Technical Report No. 2018-06 September 2018

More information

STAT 461/561- Assignments, Year 2015

STAT 461/561- Assignments, Year 2015 STAT 461/561- Assignments, Year 2015 This is the second set of assignment problems. When you hand in any problem, include the problem itself and its number. pdf are welcome. If so, use large fonts and

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

Cross-Sectional Regression after Factor Analysis: Two Applications

Cross-Sectional Regression after Factor Analysis: Two Applications al Regression after Factor Analysis: Two Applications Joint work with Jingshu, Trevor, Art; Yang Song (GSB) May 7, 2016 Overview 1 2 3 4 1 / 27 Outline 1 2 3 4 2 / 27 Data matrix Y R n p Panel data. Transposable

More information

On Methods Controlling the False Discovery Rate 1

On Methods Controlling the False Discovery Rate 1 Sankhyā : The Indian Journal of Statistics 2008, Volume 70-A, Part 2, pp. 135-168 c 2008, Indian Statistical Institute On Methods Controlling the False Discovery Rate 1 Sanat K. Sarkar Temple University,

More information

CONFOUNDER ADJUSTMENT IN MULTIPLE HYPOTHESIS TESTING

CONFOUNDER ADJUSTMENT IN MULTIPLE HYPOTHESIS TESTING Submitted to the Annals of Statistics CONFOUNDER ADJUSTMENT IN MULTIPLE HYPOTHESIS TESTING By Jingshu Wang, Qingyuan Zhao, Trevor Hastie, Art B. Owen Stanford University We consider large-scale studies

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Resampling-Based Control of the FDR

Resampling-Based Control of the FDR Resampling-Based Control of the FDR Joseph P. Romano 1 Azeem S. Shaikh 2 and Michael Wolf 3 1 Departments of Economics and Statistics Stanford University 2 Department of Economics University of Chicago

More information