Posterior convergence rates for estimating large precision. matrices using graphical models

Size: px
Start display at page:

Download "Posterior convergence rates for estimating large precision. matrices using graphical models"

Transcription

1 Biometrika (2013), xx, x, pp C 2007 Biometrika Trust Printed in Great Britain Posterior convergence rates for estimating large precision matrices using graphical models BY SAYANTAN BANERJEE Department of Statistics, North Carolina State University, 5219 SAS Hall, Campus Box 8203, Raleigh, NC 27695, USA 5 sbaner5@ncsu.edu SUBHASHIS GHOSAL Department of Statistics, North Carolina State University, 4276 SAS Hall, Campus Box 8203, Raleigh, NC 27695, USA sghosal@ncsu.edu 10 SUMMARY We consider Bayesian estimation of a p p precision matrix, when p can be much larger than the available sample size n. It is well known that consistent estimation in such ultra-high dimensional situations requires regularisation such as banding, tapering or thresholding. We consider a banding structure in the model and induce a prior distribution on a banded precision matrix 15 through a Gaussian graphical model, where an edge is present only when two vertices are within a given distance. We show that under a very mild growth condition and a proper choice of the order of graph, the posterior distribution based on the graphical model is consistent in the L - operator norm uniformly over a class of precision matrices, even if the true precision matrix may

2 20 2 S. BANERJEE AND S. GHOSAL not have a banded structure. Along the way to the proof, we also establish that the maximum likelihood estimator (MLE) is also consistent under the same set of conditions, which is of independent interest. The consistency and the convergence rate of the precision matrix given the data are also studied. We also conduct a simulation study to compare finite sample performance of the Bayes estimator and the MLE based on the graphical model with that obtained by using a 25 banding operation on the sample covariance matrix. Some key words: Precision matrix, G-Wishart, posterior consistency, convergence rate. 1. INTRODUCTION Estimating a covariance matrix or a precision matrix (inverse covariance matrix) is one of the most important problems in multivariate analysis. Of special interest are situations where the 30 number of underlying variables p is much larger than the sample size n. Such kind of situations are common in gene expression data, fmri data and in several other modern applications. Special care needs to be taken for tackling such high-dimensional scenarios. Conventional estimators like the sample covariance matrix or maximum likelihood estimator behave poorly when the dimensionality is much higher than the sample size. 35 Different regularisation based methods have been proposed and developed in the recent years for dealing with high-dimensional data. These include banding, thresholding, tapering and penalisation based methods to name a few; see, for example, Ledoit & Wolf (2004); Huang et al. (2006); Yuan & Lin (2007); Bickel & Levina (2008a,b); Karoui (2008); Friedman et al. (2008); Rothman et al. (2008); Lam & Fan (2009); Rothman et al. (2009); Cai et al. (2010, 2011). Most 40 of these regularisation based methods for high dimensions impose a sparse structure in the co- variance or the precision matrix, as in Bickel & Levina (2008a), where a rate of convergence has been derived for the estimator obtained by banding the sample covariance matrix, or by

3 Posterior Convergence Rates in Graphical models 3 banding the Cholesky factor of the inverse sample covariance matrix, as long as n 1 log p 0. Cai et al. (2010) obtained the minimax rate under the operator norm and constructed a tapering estimator which attains the minimax rate over a smoothness class of covariance matrices. Cai & 45 Liu (2011) proposed an adaptive thresholding procedure. More recently, Cai & Yuan (2012) introduced a data-driven block-thresholding estimator which is shown to be optimally rate adaptive over some smoothness class of covariance matrices. There are only a few relevant work in Bayesian inference for such kind of problems. Ghosal (2000) studied asymptotic normality of posterior distributions for exponential families when 50 the dimension p, but restricting to p n. Recently, Pati et al. (2012) considered sparse Bayesian factor models for dimensionality reduction in high dimensional problems and showed consistency in the L 2 -operator norm (also known as the spectral norm) by using a point mass mixture prior on the factor loadings, assuming such a factor model representation of the true covariance matrix. 55 Graphical models (Lauritzen, 1996) serve as an excellent tool in sparse covariance or inverse covariance estimation; see Dobra et al. (2004); Meinshausen & Bühlmann (2006); Yuan & Lin (2007); Friedman et al. (2008), as they capture the conditional dependency between the variables by means of a graph. Bayesian methods for inference using graphical models have also been developed, as in Roverato (2000); Atay-Kayis & Massam (2005); Letac & Massam (2007). For 60 a complete graph corresponding to the saturated model, clearly the Wishart distribution is the conugate prior for the precision matrix Ω; see Diaconis & Ylvisaker (1979). For an incomplete decomposable graph, a conugate family of priors is given by the G-Wishart prior (Roverato, 2000). The equivalent prior on the covariance matrix is termed as the hyper inverse Wishart distribution in Dawid & Lauritzen (1993). Letac & Massam (2007) introduced a more general 65 family of conugate priors for the precision matrix, known as the W PG -Wishart family of distri-

4 4 S. BANERJEE AND S. GHOSAL butions, which also has the conugacy property. The properties of this family of distribution were further explored in Raaratnam et al. (2008). Raaratnam et al. (2008) also obtained expressions for Bayes estimators. 70 In this paper, we consider Bayesian estimation of the precision matrix working with a G- Wishart prior induced by a Gaussian graphical model, which has a Markov property with respect to a decomposable graph G. For estimators arising from the resulting conugacy structure, we establish their consistency and derive their posterior convergence rates. More specifically, we work with a Gaussian graphical model structure which induces banding in the corresponding 75 precision matrix. Using this graphical model ensures the decomposability of the graph, along 80 with the presence of a perfect set of cliques, as explained in Section 2. For a G-Wishart prior, we can compute the explicit expression of the normalising constant of the corresponding marginal distribution of the graph. For arbitrary decomposable graphs, the computation of the normalizing constant requires Markov chain Monte-Carlo (MCMC) based methods; see Atay-Kayis & Massam (2005); Carvalho et al. (2007); Carvalho & Scott (2009); Lenkoski & Dobra (2011); Dobra et al. (2011). The paper is organized as follows. In the next section, we discuss some preliminaries on graphical models. In Section 3, we formulate the estimation problem and the describe the corresponding model assumptions. Section 4 deals with the main results related to posterior consistency and 85 convergence rates. We extend the results proved in Section 4 to estimators using a reference prior on the covariance parameter in the next section. In Section 6, we compare the performance of the Bayesian estimator with the graphical maximum likelihood estimator (MLE) and the banding estimators as proposed by Bickel & Levina (2008b). Proof of the main results are presented in Section 7. Some auxiliary lemmas and their proofs are included in the Appendix.

5 Posterior Convergence Rates in Graphical models 5 2. NOTATIONS AND PRELIMINARIES ON GRAPHICAL MODELS 90 We first describe the notations to be used in this paper. By t n = O(δ n ) (respectively, o(δ n )), we mean that t n /δ n is bounded (respectively, t n /δ n 0 as n ). For a random sequence X n, X n = O P (δ n ) (respectively, X n = o P (δ n )) means that P( X n Mδ n ) 1 for some constant M (respectively, P( X n < ϵδ n ) 1 for all ϵ > 0). For numerical sequences r n and s n, by r n s n (or, r n s n ) we mean that r n = o(s n ), while by s n r n we mean that r n = O(s n ). 95 By r n s n, we mean that r n = O(s n ) and s n = O(r n ), while r n s n stands for r n /s n 1. The indicator function is denoted by 1l. ( p ) 1/r, We define the following norms for a vector x R p : x r = =1 x r x = max x. For a matrix A = (a i ), a i stands for the (i, )th entry of A. If A is a symmetric p p matrix, let eig 1 (A),..., eig p (A) stand for its eigenvalues. We consider the following 100 norms on p p matrices: A r = ( p i=1 a i r ) 1/r, 1 r <, A = max i, a i, A (r,s) = sup{ax s : x r = 1}, by respectively viewing A as a vector in R p2 and an operator from (R p, r ) to (R p, s ), where 1 r, s. This gives A (1,1) = max i a i, A (, ) = max i a i A (2,2) = [max{eig i (A T A) : 1 i p}] 1/2, and that for symmetric matrices, A (2,2) = max{ eig i (A) : 1 i p}, and A (1,1) = A (, ). The norm (r,r) will be referred to as the L r -operator norm. For two matrices 105 A and B, we say that A B (respectively, A > B) if A B is nonnegative definite (respec-

6 6 S. BANERJEE AND S. GHOSAL tively, positive definite). Thus A > 0 for a positive definite matrix A, where 0 stands for the zero matrix in such cases. The identity matrix of order p will be denoted by I p. For a set T, we denote the cardinality, that is, the number of elements in T, by #T. We 110 denote the submatrix of the matrix A induced by the set T {1,..., p} by A T, i.e., A T = (a i : i, T ). By A 1 T, we mean the inverse (A T ) 1 of the submatrix A T. For a p p matrix A = (a i ), let (A T ) 0 = (a i ) denote a p-dimensional matrix such that a i = a i for (i, ) T T, and 0 otherwise. Also we denote the banded version of A by B k (A) = {a i 1l( i k)} corresponding to banding parameter k, k < p. 115 Now we discuss some preliminaries on graphical models. An undirected graph G consists of a non-empty vertex set V = {1,..., p} along with an edge-set E {(i, ) V V : i < }. The vertices in V are the indices of the components of a p-dimensional random vector X = (X 1,..., X p ) T. The absence of an edge (i, ) corresponds to the conditional independence of X i and X given the rest. For a Gaussian random variable X with precision matrix Ω = (ω i ), this 120 is equivalent to ω i = 0. Figure 1 illustrates the connection between a banded precision matrix and the corresponding graphical model. Following the notation in Letac & Massam (2007), we restrict the canonical parameter Ω in P G, where P G is the cone of positive definite symmetric matrices of order p having zero entry corresponding to each missing edge in E. Denoting the linear space of symmetric matrices of order p by M, let M + p M be the cone of positive 125 definite matrices. The linear space of symmetric incomplete matrices A = (a i ) with missing 130 entries a i, (i, ) / E, will be denoted by I G. The parameter space of the Gaussian graphical model can be described by the set of incomplete matrices Σ = κ(ω 1 ), Ω P G ; here, κ : M I G is the proection of M into I G ; see Letac & Massam (2007). A subgraph G of G consists of a subset V of V and E = {(i, ) E : i, V }. A maximal saturated subgraph of G is called a clique. A path in a graph is a collection of adacent edges.

7 Posterior Convergence Rates in Graphical models 7 Fig. 1. [Left] Structure of a banded precision matrix with shaded non-zero entries. [Right] The graphical model corresponding to a banded precision matrix of dimension 6 and banding parameter 3. A subset S of E is called a separator of two cliques C 1 and C 2, if all intermediate edges in every path from C 1 to C 2 must entirely lie in S. A graph is called decomposable if it is possible to find a set of cliques covering all vertices connected by a set of separators. We shall only deal with decomposable graphs in the paper. For detailed concepts and notations for graphical models, we refer the readers to Lauritzen (1996). A set of cliques C = {C 1,..., C r } are said to be in perfect 135 order, if the following holds: For H 1 = R 1 = C 1, H = C 1... C, R = C H 1, S = H 1 C, ( = 2,..., r), (1) S = {S, ( = 2,..., p)} is the set of minimal separators in G. For a decomposable graph, a perfect order of the cliques always exists. For a decomposable graph G with a perfect order of the cliques {C 1,..., C r } and the precision matrix Ω is given to lie in P G, the incomplete matrix Σ is defined in terms of the submatrices corresponding to the cliques, that is, for each 140 (i = 1,..., r), Σ Ci is positive definite. Thus we have the parameter space for the decomposable

8 8 S. BANERJEE AND S. GHOSAL Gaussian graphical models restricted to the two cones P G = {A = (a i ) M + p : a i = 0, (i, ) / E}, (2) Q G = {B I G : B Ci > 0, (i = 1,..., r)}, (3) respectively for Ω and Σ. The W PG -Wishart distribution W PG (α, β, D) has three set of parameters α, β and D, where 145 α = (α 1,..., α r ) T and β = (β 2,..., β r ) T are suitable functions defined on the cliques and sep- arators of the graph respectively, and D is a scaling matrix. The G-Wishart distribution W G (δ, D) is a special case of the W PG -Wishart family where α i = δ + #C i 1, (i = 1,..., r), 2 β i = δ + #S i 1, (i = 2,..., r). 2 (4) 3. MODEL ASSUMPTION AND PRIOR SPECIFICATION Let X 1,..., X n be independent and identically distributed (i.i.d.) random p-vectors with mean and covariance matrix Σ. Write X i = (X i1,..., X ip ) T, and assume that the X i, (i = 1,..., n), are multivariate Gaussian. Consistent estimators for the covariance matrix were obtained in Bickel & Levina (2008b) by banding the sample covariance matrix, assuming a certain sparsity structure on the true covariance. Our aim is to obtain consistency of the graphical MLE and and Bayes estimates of the precision matrix Ω = Σ 1 under the condition n 1 log p 0 where 155 Ω ranges over some fairly natural families. For a given positive sequence γ(k) 0, we consider the class of positive definite symmetric matrices Ω = (ω i ) as U(ε 0, γ) = { Ω : max (ω i : i > k) γ(k) for all k > 0, i 0 < ε 0 min eig (Ω) max eig (Ω) ε 1 0 < }. (5)

9 Posterior Convergence Rates in Graphical models 9 We also define another class of positive definite symmetric matrices as V(K, γ) = { Ω : max ( ω i : i > k) γ(k) for all k > 0, i max ( Ω 1 (, ), Ω (, ) ) K }. (6) These two classes are closely related, as shown by the following lemma. LEMMA 1. For every ε 0, there exist K 1 K 2 such that V(K 1, γ) U(ε 0, γ) V(K 2, γ). (7) The sequence γ(k) which bounds Ω B k (Ω) (, ) has been kept flexible so as to include 160 a number of matrix classes. 1. Exact banding: γ(k) = 0 for all k k 0, which means that the true precision matrix is banded, with banding parameter k 0. For instance, any autoregressive process has such a form of precision matrix. 2. Exponential decay: γ(k) = e ck. For instance, any moving average process has such a form 165 of precision matrix. 3. Polynomial decay: γ(k) = γk α, α > 0. This class of matrices has been considered in Bickel & Levina (2008b). We shall work with these two general classes U(ε 0, γ) and V(K, γ) for estimating Ω. A banding structure in the precision matrix can be induced by a Gaussian graphical model model. Since 170 ω i = 0 implies that the components X i and X of X are conditionally independent given the others, we can thus define a Gaussian graphical model G = (V, E), where V = {1,..., p} indexing the p components X 1,..., X p, and E is the corresponding edge set defined by E = {(i, ) : i k}, where k is the size of the band. This describes a parameter space for pre-

10 S. BANERJEE AND S. GHOSAL cision matrices consisting of k-banded matrices, and can be used for the maximum likelihood or the Bayesian approach, where for the latter, a prior distribution on these matrices must be specified. Clearly, G is an undirected, decomposable graphical model for which a perfect order of cliques exist, given by C = {C 1,..., C p k }, C = {,..., + k}, ( = 1,..., p k). The corresponding separators are given by S = {S 2,..., S p k }, S = {,..., + k 1}, ( = 180 2,..., p k). The choice of the perfect set of cliques is not unique, but the estimator for the precision matrix Ω under all choices of the order remains the same. The W PG -family, as a prior distribution for Σ, is conugate if the prior distribution on Ω/2 is W PG (α, β, D), then the posterior distribution of Ω/2 given the sample covariance S = n 1 n i=1 X ix T i is given by W PG {α (n/2)(1,..., 1) T, β (n/2)(1,..., 1) T, D + κ(ns)}. As mentioned earlier, the G- 185 Wishart W G (δ, D) is a special case of W PG (α, β, D) for suitable choice of functions α and β. In our case, #C i = k + 1, (i = 1,..., p k), and #S = k, ( = 2,..., p k). Thus The posterior mean of Ω, given S is α i = δ + k, (i = 1,..., p k), 2 β = δ + k 1, ( = 2,..., p k). 2 E(Ω S) = 2 (α n [ 2 ) {D + κ(ns)} 1 =1 =2 C ] 0 (β n [ 2 ) {D + κ(ns)} 1 S ] 0. (8) (9) Taking D = I p, the p dimensional indicator matrix, and plugging in the values of α and β, the above estimator reduces to the Bayes estimator Ω B with respect to the G-Wishart prior

11 Posterior Convergence Rates in Graphical models 11 W G (δ, I p ): 190 Ω B = δ + k + n n { (n 1 I k+1 + S C ) 1} 0 =1 { (n 1 I k + S S ) 1} 0 + n 1 =2 (10) r { (n 1 I k + S S ) 1} 0. The graphical MLE for Ω under the graphical model with banding parameter k is given by Ω M = p k (S 1 =1 C ) 0 =2 p k (S 1 =2 S ) 0. (11) 4. MAIN RESULTS In this section, we determine the convergence rate of the Bayes estimator of the precision matrix. An important step towards this goal is to find the convergence rate of the graphical MLE, which is also of independent interest. For high-dimensional situations, even when the 195 sample covariance matrix is singular, the graphical MLE will be positive definite if the number of elements in the cliques of the corresponding graphical model is less than the sample size. Analogous results for banded empirical covariance (or precision) matrix or estimators based on thresholding approaches are typically given in terms of the L 2 -operator norm in the literature. We however use the stronger L -operator norm (or equivalently, L 1 -operator norm), so the 200 implication of a convergence rate in our theorems is stronger. THEOREM 1. Let X 1,..., X n be random samples from a p-dimensional Gaussian distribution with mean zero and precision matrix Ω 0 U(ϵ 0, γ) for some ϵ 0 > 0 and γ( ). Then the graphical MLE Ω M of Ω, corresponding to the Gaussian graphical model with banding parameter k, has convergence rate given by 205 { }] Ω M Ω 0 (, ) = O P [max k 2 (n 1 log p) 1/2, γ(k). (12)

12 12 S. BANERJEE AND S. GHOSAL In particular, Ω M is consistent in the L -operator norm if k such that k 4 n 1 log p 0. The proof will use the explicit form of the graphical MLE and proceed by bounding the mean squared error in the L -operator norm. However, as the graphical MLE involves number of terms (k + 1)(p k/2) = O(p), a naive approach will lead to a factor p in the estimate, which 210 will not be able to establish consistency or a convergence rate in the truly high dimensional situations p n. We overcome this obstacle by looking more carefully at the structure of the graphical MLE, and note that for any row i, the number of terms in (11) which have non-zero ith row is only at most (2k + 1) p. This along with the description of L -operator norm in terms of row sums give rise to a much smaller factor than p. 215 Now we treat Bayes estimators. Consider the G-Wishart prior W G (δ, I p ) for Ω, where the graph G has banding of order k and δ is a positive integer. The following result bounds the difference between Ω M and Ω B. LEMMA 2. Assume the conditions of Theorem 1 and suppose that Ω is given the G-Wishart prior W G (δ, I p ), where the graph G has banding of order k. Then Ω B Ω M (, ) = 220 O P (k 2 /n). The proof of the above lemma is given in the Appendix. Theorem 1 and Lemma 2 together lead to the following result for the convergence rate of the Bayes estimator under the G in the L -operator norm. THEOREM 2. In the setting of Lemma 2, the Bayes estimator satisfies { }] Ω B p Ω 0 (, ) = O P [max k 2 (n 1 log p) 1/2, γ(k). (13) 225 In particular, the Bayes estimator Ω B is consistent in the L -operator norm if k such that k 4 n 1 log p 0.

13 Posterior Convergence Rates in Graphical models 13 We now study the consistency and convergence rate of the posterior distribution of the precision matrix given the data. The following theorem describes the behavior of the entire posterior distribution. THEOREM 3. In the setting of Lemma 2, the posterior distribution of the precision matrix Ω 230 satisfies E 0 { pr ( Ω Ω0 (, ) > M n ϵ n,k X )} 1 (14) for ϵ n,k = max { k 2 (n 1 log p) 1/2, γ(k) } and a sufficiently large constant M > 0. In particular, the posterior distribution is consistent in the L -operator norm if k such that k 4 n 1 log p 0. Remarks on the convergence rates. Observe that the convergence rates of the graphical MLE, 235 the Bayes estimator and the posterior distribution obtained above are the same. The obtained rates can be optimised by choosing k appropriately as in a bias-variance trade-off. The fastest possible rates obtained from the theorems may be summarised for the different decay rates of γ(k) as follows: If the true precision matrix is banded with banding parameter k 0, then the optimal rate of convergence n 1/2 (log p) 1/2 is obtained by choosing any fixed k k 0. When γ(k) decays 240 exponentially, the rate of convergence n 1/2 (log p) 1/2 (log n) 2 can be obtained by choosing k n approximately proportional to log n with some sufficiently large constant of proportionality. If γ(k) decays polynomially with index α as in Bickel & Levina (2008b), we get the consistency rate of (n 1 log p) α/(2α+4) corresponding to k n (n 1 log p) 1/(2α+4). It is to be noted that we have not assumed that the true structure of the precision matrix arises 245 from a graphical model. The graphical model is a convenient tool to generate useful estimators through the maximum likelihood and Bayesian approach, but the graphical model itself may be a misspecified model. Further, it can be inspected from the proof of the theorems that the Gaus-

14 14 S. BANERJEE AND S. GHOSAL 250 sianity assumption on true distribution of the observations is not essential, although the graphical model assumes Gaussianity to generate estimators. The Gaussianity assumption is used to control certain probabilities by applying the probability inequality Lemma A.3 of Bickel & Levina (2008b). However, it was also observed by Bickel & Levina (2008b) that one only requires bounds on the moment generating function of Xi 2, (i = 1,..., p). In particular, any thinner tailed distribution, such as one with a bounded support, will allow the arguments to go through ESTIMATION USING A REFERENCE PRIOR A reference prior for the covariance matrix Σ, obtained in Raaratnam et al. (2008), can also be used to induce a prior on Ω. This corresponds to an improper W PG (α, β, 0) distribution for Ω/2 with α i = 0, (i = 1,..., r), β 2 = 1 2 (c 1 + c 2 ) s 2, β = 1 2 (c s ), ( = 3,..., r). (15) By Corollary 4.1 in Raaratnam et al. (2008), the posterior mean Ω R of the precision matrix is 260 given by Ω R = r =1 (S 1 C ) 0 {1 n 1 (c 1 + c 2 2s 2 )}(S 1 S 2 ) 0 r =3 {1 n 1 (c s )}(S 1 S ) 0. (16) Using this prior, we have an improvement in the L -operator norm of the difference between the Bayes estimator Ω R and the graphical MLE Ω M. However, this does not lead to any faster convergence rate of the Bayes estimator.

15 Posterior Convergence Rates in Graphical models 15 THEOREM 4. Under the reference prior mentioned above, Ω R Ω ) M (, ) = O P (k 3/2 /n. (17) A sketch of the proof is given in Section NUMERICAL RESULTS We check the performance of the Bayes estimator of the precision matrix and compare with the graphical MLE and the banded estimators as proposed in Bickel & Levina (2008b). We compare the Bayes estimator of the precision matrix and the corresponding estimator of the covariance matrix with the respective estimates given by the other two methods as mentioned 270 above. Data is simulated from N(0, Σ), assuming specific structures of the covariance Σ or the precision Ω. For all simulations, we compute the L 2 - operator norm of the difference between the estimate and the true parameter for sample sizes n = 50, 100, 200 and p = n/2, n, 2n, 5n, representing cases like p < n, p n, p > n and p n. We simulate 100 replications in each cases. Some of the simulation models are the same as those in Bickel & Levina (2008b). 275 Example 1 (Autoregressive process: AR(1) covariance structure). Let the true covariance matrix have entries given by σ i = ρ i, 1 i, p, (18) with ρ = 0.5 in our simulation experiment. The precision matrix is banded in this case, with banding parameter 1.

16 S. BANERJEE AND S. GHOSAL Example 2 (Autoregressive process: AR(4) covariance structure). The elements of true preci- sion matrix are given by ω i =1l( i = 0) l( i = 1) l( i = 2) l( i = 3) l( i = 4). (19) This is the precision matrix corresponding to an AR(4) process. Example 3 (Long range dependence). We consider a Fractional Gaussian Noise process, that is, the increment process of fractional Brownian motion. The elements of the true covariance 285 matrix are given by σ i = 1 2 [ i + 1 2H 2 i 2H + i 1 2H ], 1 i, p, (20) where H [0.5, 1] is the Hurst parameter. We take H = 0.7 in the simulation example. This precision matrix does not fall in the polynomial smoothness class used in the theorems. We include this example in the simulation study to check how the proposed method is performing when the assumptions of the theorems are not met. 290 Table 1 shows the simulation results for the different scenarios and compares the performance of the Bayes estimator with the MLE and the banded estimator (denoted by BL) in terms of the L 2 -operator norm of the difference of the precision and covariance matrices from their respective true values. The maximum likelihood and Bayes estimates of the covariance matrix is obtained by inverting the estimated precision matrix, while for the banding approach the estimate of the 295 covariance matrix is obtained by banding the sample covariance matrix and that of the precision matrix is obtained by the Cholesky based method as in Bickel & Levina (2008b). The first column in the table with entries respectively Ω 1, Ω 2, Ω 3 stand for the situations where the data generating mechanism follow the processes respectively in Example 1, Example 2 and Example 3. The

17 Posterior Convergence Rates in Graphical models 17 estimates for the first two examples have been computed using the value of the banding parameter of the true precision matrix. For Example 3, we used k = 1, the value which apparently gave the 300 best result. 7. PROOFS In this section we provide the proofs of the theorems and lemmas stated in Section 4. Proofs of these results will require some additional lemmas and propositions, which we include in the 305 Appendix. Proof of Theorem 1. The L -operator norm of the difference between the graphical MLE Ω M and the true precision matrix Ω 0 can be written as Ω M Ω 0 (, ) Ω M B k (Ω 0 ) (, ) + Ω 0 B k (Ω 0 ) (, ). (21) As shown in Lauritzen (1996), in a graphical model ( ) 0 ( ) 0 ΩC ΩS = =1 =2 Hence the first term can be written as =1 ( ) 0 S 1 C =1 =2 ( ) 0 S 1 S { ( S 1 C ) 0 ( Σ 1 =1 =1 ( ) 0 ΩC + ( ) 0 Σ 1 C =2 ( Σ 1 S ) 0. ( ΩS) =2 0(, ) ) } 0 { ( ) 0 ( ) } 0 C + S 1 S Σ 1 S. (, ) (, ) =2

18 S. BANERJEE AND S. GHOSAL Let us first bound the first term. Using the fact that there are only (2k + 1) terms in above expressions inside the norms which have a given row non-zero, it follows that { ( ) 0 ( ) } 0 S 1 C Σ 1 C =1 = max l l =1 { ( max S 1 l =1 l ( (2k + 1) max max l l = (2k + 1) max S 1 C (, ) { ( S 1 C ) 0 ( Σ 1 ) } 0 C ) 0 ( ) } 0 C Σ 1 C S 1 C Σ 1 (l,l ) (l,l ) C )(l,l ) Σ 1 C (, ), (22) where the subscript (l, l ) on the matrices above stand for their respective (l, l )th entries. Using the multiplicative inequality AB AB of operator norms, we have max S 1 C = max max Σ 1 C (, ) Σ 1 C (Σ C S C )S 1 (, ) C (23) ( Σ 1 (, ) ) ΣC C S (, ) S 1 (, ) C C. By assumption on the class of matrices and Lemma 1, Σ 1 C (, ) is bounded by K 2. From 315 Lemma 5, ( P max S 1 C (, ) M 1 ) p max ( S ) 1 (, ) pr C M 1 M 1pk 2 exp( m 2 1nk 2 ) for some constant M 1, M 1, m 1 > 0, while from Lemma 4, ( ) pr max Σ C S (, ) C t M 2 pk 2 exp( m 2 nk 2 t 2 )

19 Posterior Convergence Rates in Graphical models 19 for t < m 2 for some constants M 2, m 2, m 2 > 0. We choose t = Ak(n 1 log p) 1/2 for some sufficiently large A to get the bound { ( ) 0 ( ) } 0 S 1 C Σ 1 C = O P (, ) =1 By a similar argument, we can establish { ( ) 0 ( ) } 0 S 1 S Σ 1 S = O P (, ) =2 { k 2 (n 1 log p) 1/2}. (24) { k 2 (n 1 log p) 1/2}. (25) Therefore, in view of the assumption Ω 0 B k (Ω 0 ) (, ) γ(k), we obtain the result. Proof of Lemma 2. The L -operator norm of Ω B Ω M can be bounded by { n (n 1 I k + S S ) 1} 0 (26) =2 (, ) + δ + k + n { n (n 1 I k+1 + S C ) 1} 0 (S 1 C ) 0 (27) =1 =1 (, ) + δ + k + n { n (n 1 I k + S S ) 1} 0 (S 1 S ) 0 (28) =2 =2 (, ) + δ + k + n ( ) 0 ( ) 0 1 n S 1 C S 1 S. (29) (, ) =1 =2 Now, (26) above is 1 n max { (n 1 I l k + S S ) 1} 0 l =2 (l,l ) 1 p k n max {(n l [ 1 I k + S S ) 1} ] 0 =2 l (l,l ) { (n 1 I k + S S ) 1} (l,l ) 2k + 1 n = 2k + 1 n max max l l max (n 1 I k + S S ) 1 (, ),

20 20 S. BANERJEE AND S. GHOSAL which is bounded by a multiple of k 3/2 n max (n 1 I k + S S ) 1 (2,2) k3/2 n k3/2 n max max S 1 (2,2) S S 1 S (, ). (30) In view of Lemma 5, we have that for some M 3, M 3, m 3 > 0, ( pr max S 1 S (, ) M 3 ) M 3pk 2 exp[ m 2 3nk 2 ], which converges to zero if k 2 (log p)/n 0. This leads to the estimate n 1 For (27), we observe that { (n 1 I k + S S ) 1} 0 = O P (, ) =2 { (n 1 I k+1 + S C ) 1} 0 (S 1 C ) 0 =1 =1 (2k + 1) max (n 1 I k+1 + S C ) 1 S 1 k 3/2 max ( ) k 3/2 /n. (31) (, ) (n 1 I k+1 + S C ) 1 S 1 C (2,2) C (, ) 325 and that (n 1 I k+1 + S C ) 1 S 1 C (2,2) (n 1 I k+1 + S C ) 1 (2,2) n 1 I k+1 (2,2) S 1 C (2,2) n 1 S 1 2 S C (2,2) n C. (, ) Now under k 2 (log p)/n 0, an application of Lemma 5 leads to the bound O P (k 3/2 /n) for (27). A similar argument gives rise to the same O P (k 3/2 /n) bound for (28).

21 Posterior Convergence Rates in Graphical models 21 Finally to consider (29). As argued in bounding (26), we have that =1 { (2k + 1) ( ) 0 S 1 C =2 max ( S 1 S 1 S ) 0 (, ) (, ) C + max S 1 S (, ) } = O P (k), under the assumption k 2 (log p)/n 0 by another application of Lemma 5. Since n 1 (δ + k n) 1 = O(k/n), it follows that (29) is O P (k 2 /n), which is the weakest estimate among all terms in the bound for Ω B Ω M. The result thus follows. Proof of Theorem 2. The proof directly follows from Theorem 1 and Lemma 2 using the triangle inequality. Proof of Theorem 3. The posterior distribution of the precision matrix Ω given the data X is a 335 G-Wishart distribution W G (δ + n, I p + ns). We can write Ω as ( ) 0 ( ) 0 Ω = ΩC ΩS =1 =2 { = (Σ 1 } 0 { ) C (Σ 1 } 0 ) S, =1 =2 ( ) 0 ( ) 0 = WC WS, (32) =1 =2 where W C = (Σ C ) 1, W S = (Σ S ) 1, and the equality of the expressions follow from Lauritzen (1996). Note that the equality in the expressions for Ω and W also imply that E(Ω X) = E(W X). The submatrix Σ C for any clique C has a inverse Wishart distribution with parameters δ + n and scale matrix (I p + ns) C, ( = 1,..., p k). Thus, W C = (Σ C ) 1 has a 340 Wishart distribution induced by the corresponding inverse Wishart distribution. In particular, if i C, then τ 1 in w ii has chi-square distribution with (δ + n) degrees of freedom, where τ in is the (i, i)th entry of {(I + S C ) 1 } 0. Fix a clique C = C and define T n = diag(w ii : i C). For

22 22 S. BANERJEE AND S. GHOSAL i, C, let w i = w i/ τ in τ n and W C = (w i : i, C). Then W C given X has a Wishart 345 distribution with parameters δ + n and scale matrix T 1/2 n (I k+1 + ns C )T 1/2 n. We first note that max i τ in = O P (n 1 ). To see this, observe that (I k + ns C ) 1 n 1 S 1 C, so that max τ in 1 i n S 1 C (2,2) = O P (n 1 ) in view of Lemma 5. On the other hand, from Lemma 4, it follows that max C S C (2,2) = O P (1), so with probability tending to one, S C LI C, and hence (I + ns) 1 C simultaneously for all cliques, for some constant L > 0. Hence max i τ 1 in (1 + nl) 1 I C = O P (n). Consequently, with probability tending to one, the maximum eigenvalue of Tn 1/2 (I k+1 + ns C )Tn 1/2 350 is bounded by a constant depending only on ϵ 0, simultaneously for all cliques. Hence applying Lemma A.3 of Bickel & Levina (2008b), it follows that for all i,, pr { w i E(w i X) t} M 4 exp{ m 4 (δ + n)t 2 }, t < m 4, (33) for some constants M 4, m 4, m 4 > 0 depending on ϵ 0 only. Now, as a G-Wishart prior gives rise to a k-banded structure, as arguing in the bounding of (26) and using (32), we have that, for some M 5, m 5, m 5 > 0, and all t < m 5, ( pr Ω Ω ) B (, ) t X M 5 p exp{ m 5 n(2k + 1) 2 t 2 }. (34) 355 The reduction in the number of terms in the rows from p to (2k + 1) arises due to the fact that the G-Wishart posterior preserves the banded structure of the precision matrix. Choosing t = A(n 1 log p) 1/2, with A sufficiently large, we get E 0 [pr{ω Ω B (, ) Ak(n 1 log p) 1/2 X}] 0. (35)

23 Therefore, using Theorem 2, Posterior Convergence Rates in Graphical models 23 E 0 { pr ( Ω Ω0 (, ) > 2ϵ n X )} ) { ( pr 0 ( Ω B Ω 0 (, ) > ϵ n X + E 0 pr Ω Ω )} B (, ) > ϵ n X, which converges to zero if ϵ n = max{ak(n 1 log p) 1/2, γ(k)}. Proof Proof of Theorem 4. In our scenario, the Bayes estimator under the reference prior is given by the expression p k Ω R = E(Ω S) = =1 (S 1 C ) 0 (1 n 1 )(S 1 S 2 ) 0 (1 n 1 ) =3 (S 1 S ) 0. Therefore 360 Ω R Ω M (, ) = n 1 (S 1 S 2 ) 0 + n 1 (S 1 S ) 0 =3 (, ) n 1 (S 1 S ) 0 + n 1 (S 1 S 2 ) 0 (, ). (, ) =2 The rest of the proof proceeds as in Lemma 2. A. PROOFS OF AUXILIARY RESULTS In this section we give proofs of some lemmas we have used in the paper, which are of some general interest. The first lemma deals with the various equivalence conditions for matrix norms and is easily found in 365 standard textbooks. LEMMA A3. For a symmetric matrix A of order k, we have the following: 1. A (2,2) A (, ) ka (2,2) ; 2. A A (2,2) A (, ) ka ;

24 24 S. BANERJEE AND S. GHOSAL 370 Now we prove the lemma concerning the equivalence of the classes of matrices considered for the preci- sion matrix Ω. Proof of Lemma 1. We rewrite the class of matrices defined in (5) as U(ε 0, γ) = { Ω : max (ω i : i > k) γ(k) for all k > 0, i max ( Ω 1 (2,2), Ω (2,2) ) ε 1 0 }. (A1) Now, max ( Ω 1 (, ), Ω (, ) ) K1 implies max ( Ω 1 (2,2), Ω (2,2) ) ε 1 0 for K 1 = ε 1 0, using Lemma 3. Thus V(K 1, γ) U(ε 0, γ). 375 To see the other way, note that, for any fixed k 0, Ω (, ) Ω B k0 (Ω) (, ) + B k0 (Ω) (, ) γ(k 0 ) + (2k 0 + 1)Ω γ(k 0 ) + (2k 0 + 1)Ω (2,2) γ(k 0 ) + (2k 0 + 1)ε 1 0. (A2) Choosing K 2 = γ(k 0 ) + (2k 0 + 1)ε 1 0 gives U(ε 0, γ) V(K 2, γ). LEMMA A4. Let Z i, (i = 1,..., n), be i.i.d. k-dimensional random vectors distributed as N(0, D) and max ( ) D 1 (, ), D (, ) K. Then for the sample variance Sn = n i=1 Z izi T, we have ( ) pr S n D (, ) t Mk 2 exp( mnk 2 t 2 ), t m, (A3) where M, m, m > 0 depend on K only. 380 In particular, if k 2 (log k)/n 0, then S n (, ) = O P (1). Proof. The proof directly follows from Lemma A.3 of Bickel & Levina (2008b) and noting from Lemma 3 that S n D (, ) ks n D.

25 Posterior Convergence Rates in Graphical models 25 LEMMA A5. Let Z i, (i = 1,..., n), be i.i.d. k-dimensional random vectors distributed as N(0, D) and max ( ) D 1 (, ), D (, ) K. Then for the sample variance Sn = n i=1 Z izi T, we have pr ( S 1 n (, ) M ) M k 2 exp( mnk 2 C 2 ), (A4) where M > K and M, m > 0 depend on M and K only. 385 Proof. Note that, This implies that Sn 1 (, ) D 1 (, ) + Sn 1 D 1 (, ) = D 1 (, ) + D 1 (, ) S n D (, ) Sn 1 (, ) K(1 + S n D (, ) S 1 n (, ) ). (A5) Sn 1 K (, ) 1 S n D (, ) K. Thus, using Lemma 4, we obtain pr ( Sn 1 (, ) M ) ( ) K pr 1 S n D (, ) K M pr (S n D (, ) K 1 M 1) (A6) M k 2 exp( m 2 nk 2 ). REFERENCES ATAY-KAYIS, A. & MASSAM, H. (2005). A Monte-Carlo method for computing the marginal likelihood in nonde- 390 composable Gaussian graphical models. Biometrika 92, BICKEL, P. & LEVINA, E. (2008a). Covariance regularization by thresholding. Ann. Statist. 36, BICKEL, P. & LEVINA, E. (2008b). Regularized estimation of large covariance matrices. Ann. Statist. 36, CAI, T. & LIU, W. (2011). Adaptive thresholding for sparse covariance matrix estimation. J. Amer. Statist. Assoc. 106,

26 26 S. BANERJEE AND S. GHOSAL CAI, T., LIU, W. & LUO, X. (2011). A constrained l 1 -minimization approach to sparse precision matrix estimation. J. Amer. Statist. Assoc. 106, CAI, T. & YUAN, M. (2012). Adaptive covariance matrix estimation through block thresholding. Ann. Statist. 40, CAI, T., ZHANG, C. & ZHOU, H. (2010). Optimal rates of convergence for covariance matrix estimation. Ann. Statist. 38, CARVALHO, C., MASSAM, H. & WEST, M. (2007). Simulation of hyper-inverse Wishart distributions in graphical models. Biometrika 94, CARVALHO, C. & SCOTT, J. (2009). Obective Bayesian model selection in Gaussian graphical models. Biometrika , DAWID, A. & LAURITZEN, S. (1993). Hyper Markov laws in the statistical analysis of decomposable graphical models. Ann. Statist. 21, DIACONIS, P. & YLVISAKER, D. (1979). Conugate priors for exponential families. Ann. Statist. 7, DOBRA, A., HANS, C., JONES, B., NEVINS, J., YAO, G. & WEST, M. (2004). Sparse graphical models for exploring 410 gene expression data. J. Multivariate Anal. 90, DOBRA, A., LENKOSKI, A. & RODRIGUEZ, A. (2011). Bayesian inference for general Gaussian graphical models with application to multivariate lattice data. J. Amer. Statist. Assoc. 106, FRIEDMAN, J., HASTIE, T. & TIBSHIRANI, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, GHOSAL, S. (2000). Asymptotic normality of posterior distributions for exponential families when the number of parameters tends to infinity. J. Multivariate Anal. 74, HUANG, J., LIU, N., POURAHMADI, M. & LIU, L. (2006). Covariance matrix selection and estimation via penalised normal likelihood. Biometrika 93, KAROUI, N. (2008). Operator norm consistent estimation of large-dimensional sparse covariance matrices. Ann. 420 Statist. 36, LAM, C. & FAN, J. (2009). Sparsistency and rates of convergence in large covariance matrix estimation. Ann Statist. 37, LAURITZEN, S. (1996). Graphical Models, vol. 17. Oxford University Press, USA. LEDOIT, O. & WOLF, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. J. 425 Multivariate Anal. 88,

27 Posterior Convergence Rates in Graphical models 27 LENKOSKI, A. & DOBRA, A. (2011). Computational aspects related to inference in Gaussian graphical models with the g-wishart prior. J. Comput. Graphical Statist. 20, LETAC, G. & MASSAM, H. (2007). Wishart distributions for decomposable graphs. Ann. Statist. 35, MEINSHAUSEN, N. & BÜHLMANN, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann Statist. 34, PATI, D., BHATTACHARYA, A., PILLAI, N. & DUNSON, D. (2012). Posterior contraction in sparse Bayesian factor models for massive covariance matrices. RAJARATNAM, B., MASSAM, H. & CARVALHO, C. (2008). Flexible covariance estimation in graphical Gaussian models. Ann Statist. 36, ROTHMAN, A., BICKEL, P., LEVINA, E. & ZHU, J. (2008). Sparse permutation invariant covariance estimation. 435 Electron. J. Statist. 2, ROTHMAN, A., LEVINA, E. & ZHU, J. (2009). Generalized thresholding of large covariance matrices. J. Amer. Statist. Assoc. 104, ROVERATO, A. (2000). Cholesky decomposition of a hyper inverse Wishart matrix. Biometrika 87, YUAN, M. & LIN, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika 94,

28 28 S. BANERJEE AND S. GHOSAL Table 1. Simulation results for different structures of precision matrices Ω n p ˆΩ Ω 2,2 ˆΣ Σ 2,2 MLE Bayes BL MLE Bayes BL Ω Ω Ω Ω Ω Ω Ω Ω Ω

arxiv: v3 [math.st] 6 Nov 2014

arxiv: v3 [math.st] 6 Nov 2014 arxiv:1302.2677v3 [math.st] 6 Nov 2014 Electronic Journal of Statistics Vol. 8 (2014) 2111 2137 ISSN: 1935-7524 DOI: 10.1214/14-EJS945 Posterior convergence rates for estimating large precision matrices

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results David Prince Biostat 572 dprince3@uw.edu April 19, 2012 David Prince (UW) SPICE April 19, 2012 1 / 11 Electronic

More information

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008 Regularized Estimation of High Dimensional Covariance Matrices Peter Bickel Cambridge January, 2008 With Thanks to E. Levina (Joint collaboration, slides) I. M. Johnstone (Slides) Choongsoon Bae (Slides)

More information

High Dimensional Covariance and Precision Matrix Estimation

High Dimensional Covariance and Precision Matrix Estimation High Dimensional Covariance and Precision Matrix Estimation Wei Wang Washington University in St. Louis Thursday 23 rd February, 2017 Wei Wang (Washington University in St. Louis) High Dimensional Covariance

More information

Inequalities on partial correlations in Gaussian graphical models

Inequalities on partial correlations in Gaussian graphical models Inequalities on partial correlations in Gaussian graphical models containing star shapes Edmund Jones and Vanessa Didelez, School of Mathematics, University of Bristol Abstract This short paper proves

More information

Comment on Article by Scutari

Comment on Article by Scutari Bayesian Analysis (2013) 8, Number 3, pp. 543 548 Comment on Article by Scutari Hao Wang Scutari s paper studies properties of the distribution of graphs ppgq. This is an interesting angle because it differs

More information

Bayesian structure learning in graphical models

Bayesian structure learning in graphical models Bayesian structure learning in graphical models Sayantan Banerjee 1, Subhashis Ghosal 2 1 The University of Texas MD Anderson Cancer Center 2 North Carolina State University Abstract We consider the problem

More information

FLEXIBLE COVARIANCE ESTIMATION IN GRAPHICAL GAUSSIAN MODELS

FLEXIBLE COVARIANCE ESTIMATION IN GRAPHICAL GAUSSIAN MODELS Submitted to the Annals of Statistics FLEXIBLE COVARIANCE ESTIMATION IN GRAPHICAL GAUSSIAN MODELS By Bala Rajaratnam, Hélène Massam and Carlos M. Carvalho Stanford University, York University, The University

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices

A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices Natalia Bailey 1 M. Hashem Pesaran 2 L. Vanessa Smith 3 1 Department of Econometrics & Business Statistics, Monash

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

Structure estimation for Gaussian graphical models

Structure estimation for Gaussian graphical models Faculty of Science Structure estimation for Gaussian graphical models Steffen Lauritzen, University of Copenhagen Department of Mathematical Sciences Minikurs TUM 2016 Lecture 3 Slide 1/48 Overview of

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

FLEXIBLE COVARIANCE ESTIMATION IN GRAPHICAL GAUSSIAN MODELS

FLEXIBLE COVARIANCE ESTIMATION IN GRAPHICAL GAUSSIAN MODELS Submitted to the Annals of Statistics FLEXIBLE COVARIANCE ESTIMATION IN GRAPHICAL GAUSSIAN MODELS By Bala Rajaratnam, Hélène Massam and Carlos M. Carvalho Stanford University, York University, The University

More information

Dynamic Matrix-Variate Graphical Models A Synopsis 1

Dynamic Matrix-Variate Graphical Models A Synopsis 1 Proc. Valencia / ISBA 8th World Meeting on Bayesian Statistics Benidorm (Alicante, Spain), June 1st 6th, 2006 Dynamic Matrix-Variate Graphical Models A Synopsis 1 Carlos M. Carvalho & Mike West ISDS, Duke

More information

Robust Inverse Covariance Estimation under Noisy Measurements

Robust Inverse Covariance Estimation under Noisy Measurements .. Robust Inverse Covariance Estimation under Noisy Measurements Jun-Kun Wang, Shou-De Lin Intel-NTU, National Taiwan University ICML 2014 1 / 30 . Table of contents Introduction.1 Introduction.2 Related

More information

Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices

Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices arxiv:1308.3416v1 [stat.me] 15 Aug 2013 Yixin Fang 1, Binhuan Wang 1, and Yang Feng 2 1 New York University and 2 Columbia

More information

MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models

MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models 1/13 MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models Dominique Guillot Departments of Mathematical Sciences University of Delaware May 4, 2016 Recall

More information

Bayesian (conditionally) conjugate inference for discrete data models. Jon Forster (University of Southampton)

Bayesian (conditionally) conjugate inference for discrete data models. Jon Forster (University of Southampton) Bayesian (conditionally) conjugate inference for discrete data models Jon Forster (University of Southampton) with Mark Grigsby (Procter and Gamble?) Emily Webb (Institute of Cancer Research) Table 1:

More information

Decomposable Graphical Gaussian Models

Decomposable Graphical Gaussian Models CIMPA Summerschool, Hammamet 2011, Tunisia September 12, 2011 Basic algorithm This simple algorithm has complexity O( V + E ): 1. Choose v 0 V arbitrary and let v 0 = 1; 2. When vertices {1, 2,..., j}

More information

Gaussian graphical model determination based on birth-death MCMC inference

Gaussian graphical model determination based on birth-death MCMC inference Gaussian graphical model determination based on birth-death MCMC inference Abdolreza Mohammadi Johann Bernoulli Institute, University of Groningen, Netherlands a.mohammadi@rug.nl Ernst C. Wit Johann Bernoulli

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

Estimating Structured High-Dimensional Covariance and Precision Matrices: Optimal Rates and Adaptive Estimation

Estimating Structured High-Dimensional Covariance and Precision Matrices: Optimal Rates and Adaptive Estimation Estimating Structured High-Dimensional Covariance and Precision Matrices: Optimal Rates and Adaptive Estimation T. Tony Cai 1, Zhao Ren 2 and Harrison H. Zhou 3 University of Pennsylvania, University of

More information

Introduction to graphical models: Lecture III

Introduction to graphical models: Lecture III Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction

More information

Geometric ergodicity of the Bayesian lasso

Geometric ergodicity of the Bayesian lasso Geometric ergodicity of the Bayesian lasso Kshiti Khare and James P. Hobert Department of Statistics University of Florida June 3 Abstract Consider the standard linear model y = X +, where the components

More information

High-dimensional Covariance Estimation Based On Gaussian Graphical Models

High-dimensional Covariance Estimation Based On Gaussian Graphical Models High-dimensional Covariance Estimation Based On Gaussian Graphical Models Shuheng Zhou, Philipp Rutimann, Min Xu and Peter Buhlmann February 3, 2012 Problem definition Want to estimate the covariance matrix

More information

Extended Bayesian Information Criteria for Gaussian Graphical Models

Extended Bayesian Information Criteria for Gaussian Graphical Models Extended Bayesian Information Criteria for Gaussian Graphical Models Rina Foygel University of Chicago rina@uchicago.edu Mathias Drton University of Chicago drton@uchicago.edu Abstract Gaussian graphical

More information

arxiv: v1 [math.st] 13 Feb 2012

arxiv: v1 [math.st] 13 Feb 2012 Sparse Matrix Inversion with Scaled Lasso Tingni Sun and Cun-Hui Zhang Rutgers University arxiv:1202.2723v1 [math.st] 13 Feb 2012 Address: Department of Statistics and Biostatistics, Hill Center, Busch

More information

On Bayesian Computation

On Bayesian Computation On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Sparse Permutation Invariant Covariance Estimation: Final Talk

Sparse Permutation Invariant Covariance Estimation: Final Talk Sparse Permutation Invariant Covariance Estimation: Final Talk David Prince Biostat 572 dprince3@uw.edu May 31, 2012 David Prince (UW) SPICE May 31, 2012 1 / 19 Electronic Journal of Statistics Vol. 2

More information

Learning discrete graphical models via generalized inverse covariance matrices

Learning discrete graphical models via generalized inverse covariance matrices Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,

More information

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data Journal of Multivariate Analysis 78, 6282 (2001) doi:10.1006jmva.2000.1939, available online at http:www.idealibrary.com on Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone

More information

Theory and Applications of High Dimensional Covariance Matrix Estimation

Theory and Applications of High Dimensional Covariance Matrix Estimation 1 / 44 Theory and Applications of High Dimensional Covariance Matrix Estimation Yuan Liao Princeton University Joint work with Jianqing Fan and Martina Mincheva December 14, 2011 2 / 44 Outline 1 Applications

More information

MATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.)

MATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.) 1/12 MATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.) Dominique Guillot Departments of Mathematical Sciences University of Delaware May 6, 2016

More information

Bayesian Inference of Multiple Gaussian Graphical Models

Bayesian Inference of Multiple Gaussian Graphical Models Bayesian Inference of Multiple Gaussian Graphical Models Christine Peterson,, Francesco Stingo, and Marina Vannucci February 18, 2014 Abstract In this paper, we propose a Bayesian approach to inference

More information

Estimation of large dimensional sparse covariance matrices

Estimation of large dimensional sparse covariance matrices Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)

More information

arxiv: v2 [math.st] 7 Aug 2014

arxiv: v2 [math.st] 7 Aug 2014 Sparse and Low-Rank Covariance Matrices Estimation Shenglong Zhou, Naihua Xiu, Ziyan Luo +, Lingchen Kong Department of Applied Mathematics + State Key Laboratory of Rail Traffic Control and Safety arxiv:407.4596v2

More information

Regularized Parameter Estimation in High-Dimensional Gaussian Mixture Models

Regularized Parameter Estimation in High-Dimensional Gaussian Mixture Models LETTER Communicated by Clifford Lam Regularized Parameter Estimation in High-Dimensional Gaussian Mixture Models Lingyan Ruan lruan@gatech.edu Ming Yuan ming.yuan@isye.gatech.edu School of Industrial and

More information

arxiv: v1 [math.st] 16 Aug 2011

arxiv: v1 [math.st] 16 Aug 2011 Retaining positive definiteness in thresholded matrices Dominique Guillot Stanford University Bala Rajaratnam Stanford University August 17, 2011 arxiv:1108.3325v1 [math.st] 16 Aug 2011 Abstract Positive

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

PIOTR GRACZYK LAREMA, UNIVERSITÉ D ANGERS, FRANCJA. Estymatory kowariancji dla modeli graficznych

PIOTR GRACZYK LAREMA, UNIVERSITÉ D ANGERS, FRANCJA. Estymatory kowariancji dla modeli graficznych PIOTR GRACZYK LAREMA, UNIVERSITÉ D ANGERS, FRANCJA Estymatory kowariancji dla modeli graficznych wspó lautorzy: H. Ishi(Nagoya), B. Ko lodziejek(warszawa), S. Mamane(Johannesburg), H. Ochiai(Fukuoka) XLV

More information

Robust and sparse Gaussian graphical modelling under cell-wise contamination

Robust and sparse Gaussian graphical modelling under cell-wise contamination Robust and sparse Gaussian graphical modelling under cell-wise contamination Shota Katayama 1, Hironori Fujisawa 2 and Mathias Drton 3 1 Tokyo Institute of Technology, Japan 2 The Institute of Statistical

More information

A structural Markov property for decomposable graph laws that allows control of clique intersections

A structural Markov property for decomposable graph laws that allows control of clique intersections Biometrika (2017), 00, 0,pp. 1 11 Printed in Great Britain doi: 10.1093/biomet/asx072 A structural Markov property for decomposable graph laws that allows control of clique intersections BY PETER J. GREEN

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04

More information

Latent Variable Graphical Model Selection Via Convex Optimization

Latent Variable Graphical Model Selection Via Convex Optimization Latent Variable Graphical Model Selection Via Convex Optimization The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

More information

Sparse estimation of high-dimensional covariance matrices

Sparse estimation of high-dimensional covariance matrices Sparse estimation of high-dimensional covariance matrices by Adam J. Rothman A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Statistics) in The

More information

High-dimensional statistics: Some progress and challenges ahead

High-dimensional statistics: Some progress and challenges ahead High-dimensional statistics: Some progress and challenges ahead Martin Wainwright UC Berkeley Departments of Statistics, and EECS University College, London Master Class: Lecture Joint work with: Alekh

More information

A MCMC Approach for Learning the Structure of Gaussian Acyclic Directed Mixed Graphs

A MCMC Approach for Learning the Structure of Gaussian Acyclic Directed Mixed Graphs A MCMC Approach for Learning the Structure of Gaussian Acyclic Directed Mixed Graphs Ricardo Silva Abstract Graphical models are widely used to encode conditional independence constraints and causal assumptions,

More information

Estimation of Graphical Models with Shape Restriction

Estimation of Graphical Models with Shape Restriction Estimation of Graphical Models with Shape Restriction BY KHAI X. CHIONG USC Dornsife INE, Department of Economics, University of Southern California, Los Angeles, California 989, U.S.A. kchiong@usc.edu

More information

Decomposable and Directed Graphical Gaussian Models

Decomposable and Directed Graphical Gaussian Models Decomposable Decomposable and Directed Graphical Gaussian Models Graphical Models and Inference, Lecture 13, Michaelmas Term 2009 November 26, 2009 Decomposable Definition Basic properties Wishart density

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

Sparse Permutation Invariant Covariance Estimation

Sparse Permutation Invariant Covariance Estimation Sparse Permutation Invariant Covariance Estimation Adam J. Rothman University of Michigan, Ann Arbor, USA. Peter J. Bickel University of California, Berkeley, USA. Elizaveta Levina University of Michigan,

More information

Inverse Covariance Estimation with Missing Data using the Concave-Convex Procedure

Inverse Covariance Estimation with Missing Data using the Concave-Convex Procedure Inverse Covariance Estimation with Missing Data using the Concave-Convex Procedure Jérôme Thai 1 Timothy Hunter 1 Anayo Akametalu 1 Claire Tomlin 1 Alex Bayen 1,2 1 Department of Electrical Engineering

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1

More information

Sparse Covariance Matrix Estimation with Eigenvalue Constraints

Sparse Covariance Matrix Estimation with Eigenvalue Constraints Sparse Covariance Matrix Estimation with Eigenvalue Constraints Han Liu and Lie Wang 2 and Tuo Zhao 3 Department of Operations Research and Financial Engineering, Princeton University 2 Department of Mathematics,

More information

Algebraic Representations of Gaussian Markov Combinations

Algebraic Representations of Gaussian Markov Combinations Submitted to the Bernoulli Algebraic Representations of Gaussian Markov Combinations M. SOFIA MASSA 1 and EVA RICCOMAGNO 2 1 Department of Statistics, University of Oxford, 1 South Parks Road, Oxford,

More information

Sparse Permutation Invariant Covariance Estimation

Sparse Permutation Invariant Covariance Estimation Sparse Permutation Invariant Covariance Estimation Adam J. Rothman University of Michigan Ann Arbor, MI 48109-1107 e-mail: ajrothma@umich.edu Peter J. Bickel University of California Berkeley, CA 94720-3860

More information

Bayesian model selection in graphs by using BDgraph package

Bayesian model selection in graphs by using BDgraph package Bayesian model selection in graphs by using BDgraph package A. Mohammadi and E. Wit March 26, 2013 MOTIVATION Flow cytometry data with 11 proteins from Sachs et al. (2005) RESULT FOR CELL SIGNALING DATA

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

VARIABLE SELECTION AND INDEPENDENT COMPONENT

VARIABLE SELECTION AND INDEPENDENT COMPONENT VARIABLE SELECTION AND INDEPENDENT COMPONENT ANALYSIS, PLUS TWO ADVERTS Richard Samworth University of Cambridge Joint work with Rajen Shah and Ming Yuan My core research interests A broad range of methodological

More information

sparse and low-rank tensor recovery Cubic-Sketching

sparse and low-rank tensor recovery Cubic-Sketching Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru

More information

A multiple testing approach to the regularisation of large sample correlation matrices

A multiple testing approach to the regularisation of large sample correlation matrices A multiple testing approach to the regularisation of large sample correlation matrices Natalia Bailey Department of Econometrics and Business Statistics, Monash University M. Hashem Pesaran University

More information

Log Covariance Matrix Estimation

Log Covariance Matrix Estimation Log Covariance Matrix Estimation Xinwei Deng Department of Statistics University of Wisconsin-Madison Joint work with Kam-Wah Tsui (Univ. of Wisconsin-Madsion) 1 Outline Background and Motivation The Proposed

More information

A note on profile likelihood for exponential tilt mixture models

A note on profile likelihood for exponential tilt mixture models Biometrika (2009), 96, 1,pp. 229 236 C 2009 Biometrika Trust Printed in Great Britain doi: 10.1093/biomet/asn059 Advance Access publication 22 January 2009 A note on profile likelihood for exponential

More information

8.1 Concentration inequality for Gaussian random matrix (cont d)

8.1 Concentration inequality for Gaussian random matrix (cont d) MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration

More information

High dimensional Ising model selection

High dimensional Ising model selection High dimensional Ising model selection Pradeep Ravikumar UT Austin (based on work with John Lafferty, Martin Wainwright) Sparse Ising model US Senate 109th Congress Banerjee et al, 2008 Estimate a sparse

More information

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the

More information

A Note on Auxiliary Particle Filters

A Note on Auxiliary Particle Filters A Note on Auxiliary Particle Filters Adam M. Johansen a,, Arnaud Doucet b a Department of Mathematics, University of Bristol, UK b Departments of Statistics & Computer Science, University of British Columbia,

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

arxiv: v1 [stat.me] 16 Feb 2018

arxiv: v1 [stat.me] 16 Feb 2018 Vol., 2017, Pages 1 26 1 arxiv:1802.06048v1 [stat.me] 16 Feb 2018 High-dimensional covariance matrix estimation using a low-rank and diagonal decomposition Yilei Wu 1, Yingli Qin 1 and Mu Zhu 1 1 The University

More information

arxiv: v1 [math.st] 31 Jan 2008

arxiv: v1 [math.st] 31 Jan 2008 Electronic Journal of Statistics ISSN: 1935-7524 Sparse Permutation Invariant arxiv:0801.4837v1 [math.st] 31 Jan 2008 Covariance Estimation Adam Rothman University of Michigan Ann Arbor, MI 48109-1107.

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 5, 06 Reading: See class website Eric Xing @ CMU, 005-06

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

arxiv: v2 [math.st] 2 Jul 2017

arxiv: v2 [math.st] 2 Jul 2017 A Relaxed Approach to Estimating Large Portfolios Mehmet Caner Esra Ulasan Laurent Callot A.Özlem Önder July 4, 2017 arxiv:1611.07347v2 [math.st] 2 Jul 2017 Abstract This paper considers three aspects

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

QUASI-BAYESIAN ESTIMATION OF LARGE GAUSSIAN GRAPHICAL MODELS. (Apr. 2018; first draft Dec. 2015) 1. Introduction

QUASI-BAYESIAN ESTIMATION OF LARGE GAUSSIAN GRAPHICAL MODELS. (Apr. 2018; first draft Dec. 2015) 1. Introduction QUASI-BAYESIAN ESTIMATION OF LARGE GAUSSIAN GRAPHICAL MODELS YVES F. ATCHADÉ Apr. 08; first draft Dec. 05 Abstract. This paper deals with the Bayesian estimation of high dimensional Gaussian graphical

More information

Minimax Rate-Optimal Estimation of High- Dimensional Covariance Matrices with Incomplete Data

Minimax Rate-Optimal Estimation of High- Dimensional Covariance Matrices with Incomplete Data University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 9-2016 Minimax Rate-Optimal Estimation of High- Dimensional Covariance Matrices with Incomplete Data T. Tony Cai University

More information

high-dimensional inference robust to the lack of model sparsity

high-dimensional inference robust to the lack of model sparsity high-dimensional inference robust to the lack of model sparsity Jelena Bradic (joint with a PhD student Yinchu Zhu) www.jelenabradic.net Assistant Professor Department of Mathematics University of California,

More information

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics arxiv:1603.08163v1 [stat.ml] 7 Mar 016 Farouk S. Nathoo, Keelin Greenlaw,

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Large Sample Properties of Estimators in the Classical Linear Regression Model

Large Sample Properties of Estimators in the Classical Linear Regression Model Large Sample Properties of Estimators in the Classical Linear Regression Model 7 October 004 A. Statement of the classical linear regression model The classical linear regression model can be written in

More information

A General Framework for High-Dimensional Inference and Multiple Testing

A General Framework for High-Dimensional Inference and Multiple Testing A General Framework for High-Dimensional Inference and Multiple Testing Yang Ning Department of Statistical Science Joint work with Han Liu 1 Overview Goal: Control false scientific discoveries in high-dimensional

More information

Partitioned Covariance Matrices and Partial Correlations. Proposition 1 Let the (p + q) (p + q) covariance matrix C > 0 be partitioned as C = C11 C 12

Partitioned Covariance Matrices and Partial Correlations. Proposition 1 Let the (p + q) (p + q) covariance matrix C > 0 be partitioned as C = C11 C 12 Partitioned Covariance Matrices and Partial Correlations Proposition 1 Let the (p + q (p + q covariance matrix C > 0 be partitioned as ( C11 C C = 12 C 21 C 22 Then the symmetric matrix C > 0 has the following

More information

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω

More information

Testing Equality of Natural Parameters for Generalized Riesz Distributions

Testing Equality of Natural Parameters for Generalized Riesz Distributions Testing Equality of Natural Parameters for Generalized Riesz Distributions Jesse Crawford Department of Mathematics Tarleton State University jcrawford@tarleton.edu faculty.tarleton.edu/crawford April

More information

Likelihood Analysis of Gaussian Graphical Models

Likelihood Analysis of Gaussian Graphical Models Faculty of Science Likelihood Analysis of Gaussian Graphical Models Ste en Lauritzen Department of Mathematical Sciences Minikurs TUM 2016 Lecture 2 Slide 1/43 Overview of lectures Lecture 1 Markov Properties

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

On the inconsistency of l 1 -penalised sparse precision matrix estimation

On the inconsistency of l 1 -penalised sparse precision matrix estimation On the inconsistency of l 1 -penalised sparse precision matrix estimation Otte Heinävaara Helsinki Institute for Information Technology HIIT Department of Computer Science University of Helsinki Janne

More information

Causal Inference: Discussion

Causal Inference: Discussion Causal Inference: Discussion Mladen Kolar The University of Chicago Booth School of Business Sept 23, 2016 Types of machine learning problems Based on the information available: Supervised learning Reinforcement

More information

Areal data models. Spatial smoothers. Brook s Lemma and Gibbs distribution. CAR models Gaussian case Non-Gaussian case

Areal data models. Spatial smoothers. Brook s Lemma and Gibbs distribution. CAR models Gaussian case Non-Gaussian case Areal data models Spatial smoothers Brook s Lemma and Gibbs distribution CAR models Gaussian case Non-Gaussian case SAR models Gaussian case Non-Gaussian case CAR vs. SAR STAR models Inference for areal

More information

Random matrices: Distribution of the least singular value (via Property Testing)

Random matrices: Distribution of the least singular value (via Property Testing) Random matrices: Distribution of the least singular value (via Property Testing) Van H. Vu Department of Mathematics Rutgers vanvu@math.rutgers.edu (joint work with T. Tao, UCLA) 1 Let ξ be a real or complex-valued

More information