QUASI-BAYESIAN ESTIMATION OF LARGE GAUSSIAN GRAPHICAL MODELS. (Apr. 2018; first draft Dec. 2015) 1. Introduction
|
|
- Barnaby Fowler
- 5 years ago
- Views:
Transcription
1 QUASI-BAYESIAN ESTIMATION OF LARGE GAUSSIAN GRAPHICAL MODELS YVES F. ATCHADÉ Apr. 08; first draft Dec. 05 Abstract. This paper deals with the Bayesian estimation of high dimensional Gaussian graphical models. We develop a quasi-bayesian implementation of the neighborhood selection method of Meinshausen and Buhlmann 006 for the estimation of large Gaussian graphical models. The method produces a product-form quasi-posterior distribution that can be efficiently explored by parallel computing. We derive a non-asymptotic bound on the contraction rate of the quasi-posterior distribution. The result shows that the proposed quasi-posterior distribution contracts towards the true precision matrix at a rate given by the worst contraction rate of the linear regressions that are involved in the neighborhood selection. We develop a Markov Chain Monte Carlo algorithm for approximate computations, following an approach from Atchadé 05. We illustrate the methodology with a simulation study.. Introduction We consider the problem of fitting large Gaussian graphical models with diverging number of parameters from limited data. This amount to estimating a sparse precision matrix ϑ M + p from p-dimensional Gaussian observations y i R p, i =,..., n, where M + p denotes the cone of R p p of symmetric positive inite matrices. The frequentist approach to this problem has generated an impressive literature over the last decade or so see for instance Bühlmann and van de Geer 0; Hastie et al. 05 and the reference therein. There is an interest, particularly in biomedical research, for statistical methodologies that can allow practitioners to incorporate external information in fitting such graphical models Mukherjee and Speed 008; Peterson et al. 05. This problem naturally calls for a Bayesian formulation and significant progress has been made in 000 Mathematics Subject Classification. 60F5, 60G4. Key words and phrases. Gaussian graphical models, Quasi-Bayesian inference,pseudo-likelihood, posterior contraction, forward-backward approximation, Markov Chain Monte Carlo. Y. F. Atchadé: University of Michigan, 085 South University, Ann Arbor, 4809, MI, United States. address: yvesa@umich.edu.
2 YVES F. ATCHADÉ recent years Dobra et al. 0; Lenkoski and Dobra 0; Khondker et al. 03; Peterson et al. 05; Banerjee and Ghosal 05. However, most existing Bayesian methods for fitting graphical models do not scale well with the number of nodes in the graph. The main difficulty is computational, and hinges on the ability to handle interesting prior distributions on M + p when p is large. The most commonly used class of priors distributions for Gaussian graphical models is the class of G-Wishart distributions Atay-Kayis and Massam 005. However G-Wishart distributions have intractable normalizing constants, and become impractical for inferring large graphical models, due to the cost of approximating the normalizing constants Dobra et al. 0; Lenkoski and Dobra 0. Following the development of the Bayesian lasso of Park and Casella 008 and other Bayesian shrinkage priors for linear regressions Carvalho et al. 00, several authors have proposed prior distributions on M + p obtained by putting conditionally independent shrinkage priors on the entries of the matrix, subject to a positive initeness constraint Khondker et al. 03. However this approach does not give a direct estimation of the graph structure, which in many applications is the key quantity of interest. Furthermore, dealing with the positive initeness constraint in the posterior distribution requires careful MCMC design, and becomes a limiting factor for large p. The above discussion suggests when dealing with large graphical models, some form of approximate inference is inescapable. Building on Atchade 07, we propose a quasi-bayesian approach for fitting large Gaussian graphical models. Our general approach to the problem consists in working with a larger pseudo-model { ˇf θ, θ ˇΘ}, where ˇΘ M + p. By pseudo-model we mean that the function z ˇf θ z is typically not a density on R p, but ˇf θ is chosen such that the function θ n i= log ˇf θ y i is a good candidate for M-estimation of ϑ. The enlargement of the model space from M + p to ˇΘ allows us to relax the positive initeness constraint. With a prior distribution Π on ˇΘ, we obtain a quasi-posterior distribution denoted ˇΠ n,p. ˇΠn,p is not a posterior distribution in the strict sense of the term since ˇf θ is not a proper likelihood function. In the specific case of Gaussian graphical models, we take ˇΘ as the space of matrices with positive diagonals, and take z ˇf θ z as the pseudo-model underpinning the neighborhood selection method of Meinshausen and Buhlmann 006. This choice gives a quasi-posterior distribution ˇΠ n,p that factorizes, and leads to a drastic improvement in the computing time needed for MCMC computation when a parallel computing architecture is used. We illustrate the method in Section 4 using simulated data where the number of nodes in the graph is p {00, 500, 000}. The idea of replacing the likelihood function by a pseudo-likelihood or quasilikelihood function in a Bayesian inference is not new and has been developed in other
3 QUASI-BAYESIAN ESTIMATION OF LARGE GAUSSIAN GRAPHICAL MODELS 3 contexts, such as in Bayesian semi-parametric inference Kato 03; Li and Jiang 04, and the references therein, in approximate Bayesian computation Fearnhead and Prangle 00. See also Atchade 07 for more references, and for contraction properties. We study the contraction properties of the quasi-posterior distribution ˇΠ n,p as n, p. Under the assumption that there exists a well-conditioned true precision matrix, we show that ˇΠ n,p contracts at the rate s logp n see Theorem 3 for a precise statement, where s can be viewed as an upper-bound on the largest degree in the un-directed graph ined by the true precision matrix. This convergence rate corresponds to the worst convergence rate that one gets from the Bayesian analysis of the linear regressions involved in the neighborhood selection. The condition on the sample size n for the results mentioned above to hold is n = O s logp, which shows that the quasi-posterior distribution can concentrate around the true precision matrix, even in cases where p exceeds n. The rest of the paper is organized as follows. Section provides a general discussion of quasi-models and quasi-bayesian inference. The section ends with the introduction of the proposed quasi-bayesian distribution, based on the neighborhood selection method of Meinshausen and Buhlmann 006. We specialized the discussion to Gaussian graphical models in Section 3. The theoretical analysis focuses on the Gaussian case, and is presented in Section 3, but the proofs are postponed to Section 5. The simulation study is presented in Section 4. A MATLAB implementation of the method is available from the author s website.. Quasi-Bayesian inference of graphical models For integers p, and i {,..., p}, let Y i be a nonempty subset of R, and set Y = Y Y p, that we assume is equipped with a reference sigma-finite product measure dy. We first consider a class of Markov random field distributions {f ω, ω Ω} for joint modeling of Y-valued random variables. Let M p denote the set of all real symmetric p p matrices equipped with the inner product A, B F = i j A ib ij, and norm A F = A, A F. As above, M + p denotes the subset of M p of positive inite matrices. For i =,..., p, and j < k p, let B i : Y i R and B jk : Y j Y k R be non-zero measurable functions that we assume known. The contraction rate is measured in the L, matrix norm, ined as the largest L column norms
4 4 YVES F. ATCHADÉ From these functions we ine a M p -valued function B : Y M p by B i y i if i = j, By ij = B ij y i, y j if i < j, B ji y j, y i, if j < i. These functions ine the parameter space { Ω = ω M p : Zω = Y e ω, By } F dy <. We assume that Y and B are such that Ω is non-empty, and we consider the exponential family {f ω, ω Ω} of densities f ω on Y given by f ω y = exp ω, By F log Zω, y Y. The model {f ω, ω Ω} can be useful to capture the dependence structure between a set of p random variables taking values in Y. If Y,..., Y p f ω, then the parameter ω encodes the conditional independence structure among the p variables Y,..., Y p. In particular for i j, ω ij = 0 means that Y i and Y j are conditionally independent given all other variables. The random variables Y,..., Y p can then be represented by an undirect graph where there is an edge between i and j if and only if ω ij 0. This type of models are very useful in practice to tease out direct and indirect connections between sets of random variables. The version posited in can accommodate mixed measurements where some of the y i take discrete values while other take continuous values. Example Gaussian graphical models. One recovers the Gaussian graphical model by taking Y i = R, B i x = x /, B ij x, y = xy, i < j. In this case Y = R p equipped with the Lebesgue measure, and Ω = M + p. Example Potts models. For integer M, one recovers the M-states Potts model by taking Y i = {,..., M}. In this case, Y = {,..., M} d equipped with the counting measure. Since Y is a finite set, we have Ω = M p. An important special case of the Potts model is a version of the Ising model where M =, and B i x = x, and B ij x, y = xy. Suppose that we observe data y,, y n where y i = y i,..., yi p Y is viewed as a column vector. We set x = [y,..., y n ] R n p. Given a prior distribution Π on Ω, and given the data x, the resulting posterior distribution for learning ω is n Π n A x = A i= f ωy i Πdω n i= f ωy i Πdω, A Ω. Ω
5 QUASI-BAYESIAN ESTIMATION OF LARGE GAUSSIAN GRAPHICAL MODELS 5 However, and as discussed in the introduction, this posterior distribution is typically intractable. In the frequentist literature, a commonly used approach to circumvent computational difficulties with graphical models consists in replacing the likelihood function by a pseudo-likelihood function. For ω M p, let ω i denote the i-th column of ω. Note that in the present case, if Y,..., Y p f ω, then for j p, the conditional distribution of Y j given {Y k, k j} depends on ω only through the j-th column ω j. We write this conditional distribution as u f j ω j u y j, where for y Y, y j = y,..., y j, y j+,..., y p, with obvious modifications when j =, p. Let Ω = { ω M p : u f j ω j u y j is a well-ined density on Y j, for all y Y, and all j p}. Note that Ω Ω. The most commonly used pseudo-likelihood method consists in replacing the initial likelihood contribution f ω y i by f ω y i = p f j y i ω j j j= y i j, ω Ω. This pseudo-likelihood approach typically brings important simplifications. For instance, in the Gaussian case, the parameter space Ω corresponds to the space of symmetric matrices with positive diagonals elements, which has a simpler geometry compared to M + p. And in the case of discrete graphical models, the conditional models typically have tractable normalizing constants. The idea goes back at least to Besag 974, and penalized versions of pseudo-likelihood functions have been employed by several authors to fit high-dimensional graphical models. In a Bayesian setting this approach works well for small to moderate size graphs. The issue is that the space Ω M p grow as Op, and MCMC simulation for exploring probability distributions on such very large spaces is inherently a difficult problem for example, for only p = 00 the dimension of Ω is larger than 0 4. A related pseudo-likelihood for this problem is suggested by the neighborhood selection of Meinshausen and Buhlmann 006. The idea consists in relaxing the symmetry constraint in Ω. For j p, we set Ω j = { θ R p : u f j θ u y j is a well-ined density on Y j, for all y Y, and all j p}. We note that if ω Ω, then ω j Ω j. Hence these sets Ω j are nonempty, and we ine ˇΩ = Ω Ω p, that we identify as a subset of the space of p p real
6 6 YVES F. ATCHADÉ matrices R p p. In particular if ω ˇΩ, and consistently with our notation above, ω,j denotes the j-column of ω. We consider the pseudo-model { ˇf ω, ω ˇΩ}, where ˇf ω y = p j= f j ω j y j y j, ω ˇΩ, y Y. 3 One can then maximize a penalized version of ω n i= log ˇf ω y i, and this corresponds to the neighborhood selection method of Meinshausen and Buhlmann 006, see also Sun and Zhang 03. Notice that by inition ˇΩ is a product space, whereas Ω is not, due to the symmetry constraint. This implies that ω ˇf ω y factorizes along the columns of ω, whereas ω f ω y typically does not. One can take advantage of this separability for fast computation if the penalty is also separable. With a prior distribution Π on ˇΩ, the quasi-likelihood function ω ˇf ω leads to a quasi-posterior distribution given by n ˇΠ ˇf A i= ω y i Πdω n,p A x = n ˇf i= ω i Πdω, A ˇΩ. ˇΩ Let us assume that the prior distribution factorizes: Πdω = p j= Π jω j. Then we are led to the quasi-posterior distribution p ˇΠ n,p du, du p x = ˇΠ n,p,j du j x, 4 where ˇΠ n,p,j du x = n i= f j u j= Ω j n i= f j u y i j y i y i j j Π jdu y i j Π jdu, is a probability measure on Ω j. Basically, relaxing the symmetry allows us to factorize the quasi-likelihood function and this leads to a factorized quasi-posterior distribution, as in 4. Each component of this quasi-posterior distribution can then be explored independently. Despite its simplicity, when used in a parallel computing environment, this approach increases by one order of magnitude the size of graphical models that can be estimated. 3. Gaussian graphical models Here we consider the Gaussian case where Y i = R, B i x = x /, and B ij x, y = xy. Hence in this case, Ω = M + p, Ω corresponds to the set of symmetric matrices with positive diagonal elements, and ˇΩ is the space of p p real matrices not necessarily symmetric with positive diagonal. Assuming that the diagonal elements are known and given, we shall identify ˇΩ with the matrix space R p p.
7 QUASI-BAYESIAN ESTIMATION OF LARGE GAUSSIAN GRAPHICAL MODELS 7 If ϑ M + p, and Y,..., Y p f ϑ, it is well known that for all j {,..., p}, the conditional distribution of Y j given all other Y k = y k, for k j is N k j ϑ kj ϑ jj y k, ϑ jj, 5 where Nµ, σ denotes the Gaussian distribution with mean µ and variance σ. Given data x R n p, given σj > 0, and given these conditional distributions, the product of the quasi-model 3 across the data set gives upto normalizing constants that we ignore the quasi-likelihood qθ; x = p q j θ j ; x, j= with q j θ j ; x = exp σj x j x j θ j, θ R p p, 6 where x j R n p is the matrix obtained from x by removing the j-th column, and x j resp. θ j denotes the j-column of x resp. θ. Given 5, it is clear that σj is a proxy for /ϑ jj. For the time being, we shall assume that the variance terms ϑ jj are known, and we will set σj = /ϑ jj. In practice we use an empirical Bayes approach described below whereby ϑ jj is obtained from the data. We combine 6 with a prior distribution Πdθ = p j= Π jdθ j to obtain a quasi-posterior distribution on R p p given by p ˇΠ n,p dθ x = ˇΠ n,p,j dθ j x, σj, 7 where ˇΠ n,p,j x, σ j is the probability measure on Rp given by j= ˇΠ n,p,j dz x, σ j q j z; xπ j dz. Again the main appeal of ˇΠ n,p is its factorized form, which implies that Monte Carlo samples from ˇΠ n,p can be obtained by sampling in parallel from the p distributions ˇΠ n,p,j. 3.. Prior distribution. We address here the choice of the prior distribution Π j. For each j {,..., p}, we build the prior Π j on R p as in Castillo et al. 05. First, let p = {0, } p, and let {π δ, δ p } denote a discrete probability distribution on p which we assume to be the same for all the components j. We take Π j as
8 8 YVES F. ATCHADÉ the distribution of the random variable u R p obtained as follows. δ :p i.i.d. Berq. Given δ, u,..., u p are conditionally independent Dirac0 if δ k = 0 and u k δ Laplace if δ k =, 8 where q 0, and ρ j > 0 are hyper-parameter, σj is as in 6, Dirac0 is the Dirac measure on R with mass at 0, and for ρ > 0, Laplaceρ denotes the Laplace distribution with density ρ/ ρ x, x R. 3.. Posterior contraction and rate. We study here the behavior of the posterior distribution given in 7, for large n, p and when the prior is as in 8. We will assume that the observed data matrix x R n p is a realization of random matrix X the rows of which are i.i.d. random vectors from a mean-zero Gaussian distribution on R p with precision matrix ϑ, with known diagonal elements. More precisely, H. For some ϑ M + p, X = Zϑ /, where Z R n p is a random matrix with i.i.d. standard normal entries. From the true precision matrix ϑ, we now derive the true value of the parameter θ R p p towards which ˇΠ n,p is expected to converge. For j =,..., p, θ kj = ϑ kj /ϑ jj, for k =,..., j, and θ kj = ϑ k+j /ϑ jj, for k = j,..., p. Let δ {0, } p p be the sparsity structure of θ, ined as δ kj = { θ kj >0}. We set s j = p k= ρ j σ j { θ kj >0}, j =,..., p and s = max j p s j. Hence s j is the degree of node j, and s is the maximum node degree in the undirected graph ined by ϑ. The asymptotic behavior of ˇΠ n,p depends crucially on certain restricted and m-sparse eigenvalues of the true precision matrix ϑ, that we introduce next. We set κ = inf and for s p, { u ϑu = inf κs u u ϑu u : u R p, u 0, s.t. } : u R p, u 0 s, k: δ,k =0 u k 7 k: δ,k = u k, 9 { κs u } ϑu = sup u : u R p, u 0 s. 0 In the above equations, we convene that inf = +, and sup = 0.
9 QUASI-BAYESIAN ESTIMATION OF LARGE GAUSSIAN GRAPHICAL MODELS 9 We study the contraction of ˇΠ n,p in the norm θ = max j p θ j. Theorem 3. Assume H and 8 with q = for some absolute constant u > 0, p u+ and ρ j = max k p X k 4 logp ϑ jj, j p. For j p, suppose that σj = /ϑ jj, and set ζ j = 4 u + s j κ u κ + κs 4logp s j, κ s j = s j + ζ j, and s = max j p s j. If minκ, > 0, then there exist absolute κ s constants a 0 > 0, a > 0, a > 0, M 0 such that for all p a 0, and n a s + κ logp, κ the following two statements hold: [ { } E ˇΠn,p θ R p p : θ j 0 ζ j for some j ] X, 3 e a n + p [ { } ] E ˇΠn,p θ R p p : θ θ > M 0 ɛ X 3 e a n p where ɛ > 0 is given Proof. See Section 5... ɛ = κ s logp κ s. n Remark 4. Under H and the assumed prior, 3 says that for n, p large, if θ ˇΠ n,p X, then with high probability θ j 0 < ζ j for all j {,..., p}. Note that if κ/κ and κs logp / κ are small meaning ϑ is well-conditioned then ζ j is of the same order as s,j. In other words the main conclusion of 3 is that ˇΠ n,p X concentrates most of its probability mass on matrices that are sparse with a sparsity structure that mirrors that of ϑ, provided that ϑ is well-conditioned. The behavior of the posterior distribution in practice suggests that the large constant 69 appearing in the theorem is most likely an artifact of the techniques used in the proof, and can probably be improved. 4 says that the contraction rate of ˇΠ n,p towards θ in the norm is Oɛ. This corresponds to the worst among the rates of contraction of the p linear regression
10 0 YVES F. ATCHADÉ problems performed during the neighborhood selection procedure. This rate is similar to the rate of convergence of the frequentist neighborhood selection method of Meinshausen and Buhlmann 006, which is of order s logp, 5 n see the discussion in Section 3.4 of Ravikumar et al Numerical experiments 4.. Fully Bayesian quasi-posterior distribution. Recall that the prior distribution of δ k is δ k Berq, with q = p u. The posterior distribution ˇΠ n,p is fairly robust to the choice of u, so throughout the simulations, we set u = 0.5. In contrast ˇΠ n,p is sensitive to ρ j, so we use a fully Bayesian approach, with a prior distribution ρ j φ, where φ is the uniform distribution Ua, a for a = 0 5, and a = 0 5. Given σj, we obtain a fully specified quasi-posterior distribution p Π n,p,j δ, dθ, dρ j x, σj, 6 j= where the j-th component Π n,p,j x, σ j can be written as follows. For δ p, let µ δ be the product measure on R p ined as µ δ du = p j= ν δ j du j, where ν 0 dz is the Dirac mass at 0, and ν dz is the Lebesgue measure on R. Then Π n,p,j δ, dθ, dρ j x, σ j q j θ; xq δ q p δ 0 ρ j σ j δ e ρ j σ j θ φρj µ δ dθdρ j. 7 The quasi-posterior distribution 7 depends on the choice of σj. Ideally we would like to set σj = /ϑ jj. However this quantity is unknown. In the simulation we choose σj by empirical Bayes. More precisely, following Reid et al. 03 we estimate σ j by ˆσ j = n ŝ λn x j x j ˆβλn, 8 where ˆβ λ is the lasso estimate at regularization level λ in the linear regression of x j the j-th column of x of x j the remaining columns. In the procedure, λ n is selected by 0-fold cross-validation, and ŝ λn is the number of non-zero components of ˆβ λn. We explore this approach in the simulations. Given j {,..., p}, sampling from the distribution Π n,p,j x given in 7 is a difficult computation task, due the discrete-continuous mixture prior on δ. Here we follow the approach developed by the author in Atchadé 05, which produces
11 QUASI-BAYESIAN ESTIMATION OF LARGE GAUSSIAN GRAPHICAL MODELS approximate samples from 7 by sampling from its forward-backward approximation γ denoted Π n,p,j δ, dθ, dρ j x, σj however other approximations schemes could be used as well Narisetty and He 04; Schreck et al. 03. The parameter γ 0, /4] controls the quality of the approximation. In all the simulation below, we use γ = Simulation set ups. We generate a data matrix x R n p with i.i.d. rows from N0, ϑ, ϑ M + p. Throughout we set the sample size to n = 50, and we consider three settings. a: ϑ is generated as in Setting c below, but using p = 00 nodes. b: In this case p = 500, and we take ϑ from the R-package space based on the work Peng et al These authors have designed a precision matrix ϑ that is modular with 5 modules of 00 nodes each. Inside each module, there are 3 hubs with degree around 5, and 97 other nodes with degree at most 4. The total number of edges is 587. The resulting partial correlations fall within 0.67, 0.0] [0.0, As explained in Peng et al. 009, this type of networks are useful models for biological networks. c: In this case p =, 000, and we build ϑ as follows. First we generate a symmetric sparse matrix B such that the number of off-diagonal non-zeros entries is roughly p. We magnified the signal by adding 3 to all the nonzeros entries of B subtracting 3 for negative non-zero entries. Then we set ϑ = B + ɛ λ min BI p, where λ min B is the smallest eigenvalue of B, with ɛ =. In this example, values of the partial correlations are typically in the range 0.46, 0.8] [0.8, To evaluate the effect of the hyper-parameter σj, we report two sets of results. One where σj = /ϑ jj, and another set of results where ϑ jj is assumed unknown and we select σj from the data, using the cross-validation estimator described in 8. In order to mitigate the uncertainty in some of the results reported below, we repeat all the MCMC simulations 0 times. Hence, to summarize, for each setting a, b, and c, we generate one precision matrix ϑ. Given ϑ, we generate 0 datasets, and for each dataset, we run two MCMC samplers one where the σj s are taken as the /ϑ jjs, and one where they are estimated from the data. The precision matrix used here corresponds to the example Hub network in Section 3 of Peng et al A non-sparse version of ϑ is attached to the space package
12 YVES F. ATCHADÉ 4.3. Computation details. All the simulations were performed on a high-performance computer using 00 cores and Matlab 7.4. γ To simulate from Π n,p,j x, σ j for a given j, we run the MCMC sampler for 50, 000 iterations and discard the first 0, 000 iterations as burn-in. From the MCMC output, we estimate the structure δ {0, } p p as follows. We set the diagonal of δ to one, and for each off-diagonal entry i, j of δ, we estimate δ ij as equal to if the sample average estimate of δ ij from the j-th chain and the sample average estimate of δ ji from the i-th chain are both larger than 0.5. Otherwise δ ij = 0. Obviously, other symmetrization rules could be adopted. Given the estimate ˆδ say, of δ, we estimate ϑ R p p as follows. We set the diagonal of ϑ to /σj. For i j, if ˆδ ij = 0, we set ϑ ij = ϑ ji = 0. Otherwise we estimate ϑ ij = ϑ ji as 0.5 /σj ϑ ij /σi ϑ ji, where ϑ ij resp. ϑji is the Monte Carlo sample average estimate of ϑ ij from the j-th chain resp. i-th chain. For all the off-diagonal components i, j such that ˆδ ij =, Bayesian posterior intervals can also be produced by taking the union of the 95% posterior intervals from the i-th and j-th chains. When ˆδ ij = 0, those confidence intervals are set to {0} Results. We evaluate the behavior of the quasi-posterior distribution 7 on three simulated datasets. As benchmark, we also report the results obtained using the elastic net estimator ˆϑ glasso = Argmin θ M + p log det θ + TrθS + λ i,j α θ ij + ij α θ, where S = /nx x, α = 0.9, and λ > 0 is a regularization parameter. We choose λ by minimizing log det ˆθλ + TrˆθλS + logn i<j { ˆθλ ij >0}, over a finite set of values of λ. Our goal is not to compare the quasi-bayesian method to graphical lasso, since the former utilizes vastly more computing power that the latter. Rather, we report these numbers as references that help better understand the behavior of the proposed methodology. We look at the performance of the method by computing the relative Frobenius norm, the sensitivity and the precision of the estimated matrix as obtained above. These quantities are ined respectively as E = ˆϑ ϑ F ϑ F, SEN = i<j { ϑ ij >0} {sign ˆϑij =signϑ ij } i<j { ϑ ij >0} and PREC = i<j { ˆϑ ij >0} {sign ˆϑ ij =signϑ ij } i<j. { ˆϑ ij >0} 9 ;
13 QUASI-BAYESIAN ESTIMATION OF LARGE GAUSSIAN GRAPHICAL MODELS 3 We average these statistics over the 0 simulations replications. We compute also the same quantities for the elastic net ˆϑ glasso. These results are reported in Table -3. These results suggest that the quasi-bayesian procedure generally has good contraction properties in the Frobenius norm and hence in the L, norm. The results also suggest that the quasi-bayesian procedure tends to produce high false-negatives, but has excellent false-positive rates, even with p =, 000. The same conclusion seems to hold across all three network settings considered in the simulations. We also notice with satisfaction that there seems to be little difference between the results where ϑ jj is assumed known and the results where ϑ jj is estimated from the data. ϑ jj known Empirical Bayes Glasso Relative Error E in % Sensitivity SEN in % Precision PREC in % Table. Table showing the relative error, sensitivity and precision as ined in 9 for Setting a, with p = 00 nodes. Based on 0 simulation replications. Each MCMC run is iterations. ϑ jj known Empirical Bayes Glasso Relative Error E in % Sensitivity SEN in % Precision PREC in % Table. Table showing the relative error, sensitivity and precision as ined in 9 for Setting b, with p = 500 nodes. Based on 0 simulation replications. Each MCMC run is iterations. ϑ jj known Empirical Bayes Glasso Relative Error E in % Sensitivity SEN in % Precision PREC in % Table 3. Table showing the relative error, sensitivity and precision as ined in 9 for Setting c, with p =, 000 nodes. Based on 0 simulation replications. Each MCMC run is iterations.
14 4 YVES F. ATCHADÉ 5. Proof of Theorem 3 First we shall establish from first principle some contraction properties for posterior distributions in linear regression models. We will then reduce the proof of Theorem 3 to the linear regression case. 5.. Posterior contraction of high-dimensional linear regression models. Let X R n p be a design matrix, θ R p, σ0 > 0. Suppose that Z NXθ, σ 0I n. 0 We will write P and E for the probability measure and expectation operator under the distribution of Z assumed in 0. Let = {0, } p, and {ω δ, δ } a probability distribution on. For σ > 0, λ > 0, we consider the posterior distribution Π n θ Z = C n Z e σ Z Xθ e σ Z Xθ λ δ ω δ e λ θ µ δ dθ. δ We make the following assumptions on the distribution {ω δ, δ }. H. For all δ, ω δ = g p. δ 0 δ 0 Furthermore, there exists universal constants c, c, c 3, c 4 such that for all s =,..., p, c p c 3 g s g s c p c 4 g s. Remark 5. We note here that if δ i.i.d. Berq where q = as assumed in 8, then p u+ w δ = g p, δ 0 δ 0 where gs = p s q s q p s. Furthermore it is easy to check that {g s } satisfies the double inequalities in H with c = 0.5, c =, c 3 = u +, and c 4 = u. In other words, the prior distribution chosen in 8 satisfies H. Let δ denote the sparsity structure of θ. That is for all j {,..., p}, δ,j = if and only if θ,j > 0. Set C = θ Rp : θ j 7 θ j, and ine j, δ,j = j, δ,j =0 { v u X } Xu = inf n u, u 0, u C. For integer s, we ine { vs u X } Xu = inf n u, u 0, u 0 s {, vs u X } Xu = sup n u, u 0, u 0 s.
15 QUASI-BAYESIAN ESTIMATION OF LARGE GAUSSIAN GRAPHICAL MODELS 5 Theorem 6. Assume 0 and H, and suppose that v > 0. Then there exists a constant A 0 that depends only on the constants c, c, c 3, c 4 in H such that the following statements hold. For all p A 0, and ζ > 0, E [Π n {θ R p λ σ 4 logp : θ 0 s + ζ} Z] exp 8σ0 max j p X,j + 4 s e λ σ s nv + n vs s p ζ 4c λ σ p c. 4 For all p A 0, M 96, and integer s s such that v s > 0, set κ = nv s σ, ɛ = λ s κ, and A ɛ = {θ R p : θ θ 0 s, θ θ Mɛ}. Then λ σ 4 logp p 9 s e κmɛ /3 E [Π n A ɛ Z] exp 8σ0 max j p X,j + s e κmɛ /3 + + n vs s p p c 3 e κmɛ /64 σ λ e κmɛ /64. Proof. We start the proof with some notations. We set n/ f n,θ z = e σ πσ z Xθ, z R n, θ R p. L n,θ θ; z = log f n,θ z log f n,θ z log f n,θ z, θ θ = n X σ θ θ X θ θ. n We will need the following lemmas which are special cases of respectively Lemma and Lemma 4 of Atchade 07. Lemma 7. The normalizing constant of Π n satisfies for all z R n, C n z ω δ e λ θ λ s λ + n vs, σ where s = θ 0. Lemma 8 Existing of test. Fix M, s s an integer, and suppose that v s > 0. Set κ = nv s σ, ɛ = λ s κ. s c s
16 6 YVES F. ATCHADÉ There exists a measurable function φ : R n [0, ] such that E φz p 9 s e κmɛ /3 s e κmɛ /3. Furthermore, for any θ R p such that θ θ 0 s, θ θ > jmɛ, for some j, [ φz] f n,θ zdz e κjmɛ /3. E Proof of Theorem 6-Part. Set B = {θ R p : θ 0 s + ζ}, and E = { z R n : log f n,θ z λ }. We set κ = n vs /σ. By Lemma 7, and Fubini s theorem, E [Π n B Z] P Z / E + ω δ + κ λ s δ: δ 0 s +ζ ω δ λ δ 0 R p E [ ] fn,θ Z e λ θ f n,θ Z EZ e λ θ µ δdθ The integrand of the integral in the last displayed equation is upper bounded by Ψθ λ [ ] = exp θ θ + λ θ λ θ E e L n,θ θ;z E Z. We have λ θ θ + λ θ λ θ δc θ θ + 3 δ θ θ. Hence, if θ θ / C, using the concavity of L n,θ, Ψθ e λ 4 θ θ e λ 4 δc θ θ + 7λ 4 δ θ θ e λ 4 θ θ. However, if θ θ C, then ] E [e L n,θ θ;z E Z e nv σ θ θ, and Ψθ e λ θ θ e s λ θ θ nv σ θ θ e λ s κ e λ θ θ,
17 QUASI-BAYESIAN ESTIMATION OF LARGE GAUSSIAN GRAPHICAL MODELS 7 where κ = nv/σ. We conclude that E [Π n B Z] P Z / E + e λ s κ + κ λ s ω δ δ: δ 0 s +ζ ω δ λ P Z / E + e λ s κ δ 0 e λ R p + κ λ s ω δ 4 θ θ µ δ dθ, δ: δ 0 s +ζ ω δ 4 δ 0. Using H, ω δ δ: δ 0 s +ζ ω δ 4 δ 0 = p s g s d j=s +ζ 4 j g j d s g s p j=s +ζ j s 4 j c p c g s 4 p d = 4 s s j=s +ζ j s 4c. p c 4 For p large enough so that 4c p c 4 that <, we have d j=s +ζ j s 4c p c 4 4c ζ. p c 4 It follows e λ s κ + κ λ s ω δ δ: δ 0 s +ζ ω δ 4 δ 0 4 s e λ s κ + κ s p ζ 4c λ s p c. 4 It remains only to bound the term P Z / E. Since log f n,θ z = X z Xθ/σ, and since Z N0, σ0 I n, standard Gaussian exponential bounds give λ σ 4 P Z / E p exp 8σ0 max j p X,j. Proof of Theorem 6-Part. We set κ = n vs σ, κ = nv s σ, ɛ = λ s κ. We also set A ɛ = {θ R p : θ θ 0 s, θ θ > Mɛ}. We have Π n A ɛ Z E Z + E ZΠ n A ɛ Z E Z + φz + E Z φzπ n A ɛ Z.
18 8 YVES F. ATCHADÉ Then by Lemma 7, and Fubini s theorem, E [Π n A ɛ Z] P Z / E + E φz + + κ s λ δ 0 [ ω δ λ ω δ δ We write A ɛ = j A ɛ j, where A ɛ E ] φz f e λ θ n,θ zdz e λ θ µ δdθ A ɛ j = {θ R p : θ θ 0 s, jmɛ < θ θ j + Mɛ}. Therefore, and using Lemma 8, [ ] φz f e λ θ n,θ zdz A ɛ E e λ θ µ δdθ e κ 3 jmɛ e 3λ sjmɛ e λ θ θ µ δ dθ j A ɛj given that M 4. It is easy to check using H that ω δ δ 0 p p c 3 ω δ We can then conclude that δ s c. 4 δ 0 e κ 64 jmɛ, λ j E [Π n A ɛ Z] P Z / E + E φz + + κ s p p c 3 λ e κ 3 jmɛ, s as claimed. 5.. Proof of Theorem 3. We rely on the behavior of some restricted and m-sparse eigenvalues concepts that we introduce first. For z R n q, for some q, and for s, we ine κs, z = inf δ {0,} q : δ 0 s inf and z = inf κs, { θ z zθ n θ θ z zθ n θ : θ R q, θ 0, } : θ R q, θ 0 s, { κs, z θ z zθ = sup n θ k: δ k =0 c j θ k 7 k: δ k = θ, } : θ R q, θ 0 s.
19 QUASI-BAYESIAN ESTIMATION OF LARGE GAUSSIAN GRAPHICAL MODELS 9 In the above inition, we convene that inf = +, and sup = 0. We are interested in the behavior of κs, X, κs, X and κs, X, when X is the random matrix obtained from assumption H. We will use the following result taken from Raskutti et al. 00 Theorem, and Rudelson and Zhou 03 Theorem 3., which relates the behavior of κs, X, κs, X and κs, X to the corresponding term κ, κs and κs of the true precision matrix ϑ introduced in 9-0. Lemma 9. Assume H. Then there exists finite universal constant a > 0, a > 0 such that for the following hold. κ If κ > 0, then for all n a κ s logp P [64κs, X < κ] e a n. Let s p be such that κs > 0, then for all n a s logp, P [4 κs, X < κs or 4 κs, X > 9 κs] e a n Proof of Theorem 3-Part. We have ˇΠ n,p dθ X = p ˇΠ n,p,j dθ j X, j= where for j {,..., p}, ˇΠ n,p,j dθ j X is given by ˇΠ n,p,j du X q j u; X δ ρ j e ρ j σ j u µδ du, and For q, we ine G n,q = δ p π δ σ j log q j u; X = σj X j X j u. { z R n q : κs, z 9 4 κs, κ, z 9 4 κ, and κs, z κ }. 64 For any k j 0, we start by noting that [ { E ˇΠn,p θ R p p : θ j 0 k j, } ] for some j X PX / G n,p + p E [ Gn,p XˇΠ n,p,j A j X ]. where A j = {u R p : u 0 k j }. We notice that if X G n,p, then X j G n,p for any j p. We recall that the notation X j denotes the matrix obtained by j=
20 0 YVES F. ATCHADÉ removing the j column of X. Hence E [ Gn,p XˇΠ n,p,j A j X ] [ ] E Gn,p X j ˇΠ n,p,j A j X We conclude that [ { E ˇΠn,p θ R p p : θ j 0 k j, where [ = E Gn,p X j E ˇΠn,p,j A j X X j]. } ] for some j X PX / G n,p + p j= T j = E ˇΠn,p,j A j X X j. ] E [ Gn,p X j T j, 3 The key idea of the proof is to notice that T j is an expected quasi-posterior probability in the linear regression model X j = X j β +η, where η N0, /ϑ jj I n. Therefore, by Theorem 6-Part, we have ϑ jj ρ j T j p exp 8 max k j X k + 4 s j + σ j L j ρ j s j e ρ j s j τ j σ j p kj s j 4c, 4 where L j = n κs, X j, and τ j = nκs, X j. Given the choice of ρ j, we see that the first term on the right-hand side of 4 is bounded by s j p c 4 p exp 3 logp = p, Using the fact that for X j G n,p, we have L j 9/4n κs, τ j /64nκ, it is easy to show that the second term on the right-hand side of 4 is bounded by [ 69 exp s j logp σj ϑ jj κs κ + σ j ϑ ] jj κs j 4logp κ + log4ep c 4 logp k j s j logp. With k j = ζ j as given in the statement of the theorem, this latter expression is bounded by /p. This conclude the proof.
21 QUASI-BAYESIAN ESTIMATION OF LARGE GAUSSIAN GRAPHICAL MODELS 5... Proof of Theorem 3-Part. We use the same approach as above. We ine s j = s j + ζ j s j = if s j = 0, and s = max j s j, and we set G n,q = {z R n q : κs, z 94 κs, and z 4 κ s }. κ s, We also ine U = {θ R p p : θ j θ j > ɛ j, for some j}, Ū = U {θ R p p : θ j θ j 0 s j + ζ j for all j}, and ˇΠ n,p U X ˇΠ n,p {θ R p p : θ j θ j 0 > s j + ζ j for some j} X + G c n,p X + Gn,p XˇΠ n,p Ū X. 5 If for some j, θ j θ j 0 > s j + ζ j, then we necessarily have θ j 0 > ζ j. Therefore, by Theorem 3, we have: [ ] E ˇΠn,p {θ R p p : θ j θ j 0 > s j + ζ j for some j} X e a n + 4 p. 6 By Lemma 9, for n a s logp, [ ] E G c n,p X = P [X / G n,p ] e a n. 7 It remains to control the last term on the right-hand side of 5. To do so, we note that if X G n,p, then X j G n,p for all j p. Hence E [ Gn,p XˇΠ n,p Ū X ] p j= p j= [ ] E Gn,p X j ˇΠ n,p,j A j X [ E Gn,p X j E ˇΠn,p,j A j X X j], 8 where A j = {u R p : u θ j > ɛ j, and u θ j 0 s j }. As in the proof of Theorem 3, we note that under the conditional distribution of X j given X j, the term ˇΠ n,p,j A j X can be viewed as the posterior distribution in the linear regression model X j = X j β + η, where η N0, /ϑ jj I n. Therefore, using Theorem 6-Part, and for any constant M 0 96, we have E ˇΠn,p,j A j X X j ϑ jj ρ j p exp 8 max k j X k + e s j log9p e M 0 τ j ɛ j 3 p c 3 s j e M 0 τ j ɛ j 3 p + s j c + σ j L j ρ j s j e M 0 τ j ɛ j 64 e M 0 τ j ɛ j 64, 9
22 YVES F. ATCHADÉ where ɛ j = ρ j s / j τ j, τ j = n κ s j, X j, and L j = n κs j, X j. As seen in the proof of Theorem 3, the first term on the right-hand side of 9 is upper bounded by /p. We have Hence for p 4e, and 54M 0 3 M 0 τ j ɛ j 3 σ j ϑ jj 54M 0 3 is also upper bounded by /p. For 54M 0 3 by 4 exp [ s j logp by choosing 54M 0 64 σ j ϑ jj + c 3 + σ j ϑ jj 4logp σ j ϑ jj s j logp. 4, the second term on the right-hand side of 9 σ j ϑ jj 4, the third term is upper bounded κs j 54M 0 κ 64 + c 4 + c 3. This conclude the proof. ] σj ϑ s j logp jj p, Acknowledgements The author would like to thank Shuheng Zhou for very helpful conversations. This work is partially supported by the NSF, grants DMS 864 and DMS References Atay-Kayis, A. and Massam, H A Monte Carlo method for computing the marginal likelihood in nondecomposable Gaussian graphical models. Biometrika Atchade, Y. A. 07. On the contraction properties of some high-dimensional quasi-posterior distributions. Ann. Statist Atchadé, Y. F. 05. A Moreau-Yosida approximation scheme for highdimensional quasi-posterior distributions. ArXiv e-prints. Banerjee, S. and Ghosal, S. 05. Bayesian structure learning in graphical models. Journal of Multivariate Analysis Besag, J Spatial interaction and the statistical analysis of lattice systems. J. Roy. Statist. Soc. Ser. B Bühlmann, P. and van de Geer, S. 0. Statistics for high-dimensional data. Springer Series in Statistics, Springer, Heidelberg. Methods, theory and applications. Carvalho, C. M., Polson, N. G. and Scott, J. G. 00. The horseshoe estimator for sparse signals. Biometrika
23 QUASI-BAYESIAN ESTIMATION OF LARGE GAUSSIAN GRAPHICAL MODELS 3 Castillo, I., Schmidt-Hieber, J. and van der Vaart, A. 05. Bayesian linear regression with sparse priors. Ann. Statist Dobra, A., Lenkoski, A. and Rodriguez, A. 0. Bayesian inference for general Gaussian graphical models with application to multivariate lattice data. J. Amer. Statist. Assoc Fearnhead, P. and Prangle, D. 00. Constructing summary statistics for approximate bayesian computation: Semi-automatic abc. Technical Report, Lancaster University, UK. Hastie, T., Tibshirani, R. and Wainwright, M. 05. Statistical Learning with Sparsity: The Lasso and Generalizations. Chapman and Hall/CRC. Kato, K. 03. Quasi-Bayesian analysis of nonparametric instrumental variables models. Ann. Statist Khondker, Z. S., Zhu, H., Chu, H., Lin, W. and Ibrahim, J. G. 03. The Bayesian covariance lasso. Stat. Interface Lenkoski, A. and Dobra, A. 0. Computational aspects related to inference in Gaussian graphical models with the G-Wishart prior. J. Comput. Graph. Statist Supplementary material available online. Li, C. and Jiang, W. 04. Model Selection for Likelihood-free Bayesian Methods Based on Moment Conditions: Theory and Numerical Examples. ArXiv e-prints. Meinshausen, N. and Buhlmann, P High-dimensional graphs with the lasso. Annals of Stat Mukherjee, S. and Speed, T. P Network inference using informative priors. Proceedings of the National Academy of Sciences Narisetty, N. and He, X. 04. Bayesian variable selection with shrinking and diffusing priors. Ann. Statist Park, T. and Casella, G The Bayesian lasso. J. Amer. Statist. Assoc Peng, J., Wang, P., Zhou, N. and Zhu, J Partial correlation estimation by joint sparse regression models. Journal of the American Statistical Association Peterson, C., Stingo, F. C. and Vannucci, M. 05. Bayesian inference of multiple gaussian graphical models. Journal of the American Statistical Association Raskutti, G., Wainwright, M. J. and Yu, B. 00. Restricted eigenvalue properties for correlated gaussian designs. J. Mach. Learn. Res
24 4 YVES F. ATCHADÉ Ravikumar, P., Wainwright, M. J., Raskutti, G. and Yu, B. 0. Highdimensional covariance estimation by minimizing l -penalized log-determinant divergence. Electron. J. Stat Reid, S., Tibshirani, R. and Friedman, J. 03. A Study of Error Variance Estimation in Lasso Regression. ArXiv e-prints. Rudelson, M. and Zhou, S. 03. Reconstruction from anisotropic random measurements. IEEE Trans. Inf. Theor Schreck, A., Fort, G., Le Corff, S. and Moulines, E. 03. A shrinkagethresholding Metropolis adjusted Langevin algorithm for Bayesian variable selection. ArXiv e-prints. Sun, T. and Zhang, C.-H. 03. Sparse Matrix Inversion with Scaled Lasso. Journal of Machine Learning Research
BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage
BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement
More informationHigh-dimensional covariance estimation based on Gaussian graphical models
High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,
More informationBayesian Sparse Linear Regression with Unknown Symmetric Error
Bayesian Sparse Linear Regression with Unknown Symmetric Error Minwoo Chae 1 Joint work with Lizhen Lin 2 David B. Dunson 3 1 Department of Mathematics, The University of Texas at Austin 2 Department of
More informationOr How to select variables Using Bayesian LASSO
Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection
More informationHigh-dimensional Covariance Estimation Based On Gaussian Graphical Models
High-dimensional Covariance Estimation Based On Gaussian Graphical Models Shuheng Zhou, Philipp Rutimann, Min Xu and Peter Buhlmann February 3, 2012 Problem definition Want to estimate the covariance matrix
More information10708 Graphical Models: Homework 2
10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves
More informationSparse inverse covariance estimation with the lasso
Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty
More informationExtended Bayesian Information Criteria for Gaussian Graphical Models
Extended Bayesian Information Criteria for Gaussian Graphical Models Rina Foygel University of Chicago rina@uchicago.edu Mathias Drton University of Chicago drton@uchicago.edu Abstract Gaussian graphical
More informationProperties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation
Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana
More informationGraphical Model Selection
May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor
More informationLearning discrete graphical models via generalized inverse covariance matrices
Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationStochastic Proximal Gradient Algorithm
Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind
More informationReconstruction from Anisotropic Random Measurements
Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013
More informationGaussian Graphical Models and Graphical Lasso
ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf
More informationCausal Inference: Discussion
Causal Inference: Discussion Mladen Kolar The University of Chicago Booth School of Business Sept 23, 2016 Types of machine learning problems Based on the information available: Supervised learning Reinforcement
More informationAn efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss
An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao
More informationChapter 17: Undirected Graphical Models
Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)
More informationPosterior convergence rates for estimating large precision. matrices using graphical models
Biometrika (2013), xx, x, pp. 1 27 C 2007 Biometrika Trust Printed in Great Britain Posterior convergence rates for estimating large precision matrices using graphical models BY SAYANTAN BANERJEE Department
More informationHigh Dimensional Inverse Covariate Matrix Estimation via Linear Programming
High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω
More informationBayesian model selection in graphs by using BDgraph package
Bayesian model selection in graphs by using BDgraph package A. Mohammadi and E. Wit March 26, 2013 MOTIVATION Flow cytometry data with 11 proteins from Sachs et al. (2005) RESULT FOR CELL SIGNALING DATA
More informationHigh dimensional Ising model selection
High dimensional Ising model selection Pradeep Ravikumar UT Austin (based on work with John Lafferty, Martin Wainwright) Sparse Ising model US Senate 109th Congress Banerjee et al, 2008 Estimate a sparse
More informationIntroduction to graphical models: Lecture III
Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04
More informationLearning Multiple Tasks with a Sparse Matrix-Normal Penalty
Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1
More informationOn Bayesian Computation
On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints
More informationRobust Inverse Covariance Estimation under Noisy Measurements
.. Robust Inverse Covariance Estimation under Noisy Measurements Jun-Kun Wang, Shou-De Lin Intel-NTU, National Taiwan University ICML 2014 1 / 30 . Table of contents Introduction.1 Introduction.2 Related
More informationStatistical Inference
Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park
More informationComment on Article by Scutari
Bayesian Analysis (2013) 8, Number 3, pp. 543 548 Comment on Article by Scutari Hao Wang Scutari s paper studies properties of the distribution of graphs ppgq. This is an interesting angle because it differs
More informationGenetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig
Genetic Networks Korbinian Strimmer IMISE, Universität Leipzig Seminar: Statistical Analysis of RNA-Seq Data 19 June 2012 Korbinian Strimmer, RNA-Seq Networks, 19/6/2012 1 Paper G. I. Allen and Z. Liu.
More informationRobust and sparse Gaussian graphical modelling under cell-wise contamination
Robust and sparse Gaussian graphical modelling under cell-wise contamination Shota Katayama 1, Hironori Fujisawa 2 and Mathias Drton 3 1 Tokyo Institute of Technology, Japan 2 The Institute of Statistical
More informationGeometric ergodicity of the Bayesian lasso
Geometric ergodicity of the Bayesian lasso Kshiti Khare and James P. Hobert Department of Statistics University of Florida June 3 Abstract Consider the standard linear model y = X +, where the components
More informationHigh-dimensional graphical model selection: Practical and information-theoretic limits
1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John
More informationGraphical Models for Non-Negative Data Using Generalized Score Matching
Graphical Models for Non-Negative Data Using Generalized Score Matching Shiqing Yu Mathias Drton Ali Shojaie University of Washington University of Washington University of Washington Abstract A common
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationA New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of
More informationBayesian Inference of Multiple Gaussian Graphical Models
Bayesian Inference of Multiple Gaussian Graphical Models Christine Peterson,, Francesco Stingo, and Marina Vannucci February 18, 2014 Abstract In this paper, we propose a Bayesian approach to inference
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationarxiv: v1 [math.st] 13 Feb 2012
Sparse Matrix Inversion with Scaled Lasso Tingni Sun and Cun-Hui Zhang Rutgers University arxiv:1202.2723v1 [math.st] 13 Feb 2012 Address: Department of Statistics and Biostatistics, Hill Center, Busch
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationGaussian graphical model determination based on birth-death MCMC inference
Gaussian graphical model determination based on birth-death MCMC inference Abdolreza Mohammadi Johann Bernoulli Institute, University of Groningen, Netherlands a.mohammadi@rug.nl Ernst C. Wit Johann Bernoulli
More informationPermutation-invariant regularization of large covariance matrices. Liza Levina
Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationApril 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning
for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters
More informationConfidence Intervals for Low-dimensional Parameters with High-dimensional Data
Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology
More informationProbabilistic Graphical Models
2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector
More informationThe Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression
The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression Cun-hui Zhang and Jian Huang Presenter: Quefeng Li Feb. 26, 2010 un-hui Zhang and Jian Huang Presenter: Quefeng The Sparsity
More informationMCMC algorithms for fitting Bayesian models
MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models
More informationModel-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate
Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel
More informationLearning Gaussian Graphical Models with Unknown Group Sparsity
Learning Gaussian Graphical Models with Unknown Group Sparsity Kevin Murphy Ben Marlin Depts. of Statistics & Computer Science Univ. British Columbia Canada Connections Graphical models Density estimation
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationStructure estimation for Gaussian graphical models
Faculty of Science Structure estimation for Gaussian graphical models Steffen Lauritzen, University of Copenhagen Department of Mathematical Sciences Minikurs TUM 2016 Lecture 3 Slide 1/48 Overview of
More informationEstimation of Graphical Models with Shape Restriction
Estimation of Graphical Models with Shape Restriction BY KHAI X. CHIONG USC Dornsife INE, Department of Economics, University of Southern California, Los Angeles, California 989, U.S.A. kchiong@usc.edu
More informationHigh-dimensional graphical model selection: Practical and information-theoretic limits
1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John
More informationBayesian Grouped Horseshoe Regression with Application to Additive Models
Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne
More informationAn Introduction to Graphical Lasso
An Introduction to Graphical Lasso Bo Chang Graphical Models Reading Group May 15, 2015 Bo Chang (UBC) Graphical Lasso May 15, 2015 1 / 16 Undirected Graphical Models An undirected graph, each vertex represents
More informationSparse Graph Learning via Markov Random Fields
Sparse Graph Learning via Markov Random Fields Xin Sui, Shao Tang Sep 23, 2016 Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, 2016 1 / 36 Outline 1 Introduction to graph learning
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationMarginal Specifications and a Gaussian Copula Estimation
Marginal Specifications and a Gaussian Copula Estimation Kazim Azam Abstract Multivariate analysis involving random variables of different type like count, continuous or mixture of both is frequently required
More informationScalable MCMC for the horseshoe prior
Scalable MCMC for the horseshoe prior Anirban Bhattacharya Department of Statistics, Texas A&M University Joint work with James Johndrow and Paolo Orenstein September 7, 2018 Cornell Day of Statistics
More informationMixed and Covariate Dependent Graphical Models
Mixed and Covariate Dependent Graphical Models by Jie Cheng A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Statistics) in The University of
More informationJoint Gaussian Graphical Model Review Series I
Joint Gaussian Graphical Model Review Series I Probability Foundations Beilun Wang Advisor: Yanjun Qi 1 Department of Computer Science, University of Virginia http://jointggm.org/ June 23rd, 2017 Beilun
More informationNonconcave Penalized Likelihood with A Diverging Number of Parameters
Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized
More informationBayesian estimation of the discrepancy with misspecified parametric models
Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012
More informationChris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010
Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,
More informationHigh-dimensional statistics: Some progress and challenges ahead
High-dimensional statistics: Some progress and challenges ahead Martin Wainwright UC Berkeley Departments of Statistics, and EECS University College, London Master Class: Lecture Joint work with: Alekh
More informationSparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results
Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results David Prince Biostat 572 dprince3@uw.edu April 19, 2012 David Prince (UW) SPICE April 19, 2012 1 / 11 Electronic
More informationVCMC: Variational Consensus Monte Carlo
VCMC: Variational Consensus Monte Carlo Maxim Rabinovich, Elaine Angelino, Michael I. Jordan Berkeley Vision and Learning Center September 22, 2015 probabilistic models! sky fog bridge water grass object
More informationInference For High Dimensional M-estimates. Fixed Design Results
: Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and
More informationLeast squares under convex constraint
Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption
More informationD I S C U S S I O N P A P E R
I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S ( I S B A ) UNIVERSITÉ CATHOLIQUE DE LOUVAIN D I S C U S S I O N P A P E R 2014/06 Adaptive
More informationNew ways of dimension reduction? Cutting data sets into small pieces
New ways of dimension reduction? Cutting data sets into small pieces Roman Vershynin University of Michigan, Department of Mathematics Statistical Machine Learning Ann Arbor, June 5, 2012 Joint work with
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationApproximate Likelihoods
Approximate Likelihoods Nancy Reid July 28, 2015 Why likelihood? makes probability modelling central l(θ; y) = log f (y; θ) emphasizes the inverse problem of reasoning y θ converts a prior probability
More informationAn Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models
Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong
More informationMATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models
1/13 MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models Dominique Guillot Departments of Mathematical Sciences University of Delaware May 4, 2016 Recall
More information(Part 1) High-dimensional statistics May / 41
Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2
More informationVariational Principal Components
Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationIntegrated Non-Factorized Variational Inference
Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014
More informationThe lasso, persistence, and cross-validation
The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University
More informationRegularization in Cox Frailty Models
Regularization in Cox Frailty Models Andreas Groll 1, Trevor Hastie 2, Gerhard Tutz 3 1 Ludwig-Maximilians-Universität Munich, Department of Mathematics, Theresienstraße 39, 80333 Munich, Germany 2 University
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 5, 06 Reading: See class website Eric Xing @ CMU, 005-06
More informationA Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression
A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent
More informationMultivariate Bernoulli Distribution 1
DEPARTMENT OF STATISTICS University of Wisconsin 1300 University Ave. Madison, WI 53706 TECHNICAL REPORT NO. 1170 June 6, 2012 Multivariate Bernoulli Distribution 1 Bin Dai 2 Department of Statistics University
More informationLearning Markov Network Structure using Brownian Distance Covariance
arxiv:.v [stat.ml] Jun 0 Learning Markov Network Structure using Brownian Distance Covariance Ehsan Khoshgnauz May, 0 Abstract In this paper, we present a simple non-parametric method for learning the
More informationPre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models
Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable
More informationStability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models
Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models arxiv:1006.3316v1 [stat.ml] 16 Jun 2010 Contents Han Liu, Kathryn Roeder and Larry Wasserman Carnegie Mellon
More informationBayesian structure learning in graphical models
Bayesian structure learning in graphical models Sayantan Banerjee 1, Subhashis Ghosal 2 1 The University of Texas MD Anderson Cancer Center 2 North Carolina State University Abstract We consider the problem
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationVariational Learning : From exponential families to multilinear systems
Variational Learning : From exponential families to multilinear systems Ananth Ranganathan th February 005 Abstract This note aims to give a general overview of variational inference on graphical models.
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationNear Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing
Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationRegularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008
Regularized Estimation of High Dimensional Covariance Matrices Peter Bickel Cambridge January, 2008 With Thanks to E. Levina (Joint collaboration, slides) I. M. Johnstone (Slides) Choongsoon Bae (Slides)
More informationdans les modèles à vraisemblance non explicite par des algorithmes gradient-proximaux perturbés
Inférence pénalisée dans les modèles à vraisemblance non explicite par des algorithmes gradient-proximaux perturbés Gersende Fort Institut de Mathématiques de Toulouse, CNRS and Univ. Paul Sabatier Toulouse,
More information