QUASI-BAYESIAN ESTIMATION OF LARGE GAUSSIAN GRAPHICAL MODELS. (Apr. 2018; first draft Dec. 2015) 1. Introduction

Size: px
Start display at page:

Download "QUASI-BAYESIAN ESTIMATION OF LARGE GAUSSIAN GRAPHICAL MODELS. (Apr. 2018; first draft Dec. 2015) 1. Introduction"

Transcription

1 QUASI-BAYESIAN ESTIMATION OF LARGE GAUSSIAN GRAPHICAL MODELS YVES F. ATCHADÉ Apr. 08; first draft Dec. 05 Abstract. This paper deals with the Bayesian estimation of high dimensional Gaussian graphical models. We develop a quasi-bayesian implementation of the neighborhood selection method of Meinshausen and Buhlmann 006 for the estimation of large Gaussian graphical models. The method produces a product-form quasi-posterior distribution that can be efficiently explored by parallel computing. We derive a non-asymptotic bound on the contraction rate of the quasi-posterior distribution. The result shows that the proposed quasi-posterior distribution contracts towards the true precision matrix at a rate given by the worst contraction rate of the linear regressions that are involved in the neighborhood selection. We develop a Markov Chain Monte Carlo algorithm for approximate computations, following an approach from Atchadé 05. We illustrate the methodology with a simulation study.. Introduction We consider the problem of fitting large Gaussian graphical models with diverging number of parameters from limited data. This amount to estimating a sparse precision matrix ϑ M + p from p-dimensional Gaussian observations y i R p, i =,..., n, where M + p denotes the cone of R p p of symmetric positive inite matrices. The frequentist approach to this problem has generated an impressive literature over the last decade or so see for instance Bühlmann and van de Geer 0; Hastie et al. 05 and the reference therein. There is an interest, particularly in biomedical research, for statistical methodologies that can allow practitioners to incorporate external information in fitting such graphical models Mukherjee and Speed 008; Peterson et al. 05. This problem naturally calls for a Bayesian formulation and significant progress has been made in 000 Mathematics Subject Classification. 60F5, 60G4. Key words and phrases. Gaussian graphical models, Quasi-Bayesian inference,pseudo-likelihood, posterior contraction, forward-backward approximation, Markov Chain Monte Carlo. Y. F. Atchadé: University of Michigan, 085 South University, Ann Arbor, 4809, MI, United States. address: yvesa@umich.edu.

2 YVES F. ATCHADÉ recent years Dobra et al. 0; Lenkoski and Dobra 0; Khondker et al. 03; Peterson et al. 05; Banerjee and Ghosal 05. However, most existing Bayesian methods for fitting graphical models do not scale well with the number of nodes in the graph. The main difficulty is computational, and hinges on the ability to handle interesting prior distributions on M + p when p is large. The most commonly used class of priors distributions for Gaussian graphical models is the class of G-Wishart distributions Atay-Kayis and Massam 005. However G-Wishart distributions have intractable normalizing constants, and become impractical for inferring large graphical models, due to the cost of approximating the normalizing constants Dobra et al. 0; Lenkoski and Dobra 0. Following the development of the Bayesian lasso of Park and Casella 008 and other Bayesian shrinkage priors for linear regressions Carvalho et al. 00, several authors have proposed prior distributions on M + p obtained by putting conditionally independent shrinkage priors on the entries of the matrix, subject to a positive initeness constraint Khondker et al. 03. However this approach does not give a direct estimation of the graph structure, which in many applications is the key quantity of interest. Furthermore, dealing with the positive initeness constraint in the posterior distribution requires careful MCMC design, and becomes a limiting factor for large p. The above discussion suggests when dealing with large graphical models, some form of approximate inference is inescapable. Building on Atchade 07, we propose a quasi-bayesian approach for fitting large Gaussian graphical models. Our general approach to the problem consists in working with a larger pseudo-model { ˇf θ, θ ˇΘ}, where ˇΘ M + p. By pseudo-model we mean that the function z ˇf θ z is typically not a density on R p, but ˇf θ is chosen such that the function θ n i= log ˇf θ y i is a good candidate for M-estimation of ϑ. The enlargement of the model space from M + p to ˇΘ allows us to relax the positive initeness constraint. With a prior distribution Π on ˇΘ, we obtain a quasi-posterior distribution denoted ˇΠ n,p. ˇΠn,p is not a posterior distribution in the strict sense of the term since ˇf θ is not a proper likelihood function. In the specific case of Gaussian graphical models, we take ˇΘ as the space of matrices with positive diagonals, and take z ˇf θ z as the pseudo-model underpinning the neighborhood selection method of Meinshausen and Buhlmann 006. This choice gives a quasi-posterior distribution ˇΠ n,p that factorizes, and leads to a drastic improvement in the computing time needed for MCMC computation when a parallel computing architecture is used. We illustrate the method in Section 4 using simulated data where the number of nodes in the graph is p {00, 500, 000}. The idea of replacing the likelihood function by a pseudo-likelihood or quasilikelihood function in a Bayesian inference is not new and has been developed in other

3 QUASI-BAYESIAN ESTIMATION OF LARGE GAUSSIAN GRAPHICAL MODELS 3 contexts, such as in Bayesian semi-parametric inference Kato 03; Li and Jiang 04, and the references therein, in approximate Bayesian computation Fearnhead and Prangle 00. See also Atchade 07 for more references, and for contraction properties. We study the contraction properties of the quasi-posterior distribution ˇΠ n,p as n, p. Under the assumption that there exists a well-conditioned true precision matrix, we show that ˇΠ n,p contracts at the rate s logp n see Theorem 3 for a precise statement, where s can be viewed as an upper-bound on the largest degree in the un-directed graph ined by the true precision matrix. This convergence rate corresponds to the worst convergence rate that one gets from the Bayesian analysis of the linear regressions involved in the neighborhood selection. The condition on the sample size n for the results mentioned above to hold is n = O s logp, which shows that the quasi-posterior distribution can concentrate around the true precision matrix, even in cases where p exceeds n. The rest of the paper is organized as follows. Section provides a general discussion of quasi-models and quasi-bayesian inference. The section ends with the introduction of the proposed quasi-bayesian distribution, based on the neighborhood selection method of Meinshausen and Buhlmann 006. We specialized the discussion to Gaussian graphical models in Section 3. The theoretical analysis focuses on the Gaussian case, and is presented in Section 3, but the proofs are postponed to Section 5. The simulation study is presented in Section 4. A MATLAB implementation of the method is available from the author s website.. Quasi-Bayesian inference of graphical models For integers p, and i {,..., p}, let Y i be a nonempty subset of R, and set Y = Y Y p, that we assume is equipped with a reference sigma-finite product measure dy. We first consider a class of Markov random field distributions {f ω, ω Ω} for joint modeling of Y-valued random variables. Let M p denote the set of all real symmetric p p matrices equipped with the inner product A, B F = i j A ib ij, and norm A F = A, A F. As above, M + p denotes the subset of M p of positive inite matrices. For i =,..., p, and j < k p, let B i : Y i R and B jk : Y j Y k R be non-zero measurable functions that we assume known. The contraction rate is measured in the L, matrix norm, ined as the largest L column norms

4 4 YVES F. ATCHADÉ From these functions we ine a M p -valued function B : Y M p by B i y i if i = j, By ij = B ij y i, y j if i < j, B ji y j, y i, if j < i. These functions ine the parameter space { Ω = ω M p : Zω = Y e ω, By } F dy <. We assume that Y and B are such that Ω is non-empty, and we consider the exponential family {f ω, ω Ω} of densities f ω on Y given by f ω y = exp ω, By F log Zω, y Y. The model {f ω, ω Ω} can be useful to capture the dependence structure between a set of p random variables taking values in Y. If Y,..., Y p f ω, then the parameter ω encodes the conditional independence structure among the p variables Y,..., Y p. In particular for i j, ω ij = 0 means that Y i and Y j are conditionally independent given all other variables. The random variables Y,..., Y p can then be represented by an undirect graph where there is an edge between i and j if and only if ω ij 0. This type of models are very useful in practice to tease out direct and indirect connections between sets of random variables. The version posited in can accommodate mixed measurements where some of the y i take discrete values while other take continuous values. Example Gaussian graphical models. One recovers the Gaussian graphical model by taking Y i = R, B i x = x /, B ij x, y = xy, i < j. In this case Y = R p equipped with the Lebesgue measure, and Ω = M + p. Example Potts models. For integer M, one recovers the M-states Potts model by taking Y i = {,..., M}. In this case, Y = {,..., M} d equipped with the counting measure. Since Y is a finite set, we have Ω = M p. An important special case of the Potts model is a version of the Ising model where M =, and B i x = x, and B ij x, y = xy. Suppose that we observe data y,, y n where y i = y i,..., yi p Y is viewed as a column vector. We set x = [y,..., y n ] R n p. Given a prior distribution Π on Ω, and given the data x, the resulting posterior distribution for learning ω is n Π n A x = A i= f ωy i Πdω n i= f ωy i Πdω, A Ω. Ω

5 QUASI-BAYESIAN ESTIMATION OF LARGE GAUSSIAN GRAPHICAL MODELS 5 However, and as discussed in the introduction, this posterior distribution is typically intractable. In the frequentist literature, a commonly used approach to circumvent computational difficulties with graphical models consists in replacing the likelihood function by a pseudo-likelihood function. For ω M p, let ω i denote the i-th column of ω. Note that in the present case, if Y,..., Y p f ω, then for j p, the conditional distribution of Y j given {Y k, k j} depends on ω only through the j-th column ω j. We write this conditional distribution as u f j ω j u y j, where for y Y, y j = y,..., y j, y j+,..., y p, with obvious modifications when j =, p. Let Ω = { ω M p : u f j ω j u y j is a well-ined density on Y j, for all y Y, and all j p}. Note that Ω Ω. The most commonly used pseudo-likelihood method consists in replacing the initial likelihood contribution f ω y i by f ω y i = p f j y i ω j j j= y i j, ω Ω. This pseudo-likelihood approach typically brings important simplifications. For instance, in the Gaussian case, the parameter space Ω corresponds to the space of symmetric matrices with positive diagonals elements, which has a simpler geometry compared to M + p. And in the case of discrete graphical models, the conditional models typically have tractable normalizing constants. The idea goes back at least to Besag 974, and penalized versions of pseudo-likelihood functions have been employed by several authors to fit high-dimensional graphical models. In a Bayesian setting this approach works well for small to moderate size graphs. The issue is that the space Ω M p grow as Op, and MCMC simulation for exploring probability distributions on such very large spaces is inherently a difficult problem for example, for only p = 00 the dimension of Ω is larger than 0 4. A related pseudo-likelihood for this problem is suggested by the neighborhood selection of Meinshausen and Buhlmann 006. The idea consists in relaxing the symmetry constraint in Ω. For j p, we set Ω j = { θ R p : u f j θ u y j is a well-ined density on Y j, for all y Y, and all j p}. We note that if ω Ω, then ω j Ω j. Hence these sets Ω j are nonempty, and we ine ˇΩ = Ω Ω p, that we identify as a subset of the space of p p real

6 6 YVES F. ATCHADÉ matrices R p p. In particular if ω ˇΩ, and consistently with our notation above, ω,j denotes the j-column of ω. We consider the pseudo-model { ˇf ω, ω ˇΩ}, where ˇf ω y = p j= f j ω j y j y j, ω ˇΩ, y Y. 3 One can then maximize a penalized version of ω n i= log ˇf ω y i, and this corresponds to the neighborhood selection method of Meinshausen and Buhlmann 006, see also Sun and Zhang 03. Notice that by inition ˇΩ is a product space, whereas Ω is not, due to the symmetry constraint. This implies that ω ˇf ω y factorizes along the columns of ω, whereas ω f ω y typically does not. One can take advantage of this separability for fast computation if the penalty is also separable. With a prior distribution Π on ˇΩ, the quasi-likelihood function ω ˇf ω leads to a quasi-posterior distribution given by n ˇΠ ˇf A i= ω y i Πdω n,p A x = n ˇf i= ω i Πdω, A ˇΩ. ˇΩ Let us assume that the prior distribution factorizes: Πdω = p j= Π jω j. Then we are led to the quasi-posterior distribution p ˇΠ n,p du, du p x = ˇΠ n,p,j du j x, 4 where ˇΠ n,p,j du x = n i= f j u j= Ω j n i= f j u y i j y i y i j j Π jdu y i j Π jdu, is a probability measure on Ω j. Basically, relaxing the symmetry allows us to factorize the quasi-likelihood function and this leads to a factorized quasi-posterior distribution, as in 4. Each component of this quasi-posterior distribution can then be explored independently. Despite its simplicity, when used in a parallel computing environment, this approach increases by one order of magnitude the size of graphical models that can be estimated. 3. Gaussian graphical models Here we consider the Gaussian case where Y i = R, B i x = x /, and B ij x, y = xy. Hence in this case, Ω = M + p, Ω corresponds to the set of symmetric matrices with positive diagonal elements, and ˇΩ is the space of p p real matrices not necessarily symmetric with positive diagonal. Assuming that the diagonal elements are known and given, we shall identify ˇΩ with the matrix space R p p.

7 QUASI-BAYESIAN ESTIMATION OF LARGE GAUSSIAN GRAPHICAL MODELS 7 If ϑ M + p, and Y,..., Y p f ϑ, it is well known that for all j {,..., p}, the conditional distribution of Y j given all other Y k = y k, for k j is N k j ϑ kj ϑ jj y k, ϑ jj, 5 where Nµ, σ denotes the Gaussian distribution with mean µ and variance σ. Given data x R n p, given σj > 0, and given these conditional distributions, the product of the quasi-model 3 across the data set gives upto normalizing constants that we ignore the quasi-likelihood qθ; x = p q j θ j ; x, j= with q j θ j ; x = exp σj x j x j θ j, θ R p p, 6 where x j R n p is the matrix obtained from x by removing the j-th column, and x j resp. θ j denotes the j-column of x resp. θ. Given 5, it is clear that σj is a proxy for /ϑ jj. For the time being, we shall assume that the variance terms ϑ jj are known, and we will set σj = /ϑ jj. In practice we use an empirical Bayes approach described below whereby ϑ jj is obtained from the data. We combine 6 with a prior distribution Πdθ = p j= Π jdθ j to obtain a quasi-posterior distribution on R p p given by p ˇΠ n,p dθ x = ˇΠ n,p,j dθ j x, σj, 7 where ˇΠ n,p,j x, σ j is the probability measure on Rp given by j= ˇΠ n,p,j dz x, σ j q j z; xπ j dz. Again the main appeal of ˇΠ n,p is its factorized form, which implies that Monte Carlo samples from ˇΠ n,p can be obtained by sampling in parallel from the p distributions ˇΠ n,p,j. 3.. Prior distribution. We address here the choice of the prior distribution Π j. For each j {,..., p}, we build the prior Π j on R p as in Castillo et al. 05. First, let p = {0, } p, and let {π δ, δ p } denote a discrete probability distribution on p which we assume to be the same for all the components j. We take Π j as

8 8 YVES F. ATCHADÉ the distribution of the random variable u R p obtained as follows. δ :p i.i.d. Berq. Given δ, u,..., u p are conditionally independent Dirac0 if δ k = 0 and u k δ Laplace if δ k =, 8 where q 0, and ρ j > 0 are hyper-parameter, σj is as in 6, Dirac0 is the Dirac measure on R with mass at 0, and for ρ > 0, Laplaceρ denotes the Laplace distribution with density ρ/ ρ x, x R. 3.. Posterior contraction and rate. We study here the behavior of the posterior distribution given in 7, for large n, p and when the prior is as in 8. We will assume that the observed data matrix x R n p is a realization of random matrix X the rows of which are i.i.d. random vectors from a mean-zero Gaussian distribution on R p with precision matrix ϑ, with known diagonal elements. More precisely, H. For some ϑ M + p, X = Zϑ /, where Z R n p is a random matrix with i.i.d. standard normal entries. From the true precision matrix ϑ, we now derive the true value of the parameter θ R p p towards which ˇΠ n,p is expected to converge. For j =,..., p, θ kj = ϑ kj /ϑ jj, for k =,..., j, and θ kj = ϑ k+j /ϑ jj, for k = j,..., p. Let δ {0, } p p be the sparsity structure of θ, ined as δ kj = { θ kj >0}. We set s j = p k= ρ j σ j { θ kj >0}, j =,..., p and s = max j p s j. Hence s j is the degree of node j, and s is the maximum node degree in the undirected graph ined by ϑ. The asymptotic behavior of ˇΠ n,p depends crucially on certain restricted and m-sparse eigenvalues of the true precision matrix ϑ, that we introduce next. We set κ = inf and for s p, { u ϑu = inf κs u u ϑu u : u R p, u 0, s.t. } : u R p, u 0 s, k: δ,k =0 u k 7 k: δ,k = u k, 9 { κs u } ϑu = sup u : u R p, u 0 s. 0 In the above equations, we convene that inf = +, and sup = 0.

9 QUASI-BAYESIAN ESTIMATION OF LARGE GAUSSIAN GRAPHICAL MODELS 9 We study the contraction of ˇΠ n,p in the norm θ = max j p θ j. Theorem 3. Assume H and 8 with q = for some absolute constant u > 0, p u+ and ρ j = max k p X k 4 logp ϑ jj, j p. For j p, suppose that σj = /ϑ jj, and set ζ j = 4 u + s j κ u κ + κs 4logp s j, κ s j = s j + ζ j, and s = max j p s j. If minκ, > 0, then there exist absolute κ s constants a 0 > 0, a > 0, a > 0, M 0 such that for all p a 0, and n a s + κ logp, κ the following two statements hold: [ { } E ˇΠn,p θ R p p : θ j 0 ζ j for some j ] X, 3 e a n + p [ { } ] E ˇΠn,p θ R p p : θ θ > M 0 ɛ X 3 e a n p where ɛ > 0 is given Proof. See Section 5... ɛ = κ s logp κ s. n Remark 4. Under H and the assumed prior, 3 says that for n, p large, if θ ˇΠ n,p X, then with high probability θ j 0 < ζ j for all j {,..., p}. Note that if κ/κ and κs logp / κ are small meaning ϑ is well-conditioned then ζ j is of the same order as s,j. In other words the main conclusion of 3 is that ˇΠ n,p X concentrates most of its probability mass on matrices that are sparse with a sparsity structure that mirrors that of ϑ, provided that ϑ is well-conditioned. The behavior of the posterior distribution in practice suggests that the large constant 69 appearing in the theorem is most likely an artifact of the techniques used in the proof, and can probably be improved. 4 says that the contraction rate of ˇΠ n,p towards θ in the norm is Oɛ. This corresponds to the worst among the rates of contraction of the p linear regression

10 0 YVES F. ATCHADÉ problems performed during the neighborhood selection procedure. This rate is similar to the rate of convergence of the frequentist neighborhood selection method of Meinshausen and Buhlmann 006, which is of order s logp, 5 n see the discussion in Section 3.4 of Ravikumar et al Numerical experiments 4.. Fully Bayesian quasi-posterior distribution. Recall that the prior distribution of δ k is δ k Berq, with q = p u. The posterior distribution ˇΠ n,p is fairly robust to the choice of u, so throughout the simulations, we set u = 0.5. In contrast ˇΠ n,p is sensitive to ρ j, so we use a fully Bayesian approach, with a prior distribution ρ j φ, where φ is the uniform distribution Ua, a for a = 0 5, and a = 0 5. Given σj, we obtain a fully specified quasi-posterior distribution p Π n,p,j δ, dθ, dρ j x, σj, 6 j= where the j-th component Π n,p,j x, σ j can be written as follows. For δ p, let µ δ be the product measure on R p ined as µ δ du = p j= ν δ j du j, where ν 0 dz is the Dirac mass at 0, and ν dz is the Lebesgue measure on R. Then Π n,p,j δ, dθ, dρ j x, σ j q j θ; xq δ q p δ 0 ρ j σ j δ e ρ j σ j θ φρj µ δ dθdρ j. 7 The quasi-posterior distribution 7 depends on the choice of σj. Ideally we would like to set σj = /ϑ jj. However this quantity is unknown. In the simulation we choose σj by empirical Bayes. More precisely, following Reid et al. 03 we estimate σ j by ˆσ j = n ŝ λn x j x j ˆβλn, 8 where ˆβ λ is the lasso estimate at regularization level λ in the linear regression of x j the j-th column of x of x j the remaining columns. In the procedure, λ n is selected by 0-fold cross-validation, and ŝ λn is the number of non-zero components of ˆβ λn. We explore this approach in the simulations. Given j {,..., p}, sampling from the distribution Π n,p,j x given in 7 is a difficult computation task, due the discrete-continuous mixture prior on δ. Here we follow the approach developed by the author in Atchadé 05, which produces

11 QUASI-BAYESIAN ESTIMATION OF LARGE GAUSSIAN GRAPHICAL MODELS approximate samples from 7 by sampling from its forward-backward approximation γ denoted Π n,p,j δ, dθ, dρ j x, σj however other approximations schemes could be used as well Narisetty and He 04; Schreck et al. 03. The parameter γ 0, /4] controls the quality of the approximation. In all the simulation below, we use γ = Simulation set ups. We generate a data matrix x R n p with i.i.d. rows from N0, ϑ, ϑ M + p. Throughout we set the sample size to n = 50, and we consider three settings. a: ϑ is generated as in Setting c below, but using p = 00 nodes. b: In this case p = 500, and we take ϑ from the R-package space based on the work Peng et al These authors have designed a precision matrix ϑ that is modular with 5 modules of 00 nodes each. Inside each module, there are 3 hubs with degree around 5, and 97 other nodes with degree at most 4. The total number of edges is 587. The resulting partial correlations fall within 0.67, 0.0] [0.0, As explained in Peng et al. 009, this type of networks are useful models for biological networks. c: In this case p =, 000, and we build ϑ as follows. First we generate a symmetric sparse matrix B such that the number of off-diagonal non-zeros entries is roughly p. We magnified the signal by adding 3 to all the nonzeros entries of B subtracting 3 for negative non-zero entries. Then we set ϑ = B + ɛ λ min BI p, where λ min B is the smallest eigenvalue of B, with ɛ =. In this example, values of the partial correlations are typically in the range 0.46, 0.8] [0.8, To evaluate the effect of the hyper-parameter σj, we report two sets of results. One where σj = /ϑ jj, and another set of results where ϑ jj is assumed unknown and we select σj from the data, using the cross-validation estimator described in 8. In order to mitigate the uncertainty in some of the results reported below, we repeat all the MCMC simulations 0 times. Hence, to summarize, for each setting a, b, and c, we generate one precision matrix ϑ. Given ϑ, we generate 0 datasets, and for each dataset, we run two MCMC samplers one where the σj s are taken as the /ϑ jjs, and one where they are estimated from the data. The precision matrix used here corresponds to the example Hub network in Section 3 of Peng et al A non-sparse version of ϑ is attached to the space package

12 YVES F. ATCHADÉ 4.3. Computation details. All the simulations were performed on a high-performance computer using 00 cores and Matlab 7.4. γ To simulate from Π n,p,j x, σ j for a given j, we run the MCMC sampler for 50, 000 iterations and discard the first 0, 000 iterations as burn-in. From the MCMC output, we estimate the structure δ {0, } p p as follows. We set the diagonal of δ to one, and for each off-diagonal entry i, j of δ, we estimate δ ij as equal to if the sample average estimate of δ ij from the j-th chain and the sample average estimate of δ ji from the i-th chain are both larger than 0.5. Otherwise δ ij = 0. Obviously, other symmetrization rules could be adopted. Given the estimate ˆδ say, of δ, we estimate ϑ R p p as follows. We set the diagonal of ϑ to /σj. For i j, if ˆδ ij = 0, we set ϑ ij = ϑ ji = 0. Otherwise we estimate ϑ ij = ϑ ji as 0.5 /σj ϑ ij /σi ϑ ji, where ϑ ij resp. ϑji is the Monte Carlo sample average estimate of ϑ ij from the j-th chain resp. i-th chain. For all the off-diagonal components i, j such that ˆδ ij =, Bayesian posterior intervals can also be produced by taking the union of the 95% posterior intervals from the i-th and j-th chains. When ˆδ ij = 0, those confidence intervals are set to {0} Results. We evaluate the behavior of the quasi-posterior distribution 7 on three simulated datasets. As benchmark, we also report the results obtained using the elastic net estimator ˆϑ glasso = Argmin θ M + p log det θ + TrθS + λ i,j α θ ij + ij α θ, where S = /nx x, α = 0.9, and λ > 0 is a regularization parameter. We choose λ by minimizing log det ˆθλ + TrˆθλS + logn i<j { ˆθλ ij >0}, over a finite set of values of λ. Our goal is not to compare the quasi-bayesian method to graphical lasso, since the former utilizes vastly more computing power that the latter. Rather, we report these numbers as references that help better understand the behavior of the proposed methodology. We look at the performance of the method by computing the relative Frobenius norm, the sensitivity and the precision of the estimated matrix as obtained above. These quantities are ined respectively as E = ˆϑ ϑ F ϑ F, SEN = i<j { ϑ ij >0} {sign ˆϑij =signϑ ij } i<j { ϑ ij >0} and PREC = i<j { ˆϑ ij >0} {sign ˆϑ ij =signϑ ij } i<j. { ˆϑ ij >0} 9 ;

13 QUASI-BAYESIAN ESTIMATION OF LARGE GAUSSIAN GRAPHICAL MODELS 3 We average these statistics over the 0 simulations replications. We compute also the same quantities for the elastic net ˆϑ glasso. These results are reported in Table -3. These results suggest that the quasi-bayesian procedure generally has good contraction properties in the Frobenius norm and hence in the L, norm. The results also suggest that the quasi-bayesian procedure tends to produce high false-negatives, but has excellent false-positive rates, even with p =, 000. The same conclusion seems to hold across all three network settings considered in the simulations. We also notice with satisfaction that there seems to be little difference between the results where ϑ jj is assumed known and the results where ϑ jj is estimated from the data. ϑ jj known Empirical Bayes Glasso Relative Error E in % Sensitivity SEN in % Precision PREC in % Table. Table showing the relative error, sensitivity and precision as ined in 9 for Setting a, with p = 00 nodes. Based on 0 simulation replications. Each MCMC run is iterations. ϑ jj known Empirical Bayes Glasso Relative Error E in % Sensitivity SEN in % Precision PREC in % Table. Table showing the relative error, sensitivity and precision as ined in 9 for Setting b, with p = 500 nodes. Based on 0 simulation replications. Each MCMC run is iterations. ϑ jj known Empirical Bayes Glasso Relative Error E in % Sensitivity SEN in % Precision PREC in % Table 3. Table showing the relative error, sensitivity and precision as ined in 9 for Setting c, with p =, 000 nodes. Based on 0 simulation replications. Each MCMC run is iterations.

14 4 YVES F. ATCHADÉ 5. Proof of Theorem 3 First we shall establish from first principle some contraction properties for posterior distributions in linear regression models. We will then reduce the proof of Theorem 3 to the linear regression case. 5.. Posterior contraction of high-dimensional linear regression models. Let X R n p be a design matrix, θ R p, σ0 > 0. Suppose that Z NXθ, σ 0I n. 0 We will write P and E for the probability measure and expectation operator under the distribution of Z assumed in 0. Let = {0, } p, and {ω δ, δ } a probability distribution on. For σ > 0, λ > 0, we consider the posterior distribution Π n θ Z = C n Z e σ Z Xθ e σ Z Xθ λ δ ω δ e λ θ µ δ dθ. δ We make the following assumptions on the distribution {ω δ, δ }. H. For all δ, ω δ = g p. δ 0 δ 0 Furthermore, there exists universal constants c, c, c 3, c 4 such that for all s =,..., p, c p c 3 g s g s c p c 4 g s. Remark 5. We note here that if δ i.i.d. Berq where q = as assumed in 8, then p u+ w δ = g p, δ 0 δ 0 where gs = p s q s q p s. Furthermore it is easy to check that {g s } satisfies the double inequalities in H with c = 0.5, c =, c 3 = u +, and c 4 = u. In other words, the prior distribution chosen in 8 satisfies H. Let δ denote the sparsity structure of θ. That is for all j {,..., p}, δ,j = if and only if θ,j > 0. Set C = θ Rp : θ j 7 θ j, and ine j, δ,j = j, δ,j =0 { v u X } Xu = inf n u, u 0, u C. For integer s, we ine { vs u X } Xu = inf n u, u 0, u 0 s {, vs u X } Xu = sup n u, u 0, u 0 s.

15 QUASI-BAYESIAN ESTIMATION OF LARGE GAUSSIAN GRAPHICAL MODELS 5 Theorem 6. Assume 0 and H, and suppose that v > 0. Then there exists a constant A 0 that depends only on the constants c, c, c 3, c 4 in H such that the following statements hold. For all p A 0, and ζ > 0, E [Π n {θ R p λ σ 4 logp : θ 0 s + ζ} Z] exp 8σ0 max j p X,j + 4 s e λ σ s nv + n vs s p ζ 4c λ σ p c. 4 For all p A 0, M 96, and integer s s such that v s > 0, set κ = nv s σ, ɛ = λ s κ, and A ɛ = {θ R p : θ θ 0 s, θ θ Mɛ}. Then λ σ 4 logp p 9 s e κmɛ /3 E [Π n A ɛ Z] exp 8σ0 max j p X,j + s e κmɛ /3 + + n vs s p p c 3 e κmɛ /64 σ λ e κmɛ /64. Proof. We start the proof with some notations. We set n/ f n,θ z = e σ πσ z Xθ, z R n, θ R p. L n,θ θ; z = log f n,θ z log f n,θ z log f n,θ z, θ θ = n X σ θ θ X θ θ. n We will need the following lemmas which are special cases of respectively Lemma and Lemma 4 of Atchade 07. Lemma 7. The normalizing constant of Π n satisfies for all z R n, C n z ω δ e λ θ λ s λ + n vs, σ where s = θ 0. Lemma 8 Existing of test. Fix M, s s an integer, and suppose that v s > 0. Set κ = nv s σ, ɛ = λ s κ. s c s

16 6 YVES F. ATCHADÉ There exists a measurable function φ : R n [0, ] such that E φz p 9 s e κmɛ /3 s e κmɛ /3. Furthermore, for any θ R p such that θ θ 0 s, θ θ > jmɛ, for some j, [ φz] f n,θ zdz e κjmɛ /3. E Proof of Theorem 6-Part. Set B = {θ R p : θ 0 s + ζ}, and E = { z R n : log f n,θ z λ }. We set κ = n vs /σ. By Lemma 7, and Fubini s theorem, E [Π n B Z] P Z / E + ω δ + κ λ s δ: δ 0 s +ζ ω δ λ δ 0 R p E [ ] fn,θ Z e λ θ f n,θ Z EZ e λ θ µ δdθ The integrand of the integral in the last displayed equation is upper bounded by Ψθ λ [ ] = exp θ θ + λ θ λ θ E e L n,θ θ;z E Z. We have λ θ θ + λ θ λ θ δc θ θ + 3 δ θ θ. Hence, if θ θ / C, using the concavity of L n,θ, Ψθ e λ 4 θ θ e λ 4 δc θ θ + 7λ 4 δ θ θ e λ 4 θ θ. However, if θ θ C, then ] E [e L n,θ θ;z E Z e nv σ θ θ, and Ψθ e λ θ θ e s λ θ θ nv σ θ θ e λ s κ e λ θ θ,

17 QUASI-BAYESIAN ESTIMATION OF LARGE GAUSSIAN GRAPHICAL MODELS 7 where κ = nv/σ. We conclude that E [Π n B Z] P Z / E + e λ s κ + κ λ s ω δ δ: δ 0 s +ζ ω δ λ P Z / E + e λ s κ δ 0 e λ R p + κ λ s ω δ 4 θ θ µ δ dθ, δ: δ 0 s +ζ ω δ 4 δ 0. Using H, ω δ δ: δ 0 s +ζ ω δ 4 δ 0 = p s g s d j=s +ζ 4 j g j d s g s p j=s +ζ j s 4 j c p c g s 4 p d = 4 s s j=s +ζ j s 4c. p c 4 For p large enough so that 4c p c 4 that <, we have d j=s +ζ j s 4c p c 4 4c ζ. p c 4 It follows e λ s κ + κ λ s ω δ δ: δ 0 s +ζ ω δ 4 δ 0 4 s e λ s κ + κ s p ζ 4c λ s p c. 4 It remains only to bound the term P Z / E. Since log f n,θ z = X z Xθ/σ, and since Z N0, σ0 I n, standard Gaussian exponential bounds give λ σ 4 P Z / E p exp 8σ0 max j p X,j. Proof of Theorem 6-Part. We set κ = n vs σ, κ = nv s σ, ɛ = λ s κ. We also set A ɛ = {θ R p : θ θ 0 s, θ θ > Mɛ}. We have Π n A ɛ Z E Z + E ZΠ n A ɛ Z E Z + φz + E Z φzπ n A ɛ Z.

18 8 YVES F. ATCHADÉ Then by Lemma 7, and Fubini s theorem, E [Π n A ɛ Z] P Z / E + E φz + + κ s λ δ 0 [ ω δ λ ω δ δ We write A ɛ = j A ɛ j, where A ɛ E ] φz f e λ θ n,θ zdz e λ θ µ δdθ A ɛ j = {θ R p : θ θ 0 s, jmɛ < θ θ j + Mɛ}. Therefore, and using Lemma 8, [ ] φz f e λ θ n,θ zdz A ɛ E e λ θ µ δdθ e κ 3 jmɛ e 3λ sjmɛ e λ θ θ µ δ dθ j A ɛj given that M 4. It is easy to check using H that ω δ δ 0 p p c 3 ω δ We can then conclude that δ s c. 4 δ 0 e κ 64 jmɛ, λ j E [Π n A ɛ Z] P Z / E + E φz + + κ s p p c 3 λ e κ 3 jmɛ, s as claimed. 5.. Proof of Theorem 3. We rely on the behavior of some restricted and m-sparse eigenvalues concepts that we introduce first. For z R n q, for some q, and for s, we ine κs, z = inf δ {0,} q : δ 0 s inf and z = inf κs, { θ z zθ n θ θ z zθ n θ : θ R q, θ 0, } : θ R q, θ 0 s, { κs, z θ z zθ = sup n θ k: δ k =0 c j θ k 7 k: δ k = θ, } : θ R q, θ 0 s.

19 QUASI-BAYESIAN ESTIMATION OF LARGE GAUSSIAN GRAPHICAL MODELS 9 In the above inition, we convene that inf = +, and sup = 0. We are interested in the behavior of κs, X, κs, X and κs, X, when X is the random matrix obtained from assumption H. We will use the following result taken from Raskutti et al. 00 Theorem, and Rudelson and Zhou 03 Theorem 3., which relates the behavior of κs, X, κs, X and κs, X to the corresponding term κ, κs and κs of the true precision matrix ϑ introduced in 9-0. Lemma 9. Assume H. Then there exists finite universal constant a > 0, a > 0 such that for the following hold. κ If κ > 0, then for all n a κ s logp P [64κs, X < κ] e a n. Let s p be such that κs > 0, then for all n a s logp, P [4 κs, X < κs or 4 κs, X > 9 κs] e a n Proof of Theorem 3-Part. We have ˇΠ n,p dθ X = p ˇΠ n,p,j dθ j X, j= where for j {,..., p}, ˇΠ n,p,j dθ j X is given by ˇΠ n,p,j du X q j u; X δ ρ j e ρ j σ j u µδ du, and For q, we ine G n,q = δ p π δ σ j log q j u; X = σj X j X j u. { z R n q : κs, z 9 4 κs, κ, z 9 4 κ, and κs, z κ }. 64 For any k j 0, we start by noting that [ { E ˇΠn,p θ R p p : θ j 0 k j, } ] for some j X PX / G n,p + p E [ Gn,p XˇΠ n,p,j A j X ]. where A j = {u R p : u 0 k j }. We notice that if X G n,p, then X j G n,p for any j p. We recall that the notation X j denotes the matrix obtained by j=

20 0 YVES F. ATCHADÉ removing the j column of X. Hence E [ Gn,p XˇΠ n,p,j A j X ] [ ] E Gn,p X j ˇΠ n,p,j A j X We conclude that [ { E ˇΠn,p θ R p p : θ j 0 k j, where [ = E Gn,p X j E ˇΠn,p,j A j X X j]. } ] for some j X PX / G n,p + p j= T j = E ˇΠn,p,j A j X X j. ] E [ Gn,p X j T j, 3 The key idea of the proof is to notice that T j is an expected quasi-posterior probability in the linear regression model X j = X j β +η, where η N0, /ϑ jj I n. Therefore, by Theorem 6-Part, we have ϑ jj ρ j T j p exp 8 max k j X k + 4 s j + σ j L j ρ j s j e ρ j s j τ j σ j p kj s j 4c, 4 where L j = n κs, X j, and τ j = nκs, X j. Given the choice of ρ j, we see that the first term on the right-hand side of 4 is bounded by s j p c 4 p exp 3 logp = p, Using the fact that for X j G n,p, we have L j 9/4n κs, τ j /64nκ, it is easy to show that the second term on the right-hand side of 4 is bounded by [ 69 exp s j logp σj ϑ jj κs κ + σ j ϑ ] jj κs j 4logp κ + log4ep c 4 logp k j s j logp. With k j = ζ j as given in the statement of the theorem, this latter expression is bounded by /p. This conclude the proof.

21 QUASI-BAYESIAN ESTIMATION OF LARGE GAUSSIAN GRAPHICAL MODELS 5... Proof of Theorem 3-Part. We use the same approach as above. We ine s j = s j + ζ j s j = if s j = 0, and s = max j s j, and we set G n,q = {z R n q : κs, z 94 κs, and z 4 κ s }. κ s, We also ine U = {θ R p p : θ j θ j > ɛ j, for some j}, Ū = U {θ R p p : θ j θ j 0 s j + ζ j for all j}, and ˇΠ n,p U X ˇΠ n,p {θ R p p : θ j θ j 0 > s j + ζ j for some j} X + G c n,p X + Gn,p XˇΠ n,p Ū X. 5 If for some j, θ j θ j 0 > s j + ζ j, then we necessarily have θ j 0 > ζ j. Therefore, by Theorem 3, we have: [ ] E ˇΠn,p {θ R p p : θ j θ j 0 > s j + ζ j for some j} X e a n + 4 p. 6 By Lemma 9, for n a s logp, [ ] E G c n,p X = P [X / G n,p ] e a n. 7 It remains to control the last term on the right-hand side of 5. To do so, we note that if X G n,p, then X j G n,p for all j p. Hence E [ Gn,p XˇΠ n,p Ū X ] p j= p j= [ ] E Gn,p X j ˇΠ n,p,j A j X [ E Gn,p X j E ˇΠn,p,j A j X X j], 8 where A j = {u R p : u θ j > ɛ j, and u θ j 0 s j }. As in the proof of Theorem 3, we note that under the conditional distribution of X j given X j, the term ˇΠ n,p,j A j X can be viewed as the posterior distribution in the linear regression model X j = X j β + η, where η N0, /ϑ jj I n. Therefore, using Theorem 6-Part, and for any constant M 0 96, we have E ˇΠn,p,j A j X X j ϑ jj ρ j p exp 8 max k j X k + e s j log9p e M 0 τ j ɛ j 3 p c 3 s j e M 0 τ j ɛ j 3 p + s j c + σ j L j ρ j s j e M 0 τ j ɛ j 64 e M 0 τ j ɛ j 64, 9

22 YVES F. ATCHADÉ where ɛ j = ρ j s / j τ j, τ j = n κ s j, X j, and L j = n κs j, X j. As seen in the proof of Theorem 3, the first term on the right-hand side of 9 is upper bounded by /p. We have Hence for p 4e, and 54M 0 3 M 0 τ j ɛ j 3 σ j ϑ jj 54M 0 3 is also upper bounded by /p. For 54M 0 3 by 4 exp [ s j logp by choosing 54M 0 64 σ j ϑ jj + c 3 + σ j ϑ jj 4logp σ j ϑ jj s j logp. 4, the second term on the right-hand side of 9 σ j ϑ jj 4, the third term is upper bounded κs j 54M 0 κ 64 + c 4 + c 3. This conclude the proof. ] σj ϑ s j logp jj p, Acknowledgements The author would like to thank Shuheng Zhou for very helpful conversations. This work is partially supported by the NSF, grants DMS 864 and DMS References Atay-Kayis, A. and Massam, H A Monte Carlo method for computing the marginal likelihood in nondecomposable Gaussian graphical models. Biometrika Atchade, Y. A. 07. On the contraction properties of some high-dimensional quasi-posterior distributions. Ann. Statist Atchadé, Y. F. 05. A Moreau-Yosida approximation scheme for highdimensional quasi-posterior distributions. ArXiv e-prints. Banerjee, S. and Ghosal, S. 05. Bayesian structure learning in graphical models. Journal of Multivariate Analysis Besag, J Spatial interaction and the statistical analysis of lattice systems. J. Roy. Statist. Soc. Ser. B Bühlmann, P. and van de Geer, S. 0. Statistics for high-dimensional data. Springer Series in Statistics, Springer, Heidelberg. Methods, theory and applications. Carvalho, C. M., Polson, N. G. and Scott, J. G. 00. The horseshoe estimator for sparse signals. Biometrika

23 QUASI-BAYESIAN ESTIMATION OF LARGE GAUSSIAN GRAPHICAL MODELS 3 Castillo, I., Schmidt-Hieber, J. and van der Vaart, A. 05. Bayesian linear regression with sparse priors. Ann. Statist Dobra, A., Lenkoski, A. and Rodriguez, A. 0. Bayesian inference for general Gaussian graphical models with application to multivariate lattice data. J. Amer. Statist. Assoc Fearnhead, P. and Prangle, D. 00. Constructing summary statistics for approximate bayesian computation: Semi-automatic abc. Technical Report, Lancaster University, UK. Hastie, T., Tibshirani, R. and Wainwright, M. 05. Statistical Learning with Sparsity: The Lasso and Generalizations. Chapman and Hall/CRC. Kato, K. 03. Quasi-Bayesian analysis of nonparametric instrumental variables models. Ann. Statist Khondker, Z. S., Zhu, H., Chu, H., Lin, W. and Ibrahim, J. G. 03. The Bayesian covariance lasso. Stat. Interface Lenkoski, A. and Dobra, A. 0. Computational aspects related to inference in Gaussian graphical models with the G-Wishart prior. J. Comput. Graph. Statist Supplementary material available online. Li, C. and Jiang, W. 04. Model Selection for Likelihood-free Bayesian Methods Based on Moment Conditions: Theory and Numerical Examples. ArXiv e-prints. Meinshausen, N. and Buhlmann, P High-dimensional graphs with the lasso. Annals of Stat Mukherjee, S. and Speed, T. P Network inference using informative priors. Proceedings of the National Academy of Sciences Narisetty, N. and He, X. 04. Bayesian variable selection with shrinking and diffusing priors. Ann. Statist Park, T. and Casella, G The Bayesian lasso. J. Amer. Statist. Assoc Peng, J., Wang, P., Zhou, N. and Zhu, J Partial correlation estimation by joint sparse regression models. Journal of the American Statistical Association Peterson, C., Stingo, F. C. and Vannucci, M. 05. Bayesian inference of multiple gaussian graphical models. Journal of the American Statistical Association Raskutti, G., Wainwright, M. J. and Yu, B. 00. Restricted eigenvalue properties for correlated gaussian designs. J. Mach. Learn. Res

24 4 YVES F. ATCHADÉ Ravikumar, P., Wainwright, M. J., Raskutti, G. and Yu, B. 0. Highdimensional covariance estimation by minimizing l -penalized log-determinant divergence. Electron. J. Stat Reid, S., Tibshirani, R. and Friedman, J. 03. A Study of Error Variance Estimation in Lasso Regression. ArXiv e-prints. Rudelson, M. and Zhou, S. 03. Reconstruction from anisotropic random measurements. IEEE Trans. Inf. Theor Schreck, A., Fort, G., Le Corff, S. and Moulines, E. 03. A shrinkagethresholding Metropolis adjusted Langevin algorithm for Bayesian variable selection. ArXiv e-prints. Sun, T. and Zhang, C.-H. 03. Sparse Matrix Inversion with Scaled Lasso. Journal of Machine Learning Research

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

Bayesian Sparse Linear Regression with Unknown Symmetric Error

Bayesian Sparse Linear Regression with Unknown Symmetric Error Bayesian Sparse Linear Regression with Unknown Symmetric Error Minwoo Chae 1 Joint work with Lizhen Lin 2 David B. Dunson 3 1 Department of Mathematics, The University of Texas at Austin 2 Department of

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

High-dimensional Covariance Estimation Based On Gaussian Graphical Models

High-dimensional Covariance Estimation Based On Gaussian Graphical Models High-dimensional Covariance Estimation Based On Gaussian Graphical Models Shuheng Zhou, Philipp Rutimann, Min Xu and Peter Buhlmann February 3, 2012 Problem definition Want to estimate the covariance matrix

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

Sparse inverse covariance estimation with the lasso

Sparse inverse covariance estimation with the lasso Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty

More information

Extended Bayesian Information Criteria for Gaussian Graphical Models

Extended Bayesian Information Criteria for Gaussian Graphical Models Extended Bayesian Information Criteria for Gaussian Graphical Models Rina Foygel University of Chicago rina@uchicago.edu Mathias Drton University of Chicago drton@uchicago.edu Abstract Gaussian graphical

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

Graphical Model Selection

Graphical Model Selection May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor

More information

Learning discrete graphical models via generalized inverse covariance matrices

Learning discrete graphical models via generalized inverse covariance matrices Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Stochastic Proximal Gradient Algorithm

Stochastic Proximal Gradient Algorithm Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

Causal Inference: Discussion

Causal Inference: Discussion Causal Inference: Discussion Mladen Kolar The University of Chicago Booth School of Business Sept 23, 2016 Types of machine learning problems Based on the information available: Supervised learning Reinforcement

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

Posterior convergence rates for estimating large precision. matrices using graphical models

Posterior convergence rates for estimating large precision. matrices using graphical models Biometrika (2013), xx, x, pp. 1 27 C 2007 Biometrika Trust Printed in Great Britain Posterior convergence rates for estimating large precision matrices using graphical models BY SAYANTAN BANERJEE Department

More information

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω

More information

Bayesian model selection in graphs by using BDgraph package

Bayesian model selection in graphs by using BDgraph package Bayesian model selection in graphs by using BDgraph package A. Mohammadi and E. Wit March 26, 2013 MOTIVATION Flow cytometry data with 11 proteins from Sachs et al. (2005) RESULT FOR CELL SIGNALING DATA

More information

High dimensional Ising model selection

High dimensional Ising model selection High dimensional Ising model selection Pradeep Ravikumar UT Austin (based on work with John Lafferty, Martin Wainwright) Sparse Ising model US Senate 109th Congress Banerjee et al, 2008 Estimate a sparse

More information

Introduction to graphical models: Lecture III

Introduction to graphical models: Lecture III Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04

More information

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1

More information

On Bayesian Computation

On Bayesian Computation On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints

More information

Robust Inverse Covariance Estimation under Noisy Measurements

Robust Inverse Covariance Estimation under Noisy Measurements .. Robust Inverse Covariance Estimation under Noisy Measurements Jun-Kun Wang, Shou-De Lin Intel-NTU, National Taiwan University ICML 2014 1 / 30 . Table of contents Introduction.1 Introduction.2 Related

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Comment on Article by Scutari

Comment on Article by Scutari Bayesian Analysis (2013) 8, Number 3, pp. 543 548 Comment on Article by Scutari Hao Wang Scutari s paper studies properties of the distribution of graphs ppgq. This is an interesting angle because it differs

More information

Genetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig

Genetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig Genetic Networks Korbinian Strimmer IMISE, Universität Leipzig Seminar: Statistical Analysis of RNA-Seq Data 19 June 2012 Korbinian Strimmer, RNA-Seq Networks, 19/6/2012 1 Paper G. I. Allen and Z. Liu.

More information

Robust and sparse Gaussian graphical modelling under cell-wise contamination

Robust and sparse Gaussian graphical modelling under cell-wise contamination Robust and sparse Gaussian graphical modelling under cell-wise contamination Shota Katayama 1, Hironori Fujisawa 2 and Mathias Drton 3 1 Tokyo Institute of Technology, Japan 2 The Institute of Statistical

More information

Geometric ergodicity of the Bayesian lasso

Geometric ergodicity of the Bayesian lasso Geometric ergodicity of the Bayesian lasso Kshiti Khare and James P. Hobert Department of Statistics University of Florida June 3 Abstract Consider the standard linear model y = X +, where the components

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

Graphical Models for Non-Negative Data Using Generalized Score Matching

Graphical Models for Non-Negative Data Using Generalized Score Matching Graphical Models for Non-Negative Data Using Generalized Score Matching Shiqing Yu Mathias Drton Ali Shojaie University of Washington University of Washington University of Washington Abstract A common

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Bayesian Inference of Multiple Gaussian Graphical Models

Bayesian Inference of Multiple Gaussian Graphical Models Bayesian Inference of Multiple Gaussian Graphical Models Christine Peterson,, Francesco Stingo, and Marina Vannucci February 18, 2014 Abstract In this paper, we propose a Bayesian approach to inference

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

arxiv: v1 [math.st] 13 Feb 2012

arxiv: v1 [math.st] 13 Feb 2012 Sparse Matrix Inversion with Scaled Lasso Tingni Sun and Cun-Hui Zhang Rutgers University arxiv:1202.2723v1 [math.st] 13 Feb 2012 Address: Department of Statistics and Biostatistics, Hill Center, Busch

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Gaussian graphical model determination based on birth-death MCMC inference

Gaussian graphical model determination based on birth-death MCMC inference Gaussian graphical model determination based on birth-death MCMC inference Abdolreza Mohammadi Johann Bernoulli Institute, University of Groningen, Netherlands a.mohammadi@rug.nl Ernst C. Wit Johann Bernoulli

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression

The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression Cun-hui Zhang and Jian Huang Presenter: Quefeng Li Feb. 26, 2010 un-hui Zhang and Jian Huang Presenter: Quefeng The Sparsity

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel

More information

Learning Gaussian Graphical Models with Unknown Group Sparsity

Learning Gaussian Graphical Models with Unknown Group Sparsity Learning Gaussian Graphical Models with Unknown Group Sparsity Kevin Murphy Ben Marlin Depts. of Statistics & Computer Science Univ. British Columbia Canada Connections Graphical models Density estimation

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Structure estimation for Gaussian graphical models

Structure estimation for Gaussian graphical models Faculty of Science Structure estimation for Gaussian graphical models Steffen Lauritzen, University of Copenhagen Department of Mathematical Sciences Minikurs TUM 2016 Lecture 3 Slide 1/48 Overview of

More information

Estimation of Graphical Models with Shape Restriction

Estimation of Graphical Models with Shape Restriction Estimation of Graphical Models with Shape Restriction BY KHAI X. CHIONG USC Dornsife INE, Department of Economics, University of Southern California, Los Angeles, California 989, U.S.A. kchiong@usc.edu

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne

More information

An Introduction to Graphical Lasso

An Introduction to Graphical Lasso An Introduction to Graphical Lasso Bo Chang Graphical Models Reading Group May 15, 2015 Bo Chang (UBC) Graphical Lasso May 15, 2015 1 / 16 Undirected Graphical Models An undirected graph, each vertex represents

More information

Sparse Graph Learning via Markov Random Fields

Sparse Graph Learning via Markov Random Fields Sparse Graph Learning via Markov Random Fields Xin Sui, Shao Tang Sep 23, 2016 Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, 2016 1 / 36 Outline 1 Introduction to graph learning

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Marginal Specifications and a Gaussian Copula Estimation

Marginal Specifications and a Gaussian Copula Estimation Marginal Specifications and a Gaussian Copula Estimation Kazim Azam Abstract Multivariate analysis involving random variables of different type like count, continuous or mixture of both is frequently required

More information

Scalable MCMC for the horseshoe prior

Scalable MCMC for the horseshoe prior Scalable MCMC for the horseshoe prior Anirban Bhattacharya Department of Statistics, Texas A&M University Joint work with James Johndrow and Paolo Orenstein September 7, 2018 Cornell Day of Statistics

More information

Mixed and Covariate Dependent Graphical Models

Mixed and Covariate Dependent Graphical Models Mixed and Covariate Dependent Graphical Models by Jie Cheng A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Statistics) in The University of

More information

Joint Gaussian Graphical Model Review Series I

Joint Gaussian Graphical Model Review Series I Joint Gaussian Graphical Model Review Series I Probability Foundations Beilun Wang Advisor: Yanjun Qi 1 Department of Computer Science, University of Virginia http://jointggm.org/ June 23rd, 2017 Beilun

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010 Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,

More information

High-dimensional statistics: Some progress and challenges ahead

High-dimensional statistics: Some progress and challenges ahead High-dimensional statistics: Some progress and challenges ahead Martin Wainwright UC Berkeley Departments of Statistics, and EECS University College, London Master Class: Lecture Joint work with: Alekh

More information

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results David Prince Biostat 572 dprince3@uw.edu April 19, 2012 David Prince (UW) SPICE April 19, 2012 1 / 11 Electronic

More information

VCMC: Variational Consensus Monte Carlo

VCMC: Variational Consensus Monte Carlo VCMC: Variational Consensus Monte Carlo Maxim Rabinovich, Elaine Angelino, Michael I. Jordan Berkeley Vision and Learning Center September 22, 2015 probabilistic models! sky fog bridge water grass object

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Least squares under convex constraint

Least squares under convex constraint Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption

More information

D I S C U S S I O N P A P E R

D I S C U S S I O N P A P E R I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S ( I S B A ) UNIVERSITÉ CATHOLIQUE DE LOUVAIN D I S C U S S I O N P A P E R 2014/06 Adaptive

More information

New ways of dimension reduction? Cutting data sets into small pieces

New ways of dimension reduction? Cutting data sets into small pieces New ways of dimension reduction? Cutting data sets into small pieces Roman Vershynin University of Michigan, Department of Mathematics Statistical Machine Learning Ann Arbor, June 5, 2012 Joint work with

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Approximate Likelihoods

Approximate Likelihoods Approximate Likelihoods Nancy Reid July 28, 2015 Why likelihood? makes probability modelling central l(θ; y) = log f (y; θ) emphasizes the inverse problem of reasoning y θ converts a prior probability

More information

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong

More information

MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models

MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models 1/13 MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models Dominique Guillot Departments of Mathematical Sciences University of Delaware May 4, 2016 Recall

More information

(Part 1) High-dimensional statistics May / 41

(Part 1) High-dimensional statistics May / 41 Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Integrated Non-Factorized Variational Inference

Integrated Non-Factorized Variational Inference Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014

More information

The lasso, persistence, and cross-validation

The lasso, persistence, and cross-validation The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University

More information

Regularization in Cox Frailty Models

Regularization in Cox Frailty Models Regularization in Cox Frailty Models Andreas Groll 1, Trevor Hastie 2, Gerhard Tutz 3 1 Ludwig-Maximilians-Universität Munich, Department of Mathematics, Theresienstraße 39, 80333 Munich, Germany 2 University

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 5, 06 Reading: See class website Eric Xing @ CMU, 005-06

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

Multivariate Bernoulli Distribution 1

Multivariate Bernoulli Distribution 1 DEPARTMENT OF STATISTICS University of Wisconsin 1300 University Ave. Madison, WI 53706 TECHNICAL REPORT NO. 1170 June 6, 2012 Multivariate Bernoulli Distribution 1 Bin Dai 2 Department of Statistics University

More information

Learning Markov Network Structure using Brownian Distance Covariance

Learning Markov Network Structure using Brownian Distance Covariance arxiv:.v [stat.ml] Jun 0 Learning Markov Network Structure using Brownian Distance Covariance Ehsan Khoshgnauz May, 0 Abstract In this paper, we present a simple non-parametric method for learning the

More information

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable

More information

Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models

Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models arxiv:1006.3316v1 [stat.ml] 16 Jun 2010 Contents Han Liu, Kathryn Roeder and Larry Wasserman Carnegie Mellon

More information

Bayesian structure learning in graphical models

Bayesian structure learning in graphical models Bayesian structure learning in graphical models Sayantan Banerjee 1, Subhashis Ghosal 2 1 The University of Texas MD Anderson Cancer Center 2 North Carolina State University Abstract We consider the problem

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Variational Learning : From exponential families to multilinear systems

Variational Learning : From exponential families to multilinear systems Variational Learning : From exponential families to multilinear systems Ananth Ranganathan th February 005 Abstract This note aims to give a general overview of variational inference on graphical models.

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008 Regularized Estimation of High Dimensional Covariance Matrices Peter Bickel Cambridge January, 2008 With Thanks to E. Levina (Joint collaboration, slides) I. M. Johnstone (Slides) Choongsoon Bae (Slides)

More information

dans les modèles à vraisemblance non explicite par des algorithmes gradient-proximaux perturbés

dans les modèles à vraisemblance non explicite par des algorithmes gradient-proximaux perturbés Inférence pénalisée dans les modèles à vraisemblance non explicite par des algorithmes gradient-proximaux perturbés Gersende Fort Institut de Mathématiques de Toulouse, CNRS and Univ. Paul Sabatier Toulouse,

More information