Sparse Graph Learning via Markov Random Fields
|
|
- Brandon Garrison
- 6 years ago
- Views:
Transcription
1 Sparse Graph Learning via Markov Random Fields Xin Sui, Shao Tang Sep 23, 2016 Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36
2 Outline 1 Introduction to graph learning 2 Markov Random Fields 3 Gaussian MRF 4 Extensions to the nonparanormal family 5 Ising models 6 Challenges in Poission 7 Mixed models Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36
3 Introduction Let X R n p be n samples of p random variables X = {X 1, X 2,, X p } with a probability measure P(X ). The goal is to reconstruct underlying dependencies among those variables. E.g., social network analysis, reverse-engineering of genes, discovering functional brain connectivity patterns By capturing those dependencies in a graph, probabilistic graphical model is a convenient visualization and inference tool. See Chapter 9 of Hastie et al. (2015) and Chapter 8 of Rish and Grabarnik (2014). Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36
4 Graph Formally, a graph G is a pair G = (V, E). V : a finite set of vertices. E V V : a set of edges connecting pairs of vertices (i, j ). In our settings, V = {1, 2,, p} corresponds to the set of variables, while the set of edges E is to capture their dependencies. Undirected graphical models (for any i V, j V, (i, j ) E (j, i) E) are easier to handle, although they can not capture causal dependencies. We focus on learning Markov Random Fields that captures conditional dependencies in graphs. Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36
5 Markov Random Fields A Markov Random Field (MRF) is a undirected graphical model (X, P(X ), G), where the undirected graph G satisfies the global Markov property: For all sets S V that separates the graph into disconnected components A, B V, X A X B X S. Here, for example, X A denotes the subset of random variables that is associated with the subset of vertices A. A, B are separated by S iff for any s A, t B, (s, t) E, where A, B, S is a partition of V. E.g., Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36
6 Local Markov property Local Markov property reveals pairwise independencies (given other variables) in any MRF and is thus important in our development. In a MRF (X, P(X ), G), where G = (V, E), for any s V, t V, s t, (s, t) E X s X t X /{s,t}. Here, X /{s,t} is the set of all random variables except X s and X t. Proof. Let A = {s}, B = {t}, S = V /{s, t}. Then when (s, t) E, S separates A and B, and by the global Markov property, X s X t X /{s,t}. Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36
7 Examples of MRFs Two dimensional grids, useful in computer vision with each vertex denoting a pixel. pairwise MRF model p P θ (X ) = exp θs X s + s=1 p h θ (X s ) + s=1 p θstx s X t A(θ ), s=1 t:t s for some base function h θ (X s ). Here, A(θ ) is the log of partition function. This includes Gaussian MRF, Ising models, etc..those specific applications are our focus. institution-logo-filena Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36
8 Hammersley-Clifford Theorem Hammersley-Clifford Theorem: for any strictly positive distribution, global Markov property holds iff factorization property holds. Given a graph G, it states the sufficient and necessary condition on the joint distribution P(X ) such that (X, P(X ), G) is an MRF. Factorization property: P(X ) factorizes over G if it has the decomposition P(X 1,..., X p ) = 1 Z ψ C (X C ). C C Z : normalization factor (partition function). C: the set of all maximal cliques in the graph. ψ C : positive real-valued function (compatibility function). A clique C V is a fully-connected subset of the vertex set. A clique is said to be maximal if it is not strictly contained in any other cliques. Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36
9 Example In the graph G below, sets A, B, C, D are the maximal cliques. The joint distribution P must have the form P(X ) = 1 Z ψ 123(X 1, X 2, X 3 )ψ 345 (X 3, X 4, X 5 )ψ 46 (X 4, X 6 )ψ 57 (X 5, X 7 ) such that (X, P(X ), G) is an MRF. Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36
10 Minimal I-map Since the factorization of a joint distribution is not unique, if G = (V, E) is a map such that the triplet (X, P(X ), G) is an MRF, then for any G = (V, Ẽ), where Ẽ E, (X, P(X ), G) is also a MRF. Define the minimal I-map G = (V, E ) of a joint distribution P(X ) to be the most sparse graph such that (X, P(X ), G ) is an MRF. It is our sole interest to estimate the minimal I-map, because it is the most informative among all possible graphs. Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36
11 Pairwise MRF models revisited The pairwise MRF model is p P θ (X ) = exp θs X s + s=1 p h θ (X s ) + s=1 p θstx s X t A(θ ). s=1 t:t s In the pairwise MRF (X, P θ (X ), G ), when s t: From the model definition: θ st 0 X s X t X /{s,t}. From pairwise Markov property: (s, t) E X s X t X /{s,t}. From the definition of minimal I-map: X s X t X /{s,t} (s, t) E. θ st = 0 (s, t) E. It suffices to estimate θ st. Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36
12 Example of Gaussian MRF on stock data We first show an example of Gaussian MRF. This dataset contains the close prices of 452 stocks for 1258 trading days. We use the first 20 stocks to plot their dependencies. A cluster of stocks have dependencies within it. Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36
13 Gaussian MRFs When X N (0, Σ ) (assume µ = 0 for simplicity), the probability density function (PDF) f (X ) takes the form { } f Θ (X ) = exp 1 p Θ 2 stx s X t A(Θ ). s,t=1 Θ = (Σ ) 1 : precision matrix; A(Θ ) = 1 2 log det[θ /(2π)]: log of partition function. f Θ (X ) is exactly the pairwise model P θ (X ) with θ s = 0, h θ (X s ) = Θ ssx 2 s /2, and θ st = Θ st/2. (s, t) E Θ st = 0. It suffices to estimate Θ! In estimation, sparsity-inducing penalties on the precision matrix are preferred to enforce sparsity on the graph. The l 1 -penalized optimization problem is named graphical lasso. Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36
14 Optimization criterion for graphical lasso Optimization criterion (penalized maximum likelihood) min f (Θ) := log det Θ + tr(s T Θ) + λ Θ 1 Θ S p ++ S p ++ : set of positive definite symmetric matrices of dimensionality p p. S: sample covariance matrix with mean given. λ: tuning parameter for graphical lasso. Strictly convex (strictly convex objective function and convex domain). The dual problem is max log det W + p. W S ρ It is concave and smooth. The solution gives an estimate of the covariance matrix. Estimation methods: neighborhood selection, glasso, projected gradient, greedy coordinate ascent, only to name a few. institution-logo-filena Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36
15 Neighborhood selection in Gaussian graph learning Neighborhood selection is an approximation method in optimization. See Meinshausen and Bühlmann (2006) for reference. For every random variable X s, regress it on all the other random variables X /s with the model X s t:t s β stx t. Under the assumption µ = 0, X s X /s N (Y s, 1/Θ ss), where Y s = t:t s (Θ st/θ ss)x t. Let ˆΘ st = 0 iff ˆβ st = 0. Simple & scalable. Also, the estimation of sign(θ ) is consistent, where sign( ) operates component-wisely. The estimation of Θ is not necessarily consistent, and it may violate the symmetry and positive-definite constraints. Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36
16 Algorithm 1 Neighborhood-based graph selection for graphical lasso 1: for each node vertex s = 1, 2,, p do 2: Solve the neighborhood prediction problem 1 ˆβ s = arg min β 2n X,s X,/s β λ β 1 3: Compute the estimate ˆN (s) = J (ˆβ s ) of the neighborhood set N (s), 4: end for 5: Combine the neighborhood estimates { ˆN (s), s V } via the AND or OR rule to form a graph estimate Ĝ = (V, Ê). X,s is the s th column of X, while X,/s is X without this column. J (ˆβ s ) is the support of ˆβ s. Neighborhood set N (s) := {t V : (s, t) E } AND/OR rule: (s, t) Ê iff ˆβ st = 0 AND/OR ˆβ ts = 0. Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36
17 Projected gradient approach Recall the dual problem max log det W + p. W S ρ In the dual problem, iterate between (1) and (2): 1 a gradient update W W + αw 1, where α > 0; 2 a projection W arg min Z { Z W 2 : Z S ρ}. Same order of time complexity as glasso O(p 3 ), but empirically outperforms it by a factor of 2. Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36
18 Greedy coordinate ascent on the primal problem Recall the prime problem min f (Θ) := log det Θ + tr(s T Θ) + λ Θ 1. Θ S p ++ At each iteration t, the update is Θ (t+1) Θ (t) + ˆθ (t) (e r e T s + e s e T r ), where (ˆθ (t), r, s) = arg min θ,i,j f (Θ (t) + θ(e i e T j + e j e T i )). (e j R p with only j th entry non-zero, which takes the value 1). Advantages 1 Time complexity: O(p 2 ), better scaling with respect to p than glasso. 2 Naturally preserves the sparsity of the solution. 3 Fixing a small λ, It can directly obtain a greedy solution path. 4 Massive parallelization is straightforward. Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36
19 Nonparanormal family Gaussian graph learning techniques can be naturally extended to the case where the joint distribution is nonparanormal. Definition. A random vector X = (X 1, X 2,, X p ) T has a nonparanormal distribution if there exist functions (f 1, f 2,, f p ) such that (f 1 (X 1 ), f 2 (X 2 ),, f p (X p )) N (0, Σ o) with diag(σ o) = 1. Property: f j = Φ 1 F j for all j. Here, Φ( ) is the CDF of standard normal distribution, and F j is the marginal CDF of X j. Limitation: not working for discrete random variables. Connection. If each f j is differentiable, then X i X j X /{i,j } (Θ o) ij = 0 (Liu et al., 2009). It suffice to estimate Σ o, or Θ o, as in the Gaussian case. However, the estimation of sample covariance matrix S o is not easy. We first need to estimate f j s. This leads to the copula transform method. Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36
20 Estimation of S Copula transform method. Estimate F j by its (Winsorized) empirical CDF, then use ˆf j = Φ 1 ˆF j for all j to compute S. Alternatively, Xue et al. (2012b); Liu et al. (2012) proposed the rank-based method by noticing each f j (x) is monotonely increasing in x. 1 For each X ij, compute its rank statistic ˆr ij based on its rank in the variable X j. 2 Compute ˆρ, the estimated correlation matrix between rank statistics. 3 Approximate S by 2 sin(πˆρ/6), where sin( ) is a component-wise function. In rank-based methods, there is no need to tune the winsorization parameter. This one also sacrifices estimation efficiency for robustness. Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36
21 Ising model Ising model is a pairwise MRF with each random variable X s takes discrete values { 1, +1} for all s V : P θ (X ) = exp θs X s + θ stx s X t A(θ ). s V (s,t) E Originated from ferromagnetism studies in physics. Related to Restricted Boltzmann Machine (RBM) in machine learning literature. In fact, if we partition V into V 1 and V 2, then when θ st = 0 for all s V k, t V k (k = 1, 2), and X j s are unobserved for all j V 2, Ising model degenerates to RBM model. Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36
22 Example: image de-noising (Bishop, 2006) Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36
23 Example: politician networks Example from Hastie et al. (2015). Politician networks estimated from voting records of U.S. Senate ( ), with Democratic/Republican/Independent senators coded as blue/red/yellow nodes, respectively. (b) is a smaller subgraph of the same social network. The subgraph shows a strong bipartite tendency with clustering within party lines. A few senators show cross-party connections. Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36
24 Limitation In this slide only, let P(X ) be the true joint distribution of X instead of the distribution in the pairwise MRF model. Recall if X is multivariate Gaussian, then P(X ) can be exactly represented into a pairwise MRF distribution. However, when X s are Bernoulli random variables, P(X ) may not fit into the pairwise MRF framework! E.g., when for some normalizing constant Z. P(X 1, X 2, X 3 ) = 1 Z ex1x2x3 The Ising model may be extended by adding higher order interaction terms. Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36
25 Neighborhood selection in Ising models A(θ ) is the summation over 2 p terms, and is computationally intractable! A(θ ) = log exp θs x s + θ stx s x t x { 1,+1} p s V (s,t) E Again, neighborhood selection can be used to approximate a solution, see Ravikumar et al. (2010) for reference. It can be shown that P(X s X /s ) = Ys e2xs, 2Xs Ys 1 + e where Y s = θ s + t:t s θ stx t. In logistic regression, if X s is the response, and X /s are the predictors, the conditional probability has the same form. In the algorithm, We only need to change the squared error loss in the Gaussian case to the logistic loss. Again, only sign consistency results are established in the literature (Xue et al., 2012a). institution-logo-filena Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36
26 Algorithm 2 Neighborhood-based graph selection for Ising models 1: for each node vertex s = 1, 2,, p do 2: Solve the neighborhood prediction problem (ˆθ s, ˆθ s,/s ) = arg min θ s,θ s,/s 1 n n log P(X is X i,/s, θ s, θ s,/s ) + λ β 1 i=1 3: Compute the estimate ˆN (s) = J (ˆθ s,/s ) of the neighborhood set N (s). 4: end for 5: Combine the neighborhood estimates { ˆN (s), s V } via the AND or OR rule to form a graph estimate Ĝ = (V, Ê). Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36
27 Pseudo likelihood approach Another approach is to use the pseudo likelihood P(X 1, X 2,, X p ) p i=1 P(X i X /i )(:= L(θ)). The approximation is exact when all X i s are independent. In general, the sparser the graph, the better the approximation. Optimization criterion. min θ log L(θ) + λ θ 1. Optimization methods. Höfling and Tibshirani (2009): At each iteration t, approximate log L by its second-order Taylor expansion at θ (t) with only diagonal elements in the Hessian (denote this by Q(θ; θ (t) )). Then, θ (t+1) arg min θ Q(θ; θ (t) ) + λ θ 1. Xue et al. (2012a) used coordinate descent to iteratively update each parameter. At each iteration, a quadratic function is used to majorize log L. It differs from Höfling and Tibshirani (2009) s in that: (a) it updates only one parameter at each iteration; (b) it bounds 2 log L/ θ 2 st instead of calculating its value for every parameter θ st. Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36
28 Comparison between NS and PL Both of them are approximation methods, the intractability of A(θ) is avoided by taking advantage of conditional likelihoods. Generally, ˆβ ij ˆβ ji in neighborhood selection method. Theoretically, ˆβ NS (the estimate of β in neighborhood selection) is sign consistent under some conditions (Ravikumar et al., 2010). Under similar conditions, ˆβ PL (the estimate of β in pseudo likelihood) is not only sign consistent, but also, its estimation error is O(λ E ) (Xue et al., 2012a), where E is the number of edges in the real model. Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36
29 Comparison in Experiments Computational time. Both methods are fast in implementation. Neighborhood selection is faster in large datasets, solely because each parameter is updated only once, while it has to be updated until a convergence criterion is met in pseudo likelihood approaches. Accuracy. When estimating the edges, there is hardly any difference. But pseudo likelihood approaches are slightly better in estimating the joint distribution. We show some results in Höfling and Tibshirani (2009). Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36
30 Computational time P: number of variables. N : number of observations. Neigh: average number of neighbors per node. This may be misleading! Neighborhood selection is faster in larger datasets! In my own experiment, PL took s while NS took s on a dataset of size (n, p) = (1000, 50). Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36
31 Edge recovery Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36
32 Estimation of the joint distribution Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36
33 Challenges in Poisson Karlis (2003) studies the case where X s = Y 0 + Y s. Here, Y j, j = 0, 1,, p are independent Poisson random variables. It can only model positive correlations. The pairwise Poisson graphical model is P exp (θs X s log X s!) + θ stx s X t A(θ ). s V (s,t) E A(θ ) < + only if θ st 0 for (s, t) E. It can only model negative correlations. This issue may be solved by truncating the Poisson distribution or modifying the loss function (see, e.g., Yang et al. (2013)). However, these techniques are ad-hoc and have limited applicability. Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36
34 Mixed models We would like to model independencies among variables that come from different distributions. Let us use another notation system in this slide. Let X = (X 1, X 2,, X p ) be p continuous random variables, and Y = (Y 1, Y 2,, Y q ) be q discrete ones, each has L j possible states. In the pairwise model, P(X, Y ) is proportional to { p } exp γs X s 1 p p p q q q θ 2 stx s X t + ρ sj [Y j ]X s + ψjr[y j, Y r ] s=1 s=1 t=1 s=1 j =1 i=1 j =1 Each ρ sj is a vector of L j parameters, and each ψ ij is a matrix with L i L j elements. P(X s X /s, Y ) is Gaussian, while P(Y j X, Y /j ) is multinomial. Neighborhood selection still applies. Gaussian approximation of continuous random variables may not be appropriate. Also, it cannot model Poisson data. institution-logo-filena Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36
35 Bibiography I Christopher M Bishop. Pattern recognition. Machine Learning, 128, Trevor Hastie, Robert Tibshirani, and Martin Wainwright. Statistical learning with sparsity: the lasso and generalizations. CRC Press, Holger Höfling and Robert Tibshirani. Estimation of sparse binary pairwise markov networks using pseudo-likelihoods. Journal of Machine Learning Research, 10 (Apr): , Dimitris Karlis. An em algorithm for multivariate poisson distribution and related models. Journal of Applied Statistics, 30(1):63 77, Han Liu, John Lafferty, and Larry Wasserman. The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. Journal of Machine Learning Research, 10(Oct): , Han Liu, Fang Han, Ming Yuan, John Lafferty, and Larry Wasserman. High-dimensional semiparametric gaussian copula graphical models. The Annals of Statistics, pages , Nicolai Meinshausen and Peter Bühlmann. High-dimensional graphs and variable selection with the lasso. The annals of statistics, pages , institution-logo-filena Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36
36 Bibiography II Pradeep Ravikumar, Martin J Wainwright, John D Lafferty, et al. High-dimensional ising model selection using?1-regularized logistic regression. The Annals of Statistics, 38(3): , Irina Rish and Genady Grabarnik. Sparse modeling: theory, algorithms, and applications. CRC Press, Lingzhou Xue, Hui Zou, and Tianxi Cai. Nonconcave penalized composite conditional likelihood estimation of sparse ising models. The Annals of Statistics, pages , 2012a. Lingzhou Xue, Hui Zou, et al. Regularized rank-based estimation of high-dimensional nonparanormal graphical models. The Annals of Statistics, 40 (5): , 2012b. Eunho Yang, Pradeep K Ravikumar, Genevera I Allen, and Zhandong Liu. On poisson graphical models. In Advances in Neural Information Processing Systems, pages , Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36
Learning discrete graphical models via generalized inverse covariance matrices
Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,
More informationHigh dimensional Ising model selection
High dimensional Ising model selection Pradeep Ravikumar UT Austin (based on work with John Lafferty, Martin Wainwright) Sparse Ising model US Senate 109th Congress Banerjee et al, 2008 Estimate a sparse
More informationGraphical Model Selection
May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor
More informationHigh-dimensional graphical model selection: Practical and information-theoretic limits
1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John
More informationIntroduction to graphical models: Lecture III
Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction
More informationHigh dimensional ising model selection using l 1 -regularized logistic regression
High dimensional ising model selection using l 1 -regularized logistic regression 1 Department of Statistics Pennsylvania State University 597 Presentation 2016 1/29 Outline Introduction 1 Introduction
More informationChapter 17: Undirected Graphical Models
Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)
More informationHigh-dimensional graphical model selection: Practical and information-theoretic limits
1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04
More informationGaussian Graphical Models and Graphical Lasso
ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf
More informationApproximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina.
Using Quadratic Approximation Inderjit S. Dhillon Dept of Computer Science UT Austin SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina Sept 12, 2012 Joint work with C. Hsieh, M. Sustik and
More informationCausal Inference: Discussion
Causal Inference: Discussion Mladen Kolar The University of Chicago Booth School of Business Sept 23, 2016 Types of machine learning problems Based on the information available: Supervised learning Reinforcement
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationSparse inverse covariance estimation with the lasso
Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty
More information3 : Representation of Undirected GM
10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:
More informationBAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage
BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 5, 06 Reading: See class website Eric Xing @ CMU, 005-06
More informationCSC 412 (Lecture 4): Undirected Graphical Models
CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:
More informationJoint Gaussian Graphical Model Review Series I
Joint Gaussian Graphical Model Review Series I Probability Foundations Beilun Wang Advisor: Yanjun Qi 1 Department of Computer Science, University of Virginia http://jointggm.org/ June 23rd, 2017 Beilun
More informationRobust and sparse Gaussian graphical modelling under cell-wise contamination
Robust and sparse Gaussian graphical modelling under cell-wise contamination Shota Katayama 1, Hironori Fujisawa 2 and Mathias Drton 3 1 Tokyo Institute of Technology, Japan 2 The Institute of Statistical
More informationProperties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation
Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana
More informationThe Nonparanormal skeptic
The Nonpara skeptic Han Liu Johns Hopkins University, 615 N. Wolfe Street, Baltimore, MD 21205 USA Fang Han Johns Hopkins University, 615 N. Wolfe Street, Baltimore, MD 21205 USA Ming Yuan Georgia Institute
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationUndirected Graphical Models: Markov Random Fields
Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected
More information14 : Theory of Variational Inference: Inner and Outer Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction
More informationUndirected Graphical Models
17 Undirected Graphical Models 17.1 Introduction A graph consists of a set of vertices (nodes), along with a set of edges joining some pairs of the vertices. In graphical models, each vertex represents
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationLogistic Regression: Online, Lazy, Kernelized, Sequential, etc.
Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010
More information11 : Gaussian Graphic Models and Ising Models
10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood
More information10708 Graphical Models: Homework 2
10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3
More informationChapter 16. Structured Probabilistic Models for Deep Learning
Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe
More informationStatistical Machine Learning for Structured and High Dimensional Data
Statistical Machine Learning for Structured and High Dimensional Data (FA9550-09- 1-0373) PI: Larry Wasserman (CMU) Co- PI: John Lafferty (UChicago and CMU) AFOSR Program Review (Jan 28-31, 2013, Washington,
More informationBig & Quic: Sparse Inverse Covariance Estimation for a Million Variables
for a Million Variables Cho-Jui Hsieh The University of Texas at Austin NIPS Lake Tahoe, Nevada Dec 8, 2013 Joint work with M. Sustik, I. Dhillon, P. Ravikumar and R. Poldrack FMRI Brain Analysis Goal:
More informationAn efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss
An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao
More informationProbabilistic Graphical Models
2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector
More informationProbabilistic Graphical Models Lecture Notes Fall 2009
Probabilistic Graphical Models Lecture Notes Fall 2009 October 28, 2009 Byoung-Tak Zhang School of omputer Science and Engineering & ognitive Science, Brain Science, and Bioinformatics Seoul National University
More informationProbabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013
School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Junming Yin Lecture 15, March 4, 2013 Reading: W & J Book Chapters 1 Roadmap Two
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationLecture 16 Deep Neural Generative Models
Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed
More informationStructure Learning of Mixed Graphical Models
Jason D. Lee Institute of Computational and Mathematical Engineering Stanford University Trevor J. Hastie Department of Statistics Stanford University Abstract We consider the problem of learning the structure
More informationMultivariate Normal Models
Case Study 3: fmri Prediction Graphical LASSO Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox February 26 th, 2013 Emily Fox 2013 1 Multivariate Normal Models
More informationUndirected Graphical Models
Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates
More informationA New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of
More informationMultivariate Normal Models
Case Study 3: fmri Prediction Coping with Large Covariances: Latent Factor Models, Graphical Models, Graphical LASSO Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationHigh-dimensional covariance estimation based on Gaussian graphical models
High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,
More informationAn Introduction to Graphical Lasso
An Introduction to Graphical Lasso Bo Chang Graphical Models Reading Group May 15, 2015 Bo Chang (UBC) Graphical Lasso May 15, 2015 1 / 16 Undirected Graphical Models An undirected graph, each vertex represents
More informationMarkov Random Fields
Markov Random Fields Umamahesh Srinivas ipal Group Meeting February 25, 2011 Outline 1 Basic graph-theoretic concepts 2 Markov chain 3 Markov random field (MRF) 4 Gauss-Markov random field (GMRF), and
More informationExample: multivariate Gaussian Distribution
School of omputer Science Probabilistic Graphical Models Representation of undirected GM (continued) Eric Xing Lecture 3, September 16, 2009 Reading: KF-chap4 Eric Xing @ MU, 2005-2009 1 Example: multivariate
More informationLecture 25: November 27
10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More informationUndirected graphical models
Undirected graphical models Semantics of probabilistic models over undirected graphs Parameters of undirected models Example applications COMP-652 and ECSE-608, February 16, 2017 1 Undirected graphical
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More informationDoes Better Inference mean Better Learning?
Does Better Inference mean Better Learning? Andrew E. Gelfand, Rina Dechter & Alexander Ihler Department of Computer Science University of California, Irvine {agelfand,dechter,ihler}@ics.uci.edu Abstract
More informationThe lasso: some novel algorithms and applications
1 The lasso: some novel algorithms and applications Newton Institute, June 25, 2008 Robert Tibshirani Stanford University Collaborations with Trevor Hastie, Jerome Friedman, Holger Hoefling, Gen Nowak,
More informationGreedy Layer-Wise Training of Deep Networks
Greedy Layer-Wise Training of Deep Networks Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle NIPS 2007 Presented by Ahmed Hefny Story so far Deep neural nets are more expressive: Can learn
More information1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)
Machine Learning (ML, F16) Lecture#07 (Thursday Nov. 3rd) Lecturer: Byron Boots Undirected Graphical Models 1 Undirected Graphical Models In the previous lecture, we discussed directed graphical models.
More informationCoordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /
Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods
More informationDirected and Undirected Graphical Models
Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed
More informationRapid Introduction to Machine Learning/ Deep Learning
Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/24 Lecture 5b Markov random field (MRF) November 13, 2015 2/24 Table of contents 1 1. Objectives of Lecture
More informationPart I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS
Part I C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Probabilistic Graphical Models Graphical representation of a probabilistic model Each variable corresponds to a
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationProximity-Based Anomaly Detection using Sparse Structure Learning
Proximity-Based Anomaly Detection using Sparse Structure Learning Tsuyoshi Idé (IBM Tokyo Research Lab) Aurelie C. Lozano, Naoki Abe, and Yan Liu (IBM T. J. Watson Research Center) 2009/04/ SDM 2009 /
More informationLearning MN Parameters with Alternative Objective Functions. Sargur Srihari
Learning MN Parameters with Alternative Objective Functions Sargur srihari@cedar.buffalo.edu 1 Topics Max Likelihood & Contrastive Objectives Contrastive Objective Learning Methods Pseudo-likelihood Gradient
More informationStability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models
Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models Han Liu Kathryn Roeder Larry Wasserman Carnegie Mellon University Pittsburgh, PA 15213 Abstract A challenging
More informationLearning the Network Structure of Heterogenerous Data via Pairwise Exponential MRF
Learning the Network Structure of Heterogenerous Data via Pairwise Exponential MRF Jong Ho Kim, Youngsuk Park December 17, 2016 1 Introduction Markov random fields (MRFs) are a fundamental fool on data
More informationThe Origin of Deep Learning. Lili Mou Jan, 2015
The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets
More informationParametrizations of Discrete Graphical Models
Parametrizations of Discrete Graphical Models Robin J. Evans www.stat.washington.edu/ rje42 10th August 2011 1/34 Outline 1 Introduction Graphical Models Acyclic Directed Mixed Graphs Two Problems 2 Ingenuous
More informationDiscriminative Fields for Modeling Spatial Dependencies in Natural Images
Discriminative Fields for Modeling Spatial Dependencies in Natural Images Sanjiv Kumar and Martial Hebert The Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 {skumar,hebert}@ri.cmu.edu
More informationLearning Markov Network Structure using Brownian Distance Covariance
arxiv:.v [stat.ml] Jun 0 Learning Markov Network Structure using Brownian Distance Covariance Ehsan Khoshgnauz May, 0 Abstract In this paper, we present a simple non-parametric method for learning the
More informationLearning the Structure of Mixed Graphical Models
Learning the Structure of Mixed Graphical Models Jason D. Lee and Trevor J. Hastie February 17, 2014 Abstract We consider the problem of learning the structure of a pairwise graphical model over continuous
More informationGenetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig
Genetic Networks Korbinian Strimmer IMISE, Universität Leipzig Seminar: Statistical Analysis of RNA-Seq Data 19 June 2012 Korbinian Strimmer, RNA-Seq Networks, 19/6/2012 1 Paper G. I. Allen and Z. Liu.
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by
More informationA Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression
A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent
More informationMultivariate Bernoulli Distribution 1
DEPARTMENT OF STATISTICS University of Wisconsin 1300 University Ave. Madison, WI 53706 TECHNICAL REPORT NO. 1170 June 6, 2012 Multivariate Bernoulli Distribution 1 Bin Dai 2 Department of Statistics University
More informationDiscrete Markov Random Fields the Inference story. Pradeep Ravikumar
Discrete Markov Random Fields the Inference story Pradeep Ravikumar Graphical Models, The History How to model stochastic processes of the world? I want to model the world, and I like graphs... 2 Mid to
More informationMarkov Chains and MCMC
Markov Chains and MCMC Markov chains Let S = {1, 2,..., N} be a finite set consisting of N states. A Markov chain Y 0, Y 1, Y 2,... is a sequence of random variables, with Y t S for all points in time
More informationCS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning
CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning Professor Erik Sudderth Brown University Computer Science October 4, 2016 Some figures and materials courtesy
More informationSparse and Locally Constant Gaussian Graphical Models
Sparse and Locally Constant Gaussian Graphical Models Jean Honorio, Luis Ortiz, Dimitris Samaras Department of Computer Science Stony Brook University Stony Brook, NY 794 {jhonorio,leortiz,samaras}@cs.sunysb.edu
More informationConditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013
Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative Chain CRF General
More informationProbabilistic Graphical Models
Probabilistic Graphical Models David Sontag New York University Lecture 4, February 16, 2012 David Sontag (NYU) Graphical Models Lecture 4, February 16, 2012 1 / 27 Undirected graphical models Reminder
More informationConditional Independence and Factorization
Conditional Independence and Factorization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationSparse Gaussian conditional random fields
Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian
More informationIntroduction to Graphical Models
Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationSparse Inverse Covariance Estimation for a Million Variables
Sparse Inverse Covariance Estimation for a Million Variables Inderjit S. Dhillon Depts of Computer Science & Mathematics The University of Texas at Austin SAMSI LDHD Opening Workshop Raleigh, North Carolina
More informationVariables. Cho-Jui Hsieh The University of Texas at Austin. ICML workshop on Covariance Selection Beijing, China June 26, 2014
for a Million Variables Cho-Jui Hsieh The University of Texas at Austin ICML workshop on Covariance Selection Beijing, China June 26, 2014 Joint work with M. Sustik, I. Dhillon, P. Ravikumar, R. Poldrack,
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationMarkov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials)
Markov Networks l Like Bayes Nets l Graphical model that describes joint probability distribution using tables (AKA potentials) l Nodes are random variables l Labels are outcomes over the variables Markov
More informationStructure estimation for Gaussian graphical models
Faculty of Science Structure estimation for Gaussian graphical models Steffen Lauritzen, University of Copenhagen Department of Mathematical Sciences Minikurs TUM 2016 Lecture 3 Slide 1/48 Overview of
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More informationCS Lecture 4. Markov Random Fields
CS 6347 Lecture 4 Markov Random Fields Recap Announcements First homework is available on elearning Reminder: Office hours Tuesday from 10am-11am Last Time Bayesian networks Today Markov random fields
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More informationRobust Inverse Covariance Estimation under Noisy Measurements
.. Robust Inverse Covariance Estimation under Noisy Measurements Jun-Kun Wang, Shou-De Lin Intel-NTU, National Taiwan University ICML 2014 1 / 30 . Table of contents Introduction.1 Introduction.2 Related
More informationLearning the Structure of Mixed Graphical Models
Learning the Structure of Mixed Graphical Models Jason D. Lee and Trevor J. Hastie August 11, 2013 Abstract We consider the problem of learning the structure of a pairwise graphical model over continuous
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More information