Sparse Graph Learning via Markov Random Fields

Size: px
Start display at page:

Download "Sparse Graph Learning via Markov Random Fields"

Transcription

1 Sparse Graph Learning via Markov Random Fields Xin Sui, Shao Tang Sep 23, 2016 Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36

2 Outline 1 Introduction to graph learning 2 Markov Random Fields 3 Gaussian MRF 4 Extensions to the nonparanormal family 5 Ising models 6 Challenges in Poission 7 Mixed models Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36

3 Introduction Let X R n p be n samples of p random variables X = {X 1, X 2,, X p } with a probability measure P(X ). The goal is to reconstruct underlying dependencies among those variables. E.g., social network analysis, reverse-engineering of genes, discovering functional brain connectivity patterns By capturing those dependencies in a graph, probabilistic graphical model is a convenient visualization and inference tool. See Chapter 9 of Hastie et al. (2015) and Chapter 8 of Rish and Grabarnik (2014). Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36

4 Graph Formally, a graph G is a pair G = (V, E). V : a finite set of vertices. E V V : a set of edges connecting pairs of vertices (i, j ). In our settings, V = {1, 2,, p} corresponds to the set of variables, while the set of edges E is to capture their dependencies. Undirected graphical models (for any i V, j V, (i, j ) E (j, i) E) are easier to handle, although they can not capture causal dependencies. We focus on learning Markov Random Fields that captures conditional dependencies in graphs. Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36

5 Markov Random Fields A Markov Random Field (MRF) is a undirected graphical model (X, P(X ), G), where the undirected graph G satisfies the global Markov property: For all sets S V that separates the graph into disconnected components A, B V, X A X B X S. Here, for example, X A denotes the subset of random variables that is associated with the subset of vertices A. A, B are separated by S iff for any s A, t B, (s, t) E, where A, B, S is a partition of V. E.g., Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36

6 Local Markov property Local Markov property reveals pairwise independencies (given other variables) in any MRF and is thus important in our development. In a MRF (X, P(X ), G), where G = (V, E), for any s V, t V, s t, (s, t) E X s X t X /{s,t}. Here, X /{s,t} is the set of all random variables except X s and X t. Proof. Let A = {s}, B = {t}, S = V /{s, t}. Then when (s, t) E, S separates A and B, and by the global Markov property, X s X t X /{s,t}. Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36

7 Examples of MRFs Two dimensional grids, useful in computer vision with each vertex denoting a pixel. pairwise MRF model p P θ (X ) = exp θs X s + s=1 p h θ (X s ) + s=1 p θstx s X t A(θ ), s=1 t:t s for some base function h θ (X s ). Here, A(θ ) is the log of partition function. This includes Gaussian MRF, Ising models, etc..those specific applications are our focus. institution-logo-filena Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36

8 Hammersley-Clifford Theorem Hammersley-Clifford Theorem: for any strictly positive distribution, global Markov property holds iff factorization property holds. Given a graph G, it states the sufficient and necessary condition on the joint distribution P(X ) such that (X, P(X ), G) is an MRF. Factorization property: P(X ) factorizes over G if it has the decomposition P(X 1,..., X p ) = 1 Z ψ C (X C ). C C Z : normalization factor (partition function). C: the set of all maximal cliques in the graph. ψ C : positive real-valued function (compatibility function). A clique C V is a fully-connected subset of the vertex set. A clique is said to be maximal if it is not strictly contained in any other cliques. Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36

9 Example In the graph G below, sets A, B, C, D are the maximal cliques. The joint distribution P must have the form P(X ) = 1 Z ψ 123(X 1, X 2, X 3 )ψ 345 (X 3, X 4, X 5 )ψ 46 (X 4, X 6 )ψ 57 (X 5, X 7 ) such that (X, P(X ), G) is an MRF. Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36

10 Minimal I-map Since the factorization of a joint distribution is not unique, if G = (V, E) is a map such that the triplet (X, P(X ), G) is an MRF, then for any G = (V, Ẽ), where Ẽ E, (X, P(X ), G) is also a MRF. Define the minimal I-map G = (V, E ) of a joint distribution P(X ) to be the most sparse graph such that (X, P(X ), G ) is an MRF. It is our sole interest to estimate the minimal I-map, because it is the most informative among all possible graphs. Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36

11 Pairwise MRF models revisited The pairwise MRF model is p P θ (X ) = exp θs X s + s=1 p h θ (X s ) + s=1 p θstx s X t A(θ ). s=1 t:t s In the pairwise MRF (X, P θ (X ), G ), when s t: From the model definition: θ st 0 X s X t X /{s,t}. From pairwise Markov property: (s, t) E X s X t X /{s,t}. From the definition of minimal I-map: X s X t X /{s,t} (s, t) E. θ st = 0 (s, t) E. It suffices to estimate θ st. Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36

12 Example of Gaussian MRF on stock data We first show an example of Gaussian MRF. This dataset contains the close prices of 452 stocks for 1258 trading days. We use the first 20 stocks to plot their dependencies. A cluster of stocks have dependencies within it. Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36

13 Gaussian MRFs When X N (0, Σ ) (assume µ = 0 for simplicity), the probability density function (PDF) f (X ) takes the form { } f Θ (X ) = exp 1 p Θ 2 stx s X t A(Θ ). s,t=1 Θ = (Σ ) 1 : precision matrix; A(Θ ) = 1 2 log det[θ /(2π)]: log of partition function. f Θ (X ) is exactly the pairwise model P θ (X ) with θ s = 0, h θ (X s ) = Θ ssx 2 s /2, and θ st = Θ st/2. (s, t) E Θ st = 0. It suffices to estimate Θ! In estimation, sparsity-inducing penalties on the precision matrix are preferred to enforce sparsity on the graph. The l 1 -penalized optimization problem is named graphical lasso. Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36

14 Optimization criterion for graphical lasso Optimization criterion (penalized maximum likelihood) min f (Θ) := log det Θ + tr(s T Θ) + λ Θ 1 Θ S p ++ S p ++ : set of positive definite symmetric matrices of dimensionality p p. S: sample covariance matrix with mean given. λ: tuning parameter for graphical lasso. Strictly convex (strictly convex objective function and convex domain). The dual problem is max log det W + p. W S ρ It is concave and smooth. The solution gives an estimate of the covariance matrix. Estimation methods: neighborhood selection, glasso, projected gradient, greedy coordinate ascent, only to name a few. institution-logo-filena Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36

15 Neighborhood selection in Gaussian graph learning Neighborhood selection is an approximation method in optimization. See Meinshausen and Bühlmann (2006) for reference. For every random variable X s, regress it on all the other random variables X /s with the model X s t:t s β stx t. Under the assumption µ = 0, X s X /s N (Y s, 1/Θ ss), where Y s = t:t s (Θ st/θ ss)x t. Let ˆΘ st = 0 iff ˆβ st = 0. Simple & scalable. Also, the estimation of sign(θ ) is consistent, where sign( ) operates component-wisely. The estimation of Θ is not necessarily consistent, and it may violate the symmetry and positive-definite constraints. Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36

16 Algorithm 1 Neighborhood-based graph selection for graphical lasso 1: for each node vertex s = 1, 2,, p do 2: Solve the neighborhood prediction problem 1 ˆβ s = arg min β 2n X,s X,/s β λ β 1 3: Compute the estimate ˆN (s) = J (ˆβ s ) of the neighborhood set N (s), 4: end for 5: Combine the neighborhood estimates { ˆN (s), s V } via the AND or OR rule to form a graph estimate Ĝ = (V, Ê). X,s is the s th column of X, while X,/s is X without this column. J (ˆβ s ) is the support of ˆβ s. Neighborhood set N (s) := {t V : (s, t) E } AND/OR rule: (s, t) Ê iff ˆβ st = 0 AND/OR ˆβ ts = 0. Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36

17 Projected gradient approach Recall the dual problem max log det W + p. W S ρ In the dual problem, iterate between (1) and (2): 1 a gradient update W W + αw 1, where α > 0; 2 a projection W arg min Z { Z W 2 : Z S ρ}. Same order of time complexity as glasso O(p 3 ), but empirically outperforms it by a factor of 2. Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36

18 Greedy coordinate ascent on the primal problem Recall the prime problem min f (Θ) := log det Θ + tr(s T Θ) + λ Θ 1. Θ S p ++ At each iteration t, the update is Θ (t+1) Θ (t) + ˆθ (t) (e r e T s + e s e T r ), where (ˆθ (t), r, s) = arg min θ,i,j f (Θ (t) + θ(e i e T j + e j e T i )). (e j R p with only j th entry non-zero, which takes the value 1). Advantages 1 Time complexity: O(p 2 ), better scaling with respect to p than glasso. 2 Naturally preserves the sparsity of the solution. 3 Fixing a small λ, It can directly obtain a greedy solution path. 4 Massive parallelization is straightforward. Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36

19 Nonparanormal family Gaussian graph learning techniques can be naturally extended to the case where the joint distribution is nonparanormal. Definition. A random vector X = (X 1, X 2,, X p ) T has a nonparanormal distribution if there exist functions (f 1, f 2,, f p ) such that (f 1 (X 1 ), f 2 (X 2 ),, f p (X p )) N (0, Σ o) with diag(σ o) = 1. Property: f j = Φ 1 F j for all j. Here, Φ( ) is the CDF of standard normal distribution, and F j is the marginal CDF of X j. Limitation: not working for discrete random variables. Connection. If each f j is differentiable, then X i X j X /{i,j } (Θ o) ij = 0 (Liu et al., 2009). It suffice to estimate Σ o, or Θ o, as in the Gaussian case. However, the estimation of sample covariance matrix S o is not easy. We first need to estimate f j s. This leads to the copula transform method. Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36

20 Estimation of S Copula transform method. Estimate F j by its (Winsorized) empirical CDF, then use ˆf j = Φ 1 ˆF j for all j to compute S. Alternatively, Xue et al. (2012b); Liu et al. (2012) proposed the rank-based method by noticing each f j (x) is monotonely increasing in x. 1 For each X ij, compute its rank statistic ˆr ij based on its rank in the variable X j. 2 Compute ˆρ, the estimated correlation matrix between rank statistics. 3 Approximate S by 2 sin(πˆρ/6), where sin( ) is a component-wise function. In rank-based methods, there is no need to tune the winsorization parameter. This one also sacrifices estimation efficiency for robustness. Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36

21 Ising model Ising model is a pairwise MRF with each random variable X s takes discrete values { 1, +1} for all s V : P θ (X ) = exp θs X s + θ stx s X t A(θ ). s V (s,t) E Originated from ferromagnetism studies in physics. Related to Restricted Boltzmann Machine (RBM) in machine learning literature. In fact, if we partition V into V 1 and V 2, then when θ st = 0 for all s V k, t V k (k = 1, 2), and X j s are unobserved for all j V 2, Ising model degenerates to RBM model. Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36

22 Example: image de-noising (Bishop, 2006) Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36

23 Example: politician networks Example from Hastie et al. (2015). Politician networks estimated from voting records of U.S. Senate ( ), with Democratic/Republican/Independent senators coded as blue/red/yellow nodes, respectively. (b) is a smaller subgraph of the same social network. The subgraph shows a strong bipartite tendency with clustering within party lines. A few senators show cross-party connections. Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36

24 Limitation In this slide only, let P(X ) be the true joint distribution of X instead of the distribution in the pairwise MRF model. Recall if X is multivariate Gaussian, then P(X ) can be exactly represented into a pairwise MRF distribution. However, when X s are Bernoulli random variables, P(X ) may not fit into the pairwise MRF framework! E.g., when for some normalizing constant Z. P(X 1, X 2, X 3 ) = 1 Z ex1x2x3 The Ising model may be extended by adding higher order interaction terms. Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36

25 Neighborhood selection in Ising models A(θ ) is the summation over 2 p terms, and is computationally intractable! A(θ ) = log exp θs x s + θ stx s x t x { 1,+1} p s V (s,t) E Again, neighborhood selection can be used to approximate a solution, see Ravikumar et al. (2010) for reference. It can be shown that P(X s X /s ) = Ys e2xs, 2Xs Ys 1 + e where Y s = θ s + t:t s θ stx t. In logistic regression, if X s is the response, and X /s are the predictors, the conditional probability has the same form. In the algorithm, We only need to change the squared error loss in the Gaussian case to the logistic loss. Again, only sign consistency results are established in the literature (Xue et al., 2012a). institution-logo-filena Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36

26 Algorithm 2 Neighborhood-based graph selection for Ising models 1: for each node vertex s = 1, 2,, p do 2: Solve the neighborhood prediction problem (ˆθ s, ˆθ s,/s ) = arg min θ s,θ s,/s 1 n n log P(X is X i,/s, θ s, θ s,/s ) + λ β 1 i=1 3: Compute the estimate ˆN (s) = J (ˆθ s,/s ) of the neighborhood set N (s). 4: end for 5: Combine the neighborhood estimates { ˆN (s), s V } via the AND or OR rule to form a graph estimate Ĝ = (V, Ê). Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36

27 Pseudo likelihood approach Another approach is to use the pseudo likelihood P(X 1, X 2,, X p ) p i=1 P(X i X /i )(:= L(θ)). The approximation is exact when all X i s are independent. In general, the sparser the graph, the better the approximation. Optimization criterion. min θ log L(θ) + λ θ 1. Optimization methods. Höfling and Tibshirani (2009): At each iteration t, approximate log L by its second-order Taylor expansion at θ (t) with only diagonal elements in the Hessian (denote this by Q(θ; θ (t) )). Then, θ (t+1) arg min θ Q(θ; θ (t) ) + λ θ 1. Xue et al. (2012a) used coordinate descent to iteratively update each parameter. At each iteration, a quadratic function is used to majorize log L. It differs from Höfling and Tibshirani (2009) s in that: (a) it updates only one parameter at each iteration; (b) it bounds 2 log L/ θ 2 st instead of calculating its value for every parameter θ st. Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36

28 Comparison between NS and PL Both of them are approximation methods, the intractability of A(θ) is avoided by taking advantage of conditional likelihoods. Generally, ˆβ ij ˆβ ji in neighborhood selection method. Theoretically, ˆβ NS (the estimate of β in neighborhood selection) is sign consistent under some conditions (Ravikumar et al., 2010). Under similar conditions, ˆβ PL (the estimate of β in pseudo likelihood) is not only sign consistent, but also, its estimation error is O(λ E ) (Xue et al., 2012a), where E is the number of edges in the real model. Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36

29 Comparison in Experiments Computational time. Both methods are fast in implementation. Neighborhood selection is faster in large datasets, solely because each parameter is updated only once, while it has to be updated until a convergence criterion is met in pseudo likelihood approaches. Accuracy. When estimating the edges, there is hardly any difference. But pseudo likelihood approaches are slightly better in estimating the joint distribution. We show some results in Höfling and Tibshirani (2009). Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36

30 Computational time P: number of variables. N : number of observations. Neigh: average number of neighbors per node. This may be misleading! Neighborhood selection is faster in larger datasets! In my own experiment, PL took s while NS took s on a dataset of size (n, p) = (1000, 50). Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36

31 Edge recovery Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36

32 Estimation of the joint distribution Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36

33 Challenges in Poisson Karlis (2003) studies the case where X s = Y 0 + Y s. Here, Y j, j = 0, 1,, p are independent Poisson random variables. It can only model positive correlations. The pairwise Poisson graphical model is P exp (θs X s log X s!) + θ stx s X t A(θ ). s V (s,t) E A(θ ) < + only if θ st 0 for (s, t) E. It can only model negative correlations. This issue may be solved by truncating the Poisson distribution or modifying the loss function (see, e.g., Yang et al. (2013)). However, these techniques are ad-hoc and have limited applicability. Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36

34 Mixed models We would like to model independencies among variables that come from different distributions. Let us use another notation system in this slide. Let X = (X 1, X 2,, X p ) be p continuous random variables, and Y = (Y 1, Y 2,, Y q ) be q discrete ones, each has L j possible states. In the pairwise model, P(X, Y ) is proportional to { p } exp γs X s 1 p p p q q q θ 2 stx s X t + ρ sj [Y j ]X s + ψjr[y j, Y r ] s=1 s=1 t=1 s=1 j =1 i=1 j =1 Each ρ sj is a vector of L j parameters, and each ψ ij is a matrix with L i L j elements. P(X s X /s, Y ) is Gaussian, while P(Y j X, Y /j ) is multinomial. Neighborhood selection still applies. Gaussian approximation of continuous random variables may not be appropriate. Also, it cannot model Poisson data. institution-logo-filena Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36

35 Bibiography I Christopher M Bishop. Pattern recognition. Machine Learning, 128, Trevor Hastie, Robert Tibshirani, and Martin Wainwright. Statistical learning with sparsity: the lasso and generalizations. CRC Press, Holger Höfling and Robert Tibshirani. Estimation of sparse binary pairwise markov networks using pseudo-likelihoods. Journal of Machine Learning Research, 10 (Apr): , Dimitris Karlis. An em algorithm for multivariate poisson distribution and related models. Journal of Applied Statistics, 30(1):63 77, Han Liu, John Lafferty, and Larry Wasserman. The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. Journal of Machine Learning Research, 10(Oct): , Han Liu, Fang Han, Ming Yuan, John Lafferty, and Larry Wasserman. High-dimensional semiparametric gaussian copula graphical models. The Annals of Statistics, pages , Nicolai Meinshausen and Peter Bühlmann. High-dimensional graphs and variable selection with the lasso. The annals of statistics, pages , institution-logo-filena Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36

36 Bibiography II Pradeep Ravikumar, Martin J Wainwright, John D Lafferty, et al. High-dimensional ising model selection using?1-regularized logistic regression. The Annals of Statistics, 38(3): , Irina Rish and Genady Grabarnik. Sparse modeling: theory, algorithms, and applications. CRC Press, Lingzhou Xue, Hui Zou, and Tianxi Cai. Nonconcave penalized composite conditional likelihood estimation of sparse ising models. The Annals of Statistics, pages , 2012a. Lingzhou Xue, Hui Zou, et al. Regularized rank-based estimation of high-dimensional nonparanormal graphical models. The Annals of Statistics, 40 (5): , 2012b. Eunho Yang, Pradeep K Ravikumar, Genevera I Allen, and Zhandong Liu. On poisson graphical models. In Advances in Neural Information Processing Systems, pages , Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, / 36

Learning discrete graphical models via generalized inverse covariance matrices

Learning discrete graphical models via generalized inverse covariance matrices Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,

More information

High dimensional Ising model selection

High dimensional Ising model selection High dimensional Ising model selection Pradeep Ravikumar UT Austin (based on work with John Lafferty, Martin Wainwright) Sparse Ising model US Senate 109th Congress Banerjee et al, 2008 Estimate a sparse

More information

Graphical Model Selection

Graphical Model Selection May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

Introduction to graphical models: Lecture III

Introduction to graphical models: Lecture III Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction

More information

High dimensional ising model selection using l 1 -regularized logistic regression

High dimensional ising model selection using l 1 -regularized logistic regression High dimensional ising model selection using l 1 -regularized logistic regression 1 Department of Statistics Pennsylvania State University 597 Presentation 2016 1/29 Outline Introduction 1 Introduction

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina.

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina. Using Quadratic Approximation Inderjit S. Dhillon Dept of Computer Science UT Austin SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina Sept 12, 2012 Joint work with C. Hsieh, M. Sustik and

More information

Causal Inference: Discussion

Causal Inference: Discussion Causal Inference: Discussion Mladen Kolar The University of Chicago Booth School of Business Sept 23, 2016 Types of machine learning problems Based on the information available: Supervised learning Reinforcement

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Sparse inverse covariance estimation with the lasso

Sparse inverse covariance estimation with the lasso Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty

More information

3 : Representation of Undirected GM

3 : Representation of Undirected GM 10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 5, 06 Reading: See class website Eric Xing @ CMU, 005-06

More information

CSC 412 (Lecture 4): Undirected Graphical Models

CSC 412 (Lecture 4): Undirected Graphical Models CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:

More information

Joint Gaussian Graphical Model Review Series I

Joint Gaussian Graphical Model Review Series I Joint Gaussian Graphical Model Review Series I Probability Foundations Beilun Wang Advisor: Yanjun Qi 1 Department of Computer Science, University of Virginia http://jointggm.org/ June 23rd, 2017 Beilun

More information

Robust and sparse Gaussian graphical modelling under cell-wise contamination

Robust and sparse Gaussian graphical modelling under cell-wise contamination Robust and sparse Gaussian graphical modelling under cell-wise contamination Shota Katayama 1, Hironori Fujisawa 2 and Mathias Drton 3 1 Tokyo Institute of Technology, Japan 2 The Institute of Statistical

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

The Nonparanormal skeptic

The Nonparanormal skeptic The Nonpara skeptic Han Liu Johns Hopkins University, 615 N. Wolfe Street, Baltimore, MD 21205 USA Fang Han Johns Hopkins University, 615 N. Wolfe Street, Baltimore, MD 21205 USA Ming Yuan Georgia Institute

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Undirected Graphical Models: Markov Random Fields

Undirected Graphical Models: Markov Random Fields Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction

More information

Undirected Graphical Models

Undirected Graphical Models 17 Undirected Graphical Models 17.1 Introduction A graph consists of a set of vertices (nodes), along with a set of edges joining some pairs of the vertices. In graphical models, each vertex represents

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010

More information

11 : Gaussian Graphic Models and Ising Models

11 : Gaussian Graphic Models and Ising Models 10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

Statistical Machine Learning for Structured and High Dimensional Data

Statistical Machine Learning for Structured and High Dimensional Data Statistical Machine Learning for Structured and High Dimensional Data (FA9550-09- 1-0373) PI: Larry Wasserman (CMU) Co- PI: John Lafferty (UChicago and CMU) AFOSR Program Review (Jan 28-31, 2013, Washington,

More information

Big & Quic: Sparse Inverse Covariance Estimation for a Million Variables

Big & Quic: Sparse Inverse Covariance Estimation for a Million Variables for a Million Variables Cho-Jui Hsieh The University of Texas at Austin NIPS Lake Tahoe, Nevada Dec 8, 2013 Joint work with M. Sustik, I. Dhillon, P. Ravikumar and R. Poldrack FMRI Brain Analysis Goal:

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Probabilistic Graphical Models Lecture Notes Fall 2009

Probabilistic Graphical Models Lecture Notes Fall 2009 Probabilistic Graphical Models Lecture Notes Fall 2009 October 28, 2009 Byoung-Tak Zhang School of omputer Science and Engineering & ognitive Science, Brain Science, and Bioinformatics Seoul National University

More information

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013 School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Junming Yin Lecture 15, March 4, 2013 Reading: W & J Book Chapters 1 Roadmap Two

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

Structure Learning of Mixed Graphical Models

Structure Learning of Mixed Graphical Models Jason D. Lee Institute of Computational and Mathematical Engineering Stanford University Trevor J. Hastie Department of Statistics Stanford University Abstract We consider the problem of learning the structure

More information

Multivariate Normal Models

Multivariate Normal Models Case Study 3: fmri Prediction Graphical LASSO Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox February 26 th, 2013 Emily Fox 2013 1 Multivariate Normal Models

More information

Undirected Graphical Models

Undirected Graphical Models Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Multivariate Normal Models

Multivariate Normal Models Case Study 3: fmri Prediction Coping with Large Covariances: Latent Factor Models, Graphical Models, Graphical LASSO Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

An Introduction to Graphical Lasso

An Introduction to Graphical Lasso An Introduction to Graphical Lasso Bo Chang Graphical Models Reading Group May 15, 2015 Bo Chang (UBC) Graphical Lasso May 15, 2015 1 / 16 Undirected Graphical Models An undirected graph, each vertex represents

More information

Markov Random Fields

Markov Random Fields Markov Random Fields Umamahesh Srinivas ipal Group Meeting February 25, 2011 Outline 1 Basic graph-theoretic concepts 2 Markov chain 3 Markov random field (MRF) 4 Gauss-Markov random field (GMRF), and

More information

Example: multivariate Gaussian Distribution

Example: multivariate Gaussian Distribution School of omputer Science Probabilistic Graphical Models Representation of undirected GM (continued) Eric Xing Lecture 3, September 16, 2009 Reading: KF-chap4 Eric Xing @ MU, 2005-2009 1 Example: multivariate

More information

Lecture 25: November 27

Lecture 25: November 27 10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

Undirected graphical models

Undirected graphical models Undirected graphical models Semantics of probabilistic models over undirected graphs Parameters of undirected models Example applications COMP-652 and ECSE-608, February 16, 2017 1 Undirected graphical

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

Does Better Inference mean Better Learning?

Does Better Inference mean Better Learning? Does Better Inference mean Better Learning? Andrew E. Gelfand, Rina Dechter & Alexander Ihler Department of Computer Science University of California, Irvine {agelfand,dechter,ihler}@ics.uci.edu Abstract

More information

The lasso: some novel algorithms and applications

The lasso: some novel algorithms and applications 1 The lasso: some novel algorithms and applications Newton Institute, June 25, 2008 Robert Tibshirani Stanford University Collaborations with Trevor Hastie, Jerome Friedman, Holger Hoefling, Gen Nowak,

More information

Greedy Layer-Wise Training of Deep Networks

Greedy Layer-Wise Training of Deep Networks Greedy Layer-Wise Training of Deep Networks Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle NIPS 2007 Presented by Ahmed Hefny Story so far Deep neural nets are more expressive: Can learn

More information

1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)

1 Undirected Graphical Models. 2 Markov Random Fields (MRFs) Machine Learning (ML, F16) Lecture#07 (Thursday Nov. 3rd) Lecturer: Byron Boots Undirected Graphical Models 1 Undirected Graphical Models In the previous lecture, we discussed directed graphical models.

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

Directed and Undirected Graphical Models

Directed and Undirected Graphical Models Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed

More information

Rapid Introduction to Machine Learning/ Deep Learning

Rapid Introduction to Machine Learning/ Deep Learning Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/24 Lecture 5b Markov random field (MRF) November 13, 2015 2/24 Table of contents 1 1. Objectives of Lecture

More information

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Part I C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Probabilistic Graphical Models Graphical representation of a probabilistic model Each variable corresponds to a

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

Proximity-Based Anomaly Detection using Sparse Structure Learning

Proximity-Based Anomaly Detection using Sparse Structure Learning Proximity-Based Anomaly Detection using Sparse Structure Learning Tsuyoshi Idé (IBM Tokyo Research Lab) Aurelie C. Lozano, Naoki Abe, and Yan Liu (IBM T. J. Watson Research Center) 2009/04/ SDM 2009 /

More information

Learning MN Parameters with Alternative Objective Functions. Sargur Srihari

Learning MN Parameters with Alternative Objective Functions. Sargur Srihari Learning MN Parameters with Alternative Objective Functions Sargur srihari@cedar.buffalo.edu 1 Topics Max Likelihood & Contrastive Objectives Contrastive Objective Learning Methods Pseudo-likelihood Gradient

More information

Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models

Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models Han Liu Kathryn Roeder Larry Wasserman Carnegie Mellon University Pittsburgh, PA 15213 Abstract A challenging

More information

Learning the Network Structure of Heterogenerous Data via Pairwise Exponential MRF

Learning the Network Structure of Heterogenerous Data via Pairwise Exponential MRF Learning the Network Structure of Heterogenerous Data via Pairwise Exponential MRF Jong Ho Kim, Youngsuk Park December 17, 2016 1 Introduction Markov random fields (MRFs) are a fundamental fool on data

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information

Parametrizations of Discrete Graphical Models

Parametrizations of Discrete Graphical Models Parametrizations of Discrete Graphical Models Robin J. Evans www.stat.washington.edu/ rje42 10th August 2011 1/34 Outline 1 Introduction Graphical Models Acyclic Directed Mixed Graphs Two Problems 2 Ingenuous

More information

Discriminative Fields for Modeling Spatial Dependencies in Natural Images

Discriminative Fields for Modeling Spatial Dependencies in Natural Images Discriminative Fields for Modeling Spatial Dependencies in Natural Images Sanjiv Kumar and Martial Hebert The Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 {skumar,hebert}@ri.cmu.edu

More information

Learning Markov Network Structure using Brownian Distance Covariance

Learning Markov Network Structure using Brownian Distance Covariance arxiv:.v [stat.ml] Jun 0 Learning Markov Network Structure using Brownian Distance Covariance Ehsan Khoshgnauz May, 0 Abstract In this paper, we present a simple non-parametric method for learning the

More information

Learning the Structure of Mixed Graphical Models

Learning the Structure of Mixed Graphical Models Learning the Structure of Mixed Graphical Models Jason D. Lee and Trevor J. Hastie February 17, 2014 Abstract We consider the problem of learning the structure of a pairwise graphical model over continuous

More information

Genetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig

Genetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig Genetic Networks Korbinian Strimmer IMISE, Universität Leipzig Seminar: Statistical Analysis of RNA-Seq Data 19 June 2012 Korbinian Strimmer, RNA-Seq Networks, 19/6/2012 1 Paper G. I. Allen and Z. Liu.

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

Multivariate Bernoulli Distribution 1

Multivariate Bernoulli Distribution 1 DEPARTMENT OF STATISTICS University of Wisconsin 1300 University Ave. Madison, WI 53706 TECHNICAL REPORT NO. 1170 June 6, 2012 Multivariate Bernoulli Distribution 1 Bin Dai 2 Department of Statistics University

More information

Discrete Markov Random Fields the Inference story. Pradeep Ravikumar

Discrete Markov Random Fields the Inference story. Pradeep Ravikumar Discrete Markov Random Fields the Inference story Pradeep Ravikumar Graphical Models, The History How to model stochastic processes of the world? I want to model the world, and I like graphs... 2 Mid to

More information

Markov Chains and MCMC

Markov Chains and MCMC Markov Chains and MCMC Markov chains Let S = {1, 2,..., N} be a finite set consisting of N states. A Markov chain Y 0, Y 1, Y 2,... is a sequence of random variables, with Y t S for all points in time

More information

CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning

CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning Professor Erik Sudderth Brown University Computer Science October 4, 2016 Some figures and materials courtesy

More information

Sparse and Locally Constant Gaussian Graphical Models

Sparse and Locally Constant Gaussian Graphical Models Sparse and Locally Constant Gaussian Graphical Models Jean Honorio, Luis Ortiz, Dimitris Samaras Department of Computer Science Stony Brook University Stony Brook, NY 794 {jhonorio,leortiz,samaras}@cs.sunysb.edu

More information

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative Chain CRF General

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models David Sontag New York University Lecture 4, February 16, 2012 David Sontag (NYU) Graphical Models Lecture 4, February 16, 2012 1 / 27 Undirected graphical models Reminder

More information

Conditional Independence and Factorization

Conditional Independence and Factorization Conditional Independence and Factorization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Sparse Gaussian conditional random fields

Sparse Gaussian conditional random fields Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian

More information

Introduction to Graphical Models

Introduction to Graphical Models Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Sparse Inverse Covariance Estimation for a Million Variables

Sparse Inverse Covariance Estimation for a Million Variables Sparse Inverse Covariance Estimation for a Million Variables Inderjit S. Dhillon Depts of Computer Science & Mathematics The University of Texas at Austin SAMSI LDHD Opening Workshop Raleigh, North Carolina

More information

Variables. Cho-Jui Hsieh The University of Texas at Austin. ICML workshop on Covariance Selection Beijing, China June 26, 2014

Variables. Cho-Jui Hsieh The University of Texas at Austin. ICML workshop on Covariance Selection Beijing, China June 26, 2014 for a Million Variables Cho-Jui Hsieh The University of Texas at Austin ICML workshop on Covariance Selection Beijing, China June 26, 2014 Joint work with M. Sustik, I. Dhillon, P. Ravikumar, R. Poldrack,

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials)

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials) Markov Networks l Like Bayes Nets l Graphical model that describes joint probability distribution using tables (AKA potentials) l Nodes are random variables l Labels are outcomes over the variables Markov

More information

Structure estimation for Gaussian graphical models

Structure estimation for Gaussian graphical models Faculty of Science Structure estimation for Gaussian graphical models Steffen Lauritzen, University of Copenhagen Department of Mathematical Sciences Minikurs TUM 2016 Lecture 3 Slide 1/48 Overview of

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the

More information

CS Lecture 4. Markov Random Fields

CS Lecture 4. Markov Random Fields CS 6347 Lecture 4 Markov Random Fields Recap Announcements First homework is available on elearning Reminder: Office hours Tuesday from 10am-11am Last Time Bayesian networks Today Markov random fields

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Robust Inverse Covariance Estimation under Noisy Measurements

Robust Inverse Covariance Estimation under Noisy Measurements .. Robust Inverse Covariance Estimation under Noisy Measurements Jun-Kun Wang, Shou-De Lin Intel-NTU, National Taiwan University ICML 2014 1 / 30 . Table of contents Introduction.1 Introduction.2 Related

More information

Learning the Structure of Mixed Graphical Models

Learning the Structure of Mixed Graphical Models Learning the Structure of Mixed Graphical Models Jason D. Lee and Trevor J. Hastie August 11, 2013 Abstract We consider the problem of learning the structure of a pairwise graphical model over continuous

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information