Introduction to graphical models: Lecture III
|
|
- Loraine Sullivan
- 6 years ago
- Views:
Transcription
1 Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January / 25
2 Introduction Markov random fields (undirected graphical models): central in many application areas of science/engineering: Martin Wainwright (UC Berkeley) Some introductory lectures January / 25
3 Introduction Markov random fields (undirected graphical models): central in many application areas of science/engineering: some fundamental problems counting/integrating: computing marginal distributions and partition functions optimization: computing most probable configurations (or top M-configurations) model selection: fitting and selecting models on the basis of data Martin Wainwright (UC Berkeley) Some introductory lectures January / 25
4 Graph structure and factorization Markov random field: random vector (X 1,...,X p ) with distribution factoring according to a graph G = (V,E): D A B C Hammersley-Clifford theorem: factorization over cliques Q(x 1,...,x p ;θ) = 1 Z(θ) exp{ θ C (x C ) } C C
5 Graphical model selection let G = (V,E) be an undirected graph on p = V vertices pairwise graphical model factorizes over edges of graph: Q(x 1,...,x p ;θ) exp { θ st (x s,x t ) }. (s,t) E given n independent and identically distributed (i.i.d.) samples of X = (X 1,...,X p ), identify the underlying graph structure Martin Wainwright (UC Berkeley) Some introductory lectures January / 25
6 1 Exact solutions Various classes of methods Chow-Liu algorithm for trees (Chow & Liu, 1967) computationally intractable for hypertrees (Srebro & Karger, 2001)
7 1 Exact solutions Various classes of methods Chow-Liu algorithm for trees (Chow & Liu, 1967) computationally intractable for hypertrees (Srebro & Karger, 2001) 2 Testing-based approaches PC algorithm (Spirtes et al., 2000; Kalisch & Bühlmann, 2008) thresholding (Bresler et al., 2008; Anandkumar et al., 2010)
8 1 Exact solutions Various classes of methods Chow-Liu algorithm for trees (Chow & Liu, 1967) computationally intractable for hypertrees (Srebro & Karger, 2001) 2 Testing-based approaches PC algorithm (Spirtes et al., 2000; Kalisch & Bühlmann, 2008) thresholding (Bresler et al., 2008; Anandkumar et al., 2010) 3 Penalized forms of global likelihood combinatorial penalties (AIC, BIC, GIC etc.) l1 and related penalties classical analysis of penalized Gaussian MLE: Yuan & Lin, 2006 some fast algorithms: d Asprémont et al., 2007; Friedman et al, 2008
9 1 Exact solutions Various classes of methods Chow-Liu algorithm for trees (Chow & Liu, 1967) computationally intractable for hypertrees (Srebro & Karger, 2001) 2 Testing-based approaches PC algorithm (Spirtes et al., 2000; Kalisch & Bühlmann, 2008) thresholding (Bresler et al., 2008; Anandkumar et al., 2010) 3 Penalized forms of global likelihood combinatorial penalties (AIC, BIC, GIC etc.) l1 and related penalties classical analysis of penalized Gaussian MLE: Yuan & Lin, 2006 some fast algorithms: d Asprémont et al., 2007; Friedman et al, Pseudolikelihoods and neighborhood regression pseudolikeliood consistency for Gaussians (Besag, 1977) pseudolikelihood and BIC criterion (Csiszar & Talata, 2006) neighborhood regression for Gaussian MRFs (e.g., Meinshausen & Buhlmann, 2005; Wainwright, 2006, Zhao & Yu 2006) logistic regression for Ising models (Ravikumar et al., 2010)
10 1. Global maximum likelihood given i.i.d. samples X n 1 := {(X 1,...,X n }, might consider methods based on global likelihood l(θ;x n 1) := 1 n n i=1 logq(x i ;θ) Martin Wainwright (UC Berkeley) Some introductory lectures January / 25
11 1. Global maximum likelihood given i.i.d. samples X n 1 := {(X 1,...,X n }, might consider methods based on global likelihood l(θ;x n 1) := 1 n n i=1 logq(x i ;θ) maximum likelihood for graphical model in exponential form θ = argmax Ê[θ(X θ s,x t )] }{{} logz(θ) (s,t) E empirical moments Martin Wainwright (UC Berkeley) Some introductory lectures January / 25
12 1. Global maximum likelihood given i.i.d. samples X n 1 := {(X 1,...,X n }, might consider methods based on global likelihood l(θ;x n 1) := 1 n n i=1 logq(x i ;θ) maximum likelihood for graphical model in exponential form θ = argmax Ê[θ(X θ s,x t )] }{{} logz(θ) (s,t) E empirical moments exact likelihood involves log partition function log Z(θ): can be computed for Gaussian MRFs (log-determinant) intractable for Ising models (binary pairwise MRFs) (Welsh, 1993) Martin Wainwright (UC Berkeley) Some introductory lectures January / 25
13 1. Global maximum likelihood given i.i.d. samples X n 1 := {(X 1,...,X n }, might consider methods based on global likelihood l(θ;x n 1) := 1 n n i=1 logq(x i ;θ) maximum likelihood for graphical model in exponential form θ = argmax Ê[θ(X θ s,x t )] }{{} logz(θ) (s,t) E empirical moments exact likelihood involves log partition function log Z(θ): can be computed for Gaussian MRFs (log-determinant) intractable for Ising models (binary pairwise MRFs) (Welsh, 1993) possible solutions: MCMC methods stochastic approximation methods variational approximations (mean field, Bethe and belief propagation) Martin Wainwright (UC Berkeley) Some introductory lectures January / 25
14 Gaussian graphs Sparse inverse covariances Zero pattern of inverse covariance Gaussian graphical model specified by sparse inverse covariance Θ: Q(x 1,...,x p ;Θ) = det(θ) (2π) p/2 exp( 1 2 xt Θx ). Martin Wainwright (UC Berkeley) Some introductory lectures January / 25
15 Gaussian l 1 -penalized MLE Estimator: l 1 -regularized log-determinant program: { Θ = arg min logdetθ+ Σn, Θ Θ 0 }{{} Gaussian log likelihood + λ n Θ ij }. i j }{{} Regularization Martin Wainwright (UC Berkeley) Some introductory lectures January / 25
16 Gaussian l 1 -penalized MLE Estimator: l 1 -regularized log-determinant program: { Θ = arg min logdetθ+ Σn, Θ Θ 0 }{{} Gaussian log likelihood + λ n Θ ij }. i j }{{} Regularization Results on this method: analysis under classical scaling (n with p fixed) (Yuan & Lin, 2006) some fast algorithms (d Asprémont et al., 2007; Friedman et al, 2008) high-dimensional analysis of Frobenius norm error (Rothman et al., 2008) high-dimensional variable selection and l bounds (Ravikumar et al., 2011) Martin Wainwright (UC Berkeley) Some introductory lectures January / 25
17 High-dimensional analysis classical analysis: dimension p fixed, sample size n + high-dimensional analysis: allow both dimension p, sample size n, and maximum degree d to increase at arbitrary rates take n i.i.d. samples from MRF defined by G p,d study probability of success as a function of three parameters: Success(n,p,d) = Q[Method recovers graph G p,d from n samples] theory is non-asymptotic: explicit probabilities for finite (n, p, d) Martin Wainwright (UC Berkeley) Some introductory lectures January / 25
18 Empirical behavior: Unrescaled plots 1 Chain graph 0.8 Prob. of success p= p=100 p=225 p= n Plots of success probability versus raw sample size n.
19 Empirical behavior: Appropriately rescaled 1 Chain graph 0.8 Prob. of success p= p=100 p=225 p= n/log p Plots of success probability versus rescaled sample size
20 Sufficient conditions for consistent model selection graph sequences G p,d = (V,E) with p vertices, and maximum degree d. suitable regularity conditions on Hessian of log-determinant Γ := (Θ ) 1 (Θ ) 1 Theorem: For multivariate Gaussian and sample size n > c 1 τ d 2 logp logp and regularization parameter λ n c 2 τ n, then with probability greater than 1 2exp ( c 3 (τ 2)logp ) : Martin Wainwright (UC Berkeley) Some introductory lectures January / 25
21 Sufficient conditions for consistent model selection graph sequences G p,d = (V,E) with p vertices, and maximum degree d. suitable regularity conditions on Hessian of log-determinant Γ := (Θ ) 1 (Θ ) 1 Theorem: For multivariate Gaussian and sample size n > c 1 τ d 2 logp logp and regularization parameter λ n c 2 τ n, then with probability greater than 1 2exp ( c 3 (τ 2)logp ) : (a) No false inclusions: The regularized log-determinant estimate Θ returns an edge set Ê E. Martin Wainwright (UC Berkeley) Some introductory lectures January / 25
22 Sufficient conditions for consistent model selection graph sequences G p,d = (V,E) with p vertices, and maximum degree d. suitable regularity conditions on Hessian of log-determinant Γ := (Θ ) 1 (Θ ) 1 Theorem: For multivariate Gaussian and sample size n > c 1 τ d 2 logp logp and regularization parameter λ n c 2 τ n, then with probability greater than 1 2exp ( c 3 (τ 2)logp ) : (a) No false inclusions: The regularized log-determinant estimate Θ returns an edge set Ê E. (b) l -control: Estimate satisfies max i,j Θ ij Θ ij 2c τ logp 4 n. Martin Wainwright (UC Berkeley) Some introductory lectures January / 25
23 Sufficient conditions for consistent model selection graph sequences G p,d = (V,E) with p vertices, and maximum degree d. suitable regularity conditions on Hessian of log-determinant Γ := (Θ ) 1 (Θ ) 1 Theorem: For multivariate Gaussian and sample size n > c 1 τ d 2 logp logp and regularization parameter λ n c 2 τ n, then with probability greater than 1 2exp ( c 3 (τ 2)logp ) : (a) No false inclusions: The regularized log-determinant estimate Θ returns an edge set Ê E. (b) l -control: Estimate satisfies max i,j Θ ij Θ ij 2c τ logp 4 n. τ logp (c) Model selection consistency: If θ min c 4 n, then E = Ê. Martin Wainwright (UC Berkeley) Some introductory lectures January / 25
24 Some other graphs (a) 4-grid (b) Star d = 4 d {O(logp), αp} Martin Wainwright (UC Berkeley) Some introductory lectures January / 25
25 Results for 4-grid graphs Vertical axis: success probability Q[Ê = E] Martin Wainwright (UC Berkeley) Some introductory lectures January / nearest neighbor grid 0.8 Prob. of success p= p=100 p=225 p= n/log p
26 Results for star graphs Vertical axis: success probability Q[Ê = E] Martin Wainwright (UC Berkeley) Some introductory lectures January / 25 1 Star graph 0.8 Prob. of success p= p=100 p=225 p= n/log p
27 Proof sketch: Primal-dual certificate construct candidate primal-dual pair ( θ,ẑ) R p p R p p. proof technique -not a practical algorithm! (A) Solve the restricted log-determinant program θ = arg min Θ 0,Θ S c=0 { logdetθ+ Σn, Θ + λ n Θ ij } thereby obtaining candidate solution θ = ( θ S,0 S c). (B) We choose ẑ S R S as an element of the subdifferential θ S 1. (C) Using optimality conditions from original convex program, solve for ẑ S c and check whether or not strict dual feasibility ẑ j < 1 for all j S c holds. i j Lemma: Full convex program recovers neighborhood primal-dual witness succeeds. Martin Wainwright (UC Berkeley) Some introductory lectures January / 25
28 2. Pseudolikelihood and neighborhood approaches Markov properties encode neighborhood structure: (X s X V\s ) }{{} Condition on full graph d = (X s X N(s) ) }{{} Condition on Markov blanket N(s) = {s,t,u,v,w} X s X t X w X s X u X v basis of pseudolikelihood method (Besag, 1974) basis of many graph learning algorithm (Friedman et al., 1999; Csiszar & Talata, 2005; Abeel et al., 2006; Meinshausen & Buhlmann, 2006)
29 Graph selection via neighborhood regression X \s X s..... Predict X s based on X \s := {X s, t s}.
30 Graph selection via neighborhood regression X \s X s..... Predict X s based on X \s := {X s, t s}. 1 For each node s V, compute (regularized) max. likelihood estimate: { } θ[s] := arg min 1 n L(θ;X i\s ) + λ n θ 1 θ R p 1 n }{{}}{{} i=1 local log. likelihood regularization
31 Graph selection via neighborhood regression X \s X s..... Predict X s based on X \s := {X s, t s}. 1 For each node s V, compute (regularized) max. likelihood estimate: { } θ[s] := arg min 1 n L(θ;X i\s ) + λ n θ 1 θ R p 1 n }{{}}{{} i=1 local log. likelihood regularization 2 Estimate the local neighborhood N(s) as support of regression vector θ[s] R p 1.
32 Empirical behavior: Unrescaled plots 1 Star graph; Linear fraction neighbors 0.8 Prob. success p = 64 p = 100 p = Number of samples
33 Empirical behavior: Appropriately rescaled 1 Star graph; Linear fraction neighbors 0.8 Prob. success p = 64 p = 100 p = Control parameter
34 Sufficient conditions for consistent Ising selection graph sequences G p,d = (V,E) with p vertices, and maximum degree d. edge weights θ st θ min for all (s,t) E draw n i.i.d, samples, and analyze prob. success indexed by (n,p,d) Theorem (Ravikumar, W. & Lafferty, 2010)
35 Sufficient conditions for consistent Ising selection graph sequences G p,d = (V,E) with p vertices, and maximum degree d. edge weights θ st θ min for all (s,t) E draw n i.i.d, samples, and analyze prob. success indexed by (n,p,d) Theorem (Ravikumar, W. & Lafferty, 2010) Under incoherence conditions, with sample size n > c 1 d 3 logp logp and regularization parameter λ n c 2 n, then with probability greater than 1 2exp ( c 3 λ 2 nn ) : (a) Correct exclusion: The estimated sign neighborhood N(s) correctly excludes all edges not in the true neighborhood.
36 Sufficient conditions for consistent Ising selection graph sequences G p,d = (V,E) with p vertices, and maximum degree d. edge weights θ st θ min for all (s,t) E draw n i.i.d, samples, and analyze prob. success indexed by (n,p,d) Theorem (Ravikumar, W. & Lafferty, 2010) Under incoherence conditions, with sample size n > c 1 d 3 logp logp and regularization parameter λ n c 2 n, then with probability greater than 1 2exp ( c 3 λ 2 nn ) : (a) Correct exclusion: The estimated sign neighborhood N(s) correctly excludes all edges not in the true neighborhood. (b) Correct inclusion: For θ min c 4 dλn, the method selects the correct signed neighborhood.
37 US Senate network ( voting)
38 3. Info. theory: Graph selection as channel coding graphical model selection is an unorthodox channel coding problem: Martin Wainwright (UC Berkeley) Some introductory lectures January / 25
39 3. Info. theory: Graph selection as channel coding graphical model selection is an unorthodox channel coding problem: codewords/codebook: graph G in some graph class G channel use: draw sample Xi = (X i1,...,x ip) from Markov random field Q θ(g) decoding problem: use n samples {X1,...,X n } to correctly distinguish the codeword G Q(X G) X 1,...,X n Martin Wainwright (UC Berkeley) Some introductory lectures January / 25
40 3. Info. theory: Graph selection as channel coding graphical model selection is an unorthodox channel coding problem: codewords/codebook: graph G in some graph class G channel use: draw sample Xi = (X i1,...,x ip) from Markov random field Q θ(g) decoding problem: use n samples {X1,...,X n } to correctly distinguish the codeword G Q(X G) X 1,...,X n Channel capacity for graph decoding determined by balance between log number of models relative distinguishability of different models Martin Wainwright (UC Berkeley) Some introductory lectures January / 25
41 Necessary conditions for G d,p G G d,p : graphs with p nodes and max. degree d Ising models with: Minimum edge weight: θ st θ min for all edges Maximum neighborhood weight: ω(θ) := max s V θst t N(s)
42 Necessary conditions for G d,p G G d,p : graphs with p nodes and max. degree d Ising models with: Minimum edge weight: θ st θ min for all edges Maximum neighborhood weight: ω(θ) := max s V θst t N(s) Theorem If the sample size n is upper bounded by (Santhanam & W, 2012) { d n < max 8 log p 8d, exp( ω(θ) 4 )dθ minlog(pd/8), 128exp( 3θmin 2 ) logp } 2θ min tanh(θ min ) then the probability of error of any algorithm over G d,p is at least 1/2.
43 Necessary conditions for G d,p G G d,p : graphs with p nodes and max. degree d Ising models with: Minimum edge weight: θ st θ min for all edges Maximum neighborhood weight: ω(θ) := max s V θst t N(s) Theorem If the sample size n is upper bounded by (Santhanam & W, 2012) { d n < max 8 log p 8d, exp( ω(θ) 4 )dθ minlog(pd/8), 128exp( 3θmin 2 ) logp } 2θ min tanh(θ min ) then the probability of error of any algorithm over G d,p is at least 1/2. Interpretation: Naive bulk effect: Arises from log cardinality log G d,p d-clique effect: Difficulty of separating models that contain a near d-clique Small weight effect: Difficult to detect edges with small weights.
44 Some consequences Corollary For asymptotically reliable recovery over G d,p, any algorithm requires at least n = Ω(d 2 logp) samples. Martin Wainwright (UC Berkeley) Some introductory lectures January / 25
45 Some consequences Corollary For asymptotically reliable recovery over G d,p, any algorithm requires at least n = Ω(d 2 logp) samples. note that maximum neighborhood weight ω(θ ) dθ min = require θ min = O(1/d) Martin Wainwright (UC Berkeley) Some introductory lectures January / 25
46 Some consequences Corollary For asymptotically reliable recovery over G d,p, any algorithm requires at least n = Ω(d 2 logp) samples. note that maximum neighborhood weight ω(θ ) dθ min = require θ min = O(1/d) from small weight effect logp n = Ω( θ min tanh(θ min ) ) = Ω( logp) θ 2 min Martin Wainwright (UC Berkeley) Some introductory lectures January / 25
47 Some consequences Corollary For asymptotically reliable recovery over G d,p, any algorithm requires at least n = Ω(d 2 logp) samples. note that maximum neighborhood weight ω(θ ) dθ min = require θ min = O(1/d) from small weight effect logp n = Ω( θ min tanh(θ min ) ) = Ω( logp) θ 2 min conclude that l 1 -regularized logistic regression (LR) is within Θ(d) of optimal for general graphs Martin Wainwright (UC Berkeley) Some introductory lectures January / 25
High-dimensional graphical model selection: Practical and information-theoretic limits
1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John
More informationHigh-dimensional graphical model selection: Practical and information-theoretic limits
1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John
More informationLearning and message-passing in graphical models
Learning and message-passing in graphical models Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Graphical models and message-passing 1 / 41 graphical
More informationHigh dimensional Ising model selection
High dimensional Ising model selection Pradeep Ravikumar UT Austin (based on work with John Lafferty, Martin Wainwright) Sparse Ising model US Senate 109th Congress Banerjee et al, 2008 Estimate a sparse
More informationHigh dimensional ising model selection using l 1 -regularized logistic regression
High dimensional ising model selection using l 1 -regularized logistic regression 1 Department of Statistics Pennsylvania State University 597 Presentation 2016 1/29 Outline Introduction 1 Introduction
More informationLearning discrete graphical models via generalized inverse covariance matrices
Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,
More informationHigh-dimensional statistics: Some progress and challenges ahead
High-dimensional statistics: Some progress and challenges ahead Martin Wainwright UC Berkeley Departments of Statistics, and EECS University College, London Master Class: Lecture Joint work with: Alekh
More informationGraphical models and message-passing Part I: Basics and MAP computation
Graphical models and message-passing Part I: Basics and MAP computation Martin Wainwright UC Berkeley Departments of Statistics, and EECS Tutorial materials (slides, monograph, lecture notes) available
More informationSparse Graph Learning via Markov Random Fields
Sparse Graph Learning via Markov Random Fields Xin Sui, Shao Tang Sep 23, 2016 Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, 2016 1 / 36 Outline 1 Introduction to graph learning
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3
More informationHigh-dimensional covariance estimation based on Gaussian graphical models
High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,
More informationLarge-Deviations and Applications for Learning Tree-Structured Graphical Models
Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of Information and Decision Systems, Massachusetts Institute of Technology Thesis
More information3 : Representation of Undirected GM
10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 5, 06 Reading: See class website Eric Xing @ CMU, 005-06
More information14 : Theory of Variational Inference: Inner and Outer Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction
More information11 : Gaussian Graphic Models and Ising Models
10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood
More informationBAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage
BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement
More informationMarkov random fields. The Markov property
Markov random fields The Markov property Discrete time: (X k X k!1,x k!2,... = (X k X k!1 A time symmetric version: (X k! X!k = (X k X k!1,x k+1 A more general version: Let A be a set of indices >k, B
More informationProbabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013
School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Junming Yin Lecture 15, March 4, 2013 Reading: W & J Book Chapters 1 Roadmap Two
More informationMultivariate Bernoulli Distribution 1
DEPARTMENT OF STATISTICS University of Wisconsin 1300 University Ave. Madison, WI 53706 TECHNICAL REPORT NO. 1170 June 6, 2012 Multivariate Bernoulli Distribution 1 Bin Dai 2 Department of Statistics University
More informationWhich graphical models are difficult to learn?
Which graphical models are difficult to learn? Jose Bento Department of Electrical Engineering Stanford University jbento@stanford.edu Andrea Montanari Department of Electrical Engineering and Department
More informationEstimators based on non-convex programs: Statistical and computational guarantees
Estimators based on non-convex programs: Statistical and computational guarantees Martin Wainwright UC Berkeley Statistics and EECS Based on joint work with: Po-Ling Loh (UC Berkeley) Martin Wainwright
More informationDiscrete Markov Random Fields the Inference story. Pradeep Ravikumar
Discrete Markov Random Fields the Inference story Pradeep Ravikumar Graphical Models, The History How to model stochastic processes of the world? I want to model the world, and I like graphs... 2 Mid to
More informationCausal Inference: Discussion
Causal Inference: Discussion Mladen Kolar The University of Chicago Booth School of Business Sept 23, 2016 Types of machine learning problems Based on the information available: Supervised learning Reinforcement
More information1 Regression with High Dimensional Data
6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:
More informationCSC 412 (Lecture 4): Undirected Graphical Models
CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:
More informationProperties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation
Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana
More informationDoes Better Inference mean Better Learning?
Does Better Inference mean Better Learning? Andrew E. Gelfand, Rina Dechter & Alexander Ihler Department of Computer Science University of California, Irvine {agelfand,dechter,ihler}@ics.uci.edu Abstract
More information10708 Graphical Models: Homework 2
10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves
More informationCS Lecture 19. Exponential Families & Expectation Propagation
CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces
More informationGraphical models and message-passing Part II: Marginals and likelihoods
Graphical models and message-passing Part II: Marginals and likelihoods Martin Wainwright UC Berkeley Departments of Statistics, and EECS Tutorial materials (slides, monograph, lecture notes) available
More informationGraphical Model Selection
May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationSimultaneous Support Recovery in High Dimensions: Benefits and Perils of Block `1=` -Regularization
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 6, JUNE 2011 3841 Simultaneous Support Recovery in High Dimensions: Benefits and Perils of Block `1=` -Regularization Sahand N. Negahban and Martin
More informationLecture 9: PGM Learning
13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and
More informationLearning the Network Structure of Heterogenerous Data via Pairwise Exponential MRF
Learning the Network Structure of Heterogenerous Data via Pairwise Exponential MRF Jong Ho Kim, Youngsuk Park December 17, 2016 1 Introduction Markov random fields (MRFs) are a fundamental fool on data
More informationHigh-dimensional Covariance Estimation Based On Gaussian Graphical Models
High-dimensional Covariance Estimation Based On Gaussian Graphical Models Shuheng Zhou, Philipp Rutimann, Min Xu and Peter Buhlmann February 3, 2012 Problem definition Want to estimate the covariance matrix
More informationUndirected Graphical Models
Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates
More informationInformation-Theoretic Limits of Selecting Binary Graphical Models in High Dimensions
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, NO. 7, JULY 2012 4117 Information-Theoretic Limits of Selecting Binary Graphical Models in High Dimensions Narayana P. Santhanam, Member, IEEE, and Martin
More informationMassachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Problem Set 3 Issued: Thursday, September 25, 2014 Due: Thursday,
More informationImproved Greedy Algorithms for Learning Graphical Models
Improved Greedy Algorithms for Learning Graphical Models Avik Ray, Sujay Sanghavi Member, IEEE and Sanjay Shakkottai Fellow, IEEE Abstract We propose new greedy algorithms for learning the structure of
More informationInferning with High Girth Graphical Models
Uri Heinemann The Hebrew University of Jerusalem, Jerusalem, Israel Amir Globerson The Hebrew University of Jerusalem, Jerusalem, Israel URIHEI@CS.HUJI.AC.IL GAMIR@CS.HUJI.AC.IL Abstract Unsupervised learning
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationThe Nonparanormal skeptic
The Nonpara skeptic Han Liu Johns Hopkins University, 615 N. Wolfe Street, Baltimore, MD 21205 USA Fang Han Johns Hopkins University, 615 N. Wolfe Street, Baltimore, MD 21205 USA Ming Yuan Georgia Institute
More informationMachine Learning for Data Science (CS4786) Lecture 24
Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each
More informationJunction Tree, BP and Variational Methods
Junction Tree, BP and Variational Methods Adrian Weller MLSALT4 Lecture Feb 21, 2018 With thanks to David Sontag (MIT) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,
More informationSparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results
Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results David Prince Biostat 572 dprince3@uw.edu April 19, 2012 David Prince (UW) SPICE April 19, 2012 1 / 11 Electronic
More information13 : Variational Inference: Loopy Belief Propagation
10-708: Probabilistic Graphical Models 10-708, Spring 2014 13 : Variational Inference: Loopy Belief Propagation Lecturer: Eric P. Xing Scribes: Rajarshi Das, Zhengzhong Liu, Dishan Gupta 1 Introduction
More informationStatistical Learning
Statistical Learning Lecture 5: Bayesian Networks and Graphical Models Mário A. T. Figueiredo Instituto Superior Técnico & Instituto de Telecomunicações University of Lisbon, Portugal May 2018 Mário A.
More informationProbabilistic Graphical Models
2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationPermutation-invariant regularization of large covariance matrices. Liza Levina
Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationConvex relaxation for Combinatorial Penalties
Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,
More informationProbabilistic Graphical Models Lecture Notes Fall 2009
Probabilistic Graphical Models Lecture Notes Fall 2009 October 28, 2009 Byoung-Tak Zhang School of omputer Science and Engineering & ognitive Science, Brain Science, and Bioinformatics Seoul National University
More informationarxiv: v1 [stat.ml] 8 Oct 2011
On the trade-off between complexity and correlation decay in structural learning algorithms arxiv:1110.1769v1 [stat.ml] 8 Oct 2011 José Bento and Andrea Montanari October 11, 2011 Abstract We consider
More informationLinear and conic programming relaxations: Graph structure and message-passing
Linear and conic programming relaxations: Graph structure and message-passing Martin Wainwright UC Berkeley Departments of EECS and Statistics Banff Workshop Partially supported by grants from: National
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationarxiv: v1 [stat.ap] 19 Oct 2015
Submitted to the Annals of Applied Statistics STRUCTURE ESTIMATION FOR MIXED GRAPHICAL MODELS IN HIGH-DIMENSIONAL DATA arxiv:1510.05677v1 [stat.ap] 19 Oct 2015 By Jonas M. B. Haslbeck Utrecht University
More informationCS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling
CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling Professor Erik Sudderth Brown University Computer Science October 27, 2016 Some figures and materials courtesy
More informationExtended Bayesian Information Criteria for Gaussian Graphical Models
Extended Bayesian Information Criteria for Gaussian Graphical Models Rina Foygel University of Chicago rina@uchicago.edu Mathias Drton University of Chicago drton@uchicago.edu Abstract Gaussian graphical
More informationLearning Quadratic Variance Function (QVF) DAG Models via OverDispersion Scoring (ODS)
Journal of Machine Learning Research 18 2018 1-44 Submitted 4/17; Revised 12/17; Published 4/18 Learning Quadratic Variance Function QVF DAG Models via OverDispersion Scoring ODS Gunwoong Park Department
More informationGaussian Graphical Models and Graphical Lasso
ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf
More informationStructure Learning of Mixed Graphical Models
Jason D. Lee Institute of Computational and Mathematical Engineering Stanford University Trevor J. Hastie Department of Statistics Stanford University Abstract We consider the problem of learning the structure
More informationDirected and Undirected Graphical Models
Directed and Undirected Graphical Models Adrian Weller MLSALT4 Lecture Feb 26, 2016 With thanks to David Sontag (NYU) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by
More informationLearning Gaussian Graphical Models with Unknown Group Sparsity
Learning Gaussian Graphical Models with Unknown Group Sparsity Kevin Murphy Ben Marlin Depts. of Statistics & Computer Science Univ. British Columbia Canada Connections Graphical models Density estimation
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Variational Inference IV: Variational Principle II Junming Yin Lecture 17, March 21, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3 X 3 Reading: X 4
More informationarxiv: v6 [math.st] 3 Feb 2018
Submitted to the Annals of Statistics HIGH-DIMENSIONAL CONSISTENCY IN SCORE-BASED AND HYBRID STRUCTURE LEARNING arxiv:1507.02608v6 [math.st] 3 Feb 2018 By Preetam Nandy,, Alain Hauser and Marloes H. Maathuis,
More information14 : Theory of Variational Inference: Inner and Outer Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture
More informationExact MAP estimates by (hyper)tree agreement
Exact MAP estimates by (hyper)tree agreement Martin J. Wainwright, Department of EECS, UC Berkeley, Berkeley, CA 94720 martinw@eecs.berkeley.edu Tommi S. Jaakkola and Alan S. Willsky, Department of EECS,
More informationOn the optimality of tree-reweighted max-product message-passing
Appeared in Uncertainty on Artificial Intelligence, July 2005, Edinburgh, Scotland. On the optimality of tree-reweighted max-product message-passing Vladimir Kolmogorov Microsoft Research Cambridge, UK
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More informationUndirected graphical models
Undirected graphical models Kevin P. Murphy Last updated November 16, 2006 * Denotes advanced sections that may be omitted on a first reading. 1 Introduction We have seen that conditional independence
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationEstimation of Graphical Models with Shape Restriction
Estimation of Graphical Models with Shape Restriction BY KHAI X. CHIONG USC Dornsife INE, Department of Economics, University of Southern California, Los Angeles, California 989, U.S.A. kchiong@usc.edu
More informationUC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory
UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics EECS 281A / STAT 241A Statistical Learning Theory Solutions to Problem Set 2 Fall 2011 Issued: Wednesday,
More informationHigh-Dimensional Learning of Linear Causal Networks via Inverse Covariance Estimation
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 10-014 High-Dimensional Learning of Linear Causal Networks via Inverse Covariance Estimation Po-Ling Loh University
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 10 Undirected Models CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due this Wednesday (Nov 4) in class Project milestones due next Monday (Nov 9) About half
More informationNon-Asymptotic Analysis for Relational Learning with One Network
Peng He Department of Automation Tsinghua University Changshui Zhang Department of Automation Tsinghua University Abstract This theoretical paper is concerned with a rigorous non-asymptotic analysis of
More informationHigh Dimensional Inverse Covariate Matrix Estimation via Linear Programming
High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω
More informationUndirected Graphical Models: Markov Random Fields
Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected
More informationEstimating Latent Variable Graphical Models with Moments and Likelihoods
Estimating Latent Variable Graphical Models with Moments and Likelihoods Arun Tejasvi Chaganty Percy Liang Stanford University June 18, 2014 Chaganty, Liang (Stanford University) Moments and Likelihoods
More informationInconsistent parameter estimation in Markov random fields: Benefits in the computation-limited setting
Inconsistent parameter estimation in Markov random fields: Benefits in the computation-limited setting Martin J. Wainwright Department of Statistics, and Department of Electrical Engineering and Computer
More informationVariational Inference. Sargur Srihari
Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of discussion We first describe inference with PGMs and the intractability of exact inference Then give a taxonomy of inference algorithms
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationSemidefinite relaxations for approximate inference on graphs with cycles
Semidefinite relaxations for approximate inference on graphs with cycles Martin J. Wainwright Electrical Engineering and Computer Science UC Berkeley, Berkeley, CA 94720 wainwrig@eecs.berkeley.edu Michael
More informationRobust Inverse Covariance Estimation under Noisy Measurements
.. Robust Inverse Covariance Estimation under Noisy Measurements Jun-Kun Wang, Shou-De Lin Intel-NTU, National Taiwan University ICML 2014 1 / 30 . Table of contents Introduction.1 Introduction.2 Related
More information1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)
Machine Learning (ML, F16) Lecture#07 (Thursday Nov. 3rd) Lecturer: Byron Boots Undirected Graphical Models 1 Undirected Graphical Models In the previous lecture, we discussed directed graphical models.
More informationRapid Introduction to Machine Learning/ Deep Learning
Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/24 Lecture 5b Markov random field (MRF) November 13, 2015 2/24 Table of contents 1 1. Objectives of Lecture
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More informationIndependencies. Undirected Graphical Models 2: Independencies. Independencies (Markov networks) Independencies (Bayesian Networks)
(Bayesian Networks) Undirected Graphical Models 2: Use d-separation to read off independencies in a Bayesian network Takes a bit of effort! 1 2 (Markov networks) Use separation to determine independencies
More informationIntelligent Systems:
Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition
More informationNew ways of dimension reduction? Cutting data sets into small pieces
New ways of dimension reduction? Cutting data sets into small pieces Roman Vershynin University of Michigan, Department of Mathematics Statistical Machine Learning Ann Arbor, June 5, 2012 Joint work with
More informationECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning
ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Markov Random Fields: Representation Conditional Random Fields Log-Linear Models Readings: KF
More informationLinear Response for Approximate Inference
Linear Response for Approximate Inference Max Welling Department of Computer Science University of Toronto Toronto M5S 3G4 Canada welling@cs.utoronto.ca Yee Whye Teh Computer Science Division University
More information