High dimensional ising model selection using l 1 -regularized logistic regression

Size: px
Start display at page:

Download "High dimensional ising model selection using l 1 -regularized logistic regression"

Transcription

1 High dimensional ising model selection using l 1 -regularized logistic regression 1 Department of Statistics Pennsylvania State University 597 Presentation /29

2 Outline Introduction 1 Introduction Background and Motivation Problem Formulation 2 Theoretical results Primal-dual witness for graph recovery Analysis under sample Fisher matrix assumptions Extensions to general discrete MRF /29

3 Background Introduction Background and Motivation Problem Formulation Markov random fields (MRF)/Undirected graphical model. MRF: undirected graph G = (V, E) with vertex set V = {1, 2,, p} and edge set E V V Structure of the graph: encodes certain conditional independence assumptions among subsets of X = (X 1, X p ), where each X s is associated with a vertex s V. One important problem: Estimate the underlying graph from n i.i.d samples {x (1), x (2),, x (n) } drawn from the distribution specified by the MRF. 3/29

4 Previous work on MRF Background and Motivation Problem Formulation Constraint-based approaches: estimates conditional independence from the data using hypothesis testing. Score-based approaches: combine a metric for the complexity of the graph with a measure of the goodness of fit, e.g., log-likelihood of the maximum likelihood parameters. Drawbacks: Calculation for the partition function associated with the MRF is computationally intractable. 4/29

5 Basic approach Introduction Background and Motivation Problem Formulation Basic approach: performing sequential l 1 -regularized logistic regression. Then using the sparsity pattern of the regression vector to infer the underlying neighborhood structure. sample size: n; model dimension p; maximum neighborhood size d. Both p and d may tend to infinity as a function of the size n. Focus on binary Ising model and generalize the method to general discrete MRF. 5/29

6 Background Introduction Background and Motivation Problem Formulation Pairwise Markov random fields. X = (X 1, X 2, X p ) is a random vector. G = (V, E) is an undirected graph with vertex set V = {1,, p} and edge set E. The pairwise MRF is the family of distributions of X : P(x) exp{ φ st (x s, x t )} (s,t) E Ising model. We focus on the special case of the Ising model in which X s { 1, 1} for each vertex s V, and φ st (x s, x t ) = θstx s x t, θst R, thus P(x) = 1 Z(θ ) exp θ stx s x t. (1) (s,t) E 6/29

7 Graphical model selection Background and Motivation Problem Formulation Goal. X n 1 := {x (1),, x (n) } of n samples where x (i) { 1, +1} p is drawn i.i.d. from a distribution P θ The goal of graphical model selection is to infer the edge set E(the same as estimate θ st). Stronger criterion of signed edge recovery. Given a graphical model with parameter θ, define the edge sign vector E = { sign(θ st ), if (s, t) E 0, otherwise. (2) 7/29

8 Motivation Introduction Background and Motivation Problem Formulation Classical notion of statistical consistency: limiting behavior of the estimator as n and p fixed. Analysis in this paper is of high-dimensional nature. As n, p = p(n), d = d(n) also scale as a function of n. Our goal is to establish sufficient conditions on the scaling of the triple (n, p, d) such that proposed estimator is consistent: P[Ê n = E ] 1, as n + 8/29

9 Background and Motivation Problem Formulation Neighborhood-based logistic regression. Recovering E is equivalent to recovering signed neighborhood for each r V N ± (r) := {sign(θ rt)t t N (r)} (3) where N (r) is the neighborhood set of r, i.e., N(r) := {t V (r, t) E} N ± (r) can be recovered from the sign-sparsity pattern: θ\r := {θ ru, u V \r}. Derive the conditional distribution of X r given all others X \r (4), P θ (x r x \r ) = exp(2x r t V \r θ rtx t ) (4) exp(2x r t V \r θ rtx t ) + 1 9/29

10 Background and Motivation Problem Formulation Neighborhood-based logistic regression. The conditional distribution suggests that X r can be viewed as the response variable in a logistic regression in which X \r are covariates. Estimation procedure: computing an l 1 -regularized logistic regression of X r on X \r. Given a sample X1 n, the regularized regression problem is a convex program of the form 1 n min f (θ; x (i) ) θ ru ˆµ ru + λ n θ θ \r R p 1 n \r 1 (5) i=1 t V \r u V \r where f (θ; x) := log exp( θ rt x t ) + exp( θ rt x t ) (6) t V \r 10/29

11 Background and Motivation Problem Formulation By Lagrangian duality, the problem (5) can be re-cast as a constrained problem over the ball θ \r 1 C(λn). After obtaining the minimizer ˆθ \r n, estimate the signed neighborhood N ± (r) according to ˆN ± (r) := {sign(ˆθ ru )u u V \r, ˆθ su 0}. (7) The full graph G is estimated consistently {Ên = E }, if ˆN ± (r) = N ± (r) for all r V. 11/29

12 Theoretical results Primal-dual witness for graph recovery Analysis under sample Fisher matrix assumptions Extensions to general discrete MRF. Our analysis proceeds by First, establish sufficient conditions for correct signed neighborhood recovery for some fixed node r V. Second, we use a union bound over all p nodes of the graph to conclude that consistent graph selection is also achieved. 12/29

13 Assumptions Introduction Theoretical results Primal-dual witness for graph recovery Analysis under sample Fisher matrix assumptions Extensions to general discrete MRF. Next, state the main result on the performance of neighborhood logistic regression for graphical model selection. Definitions Population Fisher information matrix (Hessian of the likelihood function). Q r := E θ { 2 logp θ [X r X \r ].} S := {(r, t) t N (r)}, Q SS denotes the d d sub-matrix of Q θmin = min (r,t) E θrt determines the limits of model selection. 13/29

14 Assumptions Introduction Theoretical results Primal-dual witness for graph recovery Analysis under sample Fisher matrix assumptions Extensions to general discrete MRF. (A1) Dependency condition Λ min (Q SS ) C min Λ max (E θ [X \r X T \r ]) D max (8) (A2) Incoherence condition Q S c S (Q SS ) 1 1 α (9) 14/29

15 Theorem 1 Introduction Theoretical results Primal-dual witness for graph recovery Analysis under sample Fisher matrix assumptions Extensions to general discrete MRF. Theorem 1 Consider an Ising graphical model with parameter vector θ and associated edge set E such that conditions (A1) and (A2) are satisfied by Q, and let X1 n be a set of n i.i.d. samples from the model specified by θ. Suppose that the regularization parameter λ n is selected to satisfy λ n 16(2 α) log p α n (10) Then there exist positive constants L and K, independent of (n, p, d), such that if n > Ld 3 log p (11) 15/29

16 Theoretical results Primal-dual witness for graph recovery Analysis under sample Fisher matrix assumptions Extensions to general discrete MRF. Theorem Then the following properties hold with probability at least 1 2exp( kλ 2 nn) (a) For each node r V, the l 1 -regularized logistic regression (5), given data X1 n, has a unique solution, and so uniquely specifies a signed neighborhood ˆN ± (r). (b) For each r V, the estimated signed neighborhood ˆN ± (r) correctly excludes all edges not in the true neighborhood. Moreover, it correctly includes all edges (r, t) for which θrt 10 C dλn min. 16/29

17 Corollary 1 Introduction Theoretical results Primal-dual witness for graph recovery Analysis under sample Fisher matrix assumptions Extensions to general discrete MRF. Corollary 1 Consider a sequence of Ising models satisfy conditions (A1) and (A2). For each n, Suppose that (n, p(n), d(n)) satisfies the scaling condition (11). Suppose that {λ n } of regularization parameters satisfies condition (10) and λ 2 nn and the minimum parameter weights satisfy min θ 10 (r,t) En (n,p,d)(r, t) dλn (12) C min for sufficiently large n. Then the method is model selection consistent so that if Êp(n) is the graph structure estimated by the method, then P[Ê p(n) = E p(n) ] 1 17/29

18 Theoretical results Primal-dual witness for graph recovery Analysis under sample Fisher matrix assumptions Extensions to general discrete MRF. The analysis required to prove Theorem 1 can be divided naturally into two parts. If A(1) and A(2) hold for sample Fisher information matrix Q n := Ê[η(X ; θ )X \r X T \r ] = 1 n n i=1 η(x (i) ; θ )x (i) (i) \r (x\r )T (13) Then the condition in Theorem 1 are sufficient to ensure the graph is recovered with high probability. Under the scaling condition, imposing the assumptions on Q guarantees(with high probabillity) that analogous conditions hold for Q n 18/29

19 Theoretical results Primal-dual witness for graph recovery Analysis under sample Fisher matrix assumptions Extensions to general discrete MRF. Our proof involves the explicit construction of an optimal primal-dual pair. The zero sub-gradient optimality conditions take the form l(ˆθ) + λ n ẑ = 0 (14) where the dual vector ẑ R p 1 must satisfy the properties ẑ rt = sign(ˆθ rt ) if ˆθ i 0 and ẑ rt 1 otherwise. (15) a pair (ˆθ, ẑ) is a primal dual optimal solution to the convex program and its dual iff (14) and (15) are satisfied. Such an optimal primal-dual pair correctly species the signed neighborhood of node r iff sign(ẑ rt ) = sign(θ rt) (r, t) S := {(r, t) E} and (16) ˆθ ru = 0 for all(r, u) S c := E\S (17) 19/29

20 Lemma 1 Introduction Theoretical results Primal-dual witness for graph recovery Analysis under sample Fisher matrix assumptions Extensions to general discrete MRF. Shared sparsity among optimal solutions and uniqueness of the optimal solution. Lemma Suppose that there exists an optimal primal solution ˆθ with associated optimal dual vector ẑ such that ẑ S c < 1. Then any optimal primal solution θ must have θ S c = 0. Moreover, if the Hessian sub-matrix [ 2 l(ˆθ)] SS is strictly positive definite, then ˆθ is the unique optimal solution. 20/29

21 Theoretical results Primal-dual witness for graph recovery Analysis under sample Fisher matrix assumptions Extensions to general discrete MRF. model selection consistency when assumptions are imposed directly on Q n Define the good event : M(X n 1 ) := {X n 1 { 1, +1}n p Q n satisfies (A1) and (A2)}. Proposition 1 (Fixed design). If the event M(X 1 n) holds, n > Ld 2 log(p), and λ n 16(2 α) log p α n, then with probability at least 1 2exp(Kλ 2 n) 1, the following properties hold. (a) For each node r V, the l 1 -regularized logistic regression has a unique solution, and so uniquely specifies ˆN ± (r). (b) For each r V, the estimated signed neighborhood vector ˆN ± (r) correctly excludes all edges not in the true neighborhood. Moreover, it correctly includes all edges with θ rt 10 C dλn in. 21/29

22 Theoretical results Primal-dual witness for graph recovery Analysis under sample Fisher matrix assumptions Extensions to general discrete MRF. Extensions to general discrete Markov random fields. Each variable X i taking values in X = {1, 2,, m} { 1, if xs = j, Define I [x s = j] =. Similarly, define pairwise 0, otherwise. indicator functions I [x s = j, x t = k] Define a matrix Θ \r R(m 1)2 (p 1), where column u is given by the vector θ ru. Solve the following convex program min {l(θ \r ; X1 n ) + λ n Θ \r 2,1} (18) Θ \r R (m 1)2 (p 1) 22/29

23 Three different classes of graphs four-nearest neighbor lattices eight-nearest neighbor lattices star-shaped graphs 23/29

24 Generate random data sets {x (1),, x (n) } by gibbs sampling for the lattice models, and by exact sampling for the star graph. Examine the performance of models with mixed couplings θ st = ±ω or with positive couplings θ st = ω λ = log pn Perform simulations for sample sizes n = 10βd log p, where the control parameter β ranged from 0.1 to 3, depending on the graph type. 24/29

25 Results for 4-nearest-neighbor grid model Three different graph size p {64, 100, 225} Each curve corresponds to the success probability versus the control parameter β. Each point corresponds to the average of N = 200 trials. 25/29

26 Results for 8-nearest-neighbor grid model 26/29

27 Results for star-shaped graphs Construct star-shaped graphs with p vertices: one node as the hub and connecting it to d < (p 1) of its neighbors. For linear sparsity, we choose d = 0.1p ; for logarithmic sparsity we choose d = log(p). A triple of graph sizes p {64, 100, 225}. 27/29

28 Conclusion. Introduction A technique based on l 1 -regularized logistic regression can be used to perform consistent model selection in binary Ising graphical models. Our analysis applies to the high-dimensional setting, in which both the number of nodes p and maximum neighborhood sizes d are allowed to grow as a function of the number of observations n. Simulation results show the accuracy of these theoretical predictions.. The methods of this paper can be extended to general discrete graphical models with a higher number of states. 28/29

29 Future Work Introduction Our experimental results are consistent with the conjecture that logistic regression procedure fails with high probability for sample sizes n that are smaller than O(d log p). It would be interesting to prove such a converse result. The case of samples drawn in a non-i.i.d. manner from some unknown Markov random field; we suspect that similar results would hold for weakly dependent sampling schemes. 29/29

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

Introduction to graphical models: Lecture III

Introduction to graphical models: Lecture III Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

High dimensional Ising model selection

High dimensional Ising model selection High dimensional Ising model selection Pradeep Ravikumar UT Austin (based on work with John Lafferty, Martin Wainwright) Sparse Ising model US Senate 109th Congress Banerjee et al, 2008 Estimate a sparse

More information

Sparse Graph Learning via Markov Random Fields

Sparse Graph Learning via Markov Random Fields Sparse Graph Learning via Markov Random Fields Xin Sui, Shao Tang Sep 23, 2016 Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, 2016 1 / 36 Outline 1 Introduction to graph learning

More information

11 : Gaussian Graphic Models and Ising Models

11 : Gaussian Graphic Models and Ising Models 10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood

More information

Learning discrete graphical models via generalized inverse covariance matrices

Learning discrete graphical models via generalized inverse covariance matrices Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04

More information

Probabilistic Graphical Models Lecture Notes Fall 2009

Probabilistic Graphical Models Lecture Notes Fall 2009 Probabilistic Graphical Models Lecture Notes Fall 2009 October 28, 2009 Byoung-Tak Zhang School of omputer Science and Engineering & ognitive Science, Brain Science, and Bioinformatics Seoul National University

More information

Graphical Model Selection

Graphical Model Selection May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction

More information

The annals of Statistics (2006)

The annals of Statistics (2006) High dimensional graphs and variable selection with the Lasso Nicolai Meinshausen and Peter Buhlmann The annals of Statistics (2006) presented by Jee Young Moon Feb. 19. 2010 High dimensional graphs and

More information

CONSTRAINED PERCOLATION ON Z 2

CONSTRAINED PERCOLATION ON Z 2 CONSTRAINED PERCOLATION ON Z 2 ZHONGYANG LI Abstract. We study a constrained percolation process on Z 2, and prove the almost sure nonexistence of infinite clusters and contours for a large class of probability

More information

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013 School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Junming Yin Lecture 15, March 4, 2013 Reading: W & J Book Chapters 1 Roadmap Two

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

3 : Representation of Undirected GM

3 : Representation of Undirected GM 10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3

More information

CSC 412 (Lecture 4): Undirected Graphical Models

CSC 412 (Lecture 4): Undirected Graphical Models CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:

More information

Undirected Graphical Models: Markov Random Fields

Undirected Graphical Models: Markov Random Fields Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected

More information

Undirected Graphical Models

Undirected Graphical Models Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 5, 06 Reading: See class website Eric Xing @ CMU, 005-06

More information

Independencies. Undirected Graphical Models 2: Independencies. Independencies (Markov networks) Independencies (Bayesian Networks)

Independencies. Undirected Graphical Models 2: Independencies. Independencies (Markov networks) Independencies (Bayesian Networks) (Bayesian Networks) Undirected Graphical Models 2: Use d-separation to read off independencies in a Bayesian network Takes a bit of effort! 1 2 (Markov networks) Use separation to determine independencies

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference III: Variational Principle I Junming Yin Lecture 16, March 19, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3 X 3 Reading: X 4

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Markov Random Fields: Representation Conditional Random Fields Log-Linear Models Readings: KF

More information

A new class of upper bounds on the log partition function

A new class of upper bounds on the log partition function Accepted to IEEE Transactions on Information Theory This version: March 25 A new class of upper bounds on the log partition function 1 M. J. Wainwright, T. S. Jaakkola and A. S. Willsky Abstract We introduce

More information

On the optimality of tree-reweighted max-product message-passing

On the optimality of tree-reweighted max-product message-passing Appeared in Uncertainty on Artificial Intelligence, July 2005, Edinburgh, Scotland. On the optimality of tree-reweighted max-product message-passing Vladimir Kolmogorov Microsoft Research Cambridge, UK

More information

Parametrizations of Discrete Graphical Models

Parametrizations of Discrete Graphical Models Parametrizations of Discrete Graphical Models Robin J. Evans www.stat.washington.edu/ rje42 10th August 2011 1/34 Outline 1 Introduction Graphical Models Acyclic Directed Mixed Graphs Two Problems 2 Ingenuous

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

Joint Gaussian Graphical Model Review Series I

Joint Gaussian Graphical Model Review Series I Joint Gaussian Graphical Model Review Series I Probability Foundations Beilun Wang Advisor: Yanjun Qi 1 Department of Computer Science, University of Virginia http://jointggm.org/ June 23rd, 2017 Beilun

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

Exact MAP estimates by (hyper)tree agreement

Exact MAP estimates by (hyper)tree agreement Exact MAP estimates by (hyper)tree agreement Martin J. Wainwright, Department of EECS, UC Berkeley, Berkeley, CA 94720 martinw@eecs.berkeley.edu Tommi S. Jaakkola and Alan S. Willsky, Department of EECS,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Combinatorial Criteria for Uniqueness of Gibbs Measures

Combinatorial Criteria for Uniqueness of Gibbs Measures Combinatorial Criteria for Uniqueness of Gibbs Measures Dror Weitz School of Mathematics, Institute for Advanced Study, Princeton, NJ 08540, U.S.A. dror@ias.edu April 12, 2005 Abstract We generalize previously

More information

Lecture 4 October 18th

Lecture 4 October 18th Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Does Better Inference mean Better Learning?

Does Better Inference mean Better Learning? Does Better Inference mean Better Learning? Andrew E. Gelfand, Rina Dechter & Alexander Ihler Department of Computer Science University of California, Irvine {agelfand,dechter,ihler}@ics.uci.edu Abstract

More information

Learning Graphs from Data: A Signal Representation Perspective

Learning Graphs from Data: A Signal Representation Perspective 1 Learning Graphs from Data: A Signal Representation Perspective Xiaowen Dong*, Dorina Thanou*, Michael Rabbat, and Pascal Frossard arxiv:1806.00848v1 [cs.lg] 3 Jun 2018 The construction of a meaningful

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning and Inference in DAGs We discussed learning in DAG models, log p(x W ) = n d log p(x i j x i pa(j),

More information

Alternative Parameterizations of Markov Networks. Sargur Srihari

Alternative Parameterizations of Markov Networks. Sargur Srihari Alternative Parameterizations of Markov Networks Sargur srihari@cedar.buffalo.edu 1 Topics Three types of parameterization 1. Gibbs Parameterization 2. Factor Graphs 3. Log-linear Models with Energy functions

More information

Online Learning and Sequential Decision Making

Online Learning and Sequential Decision Making Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning

More information

Markov Random Fields

Markov Random Fields Markov Random Fields Umamahesh Srinivas ipal Group Meeting February 25, 2011 Outline 1 Basic graph-theoretic concepts 2 Markov chain 3 Markov random field (MRF) 4 Gauss-Markov random field (GMRF), and

More information

Undirected graphical models

Undirected graphical models Undirected graphical models Semantics of probabilistic models over undirected graphs Parameters of undirected models Example applications COMP-652 and ECSE-608, February 16, 2017 1 Undirected graphical

More information

Lecture 1 October 9, 2013

Lecture 1 October 9, 2013 Probabilistic Graphical Models Fall 2013 Lecture 1 October 9, 2013 Lecturer: Guillaume Obozinski Scribe: Huu Dien Khue Le, Robin Bénesse The web page of the course: http://www.di.ens.fr/~fbach/courses/fall2013/

More information

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling Professor Erik Sudderth Brown University Computer Science October 27, 2016 Some figures and materials courtesy

More information

Lecture 3 January 28

Lecture 3 January 28 EECS 28B / STAT 24B: Advanced Topics in Statistical LearningSpring 2009 Lecture 3 January 28 Lecturer: Pradeep Ravikumar Scribe: Timothy J. Wheeler Note: These lecture notes are still rough, and have only

More information

Which graphical models are difficult to learn?

Which graphical models are difficult to learn? Which graphical models are difficult to learn? Jose Bento Department of Electrical Engineering Stanford University jbento@stanford.edu Andrea Montanari Department of Electrical Engineering and Department

More information

1 Regression with High Dimensional Data

1 Regression with High Dimensional Data 6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

Information-Theoretic Limits of Selecting Binary Graphical Models in High Dimensions

Information-Theoretic Limits of Selecting Binary Graphical Models in High Dimensions IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, NO. 7, JULY 2012 4117 Information-Theoretic Limits of Selecting Binary Graphical Models in High Dimensions Narayana P. Santhanam, Member, IEEE, and Martin

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth

More information

Maximum likelihood estimation

Maximum likelihood estimation Maximum likelihood estimation Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Maximum likelihood estimation 1/26 Outline 1 Statistical concepts 2 A short review of convex analysis and optimization

More information

Topic 12 Overview of Estimation

Topic 12 Overview of Estimation Topic 12 Overview of Estimation Classical Statistics 1 / 9 Outline Introduction Parameter Estimation Classical Statistics Densities and Likelihoods 2 / 9 Introduction In the simplest possible terms, the

More information

Lecture Introduction. 2 Brief Recap of Lecture 10. CS-621 Theory Gems October 24, 2012

Lecture Introduction. 2 Brief Recap of Lecture 10. CS-621 Theory Gems October 24, 2012 CS-62 Theory Gems October 24, 202 Lecture Lecturer: Aleksander Mądry Scribes: Carsten Moldenhauer and Robin Scheibler Introduction In Lecture 0, we introduced a fundamental object of spectral graph theory:

More information

Semidefinite relaxations for approximate inference on graphs with cycles

Semidefinite relaxations for approximate inference on graphs with cycles Semidefinite relaxations for approximate inference on graphs with cycles Martin J. Wainwright Electrical Engineering and Computer Science UC Berkeley, Berkeley, CA 94720 wainwrig@eecs.berkeley.edu Michael

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions

More information

Exponential families also behave nicely under conditioning. Specifically, suppose we write η = (η 1, η 2 ) R k R p k so that

Exponential families also behave nicely under conditioning. Specifically, suppose we write η = (η 1, η 2 ) R k R p k so that 1 More examples 1.1 Exponential families under conditioning Exponential families also behave nicely under conditioning. Specifically, suppose we write η = η 1, η 2 R k R p k so that dp η dm 0 = e ηt 1

More information

Simultaneous Support Recovery in High Dimensions: Benefits and Perils of Block `1=` -Regularization

Simultaneous Support Recovery in High Dimensions: Benefits and Perils of Block `1=` -Regularization IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 6, JUNE 2011 3841 Simultaneous Support Recovery in High Dimensions: Benefits and Perils of Block `1=` -Regularization Sahand N. Negahban and Martin

More information

Rapid Introduction to Machine Learning/ Deep Learning

Rapid Introduction to Machine Learning/ Deep Learning Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/24 Lecture 5b Markov random field (MRF) November 13, 2015 2/24 Table of contents 1 1. Objectives of Lecture

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)

1 Undirected Graphical Models. 2 Markov Random Fields (MRFs) Machine Learning (ML, F16) Lecture#07 (Thursday Nov. 3rd) Lecturer: Byron Boots Undirected Graphical Models 1 Undirected Graphical Models In the previous lecture, we discussed directed graphical models.

More information

Hard-Core Model on Random Graphs

Hard-Core Model on Random Graphs Hard-Core Model on Random Graphs Antar Bandyopadhyay Theoretical Statistics and Mathematics Unit Seminar Theoretical Statistics and Mathematics Unit Indian Statistical Institute, New Delhi Centre New Delhi,

More information

Graphical Model Inference with Perfect Graphs

Graphical Model Inference with Perfect Graphs Graphical Model Inference with Perfect Graphs Tony Jebara Columbia University July 25, 2013 joint work with Adrian Weller Graphical models and Markov random fields We depict a graphical model G as a bipartite

More information

Combinatorial Criteria for Uniqueness of Gibbs Measures

Combinatorial Criteria for Uniqueness of Gibbs Measures Combinatorial Criteria for Uniqueness of Gibbs Measures Dror Weitz* School of Mathematics, Institute for Advanced Study, Princeton, New Jersey 08540; e-mail: dror@ias.edu Received 17 November 2003; accepted

More information

Tightness of LP Relaxations for Almost Balanced Models

Tightness of LP Relaxations for Almost Balanced Models Tightness of LP Relaxations for Almost Balanced Models Adrian Weller University of Cambridge AISTATS May 10, 2016 Joint work with Mark Rowland and David Sontag For more information, see http://mlg.eng.cam.ac.uk/adrian/

More information

Differentially Private Feature Selection via Stability Arguments, and the Robustness of the Lasso

Differentially Private Feature Selection via Stability Arguments, and the Robustness of the Lasso Differentially Private Feature Selection via Stability Arguments, and the Robustness of the Lasso Adam Smith asmith@cse.psu.edu Pennsylvania State University Abhradeep Thakurta azg161@cse.psu.edu Pennsylvania

More information

Stochastic Proximal Gradient Algorithm

Stochastic Proximal Gradient Algorithm Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind

More information

Nearest Neighbor Pattern Classification

Nearest Neighbor Pattern Classification Nearest Neighbor Pattern Classification T. M. Cover and P. E. Hart May 15, 2018 1 The Intro The nearest neighbor algorithm/rule (NN) is the simplest nonparametric decisions procedure, that assigns to unclassified

More information

Lecture 1: Bayesian Framework Basics

Lecture 1: Bayesian Framework Basics Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of

More information

Learning Quadratic Variance Function (QVF) DAG Models via OverDispersion Scoring (ODS)

Learning Quadratic Variance Function (QVF) DAG Models via OverDispersion Scoring (ODS) Journal of Machine Learning Research 18 2018 1-44 Submitted 4/17; Revised 12/17; Published 4/18 Learning Quadratic Variance Function QVF DAG Models via OverDispersion Scoring ODS Gunwoong Park Department

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 10 Undirected Models CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due this Wednesday (Nov 4) in class Project milestones due next Monday (Nov 9) About half

More information

MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches

MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches Martin Wainwright Tommi Jaakkola Alan Willsky martinw@eecs.berkeley.edu tommi@ai.mit.edu willsky@mit.edu

More information

Convergence Analysis of Reweighted Sum-Product Algorithms

Convergence Analysis of Reweighted Sum-Product Algorithms IEEE TRANSACTIONS ON SIGNAL PROCESSING, PAPER IDENTIFICATION NUMBER T-SP-06090-2007.R1 1 Convergence Analysis of Reweighted Sum-Product Algorithms Tanya Roosta, Student Member, IEEE, Martin J. Wainwright,

More information

(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ )

(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ ) Setting RHS to be zero, 0= (θ )+ 2 L(θ ) (θ θ ), θ θ = 2 L(θ ) 1 (θ )= H θθ (θ ) 1 d θ (θ ) O =0 θ 1 θ 3 θ 2 θ Figure 1: The Newton-Raphson Algorithm where H is the Hessian matrix, d θ is the derivative

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision

More information

Intelligent Systems:

Intelligent Systems: Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition

More information

Conditional Independence and Factorization

Conditional Independence and Factorization Conditional Independence and Factorization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Identifiability assumptions for directed graphical models with feedback

Identifiability assumptions for directed graphical models with feedback Biometrika, pp. 1 26 C 212 Biometrika Trust Printed in Great Britain Identifiability assumptions for directed graphical models with feedback BY GUNWOONG PARK Department of Statistics, University of Wisconsin-Madison,

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials)

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials) Markov Networks l Like Bayes Nets l Graphical model that describes joint probability distribution using tables (AKA potentials) l Nodes are random variables l Labels are outcomes over the variables Markov

More information

Improved Greedy Algorithms for Learning Graphical Models

Improved Greedy Algorithms for Learning Graphical Models Improved Greedy Algorithms for Learning Graphical Models Avik Ray, Sujay Sanghavi Member, IEEE and Sanjay Shakkottai Fellow, IEEE Abstract We propose new greedy algorithms for learning the structure of

More information

Divide-and-combine Strategies in Statistical Modeling for Massive Data

Divide-and-combine Strategies in Statistical Modeling for Massive Data Divide-and-combine Strategies in Statistical Modeling for Massive Data Liqun Yu Washington University in St. Louis March 30, 2017 Liqun Yu (WUSTL) D&C Statistical Modeling for Massive Data March 30, 2017

More information

Learning Parameters of Undirected Models. Sargur Srihari

Learning Parameters of Undirected Models. Sargur Srihari Learning Parameters of Undirected Models Sargur srihari@cedar.buffalo.edu 1 Topics Difficulties due to Global Normalization Likelihood Function Maximum Likelihood Parameter Estimation Simple and Conjugate

More information

STAT 730 Chapter 4: Estimation

STAT 730 Chapter 4: Estimation STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum

More information

Review: Directed Models (Bayes Nets)

Review: Directed Models (Bayes Nets) X Review: Directed Models (Bayes Nets) Lecture 3: Undirected Graphical Models Sam Roweis January 2, 24 Semantics: x y z if z d-separates x and y d-separation: z d-separates x from y if along every undirected

More information

3 Undirected Graphical Models

3 Undirected Graphical Models Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 3 Undirected Graphical Models In this lecture, we discuss undirected

More information

MAP Examples. Sargur Srihari

MAP Examples. Sargur Srihari MAP Examples Sargur srihari@cedar.buffalo.edu 1 Potts Model CRF for OCR Topics Image segmentation based on energy minimization 2 Examples of MAP Many interesting examples of MAP inference are instances

More information

Alternative Parameterizations of Markov Networks. Sargur Srihari

Alternative Parameterizations of Markov Networks. Sargur Srihari Alternative Parameterizations of Markov Networks Sargur srihari@cedar.buffalo.edu 1 Topics Three types of parameterization 1. Gibbs Parameterization 2. Factor Graphs 3. Log-linear Models Features (Ising,

More information

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. 1. Prediction with expert advice. 2. With perfect

More information

Markov Random Fields

Markov Random Fields Markov Random Fields 1. Markov property The Markov property of a stochastic sequence {X n } n 0 implies that for all n 1, X n is independent of (X k : k / {n 1, n, n + 1}), given (X n 1, X n+1 ). Another

More information

Representation of undirected GM. Kayhan Batmanghelich

Representation of undirected GM. Kayhan Batmanghelich Representation of undirected GM Kayhan Batmanghelich Review Review: Directed Graphical Model Represent distribution of the form ny p(x 1,,X n = p(x i (X i i=1 Factorizes in terms of local conditional probabilities

More information