High dimensional ising model selection using l 1 -regularized logistic regression
|
|
- Victor Johnson
- 5 years ago
- Views:
Transcription
1 High dimensional ising model selection using l 1 -regularized logistic regression 1 Department of Statistics Pennsylvania State University 597 Presentation /29
2 Outline Introduction 1 Introduction Background and Motivation Problem Formulation 2 Theoretical results Primal-dual witness for graph recovery Analysis under sample Fisher matrix assumptions Extensions to general discrete MRF /29
3 Background Introduction Background and Motivation Problem Formulation Markov random fields (MRF)/Undirected graphical model. MRF: undirected graph G = (V, E) with vertex set V = {1, 2,, p} and edge set E V V Structure of the graph: encodes certain conditional independence assumptions among subsets of X = (X 1, X p ), where each X s is associated with a vertex s V. One important problem: Estimate the underlying graph from n i.i.d samples {x (1), x (2),, x (n) } drawn from the distribution specified by the MRF. 3/29
4 Previous work on MRF Background and Motivation Problem Formulation Constraint-based approaches: estimates conditional independence from the data using hypothesis testing. Score-based approaches: combine a metric for the complexity of the graph with a measure of the goodness of fit, e.g., log-likelihood of the maximum likelihood parameters. Drawbacks: Calculation for the partition function associated with the MRF is computationally intractable. 4/29
5 Basic approach Introduction Background and Motivation Problem Formulation Basic approach: performing sequential l 1 -regularized logistic regression. Then using the sparsity pattern of the regression vector to infer the underlying neighborhood structure. sample size: n; model dimension p; maximum neighborhood size d. Both p and d may tend to infinity as a function of the size n. Focus on binary Ising model and generalize the method to general discrete MRF. 5/29
6 Background Introduction Background and Motivation Problem Formulation Pairwise Markov random fields. X = (X 1, X 2, X p ) is a random vector. G = (V, E) is an undirected graph with vertex set V = {1,, p} and edge set E. The pairwise MRF is the family of distributions of X : P(x) exp{ φ st (x s, x t )} (s,t) E Ising model. We focus on the special case of the Ising model in which X s { 1, 1} for each vertex s V, and φ st (x s, x t ) = θstx s x t, θst R, thus P(x) = 1 Z(θ ) exp θ stx s x t. (1) (s,t) E 6/29
7 Graphical model selection Background and Motivation Problem Formulation Goal. X n 1 := {x (1),, x (n) } of n samples where x (i) { 1, +1} p is drawn i.i.d. from a distribution P θ The goal of graphical model selection is to infer the edge set E(the same as estimate θ st). Stronger criterion of signed edge recovery. Given a graphical model with parameter θ, define the edge sign vector E = { sign(θ st ), if (s, t) E 0, otherwise. (2) 7/29
8 Motivation Introduction Background and Motivation Problem Formulation Classical notion of statistical consistency: limiting behavior of the estimator as n and p fixed. Analysis in this paper is of high-dimensional nature. As n, p = p(n), d = d(n) also scale as a function of n. Our goal is to establish sufficient conditions on the scaling of the triple (n, p, d) such that proposed estimator is consistent: P[Ê n = E ] 1, as n + 8/29
9 Background and Motivation Problem Formulation Neighborhood-based logistic regression. Recovering E is equivalent to recovering signed neighborhood for each r V N ± (r) := {sign(θ rt)t t N (r)} (3) where N (r) is the neighborhood set of r, i.e., N(r) := {t V (r, t) E} N ± (r) can be recovered from the sign-sparsity pattern: θ\r := {θ ru, u V \r}. Derive the conditional distribution of X r given all others X \r (4), P θ (x r x \r ) = exp(2x r t V \r θ rtx t ) (4) exp(2x r t V \r θ rtx t ) + 1 9/29
10 Background and Motivation Problem Formulation Neighborhood-based logistic regression. The conditional distribution suggests that X r can be viewed as the response variable in a logistic regression in which X \r are covariates. Estimation procedure: computing an l 1 -regularized logistic regression of X r on X \r. Given a sample X1 n, the regularized regression problem is a convex program of the form 1 n min f (θ; x (i) ) θ ru ˆµ ru + λ n θ θ \r R p 1 n \r 1 (5) i=1 t V \r u V \r where f (θ; x) := log exp( θ rt x t ) + exp( θ rt x t ) (6) t V \r 10/29
11 Background and Motivation Problem Formulation By Lagrangian duality, the problem (5) can be re-cast as a constrained problem over the ball θ \r 1 C(λn). After obtaining the minimizer ˆθ \r n, estimate the signed neighborhood N ± (r) according to ˆN ± (r) := {sign(ˆθ ru )u u V \r, ˆθ su 0}. (7) The full graph G is estimated consistently {Ên = E }, if ˆN ± (r) = N ± (r) for all r V. 11/29
12 Theoretical results Primal-dual witness for graph recovery Analysis under sample Fisher matrix assumptions Extensions to general discrete MRF. Our analysis proceeds by First, establish sufficient conditions for correct signed neighborhood recovery for some fixed node r V. Second, we use a union bound over all p nodes of the graph to conclude that consistent graph selection is also achieved. 12/29
13 Assumptions Introduction Theoretical results Primal-dual witness for graph recovery Analysis under sample Fisher matrix assumptions Extensions to general discrete MRF. Next, state the main result on the performance of neighborhood logistic regression for graphical model selection. Definitions Population Fisher information matrix (Hessian of the likelihood function). Q r := E θ { 2 logp θ [X r X \r ].} S := {(r, t) t N (r)}, Q SS denotes the d d sub-matrix of Q θmin = min (r,t) E θrt determines the limits of model selection. 13/29
14 Assumptions Introduction Theoretical results Primal-dual witness for graph recovery Analysis under sample Fisher matrix assumptions Extensions to general discrete MRF. (A1) Dependency condition Λ min (Q SS ) C min Λ max (E θ [X \r X T \r ]) D max (8) (A2) Incoherence condition Q S c S (Q SS ) 1 1 α (9) 14/29
15 Theorem 1 Introduction Theoretical results Primal-dual witness for graph recovery Analysis under sample Fisher matrix assumptions Extensions to general discrete MRF. Theorem 1 Consider an Ising graphical model with parameter vector θ and associated edge set E such that conditions (A1) and (A2) are satisfied by Q, and let X1 n be a set of n i.i.d. samples from the model specified by θ. Suppose that the regularization parameter λ n is selected to satisfy λ n 16(2 α) log p α n (10) Then there exist positive constants L and K, independent of (n, p, d), such that if n > Ld 3 log p (11) 15/29
16 Theoretical results Primal-dual witness for graph recovery Analysis under sample Fisher matrix assumptions Extensions to general discrete MRF. Theorem Then the following properties hold with probability at least 1 2exp( kλ 2 nn) (a) For each node r V, the l 1 -regularized logistic regression (5), given data X1 n, has a unique solution, and so uniquely specifies a signed neighborhood ˆN ± (r). (b) For each r V, the estimated signed neighborhood ˆN ± (r) correctly excludes all edges not in the true neighborhood. Moreover, it correctly includes all edges (r, t) for which θrt 10 C dλn min. 16/29
17 Corollary 1 Introduction Theoretical results Primal-dual witness for graph recovery Analysis under sample Fisher matrix assumptions Extensions to general discrete MRF. Corollary 1 Consider a sequence of Ising models satisfy conditions (A1) and (A2). For each n, Suppose that (n, p(n), d(n)) satisfies the scaling condition (11). Suppose that {λ n } of regularization parameters satisfies condition (10) and λ 2 nn and the minimum parameter weights satisfy min θ 10 (r,t) En (n,p,d)(r, t) dλn (12) C min for sufficiently large n. Then the method is model selection consistent so that if Êp(n) is the graph structure estimated by the method, then P[Ê p(n) = E p(n) ] 1 17/29
18 Theoretical results Primal-dual witness for graph recovery Analysis under sample Fisher matrix assumptions Extensions to general discrete MRF. The analysis required to prove Theorem 1 can be divided naturally into two parts. If A(1) and A(2) hold for sample Fisher information matrix Q n := Ê[η(X ; θ )X \r X T \r ] = 1 n n i=1 η(x (i) ; θ )x (i) (i) \r (x\r )T (13) Then the condition in Theorem 1 are sufficient to ensure the graph is recovered with high probability. Under the scaling condition, imposing the assumptions on Q guarantees(with high probabillity) that analogous conditions hold for Q n 18/29
19 Theoretical results Primal-dual witness for graph recovery Analysis under sample Fisher matrix assumptions Extensions to general discrete MRF. Our proof involves the explicit construction of an optimal primal-dual pair. The zero sub-gradient optimality conditions take the form l(ˆθ) + λ n ẑ = 0 (14) where the dual vector ẑ R p 1 must satisfy the properties ẑ rt = sign(ˆθ rt ) if ˆθ i 0 and ẑ rt 1 otherwise. (15) a pair (ˆθ, ẑ) is a primal dual optimal solution to the convex program and its dual iff (14) and (15) are satisfied. Such an optimal primal-dual pair correctly species the signed neighborhood of node r iff sign(ẑ rt ) = sign(θ rt) (r, t) S := {(r, t) E} and (16) ˆθ ru = 0 for all(r, u) S c := E\S (17) 19/29
20 Lemma 1 Introduction Theoretical results Primal-dual witness for graph recovery Analysis under sample Fisher matrix assumptions Extensions to general discrete MRF. Shared sparsity among optimal solutions and uniqueness of the optimal solution. Lemma Suppose that there exists an optimal primal solution ˆθ with associated optimal dual vector ẑ such that ẑ S c < 1. Then any optimal primal solution θ must have θ S c = 0. Moreover, if the Hessian sub-matrix [ 2 l(ˆθ)] SS is strictly positive definite, then ˆθ is the unique optimal solution. 20/29
21 Theoretical results Primal-dual witness for graph recovery Analysis under sample Fisher matrix assumptions Extensions to general discrete MRF. model selection consistency when assumptions are imposed directly on Q n Define the good event : M(X n 1 ) := {X n 1 { 1, +1}n p Q n satisfies (A1) and (A2)}. Proposition 1 (Fixed design). If the event M(X 1 n) holds, n > Ld 2 log(p), and λ n 16(2 α) log p α n, then with probability at least 1 2exp(Kλ 2 n) 1, the following properties hold. (a) For each node r V, the l 1 -regularized logistic regression has a unique solution, and so uniquely specifies ˆN ± (r). (b) For each r V, the estimated signed neighborhood vector ˆN ± (r) correctly excludes all edges not in the true neighborhood. Moreover, it correctly includes all edges with θ rt 10 C dλn in. 21/29
22 Theoretical results Primal-dual witness for graph recovery Analysis under sample Fisher matrix assumptions Extensions to general discrete MRF. Extensions to general discrete Markov random fields. Each variable X i taking values in X = {1, 2,, m} { 1, if xs = j, Define I [x s = j] =. Similarly, define pairwise 0, otherwise. indicator functions I [x s = j, x t = k] Define a matrix Θ \r R(m 1)2 (p 1), where column u is given by the vector θ ru. Solve the following convex program min {l(θ \r ; X1 n ) + λ n Θ \r 2,1} (18) Θ \r R (m 1)2 (p 1) 22/29
23 Three different classes of graphs four-nearest neighbor lattices eight-nearest neighbor lattices star-shaped graphs 23/29
24 Generate random data sets {x (1),, x (n) } by gibbs sampling for the lattice models, and by exact sampling for the star graph. Examine the performance of models with mixed couplings θ st = ±ω or with positive couplings θ st = ω λ = log pn Perform simulations for sample sizes n = 10βd log p, where the control parameter β ranged from 0.1 to 3, depending on the graph type. 24/29
25 Results for 4-nearest-neighbor grid model Three different graph size p {64, 100, 225} Each curve corresponds to the success probability versus the control parameter β. Each point corresponds to the average of N = 200 trials. 25/29
26 Results for 8-nearest-neighbor grid model 26/29
27 Results for star-shaped graphs Construct star-shaped graphs with p vertices: one node as the hub and connecting it to d < (p 1) of its neighbors. For linear sparsity, we choose d = 0.1p ; for logarithmic sparsity we choose d = log(p). A triple of graph sizes p {64, 100, 225}. 27/29
28 Conclusion. Introduction A technique based on l 1 -regularized logistic regression can be used to perform consistent model selection in binary Ising graphical models. Our analysis applies to the high-dimensional setting, in which both the number of nodes p and maximum neighborhood sizes d are allowed to grow as a function of the number of observations n. Simulation results show the accuracy of these theoretical predictions.. The methods of this paper can be extended to general discrete graphical models with a higher number of states. 28/29
29 Future Work Introduction Our experimental results are consistent with the conjecture that logistic regression procedure fails with high probability for sample sizes n that are smaller than O(d log p). It would be interesting to prove such a converse result. The case of samples drawn in a non-i.i.d. manner from some unknown Markov random field; we suspect that similar results would hold for weakly dependent sampling schemes. 29/29
High-dimensional graphical model selection: Practical and information-theoretic limits
1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John
More informationIntroduction to graphical models: Lecture III
Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction
More informationHigh-dimensional graphical model selection: Practical and information-theoretic limits
1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John
More informationHigh dimensional Ising model selection
High dimensional Ising model selection Pradeep Ravikumar UT Austin (based on work with John Lafferty, Martin Wainwright) Sparse Ising model US Senate 109th Congress Banerjee et al, 2008 Estimate a sparse
More informationSparse Graph Learning via Markov Random Fields
Sparse Graph Learning via Markov Random Fields Xin Sui, Shao Tang Sep 23, 2016 Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, 2016 1 / 36 Outline 1 Introduction to graph learning
More information11 : Gaussian Graphic Models and Ising Models
10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood
More informationLearning discrete graphical models via generalized inverse covariance matrices
Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04
More informationProbabilistic Graphical Models Lecture Notes Fall 2009
Probabilistic Graphical Models Lecture Notes Fall 2009 October 28, 2009 Byoung-Tak Zhang School of omputer Science and Engineering & ognitive Science, Brain Science, and Bioinformatics Seoul National University
More informationGraphical Model Selection
May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor
More information14 : Theory of Variational Inference: Inner and Outer Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction
More informationThe annals of Statistics (2006)
High dimensional graphs and variable selection with the Lasso Nicolai Meinshausen and Peter Buhlmann The annals of Statistics (2006) presented by Jee Young Moon Feb. 19. 2010 High dimensional graphs and
More informationCONSTRAINED PERCOLATION ON Z 2
CONSTRAINED PERCOLATION ON Z 2 ZHONGYANG LI Abstract. We study a constrained percolation process on Z 2, and prove the almost sure nonexistence of infinite clusters and contours for a large class of probability
More informationProbabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013
School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Junming Yin Lecture 15, March 4, 2013 Reading: W & J Book Chapters 1 Roadmap Two
More informationChapter 17: Undirected Graphical Models
Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)
More information3 : Representation of Undirected GM
10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3
More informationCSC 412 (Lecture 4): Undirected Graphical Models
CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:
More informationUndirected Graphical Models: Markov Random Fields
Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected
More informationUndirected Graphical Models
Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 5, 06 Reading: See class website Eric Xing @ CMU, 005-06
More informationIndependencies. Undirected Graphical Models 2: Independencies. Independencies (Markov networks) Independencies (Bayesian Networks)
(Bayesian Networks) Undirected Graphical Models 2: Use d-separation to read off independencies in a Bayesian network Takes a bit of effort! 1 2 (Markov networks) Use separation to determine independencies
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Variational Inference III: Variational Principle I Junming Yin Lecture 16, March 19, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3 X 3 Reading: X 4
More informationProbabilistic Graphical Models
2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector
More informationECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning
ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Markov Random Fields: Representation Conditional Random Fields Log-Linear Models Readings: KF
More informationA new class of upper bounds on the log partition function
Accepted to IEEE Transactions on Information Theory This version: March 25 A new class of upper bounds on the log partition function 1 M. J. Wainwright, T. S. Jaakkola and A. S. Willsky Abstract We introduce
More informationOn the optimality of tree-reweighted max-product message-passing
Appeared in Uncertainty on Artificial Intelligence, July 2005, Edinburgh, Scotland. On the optimality of tree-reweighted max-product message-passing Vladimir Kolmogorov Microsoft Research Cambridge, UK
More informationParametrizations of Discrete Graphical Models
Parametrizations of Discrete Graphical Models Robin J. Evans www.stat.washington.edu/ rje42 10th August 2011 1/34 Outline 1 Introduction Graphical Models Acyclic Directed Mixed Graphs Two Problems 2 Ingenuous
More information14 : Theory of Variational Inference: Inner and Outer Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture
More informationGaussian Graphical Models and Graphical Lasso
ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf
More informationJoint Gaussian Graphical Model Review Series I
Joint Gaussian Graphical Model Review Series I Probability Foundations Beilun Wang Advisor: Yanjun Qi 1 Department of Computer Science, University of Virginia http://jointggm.org/ June 23rd, 2017 Beilun
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationHigh-dimensional covariance estimation based on Gaussian graphical models
High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,
More informationStatistics 3858 : Maximum Likelihood Estimators
Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,
More informationExact MAP estimates by (hyper)tree agreement
Exact MAP estimates by (hyper)tree agreement Martin J. Wainwright, Department of EECS, UC Berkeley, Berkeley, CA 94720 martinw@eecs.berkeley.edu Tommi S. Jaakkola and Alan S. Willsky, Department of EECS,
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationCombinatorial Criteria for Uniqueness of Gibbs Measures
Combinatorial Criteria for Uniqueness of Gibbs Measures Dror Weitz School of Mathematics, Institute for Advanced Study, Princeton, NJ 08540, U.S.A. dror@ias.edu April 12, 2005 Abstract We generalize previously
More informationLecture 4 October 18th
Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationDoes Better Inference mean Better Learning?
Does Better Inference mean Better Learning? Andrew E. Gelfand, Rina Dechter & Alexander Ihler Department of Computer Science University of California, Irvine {agelfand,dechter,ihler}@ics.uci.edu Abstract
More informationLearning Graphs from Data: A Signal Representation Perspective
1 Learning Graphs from Data: A Signal Representation Perspective Xiaowen Dong*, Dorina Thanou*, Michael Rabbat, and Pascal Frossard arxiv:1806.00848v1 [cs.lg] 3 Jun 2018 The construction of a meaningful
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning and Inference in DAGs We discussed learning in DAG models, log p(x W ) = n d log p(x i j x i pa(j),
More informationAlternative Parameterizations of Markov Networks. Sargur Srihari
Alternative Parameterizations of Markov Networks Sargur srihari@cedar.buffalo.edu 1 Topics Three types of parameterization 1. Gibbs Parameterization 2. Factor Graphs 3. Log-linear Models with Energy functions
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning
More informationMarkov Random Fields
Markov Random Fields Umamahesh Srinivas ipal Group Meeting February 25, 2011 Outline 1 Basic graph-theoretic concepts 2 Markov chain 3 Markov random field (MRF) 4 Gauss-Markov random field (GMRF), and
More informationUndirected graphical models
Undirected graphical models Semantics of probabilistic models over undirected graphs Parameters of undirected models Example applications COMP-652 and ECSE-608, February 16, 2017 1 Undirected graphical
More informationLecture 1 October 9, 2013
Probabilistic Graphical Models Fall 2013 Lecture 1 October 9, 2013 Lecturer: Guillaume Obozinski Scribe: Huu Dien Khue Le, Robin Bénesse The web page of the course: http://www.di.ens.fr/~fbach/courses/fall2013/
More informationCS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling
CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling Professor Erik Sudderth Brown University Computer Science October 27, 2016 Some figures and materials courtesy
More informationLecture 3 January 28
EECS 28B / STAT 24B: Advanced Topics in Statistical LearningSpring 2009 Lecture 3 January 28 Lecturer: Pradeep Ravikumar Scribe: Timothy J. Wheeler Note: These lecture notes are still rough, and have only
More informationWhich graphical models are difficult to learn?
Which graphical models are difficult to learn? Jose Bento Department of Electrical Engineering Stanford University jbento@stanford.edu Andrea Montanari Department of Electrical Engineering and Department
More information1 Regression with High Dimensional Data
6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:
More informationLecture 9: PGM Learning
13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and
More informationInformation-Theoretic Limits of Selecting Binary Graphical Models in High Dimensions
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, NO. 7, JULY 2012 4117 Information-Theoretic Limits of Selecting Binary Graphical Models in High Dimensions Narayana P. Santhanam, Member, IEEE, and Martin
More informationMaster s Written Examination
Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth
More informationMaximum likelihood estimation
Maximum likelihood estimation Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Maximum likelihood estimation 1/26 Outline 1 Statistical concepts 2 A short review of convex analysis and optimization
More informationTopic 12 Overview of Estimation
Topic 12 Overview of Estimation Classical Statistics 1 / 9 Outline Introduction Parameter Estimation Classical Statistics Densities and Likelihoods 2 / 9 Introduction In the simplest possible terms, the
More informationLecture Introduction. 2 Brief Recap of Lecture 10. CS-621 Theory Gems October 24, 2012
CS-62 Theory Gems October 24, 202 Lecture Lecturer: Aleksander Mądry Scribes: Carsten Moldenhauer and Robin Scheibler Introduction In Lecture 0, we introduced a fundamental object of spectral graph theory:
More informationSemidefinite relaxations for approximate inference on graphs with cycles
Semidefinite relaxations for approximate inference on graphs with cycles Martin J. Wainwright Electrical Engineering and Computer Science UC Berkeley, Berkeley, CA 94720 wainwrig@eecs.berkeley.edu Michael
More informationMaximum Likelihood, Logistic Regression, and Stochastic Gradient Training
Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions
More informationExponential families also behave nicely under conditioning. Specifically, suppose we write η = (η 1, η 2 ) R k R p k so that
1 More examples 1.1 Exponential families under conditioning Exponential families also behave nicely under conditioning. Specifically, suppose we write η = η 1, η 2 R k R p k so that dp η dm 0 = e ηt 1
More informationSimultaneous Support Recovery in High Dimensions: Benefits and Perils of Block `1=` -Regularization
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 6, JUNE 2011 3841 Simultaneous Support Recovery in High Dimensions: Benefits and Perils of Block `1=` -Regularization Sahand N. Negahban and Martin
More informationRapid Introduction to Machine Learning/ Deep Learning
Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/24 Lecture 5b Markov random field (MRF) November 13, 2015 2/24 Table of contents 1 1. Objectives of Lecture
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by
More information1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)
Machine Learning (ML, F16) Lecture#07 (Thursday Nov. 3rd) Lecturer: Byron Boots Undirected Graphical Models 1 Undirected Graphical Models In the previous lecture, we discussed directed graphical models.
More informationHard-Core Model on Random Graphs
Hard-Core Model on Random Graphs Antar Bandyopadhyay Theoretical Statistics and Mathematics Unit Seminar Theoretical Statistics and Mathematics Unit Indian Statistical Institute, New Delhi Centre New Delhi,
More informationGraphical Model Inference with Perfect Graphs
Graphical Model Inference with Perfect Graphs Tony Jebara Columbia University July 25, 2013 joint work with Adrian Weller Graphical models and Markov random fields We depict a graphical model G as a bipartite
More informationCombinatorial Criteria for Uniqueness of Gibbs Measures
Combinatorial Criteria for Uniqueness of Gibbs Measures Dror Weitz* School of Mathematics, Institute for Advanced Study, Princeton, New Jersey 08540; e-mail: dror@ias.edu Received 17 November 2003; accepted
More informationTightness of LP Relaxations for Almost Balanced Models
Tightness of LP Relaxations for Almost Balanced Models Adrian Weller University of Cambridge AISTATS May 10, 2016 Joint work with Mark Rowland and David Sontag For more information, see http://mlg.eng.cam.ac.uk/adrian/
More informationDifferentially Private Feature Selection via Stability Arguments, and the Robustness of the Lasso
Differentially Private Feature Selection via Stability Arguments, and the Robustness of the Lasso Adam Smith asmith@cse.psu.edu Pennsylvania State University Abhradeep Thakurta azg161@cse.psu.edu Pennsylvania
More informationStochastic Proximal Gradient Algorithm
Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind
More informationNearest Neighbor Pattern Classification
Nearest Neighbor Pattern Classification T. M. Cover and P. E. Hart May 15, 2018 1 The Intro The nearest neighbor algorithm/rule (NN) is the simplest nonparametric decisions procedure, that assigns to unclassified
More informationLecture 1: Bayesian Framework Basics
Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of
More informationLearning Quadratic Variance Function (QVF) DAG Models via OverDispersion Scoring (ODS)
Journal of Machine Learning Research 18 2018 1-44 Submitted 4/17; Revised 12/17; Published 4/18 Learning Quadratic Variance Function QVF DAG Models via OverDispersion Scoring ODS Gunwoong Park Department
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 10 Undirected Models CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due this Wednesday (Nov 4) in class Project milestones due next Monday (Nov 9) About half
More informationMAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches
MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches Martin Wainwright Tommi Jaakkola Alan Willsky martinw@eecs.berkeley.edu tommi@ai.mit.edu willsky@mit.edu
More informationConvergence Analysis of Reweighted Sum-Product Algorithms
IEEE TRANSACTIONS ON SIGNAL PROCESSING, PAPER IDENTIFICATION NUMBER T-SP-06090-2007.R1 1 Convergence Analysis of Reweighted Sum-Product Algorithms Tanya Roosta, Student Member, IEEE, Martin J. Wainwright,
More information(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ )
Setting RHS to be zero, 0= (θ )+ 2 L(θ ) (θ θ ), θ θ = 2 L(θ ) 1 (θ )= H θθ (θ ) 1 d θ (θ ) O =0 θ 1 θ 3 θ 2 θ Figure 1: The Newton-Raphson Algorithm where H is the Hessian matrix, d θ is the derivative
More informationUniversity of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians
Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision
More informationIntelligent Systems:
Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition
More informationConditional Independence and Factorization
Conditional Independence and Factorization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationIdentifiability assumptions for directed graphical models with feedback
Biometrika, pp. 1 26 C 212 Biometrika Trust Printed in Great Britain Identifiability assumptions for directed graphical models with feedback BY GUNWOONG PARK Department of Statistics, University of Wisconsin-Madison,
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationMarkov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials)
Markov Networks l Like Bayes Nets l Graphical model that describes joint probability distribution using tables (AKA potentials) l Nodes are random variables l Labels are outcomes over the variables Markov
More informationImproved Greedy Algorithms for Learning Graphical Models
Improved Greedy Algorithms for Learning Graphical Models Avik Ray, Sujay Sanghavi Member, IEEE and Sanjay Shakkottai Fellow, IEEE Abstract We propose new greedy algorithms for learning the structure of
More informationDivide-and-combine Strategies in Statistical Modeling for Massive Data
Divide-and-combine Strategies in Statistical Modeling for Massive Data Liqun Yu Washington University in St. Louis March 30, 2017 Liqun Yu (WUSTL) D&C Statistical Modeling for Massive Data March 30, 2017
More informationLearning Parameters of Undirected Models. Sargur Srihari
Learning Parameters of Undirected Models Sargur srihari@cedar.buffalo.edu 1 Topics Difficulties due to Global Normalization Likelihood Function Maximum Likelihood Parameter Estimation Simple and Conjugate
More informationSTAT 730 Chapter 4: Estimation
STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum
More informationReview: Directed Models (Bayes Nets)
X Review: Directed Models (Bayes Nets) Lecture 3: Undirected Graphical Models Sam Roweis January 2, 24 Semantics: x y z if z d-separates x and y d-separation: z d-separates x from y if along every undirected
More information3 Undirected Graphical Models
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 3 Undirected Graphical Models In this lecture, we discuss undirected
More informationMAP Examples. Sargur Srihari
MAP Examples Sargur srihari@cedar.buffalo.edu 1 Potts Model CRF for OCR Topics Image segmentation based on energy minimization 2 Examples of MAP Many interesting examples of MAP inference are instances
More informationAlternative Parameterizations of Markov Networks. Sargur Srihari
Alternative Parameterizations of Markov Networks Sargur srihari@cedar.buffalo.edu 1 Topics Three types of parameterization 1. Gibbs Parameterization 2. Factor Graphs 3. Log-linear Models Features (Ising,
More informationLearning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley
Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. 1. Prediction with expert advice. 2. With perfect
More informationMarkov Random Fields
Markov Random Fields 1. Markov property The Markov property of a stochastic sequence {X n } n 0 implies that for all n 1, X n is independent of (X k : k / {n 1, n, n + 1}), given (X n 1, X n+1 ). Another
More informationRepresentation of undirected GM. Kayhan Batmanghelich
Representation of undirected GM Kayhan Batmanghelich Review Review: Directed Graphical Model Represent distribution of the form ny p(x 1,,X n = p(x i (X i i=1 Factorizes in terms of local conditional probabilities
More information