Probabilistic Graphical Models

Similar documents
Probabilistic Graphical Models

Probabilistic Graphical Models

Proximity-Based Anomaly Detection using Sparse Structure Learning

11 : Gaussian Graphic Models and Ising Models

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits

ESTIMATING TIME-VARYING NETWORKS. By Mladen Kolar, Le Song, Amr Ahmed and Eric P. Xing School of Computer Science, Carnegie Mellon University

Introduction to graphical models: Lecture III

Gaussian Graphical Models and Graphical Lasso

High dimensional Ising model selection

Learning discrete graphical models via generalized inverse covariance matrices

Genetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig

10708 Graphical Models: Homework 2

Sparse Graph Learning via Markov Random Fields

Probabilistic Graphical Models

Example: multivariate Gaussian Distribution

High dimensional ising model selection using l 1 -regularized logistic regression

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Multivariate Normal Models

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Multivariate Normal Models

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina.

CPSC 540: Machine Learning

Chapter 17: Undirected Graphical Models

1 Regression with High Dimensional Data

Probabilistic Graphical Models

25 : Graphical induced structured input/output models

Graphical Model Selection

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Learning Gaussian Graphical Models with Unknown Group Sparsity

Lecture 25: November 27

Inverse Covariance Estimation with Missing Data using the Concave-Convex Procedure

High-dimensional Covariance Estimation Based On Gaussian Graphical Models

Robust Principal Component Analysis

Probabilistic Graphical Models

Robust Inverse Covariance Estimation under Noisy Measurements

25 : Graphical induced structured input/output models

CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning

3 : Representation of Undirected GM

STA 4273H: Statistical Machine Learning

ECE521 lecture 4: 19 January Optimization, MLE, regularization

Sparse inverse covariance estimation with the lasso

Sparse Gaussian conditional random fields

Statistical Machine Learning for Structured and High Dimensional Data

MAP Examples. Sargur Srihari

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Learning the Network Structure of Heterogenerous Data via Pairwise Exponential MRF

Markov Networks.

CSC 412 (Lecture 4): Undirected Graphical Models

STA414/2104 Statistical Methods for Machine Learning II

Permutation-invariant regularization of large covariance matrices. Liza Levina

Probabilistic Graphical Models

Learning Markov Network Structure using Brownian Distance Covariance

Causal Inference: Discussion

Network Topology Inference from Non-stationary Graph Signals

14 : Theory of Variational Inference: Inner and Outer Approximation

High-dimensional covariance estimation based on Gaussian graphical models

Big & Quic: Sparse Inverse Covariance Estimation for a Million Variables

Regression, Ridge Regression, Lasso

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Markov Random Fields

Representation of undirected GM. Kayhan Batmanghelich

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results

CPSC 540: Machine Learning

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017

High-dimensional statistics: Some progress and challenges ahead

STAT 518 Intro Student Presentation

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Homework 4. Convex Optimization /36-725

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Probabilistic Graphical Models

Regulatory Inferece from Gene Expression. CMSC858P Spring 2012 Hector Corrada Bravo

6. Regularized linear regression

Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Chris Bishop s PRML Ch. 8: Graphical Models

Probabilistic Graphical Models Lecture Notes Fall 2009

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Frist order optimization methods for sparse inverse covariance selection

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

Lecture 2 Part 1 Optimization

Directed and Undirected Graphical Models

Message-Passing Algorithms for GMRFs and Non-Linear Optimization

Equivalent Partial Correlation Selection for High Dimensional Gaussian Graphical Models

Joint Gaussian Graphical Model Review Series I

DS-GA 1002 Lecture notes 10 November 23, Linear models

MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

STA 4273H: Statistical Machine Learning

Robust and sparse Gaussian graphical modelling under cell-wise contamination

Undirected graphical models

Statistical Analysis of Online News

Linear Models for Regression CS534

ECE521: Inference Algorithms and Machine Learning University of Toronto. Assignment 1: k-nn and Linear Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

Review: Directed Models (Bayes Nets)

Transcription:

School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04

Where do networks come from? The Jesus network Eric Xing @ CMU, 005-04

Evolving networks Can I get his vote? Corporativity, Antagonism, Cliques, over time? March 005 January 006 August 006 Eric Xing @ CMU, 005-04 3

Evolving networks t= 3 T Eric Xing @ CMU, 005-04 4

Recall Multivariate Gaussian Multivariate Gaussian density: WOLG: let We can view this as a continuous Markov Random Field with potentials defined on every node and edge: ) - ( ) - ( - exp ) ( ), ( / / x x x T n p j i j i ij i i ii n p x x q x q Q Q x x x p / / - exp ) ( ) 0,,,, ( Eric Xing @ CMU, 005-04 5

Gaussian Graphical Model Cell type Microarray samples Encodes dependencies among genes Eric Xing @ CMU, 005-04 6

Precision Matrix Encodes Non-Zero Edges in Gaussian Graphical Modela Edge corresponds to nonzero precision matrix element Eric Xing @ CMU, 005-04 7

Markov versus Correlation Network Correlation network is based on Covariance Matrix A GGM is a Markov Network based on Precision Matrix Conditional Independence/Partial Correlation Coefficients are a more sophisticated dependence measure With small sample size, empirical covariance matrix cannot be inverted Eric Xing @ CMU, 005-04 8

Sparsity One common assumption to make: sparsity Makes empirical sense: Genes are only assumed to interface with small groups of other genes. Makes statistical sense: Learning is now feasible in high dimensions with small sample size sparse Eric Xing @ CMU, 005-04 9

Network Learning with the LASSO Assume network is a Gaussian Graphical Model Perform LASSO regression of all nodes to a target node Eric Xing @ CMU, 005-04 0

Network Learning with the LASSO LASSO can select the neighborhood of each node Eric Xing @ CMU, 005-04

L Regularization (LASSO) A convex relaxation. Constrained Form Lagrangian Form x x Enforces sparsity! y 3 n- x 3 n x n- x n Eric Xing @ CMU, 005-04

Theoretical Guarantees Assumptions Dependency Condition: Relevant Covariates are not overly dependent Incoherence Condition: Large number of irrelevant covariates cannot be too correlated with relevant covariates Strong concentration bounds: Sample quantities converge to expected values quickly If these are assumptions are met, LASSO will asymptotically recover correct subset of covariates that relevant. Eric Xing @ CMU, 005-04 3

Network Learning with the LASSO Repeat this for every node Form the total edge set Eric Xing @ CMU, 005-04 4

Consistent Structure Recovery [Meinshausen and Buhlmann 006, Wainwright 009] If Then with high probability, Eric Xing @ CMU, 005-04 5

Why this algorithm work? What is the intuition behind graphical regression? Continuous nodal attributes Discrete nodal attributes Are there other algorihtms? More general scenarios: non-iid sample and evolving networks Case study Eric Xing @ CMU, 005-04 6

Multivariate Gaussian Multivariate Gaussian density: A joint Gaussian: How to write down p(x ), p(x x ) or p(x x ) using the block elements in and? Formulas to remember: ) - ( ) - ( - exp ) ( ), ( / / x x x T n p ), ( ), ( x x x x N p ) ( ), ( ) ( V x m V m x x x N p m m m m p V m V m x x ), ( ) ( N Eric Xing @ CMU, 005-04 7

The matrix inverse lemma Consider a block-partitioned matrix: First we diagonalize M Schur complement: Then we inverse, using this formula: Matrix inverse lemma H G F E M H G E-FH I G -H I H G F E I -FH I - - - 0 0 0 0 X ZW Y W XYZ - - G E-FH M/H - 0 0 0 0 M/E GE M/E - M/E F -E GE M/E F E E FH M/H G H H M/H G -H FH M/H - M/H I -FH I H M/H I G -H I H G F E M - - - - - - - - - GE F H-GE F E E G E-FH Eric Xing @ CMU, 005-04 8

The covariance and the precision matrices - - FH M/H G H H M/H G -H FH M/H - M/H M Q q q q q I -q -q q Q T T T T Eric Xing @ CMU, 005-04 9

Single-node Conditional p ( x x m V ) N ( x m, V ( x ) ) The conditional dist. of a single node i given the rest of the nodes can be written as: WOLG: let Q -q q T T -q q q T I q q Q Eric Xing @ CMU, 005-04 0

Conditional auto-regression From We can write the following conditional auto-regression function for each node: Neighborhood est. based on auto-regression coefficient Eric Xing @ CMU, 005-04

Conditional independence From Given an estimate of the neighborhood s i, we have: Thus the neighborhood s i defines the Markov blanket of node i Eric Xing @ CMU, 005-04

Recent trends in GGM: Covariance selection (classical method) Dempster [97]: Sequentially pruning smallest elements in precision matrix Drton and Perlman [008]: Improved statistical tests for pruning Serious limitations in practice: breaks down when covariance matrix is not invertible L -regularization based method (hot!) Meinshausen and Bühlmann [Ann. Stat. 06]: Used LASSO regression for neighborhood selection Banerjee [JMLR 08]: Block sub-gradient algorithm for finding precision matrix Friedman et al. [Biostatistics 08]: Efficient fixed-point equations based on a sub-gradient algorithm Structure learning is possible even when # variables > # samples Eric Xing @ CMU, 005-04 3

The Meinshausen-Bühlmann (MB) algorithm: Solving separated Lasso for every single variables: Step : Pick up one variable Step : Think of it as y, and the rest as z Step 3: Solve Lasso regression problem between y and z The resulting coefficient does not correspond to the Q value-wise Step 4: Connect the k-th node to those having nonzero weight in w Eric Xing @ CMU, 005-04 4

L -regularized maximum likelihood learning Input: Sample covariance matrix S Assumes standardized data (mean=0, variance=) S is generally rank-deficient Thus the inverse does not exist Output: Sparse precision matrix Q Originally, Q is defined as the inverse of S, but not directly invertible Need to find a sparse matrix that can be thought as of as an inverse of S log likelihood regularizer Approach: Solve an L -regularized maximum likelihood equation Eric Xing @ CMU, 005-04 5

From matrix opt. to vector opt.: coupled Lasso for every single Var. Focus only on one row (column), keeping the others constant Optimization problem for blue vector is shown to be Lasso (L - regularized quadratic programming) Difference from MB s: Resulting Lasso problems are coupled The gray part is actually not constant; changes after solving one Lasso problem (because it is the opt of the entire Q that optimize a single loss function, whereas in MB each lasso has its own loss function.. This coupling is essential for stability under noise Eric Xing @ CMU, 005-04 6

Learning Ising Model (i.e. pairwise MRF) Assuming the nodes are discrete (e.g., voting outcome of a person), and edges are weighted, then for a sample x, we have It can be shown the pseudo-conditional likelihood for node k is Eric Xing @ CMU, 005-04 7

New Problem: Evolving Social Networks Can I get his vote? Corporativity, Antagonism, Cliques, over time? March 005 January 006 August 006 Eric Xing @ CMU, 005-04 8

Reverse engineering timespecific "rewiring" networks t* n= or some small # T0 TN Drosophila development Eric Xing @ CMU, 005-04 9

Inference I [Song, Kolar and Xing, Bioinformatics 09] KELLER: Kernel Weighted L -regularized Logistic Regression Constrained convex optimization Lasso: Estimate time-specific nets one by one, based on "virtual iid" samples Could scale to ~0 4 genes, but under stronger smoothness assumptions Eric Xing @ CMU, 005-04 30

Algorithm nonparametric neighborhood selection Conditional likelihood Neighborhood Selection: Time-specific graph regression: Estimate at Where and Eric Xing @ CMU, 005-04 3

Structural consistency of KELLER Assumptions Define: A: Dependency Condition A: Incoherence Condition A3: Smoothness Condition A4: Bounded Kernel Eric Xing @ CMU, 005-04 3

Theorem [Kolar and Xing, 09] Eric Xing @ CMU, 005-04 33

Inference II [Amr and Xing, PNAS 009, AOAS 009] TESLA: Temporally Smoothed L -regularized logistic regression Constrained convex optimization Scale to ~5000 nodes, does not need smoothness assumption, can accommodate abrupt changes. Eric Xing @ CMU, 005-04 34

Temporally Smoothed Graph Regression TESLA: Eric Xing @ CMU, 005-04 35

Modified estimation procedure estimate block partition on which the coefficient functions are constant (*) estimate the coefficient functions on each block of the partition (**) Eric Xing @ CMU, 005-04 36

Structural Consistency of TESLA [Kolar, and Xing, 009] I. It can be shown that, by applying the results for model selection of the Lasso on a temporal difference transformation of (*), the block are estimated consistently II. Then it can be further shown that, by applying Lasso on (**), the neighborhood of each node on each of the estimated blocks consistently Further advantages of the two step procedure choosing parameters easier faster optimization procedure Eric Xing @ CMU, 005-04 37

Senate network 09 th congress Voting records from 09th congress (005-006) There are 00 senators whose votes were recorded on the 54 bills, each vote is a binary outcome Eric Xing @ CMU, 005-04 38

Senate network 09 th congress March 005 January 006 August 006 Eric Xing @ CMU, 005-04 39

Senator Chafee Eric Xing @ CMU, 005-04 40

Senator Ben Nelson T=0. T=0.8 Eric Xing @ CMU, 005-04 4

Progression and Reversion of Breast Cancer cells S (normal) T4 (malignant) T4 revert T4 revert 3 T4 revert Eric Xing @ CMU, 005-04 4

Estimate Neighborhoods Jointly Across All Cell Types S T4 How to share information across the cell types? T4R T4R3 T4R Eric Xing @ CMU, 005-04 43

Sparsity of Difference Penalize differences between networks of adjacent cell types S T4 T4R T4R3 T4R Eric Xing @ CMU, 005-04 44

Tree-Guided Graphical Lasso (Treegl) RSS for all cell types sparsity Sparsity of difference Eric Xing @ CMU, 005-04 45

Network Overview T4 EGFR-ITGB S MMP PI3K-MAPKK Eric Xing @ CMU, 005-04 46

Interactions Biological Processes S cells T4 cells: Increased Cell Proliferation, Growth, Signaling, Locomotion Eric Xing @ CMU, 005-04 47

Interactions Biological Processes T4 cells MMP-T4R cells: Significantly reduced interactions Eric Xing @ CMU, 005-04 48

Interactions Biological Processes T4 cells PI3K-MAPKK-T4R: Reduced Growth, Locomotion and Signaling Eric Xing @ CMU, 005-04 49

Summary Graphical Gaussian Model The precision matrix encode structure Not estimatable when p >> n Neighborhood selection: Conditional dist under GGM/MRF Graphical lasso Sparsistency Time-varying Markov networks Kernel reweighting est. Total variation est. Eric Xing @ CMU, 005-04 50