Graphical models. Christophe Giraud. Lecture Notes on High-Dimensional Statistics : Université Paris-Sud and Ecole Polytechnique Maths Department

Size: px
Start display at page:

Download "Graphical models. Christophe Giraud. Lecture Notes on High-Dimensional Statistics : Université Paris-Sud and Ecole Polytechnique Maths Department"

Transcription

1 Graphical Modeling Université Paris-Sud and Ecole Polytechnique Maths Department Lecture Notes on High-Dimensional Statistics : giraud/msv/lecturenotes.pdf 1/73

2 Please ask questions! 2/73

3 Topic of the lecture : Graphical modeling is a convenient representation of conditional dependences among random variables It is a powerful tool for exploring direct effect between variables fast computations in complex models It is popular in many different fields, including bioinformatics, computer vision, speech recognition, environmental statistics, economics, social sciences, etc 3/73

4 Warning Two important topics : learning graphical models learning with graphical models 4/73

5 Warning Two important topics : learning graphical models learning with graphical models 5/73

6 Example 1 Figure: Learning Biological Regulatory Networks Seminal reference : Friedman [6] 6/73

7 Example 1 Figure: Learning Gene-Gene Regulatory Networks from microarrays Seminal reference : Friedman [6] 7/73

8 Example 1 8/73

9 Example 2 Figure: Learning Brain Connectivity Networks 9/73

10 Example 3 Figure: Learning direct effects in Social Sciences 10/73

11 Conditional independence 11/73

12 The concept of conditional dependence is more suited than the concept of dependence in order to catch direct dependences between variables Traffic jams and snowmen are correlated. But conditionally on snow falls, the size of the traffic jams and the number of snowmen are independent. Figure: Difference between dependence and conditional dependence 12/73

13 Conditional independence random variables X and Y are independent conditionally on a variable Z (we write X Y Z) if law((x, Y ) Z) = law(x Z) law(y Z). Caracterisation When the distribution of (X, Y, Z) has a positive density f, then X Y Z f (x, y z) = f (x z)f (y z) f (x, y, z) = f (x, z)f (y, z)/f (z) f (x y, z) = f (x z), 13/73

14 Directed acyclic model 14/73

15 Terminology Directed graph g set of nodes and arrows Acyclic no sequence of arrows forms a loop in the graph Parents the parents of a is the set pa(a) of nodes b such that b a Descendent the descendent of a is the set de(a) of nodes that can be reached from a by following some sequence of arrows. 15/73

16 X a {X b : a... b} conditionally on {X c : c a} Directed acyclic graphical model The law of the random variable X = (X 1..., X p ) is a graphical model according to the directed acyclic graph g if for all a, X a {X b, b / de(a)} {X c, c pa(a)} We write L(X ) g. Remark: if g g and L(X ) g then L(X ) g. 16/73

17 Warning There is no unique minimal graph in general! Example: Be careful with the interpretation of directed graphical models! X i+1 = αx i + ε i with ε i independent of X 1,..., X i 1. Then, the two graphs p and p are minimal graphs for this model. 17/73

18 The issue of estimating the minimal g is ill-posed in this context. Yet, 1 it is very useful for defining / computing laws (next slides) 2 it can be used for exploring causal effect (last part of the talk) 18/73

19 Bayesian networks / DAG models Here, we assume that g is known (from expert knowledge). Factorization formula If L(X ) g, we have p f (x 1,..., x p ) = f (x b x pa(b) ) b=1 Proof: for a leaf p f (x 1,..., x p ) = f (x p x 1,..., x p 1 )f (x 1,..., x p 1 ) = f (x p x pa(p) )f (x 1,..., x p 1 ) Very useful for defining / computing f! Sampling with Gibbs sampler 19/73

20 Learning with graphical models Examples of applications speech recognition computer vision ecological monitoring decision making diagnosis environmental statistics etc 20/73

21 Examples Seals monitoring (Ver Hoef and Frost [17]) Ice streams (Berliner et al.) sses/collab ice.html 21/73

22 Non-directed model 22/73

23 Terminology Non-directed graph set of nodes and edges The nodes are labelled by 1,..., p. Neighbors The neighbors of a are the nodes in ne(a) = { b : b g a } Class of a We set cl(a) = ne(a) {a} 23/73

24 X a independent from {X b : b a} conditionally on {X c : c a} Non-directed graphical model The law of the random variable X = (X 1..., X p ) is a graphical model according to the non-directed graph g if for all a: X a {X b, b / cl(a)} {X c, c ne(a)}. We write L(X ) g. Remark: if g g and L(X ) g then L(X ) g. 24/73

25 Minimal graph Minimal graph When X has a positive density there exists a unique minimal graph g such that L(X ) g. In the following, we call simply graph of X the minimal graph g such that L(X ) g. Our main goal will be to learn g from data. 25/73

26 Connection with directed acyclic graphs Moral graph The moral graph g m associated to a directed acyclic graph g is obtained by setting an edge between each parents of each nodes replacing arrows by edges Proposition L(X ) g = L(X ) g m Proof : 1 From the factorization formula, g 1, g 2 such that f (x) = g 1 (x a, x ne m (a)) g 2 (x nn m (a), x ne m (a)) where nn m (a) = {1,..., p} \ cl m (a). 2 This ensures X a X nn m (a) given X ne m (a). 26/73

27 Factorization in undirected graphical models Hammersley-Clifford formula For a random variable X with positive density f ( ) L(X ) g f (x) exp g c (x c ). Proof : based on Möbius inversion formula. c:c cliques(g) 27/73

28 Questions so far? 28/73

29 Gaussian graphical models In the remaining X N (0, Σ), with Σ non singular. 29/73

30 Reminder on Gaussian distribution (1/2) Proposition 1 : Gaussian conditioning We consider two sets A = {1,[..., k} ] and B = {1,..., p} \ A, and a XA Gaussian random vector X = R X p with N (0, Σ) distribution. [ ] B KAA K We write K = AB for Σ K BA K 1 and K 1 AA for the inverse (K AA) 1 BB of K AA. Then, we have Law(X A X B ) = N ( K 1 AA K ABX B, K 1 ) AA which means X A = K 1 AA K ABX B + ε A, with ε A N ( 0, K 1 AA ) independent of XB. 30/73

31 Proof: explicit We have for some function f that we do not need to make g(x A x B ) = g(x A, x B )/g(x B ) = exp As a consequence, g(x A x B ) exp ( 12 x TA K AA x A x TA K AB x B ) f (x B ). ( 1 ) 2 (x A + K 1 AA K ABx B ) T K AA (x A + K 1 AA K ABx B ) where the factor of proportionality does not depend on x A. We recognize the density of the Gaussian N ( K 1 AA K AB x B, K 1 AA ) law. Ref: Lauritzen [11] 31/73

32 Reminder on Gaussian distribution (2/2) Partial correlation For any a, b {1,..., p}, we have cor(x a, X b X c : c a, b) = K a,b Kaa K bb. Proof : The previous proposition with A = {a, b} and B = A c gives ( ) 1 ( ) Kaa K cov(x A X B ) = ab 1 Kbb K = ab K ab K bb K aa K bb Kab 2. K ab K aa Plugging this formula in the definition of the partial correlation gives the result. 32/73

33 Reading the graph g on K From K to g We set K = Σ 1 and define the graph g by a g b K a,b 0. (1) GGM and precision matrix For the graph g defined by (1), we have 1 L(X ) g and g is minimal. 2 There exists ε a N (0, Kaa 1 ) independent of {X b : b a} such that X a = K ab X b + ε a. K aa b ne(a) 33/73

34 Proof. We apply Proposition 1 : 1 We set A = {a} nn(a) and B = ne(a), where nn(a) = ({1,..., p} \ cl(a). ) The precision matrix restricted to A is Kaa 0 K AA = so its inverse is 0 ( K nn(a) nn(a) ) K (K AA ) 1 1 aa 0 = 0 (K nn(a) nn(a) ) 1. The above Lemma ensures that the law of X {a} nn(a) given X ne(a) is Gaussian with covariance matrix (K AA ) 1 so X a and X nn(a) are independent conditionally on X ne(a). 2 The second point is obtained with A = {a} and B = A c. 34/73

35 Estimation strategies Goal From a n-sample X 1,..., X n i.i.d. with distribution N (0, Σ), we want to estimate the (minimal) graph g such that L(X ) g. The above results suggest 3 estimations strategies: 1 by estimating the partial correlations + multiple testing 2 by a sparse estimation of K 3 by a regression approach 35/73

36 Estimation with partial correlation (1/3) Reminder 1 Reminder 2 a g b ρ a,b := cor(x a, X b X c : c a, b) 0 ρ a,b = K a,b Kaa K bb Partial covariance estimation For n > p, we estimate ρ a,b by ρ ab = [ Σ 1 ] ab [ Σ 1 ] [ Σ 1 ] aa bb, where Σ is the empirical covariance. 36/73

37 Estimation with partial correlation (2/3) Under the null hypothesis when ρ a,b = 0 and n > p 2, we have t a,b := n 2 p ρ ab 1 ρ 2 ab Student(n p 2). Estimation procedure 1 Compute the t a,b 2 Apply a multiple testing thresholding Weakness when p > n 2 the procedure cannot be applied when n > p but n p small, t a,b has a large variance and the procedure is powerless 37/73

38 Estimation with partial correlation (3/3) Solution 1: Shrinking the conditioning work with ĉor(x a, X b X c : c S) with S small Ref: Wille and Bühlmann [19], Castelo and Roverato [2], Spirtes et al. [16] or Kalisch and Bühlmann [8]. ĉor(x a, X b X c : c S) is stable when S is small it is unclear what we estimate at the end (in general) Solution 2 : Sparse estimation of K The instability for large p comes from the instability of Σ 1 for estimating K. Build a more stable estimator of K capitalizing on its sparsity. 38/73

39 Sparse estimation of K (1/2) The likelihood of a p p positive symmetric matrix K S p + n ( det(k) Likelihood(K) = (2π) p exp 1 ) 2 X i T KX i. i=1 is Negative log-likelihood The negative-log-likelihood is convex. K n 2 log(det(k)) + n 2 K, Σ F Graphical Lasso : sparse estimation of K K λ = argmin K S + p { n 2 log(det(k)) + n 2 K, Σ F + λ } K ab a b 39/73

40 Sparse estimation of K (2/2) Efficient optimization algorithms. Ref: Friedman et al. [5], Banerjee et al. [1] Poor empirical results reported by Villers et al. [18] Theoretical guaranties under some compatibility conditions hard to check/interpret (by Ravikumar et al. [14]) keep the (good) idea of exploiting the sparsity, but move to the more classical regression framework. 40/73

41 Regression approach (1/4) Definitions Θ = the set of p p matrices with zero on the diagonal θ : matrix in Θ defined by θ ab = K ab /K bb for a b Characterization Proof: E[X a X b : b a] = b θ bax b θ = argmin Σ 1/2 (I θ) 2 F θ Θ a=1 since X a = b θ bax b + ε a. So: [ p θ = argmin E (X a ) 2 ] θ ba X b θ Θ = argmin θ Θ b:b a E [ X θ T X 2] = argmin Σ 1/2 (I θ) 2 F θ Θ 41/73

42 Regression approach (2/4) Replacing Σ by Σ, we obtain (I θ), Σ(I θ) F = 1 n X(I θ) 2 F. Estimation procedure Examples : θ λ = argmin θ Θ { } 1 n X Xθ 2 F + λω(θ) with Ω(θ) enforcing coordinate sparsity. 1 l 1 penalty : Ω(θ) = a b θ ab 2 l 1 /l 2 penalty : Ω(θ) = a<b θab 2 + θ2 ba 42/73

43 Regression approach (3/4) With the l 1 penalty : (Meinshausen and Bühlmann [13]) We can split the minimization into p problems in R p 1 { ] [ θl 1 ba = argmin 1 b:b a β R n X a } β b X b 2 + λ β l 1 p 1 b Very efficient algorithms by coordinate descent. l1 l1 No constraint enforces that θ ab 0 when θ ba 0. = choose an arbitrary decision rule to build ĝ from θ l1. Examples: 1 set an edge between a b in ĝ when either 2 set an edge a b in ĝ when both l1 l1 θ ab 0 or θ ba 0. l1 l1 θ ab 0 and θ ba 0. 43/73

44 Regression approach (4/4) With the l 1 /l 2 penalty : Symmetric zeros = no ambiguity to define ĝ from θ l1 /l 2 Computational cost The minimization problem cannot be split into p subproblems and it is less easy to minimize it in large dimensions. Algorithm : iterate on couple (a, b) until convergence ( ) ab 1 set = with ab = 1 n XT a (X b θ k a,b kb X k ). ba 2 set ) ( ( θab 1 λ ) ( ) ab. θ ba 2 + ba λ 44/73

45 Bayesian approach A series of papers [20, 4, 15] investigate the Bayesian approach. Issues 1 design of sensible priors 2 efficient posterior sampling To the best of my knowledge, cannot handle large dimensional problems 45/73

46 Conclusion(?) Conclusion we have the choice between multiple procedures for each procedure, there is at least one (non-scale free) tuning parameter to choose = we need a selection criterion 46/73

47 Classical selection criterion We have a collection G of graphs. Unbiased risk estimation AIC = 2 log(l(g)) + 2 g Bayesian criterion 2 log(p(g X)) n BIC = 2 log(l(g))+ g log(n) 2 log(p(g)) Only mathematically grounded in asymptotic setting : p fixed and n. 47/73

48 Resampling criterions Cross-Validation schemes train train train train test train train train test train train train test train train train test train train train test train train train train Figure recursive data splitting for 5-fold Cross-Validation No guaranty in high-dimensional settings : p n or p n. 48/73

49 GGMselect GGMselect R package (available on which 1 generates a collection Ĝ of candidates graphs according to the above procedures (+ some variants) 2 selects the best graph among Ĝ 49/73

50 GGMselect Quality criterion For a graph g MSEP(g) = Mean Square Error of Prediction related to g = bias(g) + variance(g) where bias(g) quantifies how important are the missing edges variance(g) is roughly proportional to the number of edges in g divided by n. Why MSEP? It is a way to quantify the importance of each edge. 50/73

51 GGMselect Ideal Select g = argmin{msep(g) : g Ĝ} g unknown! Selection criterion select ĝ which minimizes some penalized empirical MSEP where the penalty term: roughly penalizes each node of ĝ according to its degree (number of edges), is based on quantiles of Fisher random variables. 51/73

52 GGMselect Theorem : oracle-like inequality for GGMselect If max {deg(g)} ρ g G b n 2 ( log p ), for some ρ < 1, 2 then the estimated graph ĝ fulfills [ MSEP(ĝ) c ρ E inf {bias(g) + log(p) var(g)} g G b where R n = O(Tr(Σ)e c ρn + CVar(Σ) log(p)/n) with CVar(Σ) = ( ) a Σ 1 1. aa Ref: Giraud et al. [7] ] + R n, 52/73

53 GGMselect Theorem : oracle-like inequality for GGMselect If max {deg(g)} ρ g G b n 2 ( log p ), for some ρ < 1, 2 then the estimated graph ĝ fulfills [ MSEP(ĝ) c ρ E inf {bias(g) + log(p) var(g)} g G b where R n = O(Tr(Σ)e c ρn + CVar(Σ) log(p)/n) with CVar(Σ) = ( ) a Σ 1 1. aa Ref: Giraud et al. [7] ] + R n, 52/73

54 GGMselect Optimality? Optimal selection criterion? minimal size of the penalty to avoid overfitting minimax estimation rates when Ĝ contains good graphs What about the condition on the degree? (n/2 log p) unavoidable, otherwise estimation rate gets worse. 53/73

55 Practical issues : Gaussianity Gaussianity? Hammersley-Clifford formula For a random variable X with positive density f ( ) L(X ) g f (x) exp g c (x c ). c:c cliques(g) 54/73

56 Practical issues : Gaussianity Gaussianity? Data transformation: f j (X j ) := Φ 1 (F j (X j )) N (0, 1) Assumption: (f 1 (X 1 ),..., f p (X p )) N (0, Σ) Key point: graph(x 1,..., X p ) = graph(f 1 (X 1 ),..., f p (X p )) Estimation: work with f j (X j ) = Φ 1 ( F j (X j )) for some estimator F j. Ref: Data transformations proposed by Lafferty et al. [10] 55/73

57 Practical issues : Hidden variables Hidden variables? we may only observe part of the relevant variables: ( ) ( ( )) XO ΣOO Σ X = N 0, HO with X O observed and X H X H unobserved. Σ OH Σ HH We only have access to (Σ OO ) 1 = K OO K OH (K HH ) 1 K HO Ref: Chandrasekaran et al. [3] proposes a sparse + low rank estimation to recover K OO when H is small 56/73

58 Back to directed models 57/73

59 Reminder X a {X b : a... b} conditionally on {X c : c a} 58/73

60 Causal inference Setting We have p covariables X 1,..., X p and Y a variable of interest. [X, Y ] is a Gaussian graphical model according to g. No arrows from Y to X 1,..., X p Example Y is the end product of a metabolic network X 1,..., X p are protein abundances 59/73

61 Causal inference Direct versus Causal effect Direct effect given by θ a in E[Y X 1,..., X p ] = b θ b X b Causal effect (relative to g ) given by β a in E[Y X a, X b : b pa(a)] = β a X a + β b X b b pa(a) 60/73

62 Causal inference Main issue Causal effects are defined relative to g and there is no unique minimal directed graph... find all the DAG g (1),..., g (m) such that L(X ) g (k) compute a lower bound of the causal effect: β = min { β (1),..., β (m) } 61/73

63 PCalg PCalg R package (available on which estimates the DAG g (1),..., g (m) from the data estimates β (1),..., β (m) and β Ref: Kalish et al [9] Maathuis et al. [12] 62/73

64 PC algorithm Principle of PC algorithm Init : g = complete graph Iterate : for a = 1,..., p, for b ne(a): remove a b if ĉor(x a, X b ) < t 0 for a = 1,..., p, for b ne(a): remove a b if c 1 ne(a) such that ĉor(x a, X b X c1 ) < t 1 for a = 1,..., p, for b ne(a): remove a b if c 1, c 2 ne(a) such that ĉor(x a, X b X c1, X c2 ) < t 2... Output : skeleton of the DAGs Last step: compute g (1),..., g (m) from the skeleton 63/73

65 Riboflavin prediction Figure: Important genes for riboflavin production Ref : Kalish et al [9] 64/73

66 Warning Be aware of over-interpretation : we cannot reliably infer causal networks on i.i.d. data Relies on uncheckable assumptions But, seems promising for ranking covariables 65/73

67 Statistics in high-dimensional setting Despite theorem, do not trust to much statistical inferences in high-dimensional setting n p ex: gene pre-selection, metagenes, etc It is not a validation tool, but rather a good tool for providing good hints Requires experimental validations. References on high-dimensional statistics: Lecture Notes on High-Dimensional Statistics giraud/msv/lecturenotes.pdf The Element of Statistical Learning by Hastie, Tibshirani, Friedman www-stat.stanford.edu/ tibs/elemstatlearn/ 66/73

68 Thank you! 67/73

69 References I [1] O. Banerjee, L. El Ghaoui, and A. d Aspremont. Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J. Mach. Learn. Res., 9: , [2] R. Castelo and A. Roverato. A robust procedure for Gaussian graphical model search from microarray data with p larger than n. J. Mach. Learn. Res., 7: , [3] V. Chandrasekaran, P. Parrilo, and A. Willsky. Latent variable graphical model selection via convex optimization. Annals of Statistics, /73

70 References II [4] P. Dellaportas, P. Giudici, and G. Roberts. Bayesian inference for nondecomposable graphical Gaussian models. Sankhyā, 65(1):43 55, [5] J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3): , [6] Nir Friedman. Inferring Cellular Networks Using Probabilistic Graphical Models. Science, 303(5659): , [7] C. Giraud, S. Huet, and N. Verzelen. Graph selection with GGMselect. Stat. Appl. Genet. Mol. Biol., 11(3):1 50, /73

71 References III [8] M. Kalisch and P. Bühlmann. Robustification of the pc-algorithm for directed acyclic graphs. J. Comput. Graph. Statist., 17(4): , [9] Markus Kalisch, Martin Mächler, Diego Colombo, Marloes H. Maathuis, and Peter Bühlmann. Causal inference using graphical models with the r package pcalg. Journal of Statistical Software, 47(11):1 26, [10] John Lafferty, Han Liu, and Larry Wasserman. Sparse nonparametric graphical models. Statistical Science, 27: , [11] Steffen L. Lauritzen. Graphical Models. Oxford University Press, /73

72 References IV [12] Marloes H. Maathuis, Markus Kalisch, and Peter Bühlmann. Estimating high-dimensional intervention effects from observational data. Ann. Statist, 37(6A): , [13] N. Meinshausen and P. Bühlmann. High-dimensional graphs and variable selection with the lasso. Ann. Statist., 34(3): , [14] Pradeep Ravikumar, Martin J. Wainwright, Garvesh Raskutti, and Bin Yu. High-dimensional covariance estimation by minimizing, l 1 -penalized log-determinant divergence. Electronic Journal of Statistics, 5: , /73

73 References V [15] J.G. Scott and C. M. Carvalho. Feature-inclusion stochastic search for gaussian graphical models. J. Comp. Graph. Statist., 17: , [16] P. Spirtes, C. Glymour, and R. Scheines. Causation, prediction, and search. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA, second edition, [17] J. Verhoef and K. Frost. A bayesian hierarchical model for monitoring harbor seal changes in prince william sound, alaska. Environmental and Ecological Statistics, 10: , /73

74 References VI [18] Fanny Villers, Brigitte Schaeffer, Caroline Bertin, and Sylvie Huet. Assessing the Validity Domains of Graphical Gaussian Models in Order to Infer Relationships among Components of Complex Biological Systems. Statistical Applications in Genetics and Molecular Biology, 7, [19] A. Wille and P. Bühlmann. Low-order conditional independence graphs for inferring genetic networks. Stat. Appl. Genet. Mol. Biol., 5:Art. 1, 34 pp. (electronic), [20] F. Wong, C. K. Carter, and R. Kohn. Efficient estimation of covariance selection models. Biometrika, 90(4): , /73

Predicting causal effects in large-scale systems from observational data

Predicting causal effects in large-scale systems from observational data nature methods Predicting causal effects in large-scale systems from observational data Marloes H Maathuis 1, Diego Colombo 1, Markus Kalisch 1 & Peter Bühlmann 1,2 Supplementary figures and text: Supplementary

More information

Learning discrete graphical models via generalized inverse covariance matrices

Learning discrete graphical models via generalized inverse covariance matrices Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,

More information

Introduction to graphical models: Lecture III

Introduction to graphical models: Lecture III Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

Graphical model inference with unobserved variables via latent tree aggregation

Graphical model inference with unobserved variables via latent tree aggregation Graphical model inference with unobserved variables via latent tree aggregation Geneviève Robin Christophe Ambroise Stéphane Robin UMR 518 AgroParisTech/INRA MIA Équipe Statistique et génome 15 decembre

More information

Causal Inference: Discussion

Causal Inference: Discussion Causal Inference: Discussion Mladen Kolar The University of Chicago Booth School of Business Sept 23, 2016 Types of machine learning problems Based on the information available: Supervised learning Reinforcement

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

High-dimensional regression with unknown variance

High-dimensional regression with unknown variance High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f

More information

11 : Gaussian Graphic Models and Ising Models

11 : Gaussian Graphic Models and Ising Models 10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood

More information

Assessing the Validity Domains of Graphical Gaussian Models in order to Infer Relationships among Components of Complex Biological Systems

Assessing the Validity Domains of Graphical Gaussian Models in order to Infer Relationships among Components of Complex Biological Systems * Assessing the Validity Domains of Graphical Gaussian Models in order to Infer Relationships among Components of Complex Biological Systems by Fanny Villers, Brigitte Schaeffer, Caroline Bertin and Sylvie

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

Sparse inverse covariance estimation with the lasso

Sparse inverse covariance estimation with the lasso Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

Robust and sparse Gaussian graphical modelling under cell-wise contamination

Robust and sparse Gaussian graphical modelling under cell-wise contamination Robust and sparse Gaussian graphical modelling under cell-wise contamination Shota Katayama 1, Hironori Fujisawa 2 and Mathias Drton 3 1 Tokyo Institute of Technology, Japan 2 The Institute of Statistical

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

Extended Bayesian Information Criteria for Gaussian Graphical Models

Extended Bayesian Information Criteria for Gaussian Graphical Models Extended Bayesian Information Criteria for Gaussian Graphical Models Rina Foygel University of Chicago rina@uchicago.edu Mathias Drton University of Chicago drton@uchicago.edu Abstract Gaussian graphical

More information

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008 Regularized Estimation of High Dimensional Covariance Matrices Peter Bickel Cambridge January, 2008 With Thanks to E. Levina (Joint collaboration, slides) I. M. Johnstone (Slides) Choongsoon Bae (Slides)

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models

Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models arxiv:1006.3316v1 [stat.ml] 16 Jun 2010 Contents Han Liu, Kathryn Roeder and Larry Wasserman Carnegie Mellon

More information

arxiv: v6 [math.st] 3 Feb 2018

arxiv: v6 [math.st] 3 Feb 2018 Submitted to the Annals of Statistics HIGH-DIMENSIONAL CONSISTENCY IN SCORE-BASED AND HYBRID STRUCTURE LEARNING arxiv:1507.02608v6 [math.st] 3 Feb 2018 By Preetam Nandy,, Alain Hauser and Marloes H. Maathuis,

More information

Sparse Graph Learning via Markov Random Fields

Sparse Graph Learning via Markov Random Fields Sparse Graph Learning via Markov Random Fields Xin Sui, Shao Tang Sep 23, 2016 Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, 2016 1 / 36 Outline 1 Introduction to graph learning

More information

Learning in Bayesian Networks

Learning in Bayesian Networks Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

Inequalities on partial correlations in Gaussian graphical models

Inequalities on partial correlations in Gaussian graphical models Inequalities on partial correlations in Gaussian graphical models containing star shapes Edmund Jones and Vanessa Didelez, School of Mathematics, University of Bristol Abstract This short paper proves

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

Towards an extension of the PC algorithm to local context-specific independencies detection

Towards an extension of the PC algorithm to local context-specific independencies detection Towards an extension of the PC algorithm to local context-specific independencies detection Feb-09-2016 Outline Background: Bayesian Networks The PC algorithm Context-specific independence: from DAGs to

More information

Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models

Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models Han Liu Kathryn Roeder Larry Wasserman Carnegie Mellon University Pittsburgh, PA 15213 Abstract A challenging

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Robust Inverse Covariance Estimation under Noisy Measurements

Robust Inverse Covariance Estimation under Noisy Measurements .. Robust Inverse Covariance Estimation under Noisy Measurements Jun-Kun Wang, Shou-De Lin Intel-NTU, National Taiwan University ICML 2014 1 / 30 . Table of contents Introduction.1 Introduction.2 Related

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 5, 06 Reading: See class website Eric Xing @ CMU, 005-06

More information

Lecture 2 Part 1 Optimization

Lecture 2 Part 1 Optimization Lecture 2 Part 1 Optimization (January 16, 2015) Mu Zhu University of Waterloo Need for Optimization E(y x), P(y x) want to go after them first, model some examples last week then, estimate didn t discuss

More information

The Nonparanormal skeptic

The Nonparanormal skeptic The Nonpara skeptic Han Liu Johns Hopkins University, 615 N. Wolfe Street, Baltimore, MD 21205 USA Fang Han Johns Hopkins University, 615 N. Wolfe Street, Baltimore, MD 21205 USA Ming Yuan Georgia Institute

More information

High-dimensional statistics: Some progress and challenges ahead

High-dimensional statistics: Some progress and challenges ahead High-dimensional statistics: Some progress and challenges ahead Martin Wainwright UC Berkeley Departments of Statistics, and EECS University College, London Master Class: Lecture Joint work with: Alekh

More information

High dimensional sparse covariance estimation via directed acyclic graphs

High dimensional sparse covariance estimation via directed acyclic graphs Electronic Journal of Statistics Vol. 3 (009) 1133 1160 ISSN: 1935-754 DOI: 10.114/09-EJS534 High dimensional sparse covariance estimation via directed acyclic graphs Philipp Rütimann and Peter Bühlmann

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

Structure estimation for Gaussian graphical models

Structure estimation for Gaussian graphical models Faculty of Science Structure estimation for Gaussian graphical models Steffen Lauritzen, University of Copenhagen Department of Mathematical Sciences Minikurs TUM 2016 Lecture 3 Slide 1/48 Overview of

More information

Causal inference (with statistical uncertainty) based on invariance: exploiting the power of heterogeneous data

Causal inference (with statistical uncertainty) based on invariance: exploiting the power of heterogeneous data Causal inference (with statistical uncertainty) based on invariance: exploiting the power of heterogeneous data Peter Bühlmann joint work with Jonas Peters Nicolai Meinshausen ... and designing new perturbation

More information

Multivariate Bernoulli Distribution 1

Multivariate Bernoulli Distribution 1 DEPARTMENT OF STATISTICS University of Wisconsin 1300 University Ave. Madison, WI 53706 TECHNICAL REPORT NO. 1170 June 6, 2012 Multivariate Bernoulli Distribution 1 Bin Dai 2 Department of Statistics University

More information

Causality in Econometrics (3)

Causality in Econometrics (3) Graphical Causal Models References Causality in Econometrics (3) Alessio Moneta Max Planck Institute of Economics Jena moneta@econ.mpg.de 26 April 2011 GSBC Lecture Friedrich-Schiller-Universität Jena

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models 02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput

More information

Proximity-Based Anomaly Detection using Sparse Structure Learning

Proximity-Based Anomaly Detection using Sparse Structure Learning Proximity-Based Anomaly Detection using Sparse Structure Learning Tsuyoshi Idé (IBM Tokyo Research Lab) Aurelie C. Lozano, Naoki Abe, and Yan Liu (IBM T. J. Watson Research Center) 2009/04/ SDM 2009 /

More information

An Introduction to Graphical Lasso

An Introduction to Graphical Lasso An Introduction to Graphical Lasso Bo Chang Graphical Models Reading Group May 15, 2015 Bo Chang (UBC) Graphical Lasso May 15, 2015 1 / 16 Undirected Graphical Models An undirected graph, each vertex represents

More information

Learning With Bayesian Networks. Markus Kalisch ETH Zürich

Learning With Bayesian Networks. Markus Kalisch ETH Zürich Learning With Bayesian Networks Markus Kalisch ETH Zürich Inference in BNs - Review P(Burglary JohnCalls=TRUE, MaryCalls=TRUE) Exact Inference: P(b j,m) = c Sum e Sum a P(b)P(e)P(a b,e)p(j a)p(m a) Deal

More information

Latent Variable Graphical Model Selection Via Convex Optimization

Latent Variable Graphical Model Selection Via Convex Optimization Latent Variable Graphical Model Selection Via Convex Optimization The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

More information

Directed acyclic graphs and the use of linear mixed models

Directed acyclic graphs and the use of linear mixed models Directed acyclic graphs and the use of linear mixed models Siem H. Heisterkamp 1,2 1 Groningen Bioinformatics Centre, University of Groningen 2 Biostatistics and Research Decision Sciences (BARDS), MSD,

More information

Supplementary material to Structure Learning of Linear Gaussian Structural Equation Models with Weak Edges

Supplementary material to Structure Learning of Linear Gaussian Structural Equation Models with Weak Edges Supplementary material to Structure Learning of Linear Gaussian Structural Equation Models with Weak Edges 1 PRELIMINARIES Two vertices X i and X j are adjacent if there is an edge between them. A path

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

The lasso, persistence, and cross-validation

The lasso, persistence, and cross-validation The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University

More information

High dimensional Ising model selection

High dimensional Ising model selection High dimensional Ising model selection Pradeep Ravikumar UT Austin (based on work with John Lafferty, Martin Wainwright) Sparse Ising model US Senate 109th Congress Banerjee et al, 2008 Estimate a sparse

More information

This paper has been submitted for consideration for publication in Biometrics

This paper has been submitted for consideration for publication in Biometrics Biometrics 63, 1?? March 2014 DOI: 10.1111/j.1541-0420.2005.00454.x PenPC: A Two-step Approach to Estimate the Skeletons of High Dimensional Directed Acyclic Graphs Min Jin Ha Department of Biostatistics,

More information

Learning Gaussian Graphical Models with Unknown Group Sparsity

Learning Gaussian Graphical Models with Unknown Group Sparsity Learning Gaussian Graphical Models with Unknown Group Sparsity Kevin Murphy Ben Marlin Depts. of Statistics & Computer Science Univ. British Columbia Canada Connections Graphical models Density estimation

More information

Respecting Markov Equivalence in Computing Posterior Probabilities of Causal Graphical Features

Respecting Markov Equivalence in Computing Posterior Probabilities of Causal Graphical Features Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) Respecting Markov Equivalence in Computing Posterior Probabilities of Causal Graphical Features Eun Yong Kang Department

More information

Genetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig

Genetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig Genetic Networks Korbinian Strimmer IMISE, Universität Leipzig Seminar: Statistical Analysis of RNA-Seq Data 19 June 2012 Korbinian Strimmer, RNA-Seq Networks, 19/6/2012 1 Paper G. I. Allen and Z. Liu.

More information

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM 1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul

More information

arxiv: v3 [stat.me] 10 Mar 2016

arxiv: v3 [stat.me] 10 Mar 2016 Submitted to the Annals of Statistics ESTIMATING THE EFFECT OF JOINT INTERVENTIONS FROM OBSERVATIONAL DATA IN SPARSE HIGH-DIMENSIONAL SETTINGS arxiv:1407.2451v3 [stat.me] 10 Mar 2016 By Preetam Nandy,,

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

High-dimensional Covariance Estimation Based On Gaussian Graphical Models

High-dimensional Covariance Estimation Based On Gaussian Graphical Models High-dimensional Covariance Estimation Based On Gaussian Graphical Models Shuheng Zhou, Philipp Rutimann, Min Xu and Peter Buhlmann February 3, 2012 Problem definition Want to estimate the covariance matrix

More information

High-dimensional learning of linear causal networks via inverse covariance estimation

High-dimensional learning of linear causal networks via inverse covariance estimation High-dimensional learning of linear causal networks via inverse covariance estimation Po-Ling Loh Department of Statistics, University of California, Berkeley, CA 94720, USA Peter Bühlmann Seminar für

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Learning Quadratic Variance Function (QVF) DAG Models via OverDispersion Scoring (ODS)

Learning Quadratic Variance Function (QVF) DAG Models via OverDispersion Scoring (ODS) Journal of Machine Learning Research 18 2018 1-44 Submitted 4/17; Revised 12/17; Published 4/18 Learning Quadratic Variance Function QVF DAG Models via OverDispersion Scoring ODS Gunwoong Park Department

More information

Robustification of the PC-algorithm for Directed Acyclic Graphs

Robustification of the PC-algorithm for Directed Acyclic Graphs Robustification of the PC-algorithm for Directed Acyclic Graphs Markus Kalisch, Peter Bühlmann May 26, 2008 Abstract The PC-algorithm ([13]) was shown to be a powerful method for estimating the equivalence

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

Interpreting and using CPDAGs with background knowledge

Interpreting and using CPDAGs with background knowledge Interpreting and using CPDAGs with background knowledge Emilija Perković Seminar for Statistics ETH Zurich, Switzerland perkovic@stat.math.ethz.ch Markus Kalisch Seminar for Statistics ETH Zurich, Switzerland

More information

GRAPHICAL MODELS FOR HIGH DIMENSIONAL GENOMIC DATA. Min Jin Ha

GRAPHICAL MODELS FOR HIGH DIMENSIONAL GENOMIC DATA. Min Jin Ha GRAPHICAL MODELS FOR HIGH DIMENSIONAL GENOMIC DATA Min Jin Ha A dissertation submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2013 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Meghana Kshirsagar (mkshirsa), Yiwen Chen (yiwenche) 1 Graph

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

Introduction to Graphical Models

Introduction to Graphical Models Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results David Prince Biostat 572 dprince3@uw.edu April 19, 2012 David Prince (UW) SPICE April 19, 2012 1 / 11 Electronic

More information

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Comment on Article by Scutari

Comment on Article by Scutari Bayesian Analysis (2013) 8, Number 3, pp. 543 548 Comment on Article by Scutari Hao Wang Scutari s paper studies properties of the distribution of graphs ppgq. This is an interesting angle because it differs

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R Xingguo Li Tuo Zhao Tong Zhang Han Liu Abstract We describe an R package named picasso, which implements a unified framework

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Likelihood Analysis of Gaussian Graphical Models

Likelihood Analysis of Gaussian Graphical Models Faculty of Science Likelihood Analysis of Gaussian Graphical Models Ste en Lauritzen Department of Mathematical Sciences Minikurs TUM 2016 Lecture 2 Slide 1/43 Overview of lectures Lecture 1 Markov Properties

More information

Statistical Machine Learning for Structured and High Dimensional Data

Statistical Machine Learning for Structured and High Dimensional Data Statistical Machine Learning for Structured and High Dimensional Data (FA9550-09- 1-0373) PI: Larry Wasserman (CMU) Co- PI: John Lafferty (UChicago and CMU) AFOSR Program Review (Jan 28-31, 2013, Washington,

More information

Inverse Covariance Estimation with Missing Data using the Concave-Convex Procedure

Inverse Covariance Estimation with Missing Data using the Concave-Convex Procedure Inverse Covariance Estimation with Missing Data using the Concave-Convex Procedure Jérôme Thai 1 Timothy Hunter 1 Anayo Akametalu 1 Claire Tomlin 1 Alex Bayen 1,2 1 Department of Electrical Engineering

More information

Arrowhead completeness from minimal conditional independencies

Arrowhead completeness from minimal conditional independencies Arrowhead completeness from minimal conditional independencies Tom Claassen, Tom Heskes Radboud University Nijmegen The Netherlands {tomc,tomh}@cs.ru.nl Abstract We present two inference rules, based on

More information

Inferring Brain Networks through Graphical Models with Hidden Variables

Inferring Brain Networks through Graphical Models with Hidden Variables Inferring Brain Networks through Graphical Models with Hidden Variables Justin Dauwels, Hang Yu, Xueou Wang +, Francois Vialatte, Charles Latchoumane ++, Jaeseung Jeong, and Andrzej Cichocki School of

More information

Introduction to Probabilistic Graphical Models

Introduction to Probabilistic Graphical Models Introduction to Probabilistic Graphical Models Kyu-Baek Hwang and Byoung-Tak Zhang Biointelligence Lab School of Computer Science and Engineering Seoul National University Seoul 151-742 Korea E-mail: kbhwang@bi.snu.ac.kr

More information

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010 Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Big & Quic: Sparse Inverse Covariance Estimation for a Million Variables

Big & Quic: Sparse Inverse Covariance Estimation for a Million Variables for a Million Variables Cho-Jui Hsieh The University of Texas at Austin NIPS Lake Tahoe, Nevada Dec 8, 2013 Joint work with M. Sustik, I. Dhillon, P. Ravikumar and R. Poldrack FMRI Brain Analysis Goal:

More information

p L yi z n m x N n xi

p L yi z n m x N n xi y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

More information

Type-II Errors of Independence Tests Can Lead to Arbitrarily Large Errors in Estimated Causal Effects: An Illustrative Example

Type-II Errors of Independence Tests Can Lead to Arbitrarily Large Errors in Estimated Causal Effects: An Illustrative Example Type-II Errors of Independence Tests Can Lead to Arbitrarily Large Errors in Estimated Causal Effects: An Illustrative Example Nicholas Cornia & Joris M. Mooij Informatics Institute University of Amsterdam,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models David Sontag New York University Lecture 4, February 16, 2012 David Sontag (NYU) Graphical Models Lecture 4, February 16, 2012 1 / 27 Undirected graphical models Reminder

More information

arxiv: v1 [stat.ml] 23 Oct 2016

arxiv: v1 [stat.ml] 23 Oct 2016 Formulas for counting the sizes of Markov Equivalence Classes Formulas for Counting the Sizes of Markov Equivalence Classes of Directed Acyclic Graphs arxiv:1610.07921v1 [stat.ml] 23 Oct 2016 Yangbo He

More information