Introduction to graphical models: Lecture III

Similar documents
High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits

Learning and message-passing in graphical models

High dimensional Ising model selection

High dimensional ising model selection using l 1 -regularized logistic regression

Learning discrete graphical models via generalized inverse covariance matrices

High-dimensional statistics: Some progress and challenges ahead

Graphical models and message-passing Part I: Basics and MAP computation

Sparse Graph Learning via Markov Random Fields

Probabilistic Graphical Models

Probabilistic Graphical Models

High-dimensional covariance estimation based on Gaussian graphical models

Large-Deviations and Applications for Learning Tree-Structured Graphical Models

3 : Representation of Undirected GM

Probabilistic Graphical Models

14 : Theory of Variational Inference: Inner and Outer Approximation

11 : Gaussian Graphic Models and Ising Models

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

Markov random fields. The Markov property

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013

Multivariate Bernoulli Distribution 1

Which graphical models are difficult to learn?

Estimators based on non-convex programs: Statistical and computational guarantees

Discrete Markov Random Fields the Inference story. Pradeep Ravikumar

Causal Inference: Discussion

1 Regression with High Dimensional Data

CSC 412 (Lecture 4): Undirected Graphical Models

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Does Better Inference mean Better Learning?

10708 Graphical Models: Homework 2

CS Lecture 19. Exponential Families & Expectation Propagation

Graphical models and message-passing Part II: Marginals and likelihoods

Graphical Model Selection

Undirected Graphical Models

Simultaneous Support Recovery in High Dimensions: Benefits and Perils of Block `1=` -Regularization

Lecture 9: PGM Learning

Learning the Network Structure of Heterogenerous Data via Pairwise Exponential MRF

High-dimensional Covariance Estimation Based On Gaussian Graphical Models

Undirected Graphical Models

Information-Theoretic Limits of Selecting Binary Graphical Models in High Dimensions

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Improved Greedy Algorithms for Learning Graphical Models

Inferning with High Girth Graphical Models

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

The Nonparanormal skeptic

Machine Learning for Data Science (CS4786) Lecture 24

Junction Tree, BP and Variational Methods

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results

13 : Variational Inference: Loopy Belief Propagation

Statistical Learning

Probabilistic Graphical Models

Graphical Models for Collaborative Filtering

Permutation-invariant regularization of large covariance matrices. Liza Levina

Graphical Models and Kernel Methods

Convex relaxation for Combinatorial Penalties

Probabilistic Graphical Models Lecture Notes Fall 2009

arxiv: v1 [stat.ml] 8 Oct 2011

Linear and conic programming relaxations: Graph structure and message-passing

Probabilistic Graphical Models

arxiv: v1 [stat.ap] 19 Oct 2015

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

Extended Bayesian Information Criteria for Gaussian Graphical Models

Learning Quadratic Variance Function (QVF) DAG Models via OverDispersion Scoring (ODS)

Gaussian Graphical Models and Graphical Lasso

Structure Learning of Mixed Graphical Models

Directed and Undirected Graphical Models

Probabilistic Graphical Models

Learning Gaussian Graphical Models with Unknown Group Sparsity

Probabilistic Graphical Models

arxiv: v6 [math.st] 3 Feb 2018

14 : Theory of Variational Inference: Inner and Outer Approximation

Exact MAP estimates by (hyper)tree agreement

On the optimality of tree-reweighted max-product message-passing

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Variational Inference (11/04/13)

Undirected graphical models

STAT 200C: High-dimensional Statistics

Estimation of Graphical Models with Shape Restriction

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory

High-Dimensional Learning of Linear Causal Networks via Inverse Covariance Estimation

Probabilistic Graphical Models

Non-Asymptotic Analysis for Relational Learning with One Network

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming

Undirected Graphical Models: Markov Random Fields

Estimating Latent Variable Graphical Models with Moments and Likelihoods

Inconsistent parameter estimation in Markov random fields: Benefits in the computation-limited setting

Variational Inference. Sargur Srihari

Probabilistic Graphical Models

Semidefinite relaxations for approximate inference on graphs with cycles

Robust Inverse Covariance Estimation under Noisy Measurements

1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)

Rapid Introduction to Machine Learning/ Deep Learning

13: Variational inference II

Learning Bayesian network : Given structure and completely observed data

Independencies. Undirected Graphical Models 2: Independencies. Independencies (Markov networks) Independencies (Bayesian Networks)

Intelligent Systems:

New ways of dimension reduction? Cutting data sets into small pieces

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Linear Response for Approximate Inference

Transcription:

Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25

Introduction Markov random fields (undirected graphical models): central in many application areas of science/engineering: Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 2 / 25

Introduction Markov random fields (undirected graphical models): central in many application areas of science/engineering: some fundamental problems counting/integrating: computing marginal distributions and partition functions optimization: computing most probable configurations (or top M-configurations) model selection: fitting and selecting models on the basis of data Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 2 / 25

Graph structure and factorization Markov random field: random vector (X 1,...,X p ) with distribution factoring according to a graph G = (V,E): D A B C Hammersley-Clifford theorem: factorization over cliques Q(x 1,...,x p ;θ) = 1 Z(θ) exp{ θ C (x C ) } C C

Graphical model selection let G = (V,E) be an undirected graph on p = V vertices pairwise graphical model factorizes over edges of graph: Q(x 1,...,x p ;θ) exp { θ st (x s,x t ) }. (s,t) E given n independent and identically distributed (i.i.d.) samples of X = (X 1,...,X p ), identify the underlying graph structure Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 4 / 25

1 Exact solutions Various classes of methods Chow-Liu algorithm for trees (Chow & Liu, 1967) computationally intractable for hypertrees (Srebro & Karger, 2001)

1 Exact solutions Various classes of methods Chow-Liu algorithm for trees (Chow & Liu, 1967) computationally intractable for hypertrees (Srebro & Karger, 2001) 2 Testing-based approaches PC algorithm (Spirtes et al., 2000; Kalisch & Bühlmann, 2008) thresholding (Bresler et al., 2008; Anandkumar et al., 2010)

1 Exact solutions Various classes of methods Chow-Liu algorithm for trees (Chow & Liu, 1967) computationally intractable for hypertrees (Srebro & Karger, 2001) 2 Testing-based approaches PC algorithm (Spirtes et al., 2000; Kalisch & Bühlmann, 2008) thresholding (Bresler et al., 2008; Anandkumar et al., 2010) 3 Penalized forms of global likelihood combinatorial penalties (AIC, BIC, GIC etc.) l1 and related penalties classical analysis of penalized Gaussian MLE: Yuan & Lin, 2006 some fast algorithms: d Asprémont et al., 2007; Friedman et al, 2008

1 Exact solutions Various classes of methods Chow-Liu algorithm for trees (Chow & Liu, 1967) computationally intractable for hypertrees (Srebro & Karger, 2001) 2 Testing-based approaches PC algorithm (Spirtes et al., 2000; Kalisch & Bühlmann, 2008) thresholding (Bresler et al., 2008; Anandkumar et al., 2010) 3 Penalized forms of global likelihood combinatorial penalties (AIC, BIC, GIC etc.) l1 and related penalties classical analysis of penalized Gaussian MLE: Yuan & Lin, 2006 some fast algorithms: d Asprémont et al., 2007; Friedman et al, 2008 4 Pseudolikelihoods and neighborhood regression pseudolikeliood consistency for Gaussians (Besag, 1977) pseudolikelihood and BIC criterion (Csiszar & Talata, 2006) neighborhood regression for Gaussian MRFs (e.g., Meinshausen & Buhlmann, 2005; Wainwright, 2006, Zhao & Yu 2006) logistic regression for Ising models (Ravikumar et al., 2010)

1. Global maximum likelihood given i.i.d. samples X n 1 := {(X 1,...,X n }, might consider methods based on global likelihood l(θ;x n 1) := 1 n n i=1 logq(x i ;θ) Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 6 / 25

1. Global maximum likelihood given i.i.d. samples X n 1 := {(X 1,...,X n }, might consider methods based on global likelihood l(θ;x n 1) := 1 n n i=1 logq(x i ;θ) maximum likelihood for graphical model in exponential form θ = argmax Ê[θ(X θ s,x t )] }{{} logz(θ) (s,t) E empirical moments Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 6 / 25

1. Global maximum likelihood given i.i.d. samples X n 1 := {(X 1,...,X n }, might consider methods based on global likelihood l(θ;x n 1) := 1 n n i=1 logq(x i ;θ) maximum likelihood for graphical model in exponential form θ = argmax Ê[θ(X θ s,x t )] }{{} logz(θ) (s,t) E empirical moments exact likelihood involves log partition function log Z(θ): can be computed for Gaussian MRFs (log-determinant) intractable for Ising models (binary pairwise MRFs) (Welsh, 1993) Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 6 / 25

1. Global maximum likelihood given i.i.d. samples X n 1 := {(X 1,...,X n }, might consider methods based on global likelihood l(θ;x n 1) := 1 n n i=1 logq(x i ;θ) maximum likelihood for graphical model in exponential form θ = argmax Ê[θ(X θ s,x t )] }{{} logz(θ) (s,t) E empirical moments exact likelihood involves log partition function log Z(θ): can be computed for Gaussian MRFs (log-determinant) intractable for Ising models (binary pairwise MRFs) (Welsh, 1993) possible solutions: MCMC methods stochastic approximation methods variational approximations (mean field, Bethe and belief propagation) Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 6 / 25

Gaussian graphs Sparse inverse covariances 1 2 1 Zero pattern of inverse covariance 2 3 3 4 5 4 5 1 2 3 4 5 Gaussian graphical model specified by sparse inverse covariance Θ: Q(x 1,...,x p ;Θ) = det(θ) (2π) p/2 exp( 1 2 xt Θx ). Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 7 / 25

Gaussian l 1 -penalized MLE Estimator: l 1 -regularized log-determinant program: { Θ = arg min logdetθ+ Σn, Θ Θ 0 }{{} Gaussian log likelihood + λ n Θ ij }. i j }{{} Regularization Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 8 / 25

Gaussian l 1 -penalized MLE Estimator: l 1 -regularized log-determinant program: { Θ = arg min logdetθ+ Σn, Θ Θ 0 }{{} Gaussian log likelihood + λ n Θ ij }. i j }{{} Regularization Results on this method: analysis under classical scaling (n with p fixed) (Yuan & Lin, 2006) some fast algorithms (d Asprémont et al., 2007; Friedman et al, 2008) high-dimensional analysis of Frobenius norm error (Rothman et al., 2008) high-dimensional variable selection and l bounds (Ravikumar et al., 2011) Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 8 / 25

High-dimensional analysis classical analysis: dimension p fixed, sample size n + high-dimensional analysis: allow both dimension p, sample size n, and maximum degree d to increase at arbitrary rates take n i.i.d. samples from MRF defined by G p,d study probability of success as a function of three parameters: Success(n,p,d) = Q[Method recovers graph G p,d from n samples] theory is non-asymptotic: explicit probabilities for finite (n, p, d) Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 9 / 25

Empirical behavior: Unrescaled plots 1 Chain graph 0.8 Prob. of success 0.6 0.4 p=64 0.2 p=100 p=225 p=375 0 100 200 300 400 500 600 700 n Plots of success probability versus raw sample size n.

Empirical behavior: Appropriately rescaled 1 Chain graph 0.8 Prob. of success 0.6 0.4 p=64 0.2 p=100 p=225 p=375 0 0 50 100 n/log p 150 200 Plots of success probability versus rescaled sample size

Sufficient conditions for consistent model selection graph sequences G p,d = (V,E) with p vertices, and maximum degree d. suitable regularity conditions on Hessian of log-determinant Γ := (Θ ) 1 (Θ ) 1 Theorem: For multivariate Gaussian and sample size n > c 1 τ d 2 logp logp and regularization parameter λ n c 2 τ n, then with probability greater than 1 2exp ( c 3 (τ 2)logp ) : Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 12 / 25

Sufficient conditions for consistent model selection graph sequences G p,d = (V,E) with p vertices, and maximum degree d. suitable regularity conditions on Hessian of log-determinant Γ := (Θ ) 1 (Θ ) 1 Theorem: For multivariate Gaussian and sample size n > c 1 τ d 2 logp logp and regularization parameter λ n c 2 τ n, then with probability greater than 1 2exp ( c 3 (τ 2)logp ) : (a) No false inclusions: The regularized log-determinant estimate Θ returns an edge set Ê E. Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 12 / 25

Sufficient conditions for consistent model selection graph sequences G p,d = (V,E) with p vertices, and maximum degree d. suitable regularity conditions on Hessian of log-determinant Γ := (Θ ) 1 (Θ ) 1 Theorem: For multivariate Gaussian and sample size n > c 1 τ d 2 logp logp and regularization parameter λ n c 2 τ n, then with probability greater than 1 2exp ( c 3 (τ 2)logp ) : (a) No false inclusions: The regularized log-determinant estimate Θ returns an edge set Ê E. (b) l -control: Estimate satisfies max i,j Θ ij Θ ij 2c τ logp 4 n. Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 12 / 25

Sufficient conditions for consistent model selection graph sequences G p,d = (V,E) with p vertices, and maximum degree d. suitable regularity conditions on Hessian of log-determinant Γ := (Θ ) 1 (Θ ) 1 Theorem: For multivariate Gaussian and sample size n > c 1 τ d 2 logp logp and regularization parameter λ n c 2 τ n, then with probability greater than 1 2exp ( c 3 (τ 2)logp ) : (a) No false inclusions: The regularized log-determinant estimate Θ returns an edge set Ê E. (b) l -control: Estimate satisfies max i,j Θ ij Θ ij 2c τ logp 4 n. τ logp (c) Model selection consistency: If θ min c 4 n, then E = Ê. Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 12 / 25

Some other graphs (a) 4-grid (b) Star d = 4 d {O(logp), αp} Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 13 / 25

Results for 4-grid graphs Vertical axis: success probability Q[Ê = E] Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 14 / 25 1 4 nearest neighbor grid 0.8 Prob. of success 0.6 0.4 p=64 0.2 p=100 p=225 p=375 0 20 40 60 n/log p 80 100

Results for star graphs Vertical axis: success probability Q[Ê = E] Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 15 / 25 1 Star graph 0.8 Prob. of success 0.6 0.4 p=64 0.2 p=100 p=225 p=375 0 20 40 60 80 100 120 140 n/log p

Proof sketch: Primal-dual certificate construct candidate primal-dual pair ( θ,ẑ) R p p R p p. proof technique -not a practical algorithm! (A) Solve the restricted log-determinant program θ = arg min Θ 0,Θ S c=0 { logdetθ+ Σn, Θ + λ n Θ ij } thereby obtaining candidate solution θ = ( θ S,0 S c). (B) We choose ẑ S R S as an element of the subdifferential θ S 1. (C) Using optimality conditions from original convex program, solve for ẑ S c and check whether or not strict dual feasibility ẑ j < 1 for all j S c holds. i j Lemma: Full convex program recovers neighborhood primal-dual witness succeeds. Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 16 / 25

2. Pseudolikelihood and neighborhood approaches Markov properties encode neighborhood structure: (X s X V\s ) }{{} Condition on full graph d = (X s X N(s) ) }{{} Condition on Markov blanket N(s) = {s,t,u,v,w} X s X t X w X s X u X v basis of pseudolikelihood method (Besag, 1974) basis of many graph learning algorithm (Friedman et al., 1999; Csiszar & Talata, 2005; Abeel et al., 2006; Meinshausen & Buhlmann, 2006)

Graph selection via neighborhood regression 1 0 0 1 1 0 1 0 0 1 1 1 0 1 0 1 0 1 1 0 0 0 0 1 1 1 1 0 0 1 0 0.......... 1 1 1 1 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 1 0 1 0 1 0 1 0 0 0 1 0 1 X \s 1 0 0 0 0 1 1 X s..... Predict X s based on X \s := {X s, t s}.

Graph selection via neighborhood regression 1 0 0 1 1 0 1 0 0 1 1 1 0 1 0 1 0 1 1 0 0 0 0 1 1 1 1 0 0 1 0 0.......... 1 1 1 1 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 1 0 1 0 1 0 1 0 0 0 1 0 1 X \s 1 0 0 0 0 1 1 X s..... Predict X s based on X \s := {X s, t s}. 1 For each node s V, compute (regularized) max. likelihood estimate: { } θ[s] := arg min 1 n L(θ;X i\s ) + λ n θ 1 θ R p 1 n }{{}}{{} i=1 local log. likelihood regularization

Graph selection via neighborhood regression 1 0 0 1 1 0 1 0 0 1 1 1 0 1 0 1 0 1 1 0 0 0 0 1 1 1 1 0 0 1 0 0.......... 1 1 1 1 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 1 0 1 0 1 0 1 0 0 0 1 0 1 X \s 1 0 0 0 0 1 1 X s..... Predict X s based on X \s := {X s, t s}. 1 For each node s V, compute (regularized) max. likelihood estimate: { } θ[s] := arg min 1 n L(θ;X i\s ) + λ n θ 1 θ R p 1 n }{{}}{{} i=1 local log. likelihood regularization 2 Estimate the local neighborhood N(s) as support of regression vector θ[s] R p 1.

Empirical behavior: Unrescaled plots 1 Star graph; Linear fraction neighbors 0.8 Prob. success 0.6 0.4 0.2 p = 64 p = 100 p = 225 0 0 100 200 300 400 500 600 Number of samples

Empirical behavior: Appropriately rescaled 1 Star graph; Linear fraction neighbors 0.8 Prob. success 0.6 0.4 0.2 p = 64 p = 100 p = 225 0 0 0.5 1 1.5 2 Control parameter

Sufficient conditions for consistent Ising selection graph sequences G p,d = (V,E) with p vertices, and maximum degree d. edge weights θ st θ min for all (s,t) E draw n i.i.d, samples, and analyze prob. success indexed by (n,p,d) Theorem (Ravikumar, W. & Lafferty, 2010)

Sufficient conditions for consistent Ising selection graph sequences G p,d = (V,E) with p vertices, and maximum degree d. edge weights θ st θ min for all (s,t) E draw n i.i.d, samples, and analyze prob. success indexed by (n,p,d) Theorem (Ravikumar, W. & Lafferty, 2010) Under incoherence conditions, with sample size n > c 1 d 3 logp logp and regularization parameter λ n c 2 n, then with probability greater than 1 2exp ( c 3 λ 2 nn ) : (a) Correct exclusion: The estimated sign neighborhood N(s) correctly excludes all edges not in the true neighborhood.

Sufficient conditions for consistent Ising selection graph sequences G p,d = (V,E) with p vertices, and maximum degree d. edge weights θ st θ min for all (s,t) E draw n i.i.d, samples, and analyze prob. success indexed by (n,p,d) Theorem (Ravikumar, W. & Lafferty, 2010) Under incoherence conditions, with sample size n > c 1 d 3 logp logp and regularization parameter λ n c 2 n, then with probability greater than 1 2exp ( c 3 λ 2 nn ) : (a) Correct exclusion: The estimated sign neighborhood N(s) correctly excludes all edges not in the true neighborhood. (b) Correct inclusion: For θ min c 4 dλn, the method selects the correct signed neighborhood.

US Senate network (2004 2006 voting)

3. Info. theory: Graph selection as channel coding graphical model selection is an unorthodox channel coding problem: Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 23 / 25

3. Info. theory: Graph selection as channel coding graphical model selection is an unorthodox channel coding problem: codewords/codebook: graph G in some graph class G channel use: draw sample Xi = (X i1,...,x ip) from Markov random field Q θ(g) decoding problem: use n samples {X1,...,X n } to correctly distinguish the codeword G Q(X G) X 1,...,X n Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 23 / 25

3. Info. theory: Graph selection as channel coding graphical model selection is an unorthodox channel coding problem: codewords/codebook: graph G in some graph class G channel use: draw sample Xi = (X i1,...,x ip) from Markov random field Q θ(g) decoding problem: use n samples {X1,...,X n } to correctly distinguish the codeword G Q(X G) X 1,...,X n Channel capacity for graph decoding determined by balance between log number of models relative distinguishability of different models Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 23 / 25

Necessary conditions for G d,p G G d,p : graphs with p nodes and max. degree d Ising models with: Minimum edge weight: θ st θ min for all edges Maximum neighborhood weight: ω(θ) := max s V θst t N(s)

Necessary conditions for G d,p G G d,p : graphs with p nodes and max. degree d Ising models with: Minimum edge weight: θ st θ min for all edges Maximum neighborhood weight: ω(θ) := max s V θst t N(s) Theorem If the sample size n is upper bounded by (Santhanam & W, 2012) { d n < max 8 log p 8d, exp( ω(θ) 4 )dθ minlog(pd/8), 128exp( 3θmin 2 ) logp } 2θ min tanh(θ min ) then the probability of error of any algorithm over G d,p is at least 1/2.

Necessary conditions for G d,p G G d,p : graphs with p nodes and max. degree d Ising models with: Minimum edge weight: θ st θ min for all edges Maximum neighborhood weight: ω(θ) := max s V θst t N(s) Theorem If the sample size n is upper bounded by (Santhanam & W, 2012) { d n < max 8 log p 8d, exp( ω(θ) 4 )dθ minlog(pd/8), 128exp( 3θmin 2 ) logp } 2θ min tanh(θ min ) then the probability of error of any algorithm over G d,p is at least 1/2. Interpretation: Naive bulk effect: Arises from log cardinality log G d,p d-clique effect: Difficulty of separating models that contain a near d-clique Small weight effect: Difficult to detect edges with small weights.

Some consequences Corollary For asymptotically reliable recovery over G d,p, any algorithm requires at least n = Ω(d 2 logp) samples. Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 25 / 25

Some consequences Corollary For asymptotically reliable recovery over G d,p, any algorithm requires at least n = Ω(d 2 logp) samples. note that maximum neighborhood weight ω(θ ) dθ min = require θ min = O(1/d) Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 25 / 25

Some consequences Corollary For asymptotically reliable recovery over G d,p, any algorithm requires at least n = Ω(d 2 logp) samples. note that maximum neighborhood weight ω(θ ) dθ min = require θ min = O(1/d) from small weight effect logp n = Ω( θ min tanh(θ min ) ) = Ω( logp) θ 2 min Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 25 / 25

Some consequences Corollary For asymptotically reliable recovery over G d,p, any algorithm requires at least n = Ω(d 2 logp) samples. note that maximum neighborhood weight ω(θ ) dθ min = require θ min = O(1/d) from small weight effect logp n = Ω( θ min tanh(θ min ) ) = Ω( logp) θ 2 min conclude that l 1 -regularized logistic regression (LR) is within Θ(d) of optimal for general graphs Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 25 / 25