Learning and message-passing in graphical models
|
|
- Melina Sutton
- 5 years ago
- Views:
Transcription
1 Learning and message-passing in graphical models Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Graphical models and message-passing 1 / 41
2 graphical model: Introduction graph G = (V,E) with p vertices random vector: (X 1,X 2,...,X p ) (a) Markov chain (b) Multiscale quadtree (c) Two-dimensional grid useful in many statistical and computational fields: statistical signal processing statistical image processing statistical machine learning computational biology, bioinformatics statistical physics communication and information theory Martin Wainwright (UC Berkeley) Graphical models and message-passing 2 / 41
3 Graphs and factorization 2 ψ ψ ψ clique C is a fully connected subset of vertices compatibility function ψ C defined on variables x C = {x s,s C} factorization over all cliques Q(x 1,...,x p ) = 1 Z ψ C (x C ). C C Martin Wainwright (UC Berkeley) Graphical models and message-passing 3 / 41
4 Example: Optical digit/character recognition Goal: correctly label digits/characters based on noisy versions E.g., mail sorting; document scanning; handwriting recognition systems
5 Example: Optical digit/character recognition Goal: correctly label digits/characters based on noisy versions strong sequential dependencies captured by (hidden) Markov chain message-passing spreads information along chain (Baum & Petrie, 1966; Viterbi, 1967, and many others)
6 Example: Image denoising with grid models 8-bit digital image: matrix of intensity values {0,1,...255}
7 Example: Image denoising with grid models ψ uv (x u,x v ) x u x v 8-bit digital image: matrix of intensity values {0,1,...255} 1 γ γ pairwise interaction ψ uv (x u,x v ) = γ 1 γ γ γ 1 γ (0,1) enforces smoothness: {x u = x v } more likely than {x u x v }
8 Example: Multi-scale modeling often useful to move beyond pixel-based representations in image modelling
9 Example: Multi-scale modeling often useful to move beyond pixel-based representations in image modelling compute a multi-scale transform (e.g., wavelets, Gabor filters etc.)
10 Example: Multi-scale modeling often useful to move beyond pixel-based representations in image modelling compute a multi-scale transform (e.g., wavelets, Gabor filters etc.) model coefficients using a multi-scale autoregressive process on a tree (e.g., Willsky, 2002)
11 Example: Depth estimation in computer vision Stereo pairs: two images taken from horizontally-offset cameras
12 Modeling depth with a graphical model Introduce variable at pixel location (a, b): x ab Offset between images in position (a,b) ψ cd (x cd ) ψ ab,cd (x ab,x cd ) ψ ab (x ab ) Left image Right image Use message-passing algorithms to estimate most likely offset/depth map. (Szeliski et al., 2005)
13 Many other examples natural language processing (e.g., parsing, translation) computational biology (gene sequences, protein folding, phylogenetic reconstruction) sensor network deployments (e.g., distributed detection, estimation, fault monitoring) social network analysis (e.g., politics, Facebook, terrorism.) communication theory and error-control decoding (e.g., turbo codes, LDPC codes) satisfiability problems (3-SAT, MAX-XORSAT, graph colouring) robotics (path planning, tracking, navigation)... Martin Wainwright (UC Berkeley) Graphical models and message-passing 9 / 41
14 Remainder of talk 1 Message-passing in graphical models Marginalization and applications Based form of sum-product algorithm (belief propagation) Stochastic belief propagation 2 Learning graphical structure in data Applications Neighborhood-based regression Application to social network data
15 1. Marginalization and message-passing Define marginal distribution at vertex u V: Q(x u y) = Q(x 1,x 2,...,x N y) x v,v V\{u}
16 1. Marginalization and message-passing Define marginal distribution at vertex u V: Q(x u y) = Q(x 1,x 2,...,x N y) x v,v V\{u} Useful in many applications: Optical character recognition
17 1. Marginalization and message-passing Define marginal distribution at vertex u V: Q(x u y) = Q(x 1,x 2,...,x N y) x v,v V\{u} Useful in many applications: Computer vision: computation disparity Q( x }{{} u y 1,...,y }{{ N } ) depth at pixel u multiple images
18 Divide and conquer Goal: Compute marginal distribution at node 1. M 21 M x 2,x 3 Q(x 1,x 2,x 3 ) x 2,x 3 { } ψ 1 (x 1 )ψ 2 (x 2 )ψ 3 (x 3 )ψ 12 (x 1,x 2 )ψ 23 (x 2,x 3 )
19 Divide and conquer Goal: Compute marginal distribution at node 1. M 21 M x 2,x 3 Q(x 1,x 2,x 3 ) x 2,x 3 { } ψ 1 (x 1 )ψ 2 (x 2 )ψ 3 (x 3 )ψ 12 (x 1,x 2 )ψ 23 (x 2,x 3 ) Q(x 1,x 2,x 3 ) ψ 1 (x 1 ) {ψ 2 (x 2 )ψ 12 (x 1,x 2 ) { ψ 3 (x 3 ) ψ 23 (x 2,x 3 ) } } x 2,x 3 x 2 x 3 }{{} M 32(x 2)
20 Divide and conquer Goal: Compute marginal distribution at node 1. M 21 M x 2,x 3 Q(x 1,x 2,x 3 ) x 2,x 3 { } ψ 1 (x 1 )ψ 2 (x 2 )ψ 3 (x 3 )ψ 12 (x 1,x 2 )ψ 23 (x 2,x 3 ) Q(x 1,x 2,x 3 ) ψ 1 (x 1 ) {ψ 2 (x 2 )ψ 12 (x 1,x 2 ) { ψ 3 (x 3 ) ψ 23 (x 2,x 3 ) } } x 2,x 3 x 2 x 3 }{{} M 32(x 2) } {{ } M 21(x 1)
21 Decompose: Iterating the procedure Q(x) [ ψ 1 (x 1 ) ψ 12 (x 1,x 2 ) ] u N(2) M u2(x 2 ). x 2,x 3,x 4,x 5 x 2 M 21 M 32 4 M M 53 5 Update messages: M 32 (x 2 ) = x 3 ψ 3 (x 3 ) ψ 23 (x 2,x 3 ) M v3 (x 3 ) v N(3)\{2}
22 Ordinary belief propagation Algorithm 1 At time t = 0, for all edges (v,u) E, initialize message vectors: M 0 vu(k) = 1 for all states k {1,...,d}.
23 Ordinary belief propagation Algorithm 1 At time t = 0, for all edges (v,u) E, initialize message vectors: M 0 vu(k) = 1 for all states k {1,...,d}. 2 At time t = 0,1,2,..., for each edge directed (v u), update message: M t+1 vu (k) d ψ v (j)ψ uv (k,j) j=1 w N(u)\{v} M t wv(j).
24 Ordinary belief propagation Algorithm 1 At time t = 0, for all edges (v,u) E, initialize message vectors: M 0 vu(k) = 1 for all states k {1,...,d}. 2 At time t = 0,1,2,..., for each edge directed (v u), update message: M t+1 vu (k) d ψ v (j)ψ uv (k,j) j=1 w N(u)\{v} M t wv(j). 3 Upon convergence, compute (approximate) marginal distributions: Q[X u = k] ψ u(k) w N(u) M wu(k).
25 Ordinary belief propagation Algorithm 1 At time t = 0, for all edges (v,u) E, initialize message vectors: M 0 vu(k) = 1 for all states k {1,...,d}. 2 At time t = 0,1,2,..., for each edge directed (v u), update message: M t+1 vu (k) d ψ v (j)ψ uv (k,j) j=1 w N(u)\{v} M t wv(j). 3 Upon convergence, compute (approximate) marginal distributions: Q[X u = k] ψ u(k) w N(u) M wu(k). Computational complexity: Θ(d 2 ) operations per edge per round Communication complexity: Requires passing d 1 real numbers per edge.
26 Some reduced complexity methods faster methods for problems with algebraic structure: low-density parity check codes (e.g., Gallager, 1963; Richardson & Urbanke, 2008) Toeplitz matrices in computer vision (Felzenszwalb & Huttenlocher, 2006) exploiting symmetries (Kersten et al., 2009; McAyley & Caetano, 2011) Martin Wainwright (UC Berkeley) Graphical models and message-passing 15 / 41
27 Some reduced complexity methods faster methods for problems with algebraic structure: low-density parity check codes (e.g., Gallager, 1963; Richardson & Urbanke, 2008) Toeplitz matrices in computer vision (Felzenszwalb & Huttenlocher, 2006) exploiting symmetries (Kersten et al., 2009; McAyley & Caetano, 2011) quantized forms of belief propagation (Gallager, 1963; Coughlan & Shen, 2007) lower complexity (computation and communication) do not converge to exact BP fixed point Martin Wainwright (UC Berkeley) Graphical models and message-passing 15 / 41
28 Some reduced complexity methods faster methods for problems with algebraic structure: low-density parity check codes (e.g., Gallager, 1963; Richardson & Urbanke, 2008) Toeplitz matrices in computer vision (Felzenszwalb & Huttenlocher, 2006) exploiting symmetries (Kersten et al., 2009; McAyley & Caetano, 2011) quantized forms of belief propagation (Gallager, 1963; Coughlan & Shen, 2007) lower complexity (computation and communication) do not converge to exact BP fixed point particle filtering and non-parametric belief propagation (Doucet et al., 2001; Sudderth et al., 2003) designed for continuous state spaces high-fidelity requires taking number of particles to infinity Martin Wainwright (UC Berkeley) Graphical models and message-passing 15 / 41
29 Stochastic belief propagation Off-line phase: Pre-compute the normalized columns Γ uv (:,j) = ψ uv(:,j)ψ v (j) d where β uv (j) = ψ uv (k,j)ψ v (j). β uv (j) k=1
30 Stochastic belief propagation Off-line phase: Pre-compute the normalized columns Γ uv (:,j) = ψ uv(:,j)ψ v (j) d where β uv (j) = ψ uv (k,j)ψ v (j). β uv (j) Stochastic belief propagation (BP) k=1 1 Initialize message vectors M 0 vu(k) = 1 for all k {1,...,d}.
31 Stochastic belief propagation Off-line phase: Pre-compute the normalized columns Γ uv (:,j) = ψ uv(:,j)ψ v (j) d where β uv (j) = ψ uv (k,j)ψ v (j). β uv (j) Stochastic belief propagation (BP) k=1 1 Initialize message vectors M 0 vu(k) = 1 for all k {1,...,d}. 2 At time t = 0,1,2,..., for each directed edge (v u): (A) Compute product of incoming messages: M v\u (j) := w N(v)\{u} M t wv(j).
32 Stochastic belief propagation Off-line phase: Pre-compute the normalized columns Γ uv (:,j) = ψ uv(:,j)ψ v (j) d where β uv (j) = ψ uv (k,j)ψ v (j). β uv (j) Stochastic belief propagation (BP) k=1 1 Initialize message vectors M 0 vu(k) = 1 for all k {1,...,d}. 2 At time t = 0,1,2,..., for each directed edge (v u): (A) Compute product of incoming messages: M v\u (j) := w N(v)\{u} M t wv(j). (B) Pick a random index J t vu according to probability distribution: p t vu(j) M v\u (j) β uv(j) for j {1,...,d}.
33 Stochastic belief propagation Off-line phase: Pre-compute the normalized columns Γ uv (:,j) = ψ uv(:,j)ψ v (j) d where β uv (j) = ψ uv (k,j)ψ v (j). β uv (j) Stochastic belief propagation (BP) k=1 1 Initialize message vectors M 0 vu(k) = 1 for all k {1,...,d}. 2 At time t = 0,1,2,..., for each directed edge (v u): (A) Compute product of incoming messages: M v\u (j) := w N(v)\{u} M t wv(j). (B) Pick a random index J t vu according to probability distribution: p t vu(j) M v\u (j) β uv(j) for j {1,...,d}. (C) Update message vector M t+1 vu (:) R d with stepsize λ t (0,1): M t+1 vu (:) = (1 λ t ) M t vu(:)+λ t Γuv ( :,J t vu ).
34 Stochastic belief propagation Stochastic belief propagation (BP) 1 Initialize message vectors M 0 vu(k) = 1 for all k {1,...,d}. 2 At time t = 0,1,2,..., for each directed edge (v u): (A) Compute product of incoming messages: M v\u (j) := w N(v)\{u} M t wv(j). (B) Pick a random index J t vu according to probability distribution: p t vu(j) M v\u (j) β uv(j) for j {1,...,d}. (C) Update message vector M t+1 vu (:) R d with stepsize λ t (0,1): M t+1 vu (:) = (1 λ t ) M t vu(:)+λ t Γuv ( :,J t vu ). Computational complexity: Θ(d) operations per edge per round Communication complexity: Requires passing log(d) bits per edge.
35 Sample paths on grid-structured graph 10 0 Normalized squared error Number of iterations Martin Wainwright (UC Berkeley) Graphical models and message-passing 17 / 41
36 Some intuition stochastic BP update is unbiased at each round E[M t+1 vu M t vu] = (1 λ t )M t vu(:)+λ t d p uv (j) Γ uv (:, j) j=1 Martin Wainwright (UC Berkeley) Graphical models and message-passing 18 / 41
37 Some intuition stochastic BP update is unbiased at each round E[M t+1 vu M t vu] = (1 λ t )M t vu(:)+λ t d ψ uv (:,j)ψ v (j) j=1 w N(v)\{u} Mwv t ( ) : Martin Wainwright (UC Berkeley) Graphical models and message-passing 18 / 41
38 Some intuition stochastic BP update is unbiased at each round E[M t+1 vu M t vu] = (1 λ t )M t vu(:)+λ t d ψ uv (:,j)ψ v (j) j=1 w N(v)\{u} Mwv t ( ) : variance will depend on step sizes {λ t } t=1 need step size λ t 0 to smooth out fluctuations need t=1 λt + for infinite travel Martin Wainwright (UC Berkeley) Graphical models and message-passing 18 / 41
39 Some intuition stochastic BP update is unbiased at each round E[M t+1 vu M t vu] = (1 λ t )M t vu(:)+λ t d ψ uv (:,j)ψ v (j) j=1 w N(v)\{u} Mwv t ( ) : variance will depend on step sizes {λ t } t=1 need step size λ t 0 to smooth out fluctuations need t=1 λt + for infinite travel Challenges 1 Need stability condition = average path is well-behaved. 2 Control on spread of noise through network Martin Wainwright (UC Berkeley) Graphical models and message-passing 18 / 41
40 SBP on graphs with cycles Assume that ordinary BP is µ-contractive: F(M) F(M ) 2 (1 µ/2) M M 2.
41 SBP on graphs with cycles Assume that ordinary BP is µ-contractive: F(M) F(M ) 2 (1 µ/2) M M 2. Theorem With step size λ t = 1 µ(t+1), the SBP iterates have the following guarantees: (a) The SBP messages {M t } t=0 converge almost surely to unique BP fixed point M as t +.
42 SBP on graphs with cycles Assume that ordinary BP is µ-contractive: F(M) F(M ) 2 (1 µ/2) M M 2. Theorem With step size λ t = 1 µ(t+1), the SBP iterates have the following guarantees: (a) The SBP messages {M t } t=0 converge almost surely to unique BP fixed point M as t +. (b) The expected squared error is bounded as: E[ M t M 2 2 M 2 2 K(ψ) (1 c 0 µ 2 t ) +c1 M 0 M 2 2 M 2 2 (1) 2, t where K(ψ) is a constant depending on potential functions ψ.
43 SBP on graphs with cycles Assume that ordinary BP is µ-contractive: F(M) F(M ) 2 (1 µ/2) M M 2. Theorem With step size λ t = 1 µ(t+1), the SBP iterates have the following guarantees: (a) The SBP messages {M t } t=0 converge almost surely to unique BP fixed point M as t +. (b) The expected squared error is bounded as: E[ M t M 2 2 M 2 2 K(ψ) (1 c 0 µ 2 t ) +c1 M 0 M 2 2 M 2 2 (1) 2, t where K(ψ) is a constant depending on potential functions ψ. (c) For every q 1, with probability at least 1 2 t q, we have M t M 2 2 M 2 2 (1+2q) 32K(ψ) µ 2 (logt t 3/4 ).
44 Complexity comparison Algorithm Communication Computation Iteration Ordinary BP d 1 real numbers Θ(d 2 ) log(1/δ) Stochastic BP log d bits Θ(d) (1/δ) Martin Wainwright (UC Berkeley) Graphical models and message-passing 20 / 41
45 Denoising Martin Wainwright (UC Berkeley) Graphical models and message-passing 21 / 41
46 Denoising Ordinary BP SBP Martin Wainwright (UC Berkeley) Graphical models and message-passing 21 / 41
47 Comparison of running times BP SBP 340 Mean squared error Running time (sec)
48 Application to optical flow computation Martin Wainwright (UC Berkeley) Graphical models and message-passing 23 / 41
49 Application to optical flow computation Martin Wainwright (UC Berkeley) Graphical models and message-passing 23 / 41
50 2. Learning graph structure from data up to now, have been discussing a forward problem: computing marginals in a graphical model inverse problems concern learning the parameters and structure of graphs from data Martin Wainwright (UC Berkeley) Graphical models and message-passing 24 / 41
51 2. Learning graph structure from data up to now, have been discussing a forward problem: computing marginals in a graphical model inverse problems concern learning the parameters and structure of graphs from data many instances of such graph learning problems: fitting graphs to politicians voting behavior modeling diseases with epidemiological networks traffic flow modeling interactions between different genes and so on... Martin Wainwright (UC Berkeley) Graphical models and message-passing 24 / 41
52 Example: US Senate network ( voting) (Banerjee et al., 2008; Ravikumar, W. & Lafferty, 2010)
53 Example: Biological networks gene networks during Drosophila life cycle (Ahmed & Xing, PNAS, 2009) many other examples: protein networks phylogenetic trees Martin Wainwright (UC Berkeley) Graphical models and message-passing 26 / 41
54 Learning for pairwise models drawn n samples from 1 Q(x 1,...,x p ;Θ) = Z(Θ) exp{ θ s x 2 s + } θ st x s x t s V (s,t) E graph G and matrix [Θ] st = θ st of edge weights are unknown Martin Wainwright (UC Berkeley) Graphical models and message-passing 27 / 41
55 Learning for pairwise models drawn n samples from 1 Q(x 1,...,x p ;Θ) = Z(Θ) exp{ θ s x 2 s + } θ st x s x t s V (s,t) E graph G and matrix [Θ] st = θ st of edge weights are unknown data matrix: Ising model (binary variables): X n 1 {0,1} n p Gaussian model: X n 1 R n p estimator X n 1 Θ Martin Wainwright (UC Berkeley) Graphical models and message-passing 27 / 41
56 Learning for pairwise models drawn n samples from 1 Q(x 1,...,x p ;Θ) = Z(Θ) exp{ θ s x 2 s + } θ st x s x t s V (s,t) E graph G and matrix [Θ] st = θ st of edge weights are unknown data matrix: Ising model (binary variables): X n 1 {0,1} n p Gaussian model: X n 1 R n p estimator X n 1 Θ various loss functions are possible: graph selection: supp[ Θ] = supp[θ]? bounds on Kullback-Leibler divergence D(Q Θ Q Θ) bounds on Θ Θ op. Martin Wainwright (UC Berkeley) Graphical models and message-passing 27 / 41
57 Challenges in graph selection For pairwise models, negative log-likelihood takes form: l(θ;x n 1) := 1 n n logq(x i1,...,x ip ;Θ) i=1
58 Challenges in graph selection For pairwise models, negative log-likelihood takes form: l(θ;x n 1) := 1 n n logq(x i1,...,x ip ;Θ) i=1 for Gaussian graphical models, this is a log-determinant program for discrete graphical models, exact computation of likelihood is impossible (curse of dimensionality)
59 Challenges in graph selection For pairwise models, negative log-likelihood takes form: l(θ;x n 1) := 1 n n logq(x i1,...,x ip ;Θ) i=1 for Gaussian graphical models, this is a log-determinant program for discrete graphical models, exact computation of likelihood is impossible (curse of dimensionality) various work-arounds are possible: Markov chain Monte Carlo and stochastic gradient variational approximations to likelihood pseudo-likelihoods or neighborhood-based approaches
60 Methods for graph selection for Gaussian graphical models: l1-regularized neighborhood regression for Gaussian MRFs (e.g., Meinshausen & Buhlmann, 2005; Wainwright, 2006, Zhao & Yu, 2006) l1-regularized log-determinant (e.g., Yuan & Lin, 2006; d Asprémont et al., 2007; Friedman, 2008; Rothman et al., 2008; Ravikumar et al., 2008)
61 Methods for graph selection for Gaussian graphical models: l1-regularized neighborhood regression for Gaussian MRFs (e.g., Meinshausen & Buhlmann, 2005; Wainwright, 2006, Zhao & Yu, 2006) l1-regularized log-determinant (e.g., Yuan & Lin, 2006; d Asprémont et al., 2007; Friedman, 2008; Rothman et al., 2008; Ravikumar et al., 2008) methods for discrete MRFs exact solution for trees (Chow & Liu, 1967) local testing (e.g., Spirtes et al, 2000; Kalisch & Buhlmann, 2008) various other methods distribution fits by KL-divergence (Abeel et al., 2005) l 1 -regularized log. regression (Ravikumar, W. & Lafferty et al., 2008, 2010) approximate max. entropy approach and thinned graphical models (Johnson et al., 2007) neighborhood-based thresholding method (Bresler, Mossel & Sly, 2008)
62 Methods for graph selection for Gaussian graphical models: l1-regularized neighborhood regression for Gaussian MRFs (e.g., Meinshausen & Buhlmann, 2005; Wainwright, 2006, Zhao & Yu, 2006) l1-regularized log-determinant (e.g., Yuan & Lin, 2006; d Asprémont et al., 2007; Friedman, 2008; Rothman et al., 2008; Ravikumar et al., 2008) methods for discrete MRFs exact solution for trees (Chow & Liu, 1967) local testing (e.g., Spirtes et al, 2000; Kalisch & Buhlmann, 2008) various other methods distribution fits by KL-divergence (Abeel et al., 2005) l 1 -regularized log. regression (Ravikumar, W. & Lafferty et al., 2008, 2010) approximate max. entropy approach and thinned graphical models (Johnson et al., 2007) neighborhood-based thresholding method (Bresler, Mossel & Sly, 2008) information-theoretic analysis pseudolikelihood and BIC criterion (Csiszar & Talata, 2006) information-theoretic limitations (Santhanam & W., 2008, 2012)
63 Markov property and neighborhood structure Markov properties encode neighborhood structure: (X s X V\s ) }{{} Condition on full graph d = (X s X N(s) ) }{{} Condition on Markov blanket N(s) = {s,t,u,v,w} X s X t X w X s X u X v basis of pseudolikelihood method (Besag, 1974) basis of many graph learning algorithms (Friedman et al., 1999; Csiszar & Talata, 2005; Abeel et al., 2006; Meinshausen & Buhlmann, 2006) Martin Wainwright (UC Berkeley) Graphical models and message-passing 30 / 41
64 Graph selection via neighborhood regression X \s X s..... Predict X s based on X \s := {X s, t s}.
65 Graph selection via neighborhood regression X \s X s..... Predict X s based on X \s := {X s, t s}. 1 For each node s V, compute (regularized) max. likelihood estimate: { } θ[s] := arg min 1 n L(θ;X i, \s ) + λ n θ 1 θ R p 1 n }{{}}{{} i=1 local log. likelihood regularization
66 Graph selection via neighborhood regression X \s X s..... Predict X s based on X \s := {X s, t s}. 1 For each node s V, compute (regularized) max. likelihood estimate: { } θ[s] := arg min 1 n L(θ;X i, \s ) + λ n θ 1 θ R p 1 n }{{}}{{} i=1 local log. likelihood regularization 2 Estimate the local neighborhood N(s) as support of regression vector θ[s] R p 1.
67 High-dimensional analysis classical analysis: graph size p fixed, sample size n + high-dimensional analysis: allow both dimension p, sample size n, and maximum degree d to increase at arbitrary rates take n i.i.d. samples from MRF defined by G p,d study probability of success as a function of three parameters: Success(n,p,d) = Q[Method recovers graph G p,d from n samples] theory is non-asymptotic: explicit probabilities for finite (n, p, d)
68 Empirical behavior: Unrescaled plots 1 Star graph; Linear fraction neighbors 0.8 Prob. success p = 64 p = 100 p = Number of samples
69 Empirical behavior: Appropriately rescaled 1 Star graph; Linear fraction neighbors 0.8 Prob. success p = 64 p = 100 p = Control parameter
70 Rescaled plots (2-D lattice graphs) 1 4 nearest neighbor grid (attractive) 0.8 Prob. success p = 64 p = 100 p = Control parameter
71 Sufficient conditions for consistent Ising selection graph sequences G p,d = (V,E) with p vertices, and maximum degree d. edge weights θ st θ min for all (s,t) E draw n i.i.d, samples, and analyze prob. success indexed by (n,p,d) Theorem (Ravikumar, W. & Lafferty, 2006, 2010)
72 Sufficient conditions for consistent Ising selection graph sequences G p,d = (V,E) with p vertices, and maximum degree d. edge weights θ st θ min for all (s,t) E draw n i.i.d, samples, and analyze prob. success indexed by (n,p,d) Theorem (Ravikumar, W. & Lafferty, 2006, 2010) Under incoherence conditions, for a rescaled sample γ LR (n,p,d) := and regularization parameter λ n c 1 1 2exp ( c 2 λ 2 nn ) : n d 3 logp > γ crit logp n, then with probability greater than (a) Correct exclusion: The estimated sign neighborhood N(s) correctly excludes all edges not in the true neighborhood.
73 Sufficient conditions for consistent Ising selection graph sequences G p,d = (V,E) with p vertices, and maximum degree d. edge weights θ st θ min for all (s,t) E draw n i.i.d, samples, and analyze prob. success indexed by (n,p,d) Theorem (Ravikumar, W. & Lafferty, 2006, 2010) Under incoherence conditions, for a rescaled sample γ LR (n,p,d) := and regularization parameter λ n c 1 1 2exp ( c 2 λ 2 nn ) : n d 3 logp > γ crit logp n, then with probability greater than (a) Correct exclusion: The estimated sign neighborhood N(s) correctly excludes all edges not in the true neighborhood. (b) Correct inclusion: For θ min c 3 λ n, the method selects the correct signed neighborhood.
74 Some related work thresholding estimator (poly-time for bounded degree) works with n 2 d logp samples (Bresler et al., 2008)
75 Some related work thresholding estimator (poly-time for bounded degree) works with n 2 d logp samples (Bresler et al., 2008) information-theoretic lower bound over family G p,d : any method requires at least n = Ω(d 2 logp) samples (Santhanam & W., 2008, 2012)
76 Some related work thresholding estimator (poly-time for bounded degree) works with n 2 d logp samples (Bresler et al., 2008) information-theoretic lower bound over family G p,d : any method requires at least n = Ω(d 2 logp) samples (Santhanam & W., 2008, 2012) l 1 -based method: sharper achievable rates, also failure for θ large enough to violate incoherence (Bento & Montanari, 2009)
77 Some related work thresholding estimator (poly-time for bounded degree) works with n 2 d logp samples (Bresler et al., 2008) information-theoretic lower bound over family G p,d : any method requires at least n = Ω(d 2 logp) samples (Santhanam & W., 2008, 2012) l 1 -based method: sharper achievable rates, also failure for θ large enough to violate incoherence (Bento & Montanari, 2009) empirical study: l 1 -based method can succeed beyond phase transition on Ising model (Aurell & Ekeberg, 2011)
78 Some related work thresholding estimator (poly-time for bounded degree) works with n 2 d logp samples (Bresler et al., 2008) information-theoretic lower bound over family G p,d : any method requires at least n = Ω(d 2 logp) samples (Santhanam & W., 2008, 2012) l 1 -based method: sharper achievable rates, also failure for θ large enough to violate incoherence (Bento & Montanari, 2009) empirical study: l 1 -based method can succeed beyond phase transition on Ising model (Aurell & Ekeberg, 2011) simpler neighborhood-based methods: thresholding, mutual information, greedy-methods Anandkumar, Tan & Willsky, 2010a, 2010b Netrapalli et al., 2010
79 Info. theory: Graph selection as channel coding graphical model selection is an unorthodox channel coding problem: Martin Wainwright (UC Berkeley) Graphical models and message-passing 38 / 41
80 Info. theory: Graph selection as channel coding graphical model selection is an unorthodox channel coding problem: codewords/codebook: graph G in some graph class G channel use: draw sample Xi = (X i1,...,x ip from Markov random field Q θ(g) decoding problem: use n samples {X1,...,X n} to correctly distinguish the codeword G Q(X G) X 1,...,X n Martin Wainwright (UC Berkeley) Graphical models and message-passing 38 / 41
81 Info. theory: Graph selection as channel coding graphical model selection is an unorthodox channel coding problem: codewords/codebook: graph G in some graph class G channel use: draw sample Xi = (X i1,...,x ip from Markov random field Q θ(g) decoding problem: use n samples {X1,...,X n} to correctly distinguish the codeword G Q(X G) X 1,...,X n Channel capacity for graph decoding determined by balance between log number of models relative distinguishability of different models Martin Wainwright (UC Berkeley) Graphical models and message-passing 38 / 41
82 Necessary conditions for G d,p G G d,p : graphs with p nodes and max. degree d Ising models with: Minimum edge weight: θ st θ min for all edges Maximum neighborhood weight: ω(θ) := max s V θst t N(s) Martin Wainwright (UC Berkeley) Graphical models and message-passing 39 / 41
83 Necessary conditions for G d,p G G d,p : graphs with p nodes and max. degree d Ising models with: Minimum edge weight: θ st θ min for all edges Maximum neighborhood weight: ω(θ) := max s V θst t N(s) Theorem If the sample size n is upper bounded by (Santhanam & W, 2008) { d n < max 8 log p 8d, exp( ω(θ) 4 )dθ minlog(pd/8), 128exp( 3θmin 2 ) logp } 2θ min tanh(θ min ) then the probability of error of any algorithm over G d,p is at least 1/2. Martin Wainwright (UC Berkeley) Graphical models and message-passing 39 / 41
84 Necessary conditions for G d,p G G d,p : graphs with p nodes and max. degree d Ising models with: Minimum edge weight: θ st θ min for all edges Maximum neighborhood weight: ω(θ) := max s V θst t N(s) Theorem If the sample size n is upper bounded by (Santhanam & W, 2008) { d n < max 8 log p 8d, exp( ω(θ) 4 )dθ minlog(pd/8), 128exp( 3θmin 2 ) logp } 2θ min tanh(θ min ) then the probability of error of any algorithm over G d,p is at least 1/2. Interpretation: Naive bulk effect: Arises from log cardinality log G d,p d-clique effect: Difficulty of separating models that contain a near d-clique Small weight effect: Difficult to detect edges with small weights. Martin Wainwright (UC Berkeley) Graphical models and message-passing 39 / 41
85 Some consequences Corollary For asymptotically reliable recovery over G d,p, any algorithm requires at least n = Ω(d 2 logp) samples. Martin Wainwright (UC Berkeley) Graphical models and message-passing 40 / 41
86 Some consequences Corollary For asymptotically reliable recovery over G d,p, any algorithm requires at least n = Ω(d 2 logp) samples. note that maximum neighborhood weight ω(θ ) dθ min = require θ min = O(1/d) Martin Wainwright (UC Berkeley) Graphical models and message-passing 40 / 41
87 Some consequences Corollary For asymptotically reliable recovery over G d,p, any algorithm requires at least n = Ω(d 2 logp) samples. note that maximum neighborhood weight ω(θ ) dθ min = require θ min = O(1/d) from small weight effect logp n = Ω( θ min tanh(θ min ) ) = Ω( logp) θ 2 min Martin Wainwright (UC Berkeley) Graphical models and message-passing 40 / 41
88 Some consequences Corollary For asymptotically reliable recovery over G d,p, any algorithm requires at least n = Ω(d 2 logp) samples. note that maximum neighborhood weight ω(θ ) dθ min = require θ min = O(1/d) from small weight effect logp n = Ω( θ min tanh(θ min ) ) = Ω( logp) θ 2 min conclude that l 1 -regularized logistic regression (LR) is optimal up to a factor O(d) (Ravikumar., W. & Lafferty, 2010) Martin Wainwright (UC Berkeley) Graphical models and message-passing 40 / 41
89 1 Stochastic belief propagation Summary adaptively randomized version of ordinary message-passing finite-length convergence analysis 2 Learning structure of graphical models polynomial-time method for learning neighborhood structure natural extensions (using block regularization) to higher order models matches information-theoretic limits Some papers: Noorshams & W. (2011). Stochastic belief propagation: Low-complexity message-passing with guarantees. Proceedings of the Allerton Conference. Ravikumar, W. & Lafferty (2010). High-dimensional Ising model selection using l 1 -regularized logistic regression. Annals of Statistics. Santhanam & W. (2012). Information-theoretic limits of selecting binary graphical models in high dimensions, IEEE Transactions on Information Theory.
Introduction to graphical models: Lecture III
Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction
More informationHigh-dimensional graphical model selection: Practical and information-theoretic limits
1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John
More informationHigh-dimensional graphical model selection: Practical and information-theoretic limits
1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John
More informationGraphical models and message-passing Part I: Basics and MAP computation
Graphical models and message-passing Part I: Basics and MAP computation Martin Wainwright UC Berkeley Departments of Statistics, and EECS Tutorial materials (slides, monograph, lecture notes) available
More informationHigh dimensional Ising model selection
High dimensional Ising model selection Pradeep Ravikumar UT Austin (based on work with John Lafferty, Martin Wainwright) Sparse Ising model US Senate 109th Congress Banerjee et al, 2008 Estimate a sparse
More informationHigh-dimensional statistics: Some progress and challenges ahead
High-dimensional statistics: Some progress and challenges ahead Martin Wainwright UC Berkeley Departments of Statistics, and EECS University College, London Master Class: Lecture Joint work with: Alekh
More informationLearning discrete graphical models via generalized inverse covariance matrices
Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 5, 06 Reading: See class website Eric Xing @ CMU, 005-06
More informationHigh dimensional ising model selection using l 1 -regularized logistic regression
High dimensional ising model selection using l 1 -regularized logistic regression 1 Department of Statistics Pennsylvania State University 597 Presentation 2016 1/29 Outline Introduction 1 Introduction
More informationEE229B - Final Project. Capacity-Approaching Low-Density Parity-Check Codes
EE229B - Final Project Capacity-Approaching Low-Density Parity-Check Codes Pierre Garrigues EECS department, UC Berkeley garrigue@eecs.berkeley.edu May 13, 2005 Abstract The class of low-density parity-check
More informationProbabilistic Graphical Models Lecture Notes Fall 2009
Probabilistic Graphical Models Lecture Notes Fall 2009 October 28, 2009 Byoung-Tak Zhang School of omputer Science and Engineering & ognitive Science, Brain Science, and Bioinformatics Seoul National University
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationLarge-Deviations and Applications for Learning Tree-Structured Graphical Models
Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of Information and Decision Systems, Massachusetts Institute of Technology Thesis
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More information3 : Representation of Undirected GM
10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:
More informationInferning with High Girth Graphical Models
Uri Heinemann The Hebrew University of Jerusalem, Jerusalem, Israel Amir Globerson The Hebrew University of Jerusalem, Jerusalem, Israel URIHEI@CS.HUJI.AC.IL GAMIR@CS.HUJI.AC.IL Abstract Unsupervised learning
More informationSparse Graph Learning via Markov Random Fields
Sparse Graph Learning via Markov Random Fields Xin Sui, Shao Tang Sep 23, 2016 Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, 2016 1 / 36 Outline 1 Introduction to graph learning
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Variational Inference IV: Variational Principle II Junming Yin Lecture 17, March 21, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3 X 3 Reading: X 4
More informationInformation-Theoretic Limits of Selecting Binary Graphical Models in High Dimensions
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, NO. 7, JULY 2012 4117 Information-Theoretic Limits of Selecting Binary Graphical Models in High Dimensions Narayana P. Santhanam, Member, IEEE, and Martin
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationWhich graphical models are difficult to learn?
Which graphical models are difficult to learn? Jose Bento Department of Electrical Engineering Stanford University jbento@stanford.edu Andrea Montanari Department of Electrical Engineering and Department
More informationLatent Tree Approximation in Linear Model
Latent Tree Approximation in Linear Model Navid Tafaghodi Khajavi Dept. of Electrical Engineering, University of Hawaii, Honolulu, HI 96822 Email: navidt@hawaii.edu ariv:1710.01838v1 [cs.it] 5 Oct 2017
More information5. Density evolution. Density evolution 5-1
5. Density evolution Density evolution 5-1 Probabilistic analysis of message passing algorithms variable nodes factor nodes x1 a x i x2 a(x i ; x j ; x k ) x3 b x4 consider factor graph model G = (V ;
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationIntroduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah
Introduction to Graphical Models Srikumar Ramalingam School of Computing University of Utah Reference Christopher M. Bishop, Pattern Recognition and Machine Learning, Jonathan S. Yedidia, William T. Freeman,
More informationImproved Greedy Algorithms for Learning Graphical Models
Improved Greedy Algorithms for Learning Graphical Models Avik Ray, Sujay Sanghavi Member, IEEE and Sanjay Shakkottai Fellow, IEEE Abstract We propose new greedy algorithms for learning the structure of
More informationBayesian Machine Learning - Lecture 7
Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1
More informationExact MAP estimates by (hyper)tree agreement
Exact MAP estimates by (hyper)tree agreement Martin J. Wainwright, Department of EECS, UC Berkeley, Berkeley, CA 94720 martinw@eecs.berkeley.edu Tommi S. Jaakkola and Alan S. Willsky, Department of EECS,
More informationIntroduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah
Introduction to Graphical Models Srikumar Ramalingam School of Computing University of Utah Reference Christopher M. Bishop, Pattern Recognition and Machine Learning, Jonathan S. Yedidia, William T. Freeman,
More information14 : Theory of Variational Inference: Inner and Outer Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction
More informationEstimators based on non-convex programs: Statistical and computational guarantees
Estimators based on non-convex programs: Statistical and computational guarantees Martin Wainwright UC Berkeley Statistics and EECS Based on joint work with: Po-Ling Loh (UC Berkeley) Martin Wainwright
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by
More informationCSC 412 (Lecture 4): Undirected Graphical Models
CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationGraphical models and message-passing Part II: Marginals and likelihoods
Graphical models and message-passing Part II: Marginals and likelihoods Martin Wainwright UC Berkeley Departments of Statistics, and EECS Tutorial materials (slides, monograph, lecture notes) available
More informationMarkov Random Fields for Computer Vision (Part 1)
Markov Random Fields for Computer Vision (Part 1) Machine Learning Summer School (MLSS 2011) Stephen Gould stephen.gould@anu.edu.au Australian National University 13 17 June, 2011 Stephen Gould 1/23 Pixel
More informationCS Lecture 19. Exponential Families & Expectation Propagation
CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces
More informationJunction Tree, BP and Variational Methods
Junction Tree, BP and Variational Methods Adrian Weller MLSALT4 Lecture Feb 21, 2018 With thanks to David Sontag (MIT) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,
More informationProbabilistic Graphical Models
2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More information11 : Gaussian Graphic Models and Ising Models
10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood
More informationCS Lecture 4. Markov Random Fields
CS 6347 Lecture 4 Markov Random Fields Recap Announcements First homework is available on elearning Reminder: Office hours Tuesday from 10am-11am Last Time Bayesian networks Today Markov random fields
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Hidden Markov Models Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Additional References: David
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write
More informationLecture 2: August 31
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy
More informationECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning
ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Markov Random Fields: Representation Conditional Random Fields Log-Linear Models Readings: KF
More informationLecture 18 Generalized Belief Propagation and Free Energy Approximations
Lecture 18, Generalized Belief Propagation and Free Energy Approximations 1 Lecture 18 Generalized Belief Propagation and Free Energy Approximations In this lecture we talked about graphical models and
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationEfficient Inference in Fully Connected CRFs with Gaussian Edge Potentials
Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials by Phillip Krahenbuhl and Vladlen Koltun Presented by Adam Stambler Multi-class image segmentation Assign a class label to each
More informationUndirected Graphical Models: Markov Random Fields
Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationDoes Better Inference mean Better Learning?
Does Better Inference mean Better Learning? Andrew E. Gelfand, Rina Dechter & Alexander Ihler Department of Computer Science University of California, Irvine {agelfand,dechter,ihler}@ics.uci.edu Abstract
More informationUndirected graphical models
Undirected graphical models Semantics of probabilistic models over undirected graphs Parameters of undirected models Example applications COMP-652 and ECSE-608, February 16, 2017 1 Undirected graphical
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationProbabilistic and Bayesian Machine Learning
Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/
More informationMachine Learning for Data Science (CS4786) Lecture 24
Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each
More informationLogistic Regression: Online, Lazy, Kernelized, Sequential, etc.
Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010
More informationExpectation propagation for signal detection in flat-fading channels
Expectation propagation for signal detection in flat-fading channels Yuan Qi MIT Media Lab Cambridge, MA, 02139 USA yuanqi@media.mit.edu Thomas Minka CMU Statistics Department Pittsburgh, PA 15213 USA
More informationA Gaussian Tree Approximation for Integer Least-Squares
A Gaussian Tree Approximation for Integer Least-Squares Jacob Goldberger School of Engineering Bar-Ilan University goldbej@eng.biu.ac.il Amir Leshem School of Engineering Bar-Ilan University leshema@eng.biu.ac.il
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More information9 Forward-backward algorithm, sum-product on factor graphs
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 9 Forward-backward algorithm, sum-product on factor graphs The previous
More informationFactor Graphs and Message Passing Algorithms Part 1: Introduction
Factor Graphs and Message Passing Algorithms Part 1: Introduction Hans-Andrea Loeliger December 2007 1 The Two Basic Problems 1. Marginalization: Compute f k (x k ) f(x 1,..., x n ) x 1,..., x n except
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 24. Hidden Markov Models & message passing Looking back Representation of joint distributions Conditional/marginal independence
More informationLatent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs
Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs Animashree Anandkumar UC Irvine a.anandkumar@uci.edu Ragupathyraj Valluvan UC Irvine rvalluva@uci.edu Abstract Graphical
More informationGenerative and Discriminative Approaches to Graphical Models CMSC Topics in AI
Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single
More informationMessage passing and approximate message passing
Message passing and approximate message passing Arian Maleki Columbia University 1 / 47 What is the problem? Given pdf µ(x 1, x 2,..., x n ) we are interested in arg maxx1,x 2,...,x n µ(x 1, x 2,..., x
More informationInformation Geometric view of Belief Propagation
Information Geometric view of Belief Propagation Yunshu Liu 2013-10-17 References: [1]. Shiro Ikeda, Toshiyuki Tanaka and Shun-ichi Amari, Stochastic reasoning, Free energy and Information Geometry, Neural
More information4 : Exact Inference: Variable Elimination
10-708: Probabilistic Graphical Models 10-708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference
More informationMessage-Passing Algorithms for GMRFs and Non-Linear Optimization
Message-Passing Algorithms for GMRFs and Non-Linear Optimization Jason Johnson Joint Work with Dmitry Malioutov, Venkat Chandrasekaran and Alan Willsky Stochastic Systems Group, MIT NIPS Workshop: Approximate
More informationProperties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation
Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana
More informationEstimating Latent Variable Graphical Models with Moments and Likelihoods
Estimating Latent Variable Graphical Models with Moments and Likelihoods Arun Tejasvi Chaganty Percy Liang Stanford University June 18, 2014 Chaganty, Liang (Stanford University) Moments and Likelihoods
More informationarxiv:cs/ v1 [cs.it] 15 Aug 2005
To appear in International Symposium on Information Theory; Adelaide, Australia Lossy source encoding via message-passing and decimation over generalized codewords of LDG codes artin J. Wainwright Departments
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationGraphical Models Another Approach to Generalize the Viterbi Algorithm
Exact Marginalization Another Approach to Generalize the Viterbi Algorithm Oberseminar Bioinformatik am 20. Mai 2010 Institut für Mikrobiologie und Genetik Universität Göttingen mario@gobics.de 1.1 Undirected
More informationMax$Sum(( Exact(Inference(
Probabilis7c( Graphical( Models( Inference( MAP( Max$Sum(( Exact(Inference( Product Summation a 1 b 1 8 a 1 b 2 1 a 2 b 1 0.5 a 2 b 2 2 a 1 b 1 3 a 1 b 2 0 a 2 b 1-1 a 2 b 2 1 Max-Sum Elimination in Chains
More informationBAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage
BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement
More informationMinimizing D(Q,P) def = Q(h)
Inference Lecture 20: Variational Metods Kevin Murpy 29 November 2004 Inference means computing P( i v), were are te idden variables v are te visible variables. For discrete (eg binary) idden nodes, exact
More information13 : Variational Inference: Loopy Belief Propagation and Mean Field
10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction
More informationPart 1: Expectation Propagation
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud
More informationIntroduction To Graphical Models
Peter Gehler Introduction to Graphical Models Introduction To Graphical Models Peter V. Gehler Max Planck Institute for Intelligent Systems, Tübingen, Germany ENS/INRIA Summer School, Paris, July 2013
More information5. Sum-product algorithm
Sum-product algorithm 5-1 5. Sum-product algorithm Elimination algorithm Sum-product algorithm on a line Sum-product algorithm on a tree Sum-product algorithm 5-2 Inference tasks on graphical models consider
More informationMAP Examples. Sargur Srihari
MAP Examples Sargur srihari@cedar.buffalo.edu 1 Potts Model CRF for OCR Topics Image segmentation based on energy minimization 2 Examples of MAP Many interesting examples of MAP inference are instances
More informationRobust Inverse Covariance Estimation under Noisy Measurements
.. Robust Inverse Covariance Estimation under Noisy Measurements Jun-Kun Wang, Shou-De Lin Intel-NTU, National Taiwan University ICML 2014 1 / 30 . Table of contents Introduction.1 Introduction.2 Related
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory
More informationProbabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013
School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Junming Yin Lecture 15, March 4, 2013 Reading: W & J Book Chapters 1 Roadmap Two
More information13 : Variational Inference: Loopy Belief Propagation
10-708: Probabilistic Graphical Models 10-708, Spring 2014 13 : Variational Inference: Loopy Belief Propagation Lecturer: Eric P. Xing Scribes: Rajarshi Das, Zhengzhong Liu, Dishan Gupta 1 Introduction
More informationALARGE class of codes, including turbo codes [3] and
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 3, MARCH 2010 1351 Lossy Source Compression Using Low-Density Generator Matrix Codes: Analysis and Algorithms Martin J. Wainwright, Elitza Maneva,
More informationLearning MN Parameters with Approximation. Sargur Srihari
Learning MN Parameters with Approximation Sargur srihari@cedar.buffalo.edu 1 Topics Iterative exact learning of MN parameters Difficulty with exact methods Approximate methods Approximate Inference Belief
More informationComputer Vision Group Prof. Daniel Cremers. 14. Sampling Methods
Prof. Daniel Cremers 14. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric
More informationHigh-dimensional covariance estimation based on Gaussian graphical models
High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationBayesian Learning in Undirected Graphical Models
Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul
More informationECE521 Lectures 9 Fully Connected Neural Networks
ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory
More information