ISIT Tutorial Information theory and machine learning Part II
|
|
- Myron Emery Sanders
- 5 years ago
- Views:
Transcription
1 ISIT Tutorial Information theory and machine learning Part II Emmanuel Abbe Princeton University Martin Wainwright UC Berkeley 1
2 Inverse problems on graphs Examples: A large variety of machine learning and data-mining problems are about inferring global properties on a collection of agents by observing local noisy interactions of these agents - community detection in social networks 2
3 Inverse problems on graphs Examples: A large variety of machine learning and data-mining problems are about inferring global properties on a collection of agents by observing local noisy interactions of these agents - community detection in social networks - image segmentation 3
4 Inverse problems on graphs Examples: A large variety of machine learning and data-mining problems are about inferring global properties on a collection of agents by observing local noisy interactions of these agents - community detection in social networks - image segmentation - data classification and information retrieval 4
5 Inverse problems on graphs Examples: A large variety of machine learning and data-mining problems are about inferring global properties on a collection of agents by observing local noisy interactions of these agents - community detection in social networks - image segmentation - data classification and information retrieval - object matching, synchronization - page sorting - protein-to-protein interactions - haplotype assembly
6 Inverse problems on graphs A large variety of machine learning and data-mining problems are about inferring global properties on a collection of agents by observing local noisy interactions of these agents In each case: observe information on the edges of a network that has been generated from hidden attributes on the nodes, and try to infer back these attributes Dual to the graphical model learning problem (previous part) 6
7 What about graph-based codes? X 1 W Y 1 C X n W Y N Different: the code is a design parameter and takes typically specific non-local interactions of the bits (e.g., random, LDPC, polar codes). (1) What are the relevant types of codes and channels behind machine learning problems? (2) What are the fundamental limits for these? 7
8 Outline of the talk 1. Community detection and clustering 2. Stochastic block models : fundamental limits and capacity-achieving algorithms 3. Open problems 4. Graphical channels and low-rank matrix recovery 8
9 Community detection and clustering 9
10 Networks provide local interactions among agents social networks: friendship call graphs: calls biological networks: protein interactions genome HiC networks: DNA contacts 10
11 Networks provide local interactions among agents one often wants to infer global similarity classes social networks: friendship call graphs: calls biological networks: protein interactions genome HiC networks: DNA contacts 11
12 The challenges of community detection A long-studied and notoriously hard problem what is a good clustering? assort. and disassort. relations work with models how to get a good clustering? computationally hard many heuristics... Tutorial motto: Can on establish a clear line-of-sight as in communications with the Shannon capacity? Peak Data Efficiency WAN wireless tech. Infeasible Region Shannon-capacity HSDPA EV-DO SNR 12
13 The Stochastic Block Model 13
14 The stochastic block model P = diag(p) p =(p 1,...,p k ) 0 1 W W 1k B W C. A W k1 W kk The DMC of clustering..? SBM(n, p, W ) <- probability vector = relative size of the communities np 4 <- symmetric matrix with entries in [0,1] = prob.of connecting among communities np 1 np 2 W 14 W 11 W 12 W 13 W 22 W 24 W 33 W 44 W 34 W 23 np 3 k =4 14
15 The (exact) recovery problem Let X n =[X 1,...,X n ] represent the community variables of the nodes (drawn under p) Definition. An algorithm ˆX n ( ) solves (exact) recovery in the SBM if for a random graph G under the model, lim n!1 P(X n = ˆX n (G)) = 1. We will see weaker recovery requirements later Starting point: progress in science often comes from understanding special cases... 15
16 SBM with 2 symmetric communities: 2-SBM 16
17 2-SBM p q p n 2 p 1 = p 2 =1/2 n 2 W 11 = W 22 = p W 12 = q 17
18 Some history for 2-SBM Recovery problem P( ˆX n = X n )! Holland Laskey Leinhardt Boppana Dyer Frieze Bui, Chaudhuri, Leighton, Sipser Snijders Nowicki Jerrum Sorkin Condon Karp Carson Impagliazzo McSherry Bickel Chen Rohe Chatterjee Yu Bui, Chaudhuri, Leighton, Sipser 84 maxflow-mincut p = (1/n),q = o(n 1 4/((p+q)n) ) Boppana 87 spectral meth. (p q)/ p p + q = ( p log(n)/n) Dyer, Frieze 89 min-cut via degrees p q = (1) Snijders, Nowicki 97 EM algo. p q = (1) Jerrum, Sorkin 98 Metropolis aglo. p q = (n 1/6+ ) Condon, Karp 99 augmentation algo. p q = (n 1/2+ ) Carson, Impagliazzo 01 hill-climbing algo. p q = (n 1/2 log 4 (n)) Mcsherry 01 spectral meth. (p q)/ p p ( p log(n)/n) Bickel, Chen 09 N-G modularity (p q)/ p p + q = (log(n)/ p n) Rohe, Chatterjee, Yu 11 spectral meth. p q = (1) algorithms driven... Instead of how, when can we recover the clusters (IT)? 18
19 Information-theoretic view of clustering X 1 R = n N W Y 1 X 1 R = 2 n W Y 1 C unorthodox code! 2-SBM X n W = W Y N 1 1 X n W = W 1 p p 1 q q N = Y N n 2 reliable comm. iff R < 1-H(ɛ) reliable comm. iff??? 19, exact recovery
20 Some history for 2-SBM Recovery problem P( ˆX n = X n )! Holland Laskey Leinhardt Boppana Dyer Frieze Bui, Chaudhuri, Leighton, Sipser Snijders Nowicki Jerrum Sorkin Condon Karp Carson Impagliazzo McSherry Bickel Chen Rohe Chatterjee Yu Bui, Chaudhuri, Leighton, Sipser 84 maxflow-mincut p = (1/n),q = o(n 1 4/((p+q)n) ) Boppana 87 spectral meth. (p q)/ p p + q = ( p log(n)/n) Dyer, Frieze 89 min-cut via degrees p q = (1) Snijders, Nowicki 97 EM algo. p q = (1) Jerrum, Sorkin 98 Metropolis aglo. p q = (n 1/6+ ) Condon, Karp 99 augmentation algo. p q = (n 1/2+ ) Carson, Impagliazzo 01 hill-climbing algo. p q = (n 1/2 log 4 (n)) Mcsherry 01 spectral meth. (p q)/ p p ( p log(n)/n) Bickel, Chen 09 N-G modularity (p q)/ p p + q = (log(n)/ p n) Rohe, Chatterjee, Yu 11 spectral meth. p q = (1) b p = a log(n),q = b log(n) n n Recovery i efficiently achievable Infeasible Abbe-Bandeira-Hall a+b 2 1+ p ab 20 a
21 Some history for 2-SBM 1983 Recovery problem P( ˆX n = X n )! 1 Detection problem 9 > 0:P( d( ˆX n,x n ) n < 1 )! 1 2 Holland Laskey Leinhardt Boppana Dyer Frieze Bui, Chaudhuri, Leighton, Sipser Snijders Nowicki Jerrum Sorkin Condon Karp Carson Impagliazzo McSherry Bui, Chaudhuri, Leighton, Sipser 84 maxflow-mincut p = (1/n),q = o(n 1 4/((p+q)n) ) Boppana 87 spectral meth. (p q)/ p p + q = ( p log(n)/n) Dyer, Frieze 89 min-cut via degrees p q = (1) Snijders, Nowicki 97 EM algo. p q = (1) Jerrum, Sorkin 98 Metropolis aglo. p q = (n 1/6+ ) Condon, Karp 99 augmentation algo. p q = (n 1/2+ ) Carson, Impagliazzo 01 hill-climbing algo. p q = (n 1/2 log 4 (n)) Mcsherry 01 spectral meth. (p q)/ p p ( p log(n)/n) Bickel, Chen 09 N-G modularity (p q)/ p p + q = (log(n)/ p n) Rohe, Chatterjee, Yu 11 spectral meth. p q = (1) Bickel Chen Decelle Krzakala Moore Zdeborova Rohe Chatterjee Yu Coja-Oghlan Massoulié Mossel Neeman Sly Mossel-Neeman-Sly p = a log(n),q = b log(n) n n Recovery i Abbe-Bandeira-Hall a+b 2 1+ p ab efficiently achievable What about multiple/asymm. communities? Conjecture: detection changes with 5 or more 21 p = a n,q = b n (a b) 2 > 2(a + b) Detection i
22 Recovery in the 2-SBM: IT limit Converse: p If a+b 2 i B i R i p ab < 1 then ML fails w.h.p. q R j j B j p p = a log n n,q = b log n n n/2 n/2 ML fails if two nodes can be swapped to reduce the cut Abbe-Bandeira-Hall what is ML?! min-bisection P (9i : B i apple R i )=? np(b 1 apple R 1 ) (weak correlations) P (B 1 apple R 1 )=n ((a+b)/2 p ab)+o(1)
23 Recovery in the 2-SBM: efficient algorithms p q p 1 +1 P n/2 n/2 spectral: ML-decoding: lifting: X = xx t SDP: max x T Ax max x T Ax max tr(ax) s.t. kxk =1 s.t. x i = ±1 s.t. X ii =1 1 t x =0 1 t x =0 1 t X =0 NP-hard X 0 rank(x) =1 Abbe-Bandeira-Hall 14 23
24 Recovery in the 2-SBM: efficient algorithms Theorem. The SDP solves recovery if 2L SBM + 11 t + I n 0 where L SBM = D G+ D G A. -> Analyze the spectral norm of a random matrix [Abbe-Bandeira-Hall 14] Bernstein: slightly loose [Xu-Hanjek-Wu 15] Seginer bound [Bandeira, Bandeira-Van Handel 15] tight bound Note that SDP can be expensive... Abbe-Bandeira-Hall 14 24
25 The general SBM 25
26 SBM(n, p, W ) Quiz: If a node is in community i, how many neighbors does it have in expectation in community j? 1. np j 2. np j W ij np 1 np 2 W 14 W 11 W 12 W 13 W 22 W np i W ij i 1 np 4 W 24 W 33 W 44 W 34 np np W A degree profile matrix 26
27 Back to the Information-theoretic view of clustering X 1 R = n N W Y 1 X 1 R = 2 n W Y 1 C SBM X n W W Y N reliable comm. iff R < 1-H(ɛ) X n W W Y N reliable comm. iff 1 < (a+b)/2- ab reliable comm. iff R < max I(p,W) p {z } KL-divergence 27 reliable comm. iff 1 < J(p,W)???
28 Main results Theorem 1. Recovery is solvable in SBM(n, p, Q log(n)/n) if and only if where 1 2 (p a p b) 2 1 Abbe-Bandeira-Hall 14 J(p, Q) :=min i<j D +((PQ) i, (PQ) j ) 1 D + (µ, ) := max t2[0,1] X `2[k] D 1/2 (µ, ) = 1 2 kp µ log max t Pi µt i t i tµ` +(1 t) ` µ t` 1 t ` {z } D t (µ, ) D t is an f-divergence: P i if(µ i / i ) p k 2 2 is the Hellinger divergence (distance) is the Cherno divergence f(x) =1 t + tx x t We call D + the CH-divergence. [Abbe-Sandon 15] D 28
29 Main results Theorem 1. Recovery is solvable in SBM(n, p, Q log(n)/n) if and only if where J(p, Q) :=min i<j D +((PQ) i, (PQ) j ) 1 D + (µ, ) := max t2[0,1] X `2[k] tµ` +(1 t) ` µ t` 1 t ` Is recovery in the general SBM solvable efficiently down the information theoretic threshold? YES! Theorem 2. The degree-profiling algorithm achieves the threshold and runs in quasi-linear time. [Abbe-Sandon 15] 29
30 When can we extract a specific community? Theorem. If community i has a profile (PQ) i at D + -distance at least 1 from all other profiles (PQ) j, j 6= i, then it can be extracted w.h.p. (PQ) j 1 (PQ) i What if we do not know the parameters? We can learn them on the fly: [Abbe-Sandon 15] (second paper) [Abbe-Sandon 15] 30
31 Proof techniques and algorithms 31
32 A key step 1 2 Hypothesis 1 d v d v. P(log(n)(PQ) 1 ) Hypothesis 2 d v. P(log(n)(PQ) 2 ) 3 Theorem. For any 1, 2 2 (R + \{0}) k with 1 6= 2 and p 1,p 2 2 R + \{0}, X min(p ln(n) 1 (x)p 1, P ln(n) 2 (x)p 2 )= n D +( 1, 2 ) o(1), x2z k + where D + is the CH-divergence. [Abbe-Sandon 15] 32
33 How to use this? Plan: put effort in recovering most of the nodes and then finish greedily with local improvements [Abbe-Sandon 15] 33
34 The degree-profiling algorithm (capacity-achieving) G (1) Split into two graphs (2) Run Sphere-comparison on G loglog-degree G G -> gets a fraction 1-o(1) (see next) (3) Take now G with the clustering of G log-degree d v Hypothesis 1 d v... P(log(n)(PQ) 1 ) Hypothesis 2 [Abbe-Sandon 15] 34 d v... P(log(n)(PQ) 2 ) P e = n D +((PQ) 1,(PQ) 2 )+o(1)
35 How do we get most nodes correctly? 35
36 Other recovery requirements Weak recovery or detection : (for the symmetric k-sbm). for some c =1/k + > 0 Partial recovery: An algorithm solves partial recovery in the SBM with accuracy c if it produces a clustering which is correct on a fraction c of the nodes with high probability. Almost exact recovery: c =1 o(1) Exact recovery: c =1 For all the above: what are the efficient VS. information-theoretic fundamental limits? 36
37 Partial recovery in SBM(n, p, Q/n) What is a good notion of SNR? Proposed notion of SNR: min 2 max e.v. of PQ Bin(n/2, a/n) Bin(n/2, b/n) = = (a b)2 2(a+b) (a b)2 k(a+(k 1)b) for 2-symm. comm. for k-symm. comm. Theorem (informal). In the sparse SBM(n, p, Q/n), the Sphere-comparison algorithm recovers a fraction of nodes which approaches 1 when the SNR diverges. Note that the SNR scales if Q scales! [Abbe-Sandon 15] 37
38 A node neighborhood in SBM(n, p, Q/n) v = i Sphere-comparison np 1 p 1 Q i1... (PQ) i... ((PQ) r ) i [Abbe-Sandon 15] 38 p k Q ik np k {z } depth r N r (v)
39 Take now two nodes: v = i Sphere-comparison N r (v) N r 0(v 0 ) Compare v and v from: N r (v) \ N r 0(v 0 ) hard to analyze... v 0 = j [Abbe-Sandon 15] 39
40 Decorrelate: v = i Compare v and v from: Sphere-comparison Subsample G with prob. c to get E N r,r0 [E](v v 0 ) = number of crossing edges N r[g\e] (v) E N r[g\e] (v) cq n N r 0 [G\E](v 0 ) N r0 [G\E](v 0 ) ((1 c)pq) r e v cq n ((1 c)pq)r0 e v 0 = c(1 c) r+r0 e v Q(PQ) r+r0 e v 0 /n [Abbe-Sandon 15] = j 40 v 0 Additional steps: 1. look at several depths -> Vandermonde syst. 2. use anchor nodes
41 A real data example 41
42 The political blogs network 1222 blogs (left- and right-leaning) [Adamic and Glance 05] edge = hyperlink between blogs The CH-divergence is close to 1 We can recover 95% of the nodes correctly 42
43 Some open problems in community detection 43
44 Some open problems in community detection I. The SBM: a. Recovery Growing nb. of communities? Sub-linear communities? [Abbe-Sandon 15] should extend to k = o(log(n)) [Chen-Xu 14] k,p,q scale with n polynomially 44
45 Some open problems in community detection I. The SBM: a. Recovery b. Detection and broadcasting on trees [Mossel-Neeman-Sly 13] Converse for detection in 2-SBM: p=a/n, q=b/n BSC(b/(a + b)) Galton-Watson tree Poisson((a+b)/2) X r c = a + b If (a+b)/2<=1 the tree dies w.p. 1 If (a+b)/2>1 the tree survives w.p. >0 45 Unorthodox broadcasting problem: when can we detect the root-bit? If and only if c> 1 (1 2") 2 [Evans-Kenyon-Peres-Schulman 00] (a b)2 () 2(a + b) > 1
46 Some open problems in community detection I. The SBM: a. Recovery b. Detection and broadcasting on trees For k clusters? SNR = (a b) 2 k(a (k 1)b) impossible possible efficient 0 c k 1 SNR Conjecture. For the symmetric k-sbm(n, a, b), there exists c k s.t. (1) If SNR <c k, then detection cannot be solved, (2) If c k < SNR < 1, then detection can be solved information-theoretically but not e ciently, (3) If SNR > 1, then detection can be solved e ciently. Moreover c k = 1 for k 2 {2, 3, 4} and c k < 1 for k 5. [Decelle-Krzakala-Zdeborova-Moore 11] 46
47 Some open problems in community detection I. The SBM: a. Recovery b. Detection and broadcasting on trees c. Partial recovery and the SNR-distortion curve Conjecture. For the symmetric k-sbm(n, a, b) and 2 (1/k, 1), there exists k, k s.t. partial-recovery of accuracy is solvable if and only if SNR > k, and e ciently solvable i SNR > k. =1 k =2 1/2 k SNR
48 Some open problems in community detection II. Other block models - Censored block models - Labelled block models 2-CBM [Abbe-Bandeira-Bracher-Singer, Xu-Lelarge-Massoulie, Saad-Krzakala-Lelarge-Zdeborova, Chin-Rao-Vu, Hajek-Wu-Xu] 2-CBM(n, p, ) p 0 1 n/2 p p n/2 [Abbe-Bandeira-Bracher-Singer 14] correlation clustering [Bansal, Blum, Chawla 04] LDGM codes [Kumar, Pakzad, Salavati, Shokrollahi 12] labelled block model [Heimlicher, Lelarge, Massoulié 12] soft CSPs [Abbe, Montanari 13] pairwise measurements [Chen, Suh, Goldsmith 14 and 15] bounded-size correlation clustering [Puleo, Milenkovic 14] 48
49 Some open problems in community detection II. Other block models - Censored block models - Labelled block models - Degree-corrected block models [Karrer-Newman 11] - Mixed-membership block models [Airoldi-Blei-Fienber-Xing 08] - Overlapping block models [Abbe-Sandon 15] u X u p X v v p OSBM(n, p, f) p on {0, 1} s connect (u, v) with prob. f(x u,x v ) f : {0, 1} s {0, 1} s! [0, 1] 49
50 Some open problems in community detection II. Other block models - Censored block models - Labelled block models - Degree-corrected block models - Mixed-membership block models - Overlapping block models - Planted community [Deshpande-Montanari 14, Montanari 15], p p 50
51 Some open problems in community detection II. Other block models - Censored block models - Labelled block models - Degree-corrected block models - Mixed-membership block models - Overlapping block models - Planted community - Hypergraph models 51
52 Some open problems in community detection II. Other block models - Censored block models - Labelled block models - Degree-corrected block models - Mixed-membership block models - Overlapping block models - Planted community - Hypergraph models For all the above: Is there a CH-divergence behind? A generalized notion of SNR? Detection gaps? Efficient algorithms? 52
53 Some open problems in community detection III. Beyond block models: a. Exchangeable random arrays and graphons w :[0, 1] 2! [0, 1] P (E ij =1 X i = x i,x j = x j )=w(x i,x j ) [Lovasz] SBMs can approximate graphons [Choi-Wolfe, Airoldi-Costa-Chan] b. Graphical channels A general information-theoretic model? 53
54 Graphical channels [Abbe-Montanari 13] A family of channels motivated by inference on graph problems Q Y 1 G =(V,E) ak-hypergraph Q : X k! Y a channel (kernel) X 1 For x 2 X V,y 2 Y E, G P(y x) = Q e2e(g) Q(y e x[e]) X n How much information do graphical channels carry? Q Y N Theorem. lim n!1 1 n I(Xn ; G) exists for ER graphs and some kernels -> why not always? what is the limit? 54
55 Connection: sparse PCA and clustering 55
56 SBM and low-rank Gaussian model r Gaussian symmetric Spiked Wigner model: Y = n XXt + Z Y ij = cx i X j + Z ij X =(X 1,...,X n ) i.i.d. Bernoulli( )! sparse-pca [Deshpande-Abbe-Montanari 15] 1 lim n!1 n I(X; G(n, p? Theorem. n,q n )) = [Amini-Wainwright 09, Desphande-Montanari 14] X =(X 1,...,X n )i.i.d.radamacher(1/2)! SBM??? lim n!1 1 n I(X; Y ) = + 4 If n = n(p n q n ) I( ) (finite SNR), with np n,nq n!1 (large degrees) 2(p n + q n )! where solves = (1 MMSE( )) Y 1 ( )= p X 1 + Z 1 I-MMSE [Guo-Shamai-Verdú] (single-letter)
57 Conclusion Community detection couples naturally with the channel view of information theory and more specifically with: - graph-based codes - f-divergences - broadcasting problems - I-MMSE -... {z } unorthodox versions... More generally, the problem of inferring global similarity classes in data sets from noisy local interactions is at the center of many problems in ML, and an information-theoretic view of these problems seems needed and powerful. 57
58 Questions? Documents related to the tutorial: 58
Statistical and Computational Phase Transitions in Planted Models
Statistical and Computational Phase Transitions in Planted Models Jiaming Xu Joint work with Yudong Chen (UC Berkeley) Acknowledgement: Prof. Bruce Hajek November 4, 203 Cluster/Community structure in
More informationReconstruction in the Generalized Stochastic Block Model
Reconstruction in the Generalized Stochastic Block Model Marc Lelarge 1 Laurent Massoulié 2 Jiaming Xu 3 1 INRIA-ENS 2 INRIA-Microsoft Research Joint Centre 3 University of Illinois, Urbana-Champaign GDR
More informationCommunity Detection and Stochastic Block Models: Recent Developments
Journal of Machine Learning Research 18 (2018) 1-86 Submitted 9/16; Revised 3/17; Published 4/18 Community Detection and Stochastic Block Models: Recent Developments Emmanuel Abbe Program in Applied and
More informationSpectral thresholds in the bipartite stochastic block model
Spectral thresholds in the bipartite stochastic block model Laura Florescu and Will Perkins NYU and U of Birmingham September 27, 2016 Laura Florescu and Will Perkins Spectral thresholds in the bipartite
More informationHow Robust are Thresholds for Community Detection?
How Robust are Thresholds for Community Detection? Ankur Moitra (MIT) joint work with Amelia Perry (MIT) and Alex Wein (MIT) Let me tell you a story about the success of belief propagation and statistical
More informationBelief Propagation, Robust Reconstruction and Optimal Recovery of Block Models
JMLR: Workshop and Conference Proceedings vol 35:1 15, 2014 Belief Propagation, Robust Reconstruction and Optimal Recovery of Block Models Elchanan Mossel mossel@stat.berkeley.edu Department of Statistics
More informationLinear inverse problems on Erdős-Rényi graphs: Information-theoretic limits and efficient recovery
Linear inverse problems on Erdős-Rényi graphs: Information-theoretic limits and efficient recovery Emmanuel Abbe, Afonso S. Bandeira, Annina Bracher, Amit Singer Princeton University Abstract This paper
More informationCommunity detection in stochastic block models via spectral methods
Community detection in stochastic block models via spectral methods Laurent Massoulié (MSR-Inria Joint Centre, Inria) based on joint works with: Dan Tomozei (EPFL), Marc Lelarge (Inria), Jiaming Xu (UIUC),
More informationSpectral thresholds in the bipartite stochastic block model
JMLR: Workshop and Conference Proceedings vol 49:1 17, 2016 Spectral thresholds in the bipartite stochastic block model Laura Florescu New York University Will Perkins University of Birmingham FLORESCU@CIMS.NYU.EDU
More informationReconstruction in the Sparse Labeled Stochastic Block Model
Reconstruction in the Sparse Labeled Stochastic Block Model Marc Lelarge 1 Laurent Massoulié 2 Jiaming Xu 3 1 INRIA-ENS 2 INRIA-Microsoft Research Joint Centre 3 University of Illinois, Urbana-Champaign
More informationCommunity detection and stochastic block models: recent developments
Community detection and stochastic block models: recent developments Emmanuel Abbe December 7, 2016 Abstract The stochastic block model (SBM) is a random graph model with planted clusters. It is widely
More informationThe non-backtracking operator
The non-backtracking operator Florent Krzakala LPS, Ecole Normale Supérieure in collaboration with Paris: L. Zdeborova, A. Saade Rome: A. Decelle Würzburg: J. Reichardt Santa Fe: C. Moore, P. Zhang Berkeley:
More informationCommunity Detection. fundamental limits & efficient algorithms. Laurent Massoulié, Inria
Community Detection fundamental limits & efficient algorithms Laurent Massoulié, Inria Community Detection From graph of node-to-node interactions, identify groups of similar nodes Example: Graph of US
More informationConsistency Thresholds for the Planted Bisection Model
Consistency Thresholds for the Planted Bisection Model [Extended Abstract] Elchanan Mossel Joe Neeman University of Pennsylvania and University of Texas, Austin University of California, Mathematics Dept,
More informationStatistical-Computational Tradeoffs in Planted Problems and Submatrix Localization with a Growing Number of Clusters and Submatrices
Journal of Machine Learning Research 17 (2016) 1-57 Submitted 8/14; Revised 5/15; Published 4/16 Statistical-Computational Tradeoffs in Planted Problems and Submatrix Localization with a Growing Number
More informationInformation Recovery from Pairwise Measurements
Information Recovery from Pairwise Measurements A Shannon-Theoretic Approach Yuxin Chen, Changho Suh, Andrea Goldsmith Stanford University KAIST Page 1 Recovering data from correlation measurements A large
More informationStatistical-Computational Phase Transitions in Planted Models: The High-Dimensional Setting
: The High-Dimensional Setting Yudong Chen YUDONGCHEN@EECSBERELEYEDU Department of EECS, University of California, Berkeley, Berkeley, CA 94704, USA Jiaming Xu JXU18@ILLINOISEDU Department of ECE, University
More informationarxiv: v1 [math.st] 26 Jan 2018
CONCENTRATION OF RANDOM GRAPHS AND APPLICATION TO COMMUNITY DETECTION arxiv:1801.08724v1 [math.st] 26 Jan 2018 CAN M. LE, ELIZAVETA LEVINA AND ROMAN VERSHYNIN Abstract. Random matrix theory has played
More informationCommunity Detection and Stochastic Block Models
Community Detection and Stochastic Block Models Emmanuel Abbe Abstract The stochastic block model (SBM) is a random graph model with cluster structures. It is widely employed as a canonical model to study
More informationClustering from Sparse Pairwise Measurements
Clustering from Sparse Pairwise Measurements arxiv:6006683v2 [cssi] 9 May 206 Alaa Saade Laboratoire de Physique Statistique École Normale Supérieure, 24 Rue Lhomond Paris 75005 Marc Lelarge INRIA and
More informationInformation-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization
Information-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization Jess Banks Cristopher Moore Roman Vershynin Nicolas Verzelen Jiaming Xu Abstract We study the problem
More informationFundamental limits of symmetric low-rank matrix estimation
Proceedings of Machine Learning Research vol 65:1 5, 2017 Fundamental limits of symmetric low-rank matrix estimation Marc Lelarge Safran Tech, Magny-Les-Hameaux, France Département d informatique, École
More informationInformation Recovery from Pairwise Measurements: A Shannon-Theoretic Approach
Information Recovery from Pairwise Measurements: A Shannon-Theoretic Approach Yuxin Chen Statistics Stanford University Email: yxchen@stanfordedu Changho Suh EE KAIST Email: chsuh@kaistackr Andrea J Goldsmith
More informationOptimality and Sub-optimality of Principal Component Analysis for Spiked Random Matrices
Optimality and Sub-optimality of Principal Component Analysis for Spiked Random Matrices Alex Wein MIT Joint work with: Amelia Perry (MIT), Afonso Bandeira (Courant NYU), Ankur Moitra (MIT) 1 / 22 Random
More informationMGMT 69000: Topics in High-dimensional Data Analysis Falll 2016
MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 Lecture 14: Information Theoretic Methods Lecturer: Jiaming Xu Scribe: Hilda Ibriga, Adarsh Barik, December 02, 2016 Outline f-divergence
More informationA Modified Method Using the Bethe Hessian Matrix to Estimate the Number of Communities
Journal of Advanced Statistics, Vol. 3, No. 2, June 2018 https://dx.doi.org/10.22606/jas.2018.32001 15 A Modified Method Using the Bethe Hessian Matrix to Estimate the Number of Communities Laala Zeyneb
More informationHigh-dimensional graphical model selection: Practical and information-theoretic limits
1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John
More informationCommunity Detection. Data Analytics - Community Detection Module
Community Detection Data Analytics - Community Detection Module Zachary s karate club Members of a karate club (observed for 3 years). Edges represent interactions outside the activities of the club. Community
More informationarxiv: v1 [math.pr] 29 Mar 2017
Community detection and stochastic block models: recent developments arxiv:1703.10146v1 [math.pr] 29 Mar 2017 Emmanuel Abbe March 30, 2017 Abstract The stochastic block model (SBM) is a random graph model
More informationPlanted Cliques, Iterative Thresholding and Message Passing Algorithms
Planted Cliques, Iterative Thresholding and Message Passing Algorithms Yash Deshpande and Andrea Montanari Stanford University November 5, 2013 Deshpande, Montanari Planted Cliques November 5, 2013 1 /
More informationSpectral Clustering for Dynamic Block Models
Spectral Clustering for Dynamic Block Models Sharmodeep Bhattacharyya Department of Statistics Oregon State University January 23, 2017 Research Computing Seminar, OSU, Corvallis (Joint work with Shirshendu
More informationHigh-dimensional graphical model selection: Practical and information-theoretic limits
1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John
More informationarxiv: v1 [stat.ml] 6 Nov 2014
A Generic Sample Splitting Approach for Refined Community Recovery in Stochastic Bloc Models arxiv:1411.1469v1 [stat.ml] 6 Nov 2014 Jing Lei 1 and Lingxue Zhu 2 1,2 Department of Statistics, Carnegie Mellon
More informationELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018
ELE 538B: Mathematics of High-Dimensional Data Spectral methods Yuxin Chen Princeton University, Fall 2018 Outline A motivating application: graph clustering Distance and angles between two subspaces Eigen-space
More informationSparse Superposition Codes for the Gaussian Channel
Sparse Superposition Codes for the Gaussian Channel Florent Krzakala (LPS, Ecole Normale Supérieure, France) J. Barbier (ENS) arxiv:1403.8024 presented at ISIT 14 Long version in preparation Communication
More informationThe Tightness of the Kesten-Stigum Reconstruction Bound for a Symmetric Model With Multiple Mutations
The Tightness of the Kesten-Stigum Reconstruction Bound for a Symmetric Model With Multiple Mutations City University of New York Frontier Probability Days 2018 Joint work with Dr. Sreenivasa Rao Jammalamadaka
More informationarxiv: v1 [stat.ml] 29 Jul 2012
arxiv:1207.6745v1 [stat.ml] 29 Jul 2012 Universally Consistent Latent Position Estimation and Vertex Classification for Random Dot Product Graphs Daniel L. Sussman, Minh Tang, Carey E. Priebe Johns Hopkins
More informationTractable Upper Bounds on the Restricted Isometry Constant
Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.
More informationSpectral Partitiong in a Stochastic Block Model
Spectral Graph Theory Lecture 21 Spectral Partitiong in a Stochastic Block Model Daniel A. Spielman November 16, 2015 Disclaimer These notes are not necessarily an accurate representation of what happened
More informationCompressed Sensing and Linear Codes over Real Numbers
Compressed Sensing and Linear Codes over Real Numbers Henry D. Pfister (joint with Fan Zhang) Texas A&M University College Station Information Theory and Applications Workshop UC San Diego January 31st,
More informationarxiv: v1 [stat.me] 6 Nov 2014
Network Cross-Validation for Determining the Number of Communities in Network Data Kehui Chen 1 and Jing Lei arxiv:1411.1715v1 [stat.me] 6 Nov 014 1 Department of Statistics, University of Pittsburgh Department
More informationBenchmarking recovery theorems for the DC-SBM
Benchmarking recovery theorems for the DC-SBM Yali Wan Department of Statistics University of Washington Seattle, WA 98195-4322, USA yaliwan@washington.edu Marina Meilă Department of Statistics University
More informationMatrix estimation by Universal Singular Value Thresholding
Matrix estimation by Universal Singular Value Thresholding Courant Institute, NYU Let us begin with an example: Suppose that we have an undirected random graph G on n vertices. Model: There is a real symmetric
More informationBenchmarking recovery theorems for the DC-SBM
Benchmarking recovery theorems for the DC-SBM Yali Wan Department of Statistics University of Washington Seattle, WA 98195-4322, USA Marina Meila Department of Statistics University of Washington Seattle,
More informationSpectral Redemption: Clustering Sparse Networks
Spectral Redemption: Clustering Sparse Networks Florent Krzakala Cristopher Moore Elchanan Mossel Joe Neeman Allan Sly, et al. SFI WORKING PAPER: 03-07-05 SFI Working Papers contain accounts of scienti5ic
More informationNetwork Representation Using Graph Root Distributions
Network Representation Using Graph Root Distributions Jing Lei Department of Statistics and Data Science Carnegie Mellon University 2018.04 Network Data Network data record interactions (edges) between
More informationComputational Lower Bounds for Community Detection on Random Graphs
Computational Lower Bounds for Community Detection on Random Graphs Bruce Hajek, Yihong Wu, Jiaming Xu Department of Electrical and Computer Engineering Coordinated Science Laboratory University of Illinois
More informationCLUSTERING over graphs is a classical problem with
Maximum Likelihood Latent Space Embedding of Logistic Random Dot Product Graphs Luke O Connor, Muriel Médard and Soheil Feizi ariv:5.85v3 [stat.ml] 3 Aug 27 Abstract A latent space model for a family of
More informationTheory and Methods for the Analysis of Social Networks
Theory and Methods for the Analysis of Social Networks Alexander Volfovsky Department of Statistical Science, Duke University Lecture 1: January 16, 2018 1 / 35 Outline Jan 11 : Brief intro and Guest lecture
More informationTopics in Network Models. Peter Bickel
Topics in Network Models MSU, Sepember, 2012 Peter Bickel Statistics Dept. UC Berkeley (Joint work with S. Bhattacharyya UC Berkeley, A. Chen Google, D. Choi UC Berkeley, E. Levina U. Mich and P. Sarkar
More informationNetwork Cross-Validation for Determining the Number of Communities in Network Data
Network Cross-Validation for Determining the Number of Communities in Network Data Kehui Chen and Jing Lei University of Pittsburgh and Carnegie Mellon University August 1, 2016 Abstract The stochastic
More informationLearning latent structure in complex networks
Learning latent structure in complex networks Lars Kai Hansen www.imm.dtu.dk/~lkh Current network research issues: Social Media Neuroinformatics Machine learning Joint work with Morten Mørup, Sune Lehmann
More informationIntroduction to graphical models: Lecture III
Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction
More informationSubmodularity in Machine Learning
Saifuddin Syed MLRG Summer 2016 1 / 39 What are submodular functions Outline 1 What are submodular functions Motivation Submodularity and Concavity Examples 2 Properties of submodular functions Submodularity
More informationCOMMUNITY DETECTION IN SPARSE NETWORKS VIA GROTHENDIECK S INEQUALITY
COMMUNITY DETECTION IN SPARSE NETWORKS VIA GROTHENDIECK S INEQUALITY OLIVIER GUÉDON AND ROMAN VERSHYNIN Abstract. We present a simple and flexible method to prove consistency of semidefinite optimization
More informationBasic models and questions in statistical network analysis
Basic models and questions in statistical network analysis Miklós Z. Rácz with Sébastien Bubeck September 12, 2016 Abstract Extracting information from large graphs has become an important statistical
More informationRobust Principal Component Analysis
ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M
More information5. Density evolution. Density evolution 5-1
5. Density evolution Density evolution 5-1 Probabilistic analysis of message passing algorithms variable nodes factor nodes x1 a x i x2 a(x i ; x j ; x k ) x3 b x4 consider factor graph model G = (V ;
More informationFinal Exam, Machine Learning, Spring 2009
Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3
More information11. Learning graphical models
Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical
More informationAdvanced Machine Learning & Perception
Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 6 Standard Kernels Unusual Input Spaces for Kernels String Kernels Probabilistic Kernels Fisher Kernels Probability Product Kernels
More informationCS369N: Beyond Worst-Case Analysis Lecture #4: Probabilistic and Semirandom Models for Clustering and Graph Partitioning
CS369N: Beyond Worst-Case Analysis Lecture #4: Probabilistic and Semirandom Models for Clustering and Graph Partitioning Tim Roughgarden April 25, 2010 1 Learning Mixtures of Gaussians This lecture continues
More informationTHE PHYSICS OF COUNTING AND SAMPLING ON RANDOM INSTANCES. Lenka Zdeborová
THE PHYSICS OF COUNTING AND SAMPLING ON RANDOM INSTANCES Lenka Zdeborová (CEA Saclay and CNRS, France) MAIN CONTRIBUTORS TO THE PHYSICS UNDERSTANDING OF RANDOM INSTANCES Braunstein, Franz, Kabashima, Kirkpatrick,
More informationECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis
ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear
More informationSHARED INFORMATION. Prakash Narayan with. Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe
SHARED INFORMATION Prakash Narayan with Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe 2/40 Acknowledgement Praneeth Boda Himanshu Tyagi Shun Watanabe 3/40 Outline Two-terminal model: Mutual
More informationEE229B - Final Project. Capacity-Approaching Low-Density Parity-Check Codes
EE229B - Final Project Capacity-Approaching Low-Density Parity-Check Codes Pierre Garrigues EECS department, UC Berkeley garrigue@eecs.berkeley.edu May 13, 2005 Abstract The class of low-density parity-check
More informationAlgorithms for Graph Partitioning on the Planted Partition Model
Algorithms for Graph Partitioning on the Planted Partition Model Anne Condon, 1,* Richard M. Karp 2, 1 Computer Sciences Department, University of Wisconsin, 1210 West Dayton St., Madison, WI 53706; e-mail:
More informationPhase transitions in low-rank matrix estimation
Phase transitions in low-rank matrix estimation Florent Krzakala http://krzakala.github.io/lowramp/ arxiv:1503.00338 & 1507.03857 Thibault Lesieur Lenka Zeborová Caterina Debacco Cris Moore Jess Banks
More informationTwo-sample hypothesis testing for random dot product graphs
Two-sample hypothesis testing for random dot product graphs Minh Tang Department of Applied Mathematics and Statistics Johns Hopkins University JSM 2014 Joint work with Avanti Athreya, Vince Lyzinski,
More informationImpact of regularization on Spectral Clustering
Impact of regularization on Spectral Clustering Antony Joseph and Bin Yu December 5, 2013 Abstract The performance of spectral clustering is considerably improved via regularization, as demonstrated empirically
More informationELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications
ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications Professor M. Chiang Electrical Engineering Department, Princeton University March
More informationSHARED INFORMATION. Prakash Narayan with. Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe
SHARED INFORMATION Prakash Narayan with Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe 2/41 Outline Two-terminal model: Mutual information Operational meaning in: Channel coding: channel
More informationLearning Gaussian Graphical Models with Unknown Group Sparsity
Learning Gaussian Graphical Models with Unknown Group Sparsity Kevin Murphy Ben Marlin Depts. of Statistics & Computer Science Univ. British Columbia Canada Connections Graphical models Density estimation
More informationA Potpourri of Nonlinear Algebra
a a 2 a 3 + c c 2 c 3 =2, a a 3 b 2 + c c 3 d 2 =0, a 2 a 3 b + c 2 c 3 d =0, a 3 b b 2 + c 3 d d 2 = 4, a a 2 b 3 + c c 2 d 3 =0, a b 2 b 3 + c d 2 d 3 = 4, a 2 b b 3 + c 2 d 3 d =4, b b 2 b 3 + d d 2
More informationLinear and conic programming relaxations: Graph structure and message-passing
Linear and conic programming relaxations: Graph structure and message-passing Martin Wainwright UC Berkeley Departments of EECS and Statistics Banff Workshop Partially supported by grants from: National
More informationMixed Membership Stochastic Blockmodels
Mixed Membership Stochastic Blockmodels Journal of Machine Learning Research, 2008 by E.M. Airoldi, D.M. Blei, S.E. Fienberg, E.P. Xing as interpreted by Ted Westling STAT 572 Intro Talk April 22, 2014
More informationMixed Membership Stochastic Blockmodels
Mixed Membership Stochastic Blockmodels (2008) Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg and Eric P. Xing Herrissa Lamothe Princeton University Herrissa Lamothe (Princeton University) Mixed
More informationStatistical Methods for SVM
Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,
More informationLecture 14 February 28
EE/Stats 376A: Information Theory Winter 07 Lecture 4 February 8 Lecturer: David Tse Scribe: Sagnik M, Vivek B 4 Outline Gaussian channel and capacity Information measures for continuous random variables
More informationOn the low-rank approach for semidefinite programs arising in synchronization and community detection
JMLR: Workshop and Conference Proceedings vol 49:1, 016 On the low-rank approach for semidefinite programs arising in synchronization and community detection Afonso S. Bandeira Department of Mathematics,
More informationPart 1: Les rendez-vous symétriques
Part 1: Les rendez-vous symétriques Cris Moore, Santa Fe Institute Joint work with Alex Russell, University of Connecticut A walk in the park L AND NL-COMPLETENESS 313 there are n locations in a park you
More informationConvex Relaxation Methods for Community Detection
Convex Relaxation Methods for Community Detection Xiaodong Li, Yudong Chen and Jiaming Xu University of California, Davis, Cornell University and Duke University arxiv:1810.00315v1 [math.st] 30 Sep 2018
More informationSpectral Method and Regularized MLE Are Both Optimal for Top-K Ranking
Spectral Method and Regularized MLE Are Both Optimal for Top-K Ranking Yuxin Chen Electrical Engineering, Princeton University Joint work with Jianqing Fan, Cong Ma and Kaizheng Wang Ranking A fundamental
More informationLinear inverse problems on Erdős-Rényi graphs: Information-theoretic limits and efficient recovery
Linear inverse probles on Erdős-Rényi graphs: Inforation-theoretic liits and efficient recovery Eanuel Abbe, Afonso S. Bandeira, Annina Bracher, Ait Singer Princeton University Abstract This paper considers
More informationDraft. On Limiting Expressions for the Capacity Region of Gaussian Interference Channels. Mojtaba Vaezi and H. Vincent Poor
Preliminaries Counterexample Better Use On Limiting Expressions for the Capacity Region of Gaussian Interference Channels Mojtaba Vaezi and H. Vincent Poor Department of Electrical Engineering Princeton
More informationRapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization
Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization Shuyang Ling Department of Mathematics, UC Davis Oct.18th, 2016 Shuyang Ling (UC Davis) 16w5136, Oaxaca, Mexico Oct.18th, 2016
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Classification: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network
More informationDATA MINING LECTURE 9. Minimum Description Length Information Theory Co-Clustering
DATA MINING LECTURE 9 Minimum Description Length Information Theory Co-Clustering MINIMUM DESCRIPTION LENGTH Occam s razor Most data mining tasks can be described as creating a model for the data E.g.,
More information18.2 Continuous Alphabet (discrete-time, memoryless) Channel
0-704: Information Processing and Learning Spring 0 Lecture 8: Gaussian channel, Parallel channels and Rate-distortion theory Lecturer: Aarti Singh Scribe: Danai Koutra Disclaimer: These notes have not
More informationMatrix Completion: Fundamental Limits and Efficient Algorithms
Matrix Completion: Fundamental Limits and Efficient Algorithms Sewoong Oh PhD Defense Stanford University July 23, 2010 1 / 33 Matrix completion Find the missing entries in a huge data matrix 2 / 33 Example
More informationLecture 16 Oct 21, 2014
CS 395T: Sublinear Algorithms Fall 24 Prof. Eric Price Lecture 6 Oct 2, 24 Scribe: Chi-Kit Lam Overview In this lecture we will talk about information and compression, which the Huffman coding can achieve
More informationDeciphering and modeling heterogeneity in interaction networks
Deciphering and modeling heterogeneity in interaction networks (using variational approximations) S. Robin INRA / AgroParisTech Mathematical Modeling of Complex Systems December 2013, Ecole Centrale de
More informationCoding into a source: an inverse rate-distortion theorem
Coding into a source: an inverse rate-distortion theorem Anant Sahai joint work with: Mukul Agarwal Sanjoy K. Mitter Wireless Foundations Department of Electrical Engineering and Computer Sciences University
More informationProbability Models of Information Exchange on Networks Lecture 3
Probability Models of Information Exchange on Networks Lecture 3 Elchanan Mossel UC Berkeley All Right Reserved The Bayesian View of the Jury Theorem Recall: we assume 0/1 with prior probability (0.5,0.5).
More informationSVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)
Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular
More informationInformation-theoretic Limits for Community Detection in Network Models
Information-theoretic Limits for Community Detection in Network Models Chuyang Ke Department of Computer Science Purdue University West Lafayette, IN 47907 cke@purdue.edu Jean Honorio Department of Computer
More informationCommunity Detection by L 0 -penalized Graph Laplacian
Community Detection by L 0 -penalized Graph Laplacian arxiv:1706.10273v1 [stat.me] 30 Jun 2017 Chong Chen School of Mathematical Science, Peking University and Ruibin Xi School of Mathematical Science,
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationMixed Membership Stochastic Blockmodels
Mixed Membership Stochastic Blockmodels Journal of Machine Learning Research, 2008 by E.M. Airoldi, D.M. Blei, S.E. Fienberg, E.P. Xing as interpreted by Ted Westling STAT 572 Final Talk May 8, 2014 Ted
More informationCSCE 478/878 Lecture 6: Bayesian Learning
Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell
More information