Information Recovery from Pairwise Measurements

Size: px

Start display at page:

Download "Information Recovery from Pairwise Measurements"

Lionel Hart
5 years ago
Views:

1 Information Recovery from Pairwise Measurements A Shannon-Theoretic Approach Yuxin Chen, Changho Suh, Andrea Goldsmith Stanford University KAIST Page 1

2 Recovering data from correlation measurements A large collection of data instances In many applications, it is difficult/infeasible to measure each variable directly feasible to measure pairwise correlation Page 2

3 Motivating application: multi-image alignment Structure from motion: estimate 3D structures from 2D image sequences Key step: joint alignment input: (noisy) estimates of relative camera poses goal: jointly recover all camera poses Page 3

4 Motivating application: graph clustering Real-world networks exhibit community structures input: pairwise similarities between members goal: uncover hidden clusters Page 4

5 This talk: recovery from pairwise difference measurements Goal: recover a collection of variables {x i } Can only measure several pairwise difference x i x j (broadly defined) Page 5

6 This talk: recovery from pairwise difference measurements Goal: recover a collection of variables {x i } Can only measure several pairwise difference x i x j (broadly defined) Examples: joint alignment x i : (angle θ i, position z i ) relative rotation/translation (θ i θ j, z i z j ) Page 5

7 This talk: recovery from pairwise difference measurements Goal: recover a collection of variables {x i } Can only measure several pairwise difference x i x j (broadly defined) Examples: joint alignment x i : (angle θ i, position z i ) relative rotation/translation (θ i θ j, z i z j ) graph partition x i : membership (which partition it belongs to) { 1, if i, j same partition cluster agreement: x i x j = 0, else. Page 5

This talk: recovery from pairwise difference measurements Goal: recover a collection of variables {x i } Can only measure several pairwise difference x i x j (broadly defined) Examples: joint

8 This talk: recovery from pairwise difference measurements Goal: recover a collection of variables {x i } Can only measure several pairwise difference x i x j (broadly defined) Examples: joint alignment x i : (angle θ i, position z i ) relative rotation/translation (θ i θ j, z i z j ) graph partition x i : membership (which partition it belongs to) { 1, if i, j same partition cluster agreement: x i x j = 0, else. pairwise maps, parity reads,... Page 5

9 A fundamental-limit perspective? A flurry of activity in recovery algorithm design convex program combinatorial spectral method Page 6

10 A fundamental-limit perspective? A flurry of activity in recovery algorithm design convex program combinatorial spectral method What are the fundamental recovery limits? min. sample complexity? how noisy the measurements can be? Page 6

sample complexity? how noisy the measurements can be?

11 A fundamental-limit perspective? A flurry of activity in recovery algorithm design convex program combinatorial spectral method What are the fundamental recovery limits? min. sample complexity? how noisy the measurements can be? So far mostly studied in a model-specific manner Seek a more unified framework Page 6

12 Problem setup: a Shannon-theoretic framework x 3 x 1 x 4 x 5 x 2 x 6 y 26 x i {0,, M 1} x 7 Information network n vertices discrete inputs w/ alphabet size: could scale with n M Page 7

13 Problem setup: a Shannon-theoretic framework y 34 x 3 x 1 y 13 y 12 measurement graph G x 4 y 35 y 15 x 5 y 24 y 27 x 6 x 2 y 26 measurements of x 1 x 2, x 1 x 3, x 1 x 5, x 7 y 67 Pairwise difference measurements truth: x i x j measurements: y ij (arbitrary alphabet) can be corrupted by noise, distortion,... Graphical representation observe y ij (i, j) G Page 7

14 Problem setup: a Shannon-theoretic framework x1 - x2 x1 x3 x1 - x5 x2 x4 x5 x6 x7 x2 - x7 x6 - x7 p (yij xi-xj ) channel y12 channel y15 channel y27 channel y67 Channel-decoding perspective each measurement is modeled by an i.i.d. channel transition prob. P (yij xi xj ) Page 7

15 Problem setup: a Shannon-theoretic framework x1 - x2 x1 x3 p (yij xi-xj ) channel y12 channel y15 channel y27 channel y67 x1 - x5 x2 x4 x5 x6 x7 x2 - x7 x6 - x7 Goal: recover {xi} exactly (up to global offset) Unified framework for decoding model capture similarities among various applications Page 7

16 What factors dictate hardness of recovery? x 1 x 2 = 1 channel y 12 P 1 x 1 x 2 x 1 x 2 = 2 channel y 12 P 2 P l := P( y ij x i x j = l) Channel distance/resolution Captured by KL( P l P k ) or Hellinger( P l P k ) or... Page 8

17 What factors dictate hardness of recovery? x 1 x 2 = 1 channel y 12 P 1 x 1 x 2 x 1 x 2 = 3 channel y 12 P 3 P l := P( y ij x i x j = l) Minimum channel distance/resolution min l k KL( P l P k ) := KL min or min l k Hellinger( P l P k ) := Hellinger min or... Uncoded input Page 8

18 What factors dictate hardness of recovery? Graph connectivity measurement graph G o Impossible to recover isolated vertices Page 9

19 What factors dictate hardness of recovery? Graph connectivity o Over-sparse connectivity is fragile measurement graph G Page 9

20 What factors dictate hardness of recovery? Graph connectivity measurement graph G o Sufficient connectivity removes fragility! Page 9

21 Agenda Page 10

22 Main result: Erdos-Renyi random graph Erdos-Renyi graph G(n, p obs ). Each edge (i, j) is present independently w.p. p obs (p obs = 1) (p obs = 0.3) Page 11

23 Main result: Erdos-Renyi random graph Erdos-Renyi graph G(n, p obs ). Each edge (i, j) is present independently w.p. p obs (p obs = 1) (p obs = 0.3) ML decoding works if Hellinger min > 2 log n + 4 log M p obs n Page 11

24 Main result: Erdos-Renyi random graph Erdos-Renyi graph G(n, p obs ). Each edge (i, j) is present independently w.p. p obs (p obs = 1) (p obs = 0.3) ML decoding works if Hellinger min > 2 log n + 4 log M p obs n Converse: no method works if KL min < log n p obs n Page 11

25 Main result: Erdos-Renyi random graph Erdos-Renyi graph G(n, p obs ). Each edge (i, j) is present independently w.p. p obs (p obs = 1) (p obs = 0.3) ML decoding works if Hellinger min > Converse: no method works if 2 log n + 4 log M p obs n KL min < log n p obs n non-asymptotic! Page 11

26 Main result: Erdos-Renyi random graph (p obs = 1) (p obs = 0.3) Page 12

27 Main result: Erdos-Renyi random graph (p obs = 1) (p obs = 0.3) In the hard regime where dp l dp k 1: KL min 2 Hellinger min Page 12

28 Main result: Erdos-Renyi random graph (p obs = 1) (p obs = 0.3) In the hard regime where dp l dp k 1: KL min 2 Hellinger min Recovery conditions ML works if Hellinger min 2 log n + 4 log M > p obs n Impossible if Hellinger min < log n 2p obs n Page 12

29 Main result: Erdos-Renyi random graph (p obs = 1) (p obs = 0.3) In the hard regime where dp l dp k 1: KL min 2 Hellinger min Fundamental recovery condition (assuming M poly(n)) Hellinger min log n p obs n Page 12

30 Main result: Erdos-Renyi random graph (p obs = 1) (p obs = 0.3) In the hard regime where dp l dp k 1: KL min 2 Hellinger min Fundamental recovery condition (assuming M poly(n)) Hellinger min log n p obs n avg-degree Hellinger min log n Page 12

31 Intuition Fundamental recovery condition (Erdos-Renyi graphs). avg-degree Hellinger min log n [x i x j ] 1 i,j n Page 13

32 Intuition Fundamental recovery condition (Erdos-Renyi graphs). avg-degree Hellinger min log n [x i x j ] 1 i,j n hypotheses: H 0 : x = [0, 0,, 0] H 1 : x = [1, 0,, 0] H 0 and H 1 differ only at the highlighted region ( avg-degree pieces of info) Page 13

0 : x = [0, 0,, 0] H 2 : x = [0, 1,, 0] H 0 and H 2 differ only

33 Intuition Fundamental recovery condition (Erdos-Renyi graphs). avg-degree Hellinger min log n [x i x j ] 1 i,j n hypotheses: H 0 : x = [0, 0,, 0] H 2 : x = [0, 1,, 0] H 0 and H 2 differ only at the highlighted region ( avg-degree pieces of info) Page 13

34 Intuition Fundamental recovery condition (Erdos-Renyi graphs). avg-degree Hellinger min log n (1) [x i x j ] 1 i,j n hypotheses: H 0 : x = [0, 0,, 0] H n : x = [0, 0,, 1] n minimally-separated hypotheses needs at least log n bits the consequence of uncoded inputs Page 13

35 Minimal sample complexity Fundamental recovery condition (Erdos-Renyi graphs). avg-degree Hellinger min log n Sample complexity: n avg-degree Page 14

36 Minimal sample complexity Fundamental recovery condition (Erdos-Renyi graphs). avg-degree Hellinger min log n Sample complexity: n avg-degree Min sample complexity n log n Hellinger min Page 14

37 How general this limit is? Fundamental recovery condition (Erdos-Renyi graphs). avg-degree Hellinger min log n Can we go beyond Erdos-Renyi graphs? Page 15

38 Main results: homogeneous graphs random geometric graph (generalized) ring Page 16

39 Main results: homogeneous graphs random geometric graph (generalized) ring Fundamental recovery condition (various homogeneous graphs). avg-degree Hellinger min log n Homogeneous graphs: min-degree max-degree mincut balanced cut-set distributions Page 16

40 Main results: homogeneous graphs random geometric graph (generalized) ring Fundamental recovery condition (various homogeneous graphs). avg-degree Hellinger min log n Depend almost only on graph sparsity Page 16

41 Main results: general graphs mincut = 5 Page 17

42 Main results: general graphs mincut = 5 Information across the minimum cut set: mincut Hellinger min Page 17

43 Main results: general graphs Recovery conditions mincut = 5 ML works if mincut Hellinger min τ cut + log n + log M Impossible if mincut Hellinger min τ cut + mincut max-degree log n Page 17

44 Cut-homogeneity exponent τ cut captures 1 growth rate of the cut-set distribution the ratio mincut avg-degree 1 τ cut := maxk 1 k N (k mincut), where N (K) := {cut : cut-size K} Page 18

45 Cut-homogeneity exponent τ cut captures 1 growth rate of the cut-set distribution the ratio mincut avg-degree In general: τ cut 1 For homogeneous graphs: τ cut log n 1 τ cut := maxk 1 k N (k mincut), where N (K) := {cut : cut-size K} Page 18

46 Summary of main results mincut Hellinger min (1 log n) gap log n avg-deg Hellinger min log n gap 1 avg-deg Hellinger min log n gap 4(1 + 2 log M log n ) Page 19

47 Concrete application: stochastic block model Stochastic block model: 2 clusters edge densities: within-cluster: p = α log n n across-cluster: q = β log n n (q < p) adjacency matrix Page 20

feasible if impossible if α β > 2 α β < 1/2

48 Concrete application: stochastic block model Stochastic block model: 2 clusters edge densities: within-cluster: p = α log n n across-cluster: q = β log n n (q < p) adjacency matrix Our theory: feasible if impossible if α β > 2 α β < 1/2 Fundamental limit (Abbe et al. and Mossel et al.): α β > 2 Page 20

49 Concluding remarks A unified framework to determine recovery limits Interplay between IT and graph theory Tighten the pre-constants? Arxiv: Page 21

Information Recovery from Pairwise Measurements: A Shannon-Theoretic Approach

Information Recovery from Pairwise Measurements: A Shannon-Theoretic Approach Yuxin Chen Statistics Stanford University Email: yxchen@stanfordedu Changho Suh EE KAIST Email: chsuh@kaistackr Andrea J Goldsmith