The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences

Size: px

Start display at page:

Download "The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences"

Calvin Sharp
5 years ago
Views:

1 The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences Yuxin Chen Emmanuel Candès Department of Statistics, Stanford University, Sep. 2016

2 Nonconvex optimization is everywhere For instance, maximum likelihood estimation is nonconvex in numerous problems maximize x l(x; y) subject to x S matrix completion phase retrieval dictionary learning blind deconvolution robust PCA...

15, Li et al 16, Yi et al 16, Zhang et al 16, Wang et al 16,.

3 Keshavan et al 08, Netrapalli et al 13, Candès et al 14, Soltanolkotabi 14, Jain et al 14, Sun et al 14, Chen et al 15, Cai et al 15, Tu et al 15, Sun et al 15, White et al 15, Li et al 16, Yi et al 16, Zhang et al 16, Wang et al 16,... Recent flurry of research in nonconvex procedures Nice geometry within a neighborhood around x (basin of attraction)

4 Keshavan et al 08, Netrapalli et al 13, Candès et al 14, Soltanolkotabi 14, Jain et al 14, Sun et al 14, Chen et al 15, Cai et al 15, Tu et al 15, Sun et al 15, White et al 15, Li et al 16, Yi et al 16, Zhang et al 16, Wang et al 16,... Recent flurry of research in nonconvex procedures Nice geometry within a neighborhood around x (basin of attraction) Suggests two-stage paradigms 1. Start from an appropriate initial point

5 Keshavan et al 08, Netrapalli et al 13, Candès et al 14, Soltanolkotabi 14, Jain et al 14, Sun et al 14, Chen et al 15, Cai et al 15, Tu et al 15, Sun et al 15, White et al 15, Li et al 16, Yi et al 16, Zhang et al 16, Wang et al 16,... Recent flurry of research in nonconvex procedures Nice geometry within a neighborhood around x (basin of attraction) Suggests two-stage paradigms 1. Start from an appropriate initial point 2. Proceed via some iterative updates

6 This talk: a discrete nonconvex problem

7 Joint alignment from pairwise differences n unknown variables: x1,, xn m possible states: xi {1, 2,, m} x1 = x2 = x3 = 12...

8 Joint alignment from pairwise differences n unknown variables: x1,, xn m possible states: xi {1, 2,, m}

9 Joint alignment from pairwise differences n unknown variables: x1,, xn m possible states: xi {1, 2,, m}

10 Joint alignment from pairwise differences Measurements: pairwise differences ind. yi,j = xi xj + ηi,j mod m, {z} i 6= j noise Bandiera, Charikar, Singer, Zhu 13; Chen, Guibas, Huang 14 xi xj mod m

11 Joint differences alignment from pairwise di eren Joint alignment from pairwise Measurements: pairwise differences (0) z ind. yi,j = xi xj + ηi,j mod m, {z} w(1) z (1) w(2) z (2) w(3) z (2) i 6= j noise ( Measurements: di erences xi xj mod pairwise m xi with prob.m 0 ind xj mod yi,j = e.g. random corruption model Uniform (m) else ind. yi,j = xi xj + i,j mod m ( {z} xi xj mod m with ind noiseprob. 0 yi,j = Uniform (m) else C xi = 1 () 2 3 rate 0 : non-corruption xi = e1 = 6. 7 xi = xj = 0; yi,j ) 2 L1,1 6.. > Bandiera, Charikar, Singer, Zhu 13; 14 xi Chen,xGuibas, LHuang L=4 j i,j () ` (xi... 3 L1,n xi

12 Joint differences alignment from pairwise di eren Joint alignment from pairwise Measurements: pairwise differences (0) z ind. yi,j = xi xj + ηi,j mod m, {z} w(1) z (1) w(2) z (2) w(3) z (2) i 6= j noise ( Measurements: di erences xi xj mod pairwise m xi with prob.m 0 ind xj mod yi,j = e.g. random corruption model Uniform (m) else ind. yi,j = xi xj + i,j mod m ( {z} xi xj mod m with ind noiseprob. 0 yi,j = Uniform (m) else C 2 3 rate 0 : non-corruption xi = 1 () xi = e1 = 6. 7 xi = 2 () xi Goal: recover {xi } (up to global offset) ` (xi xj = 0; yi,j ) 2 3 L1,1 L1,n Bandiera, Charikar, Singer, Zhu 13; 14 x> LHuang L = j i,j. i Chen,xGuibas,

13 Motivation: multi-image alignment Jointly align a collection of images/shapes of the same physical object

14 Motivation: multi-image alignment Jointly align a collection of images/shapes of the same physical object x i : angle of rotation associated with each shape computer vision/graphics

15 Motivation: multi-image alignment Step 1: compute pairwise estimates of relative angles of rotations...

16 Motivation: multi-image alignment Step 1: compute pairwise estimates of relative angles of rotations... Step 2: aggregate these pairwise information for joint alignment

17 Maximum likelihood estimates (MLE) maximize {xi} l (x i, x j ; y i,j ) i,j subj. to x i {1,, m}, 1 i n Log-likelihood function l may be complicated

18 Maximum likelihood estimates (MLE) maximize {xi} l (x i, x j ; y i,j ) i,j subj. to x i {1,, m}, 1 i n Log-likelihood function l may be complicated Discrete input space

19 Maximum likelihood estimates (MLE) maximize {xi} l (x i, x j ; y i,j ) i,j subj. to x i {1,, m}, 1 i n Log-likelihood function l may be complicated Discrete input space Looks daunting

20 Another look in lifted space Discrete variables orthogonal vectors in higher-dimensional space x i =1 () x i = e 1 = x i =2 () x i = e 1 = x i = j x i = e j

21 Another look in lifted space Pairwise sample y i,j encode l(x i, x j ) by L i,j R m m [L i,j ] α,β = l(x i = α, x j = β)

22 Another look in lifted space Pairwise sample y i,j encode l(x i, x j ) by L i,j R m m [L i,j ] α,β = l(x i = α, x j = β) e.g. random corruption model { { x y i,j = i x j, w.p. π 0 log(π l(x Unif(m), else i, x j ) = π 0 m ), if x i x j = y i,j log( 1 π 0 ), else m

23 Another look in lifted space Pairwise sample y i,j encode l(x i, x j ) by L i,j R m m [L i,j ] α,β = l(x i = α, x j = β) e.g. random corruption model { { x y i,j = i x j, w.p. π 0 log(π l(x Unif(m), else i, x j ) = π 0 m ), if x i x j = y i,j log( 1 π 0 ), else m `(x i =2,x j = 5; y i,j = 2)

24 Another look in lifted space Pairwise sample y i,j encode l(x i, x j ) by L i,j R m m [L i,j ] α,β = l(x i = α, x j = β) e.g. random corruption model { { x y i,j = i x j, w.p. π 0 log(π l(x Unif(m), else i, x j ) = π 0 m ), if x i x j = y i,j log( 1 π 0 ), else m `(x i =2,x j = 5; y i,j = 2) This enables quadratic representation l(x i, x j ) = x i L i,j x j x > i L i,j x j

25 MLE is equivalent to a binary quadratic program L = 4 () L 1,1 L 1,n L n,1 L n,n L i,j

26 MLE is equivalent to a binary quadratic program L = 4 () L 1,1 L 1,n L n,1 L n,n L i,j maximize l(x i x j; y ij) i,j subj. to x i {1,, m} maximize x x Lx subj. to x = x 1. x n x i {e 1,, e m}

27 MLE is equivalent to a binary quadratic program L = 4 () L 1,1 L 1,n L n,1 L n,n L i,j maximize l(x i x j; y ij) i,j subj. to x i {1,, m} maximize x x Lx subj. to x = x 1. x n x i {e 1,, e m} This is essentially nonconvex constrained PCA

28 How to solve nonconvex constrained PCA? PCA maximize x x Lx subj. to x = 1 Power method: for t = 1, 2, z (t) z (t) = Lz (t 1) normalize ( z (t))

29 How to solve nonconvex constrained PCA? PCA maximize x x Lx subj. to x = 1 Constrained PCA maximize x x Lx subj. to x i {e 1,, e m} Power method: for t = 1, 2, z (t) z (t) = Lz (t 1) normalize ( z (t)) Projected power method: for t = 1, 2, z (t) z (t) = Lz (t 1) ( Project n µz (t) ) µ: scaling factor

30 Projection onto standard simplex maximize x={xi } x Lx s.t. x i {e 1,, e m} z (t) z (t) = Lz (t 1) ( Project n µz (t) )

31 Projection onto standard simplex maximize x={xi } x Lx s.t. x i {e 1,, e m} z (t) z (t) = Lz (t 1) Project n ( µz (t) ) n is convex hull of feasibility set, i.e. { } z = [z i] 1 i n i : 1 z i = 1; z i 0 µz (t) 1 µz (t) 2 µz (t) n

32 Illustration Projected power method: z (t+1) ( Project ) n µlz (t)

33 Illustration Projected power method: z (t+1) Lz is gradient of 1 2 z Lz Project n ( µlz (t) ) power iteration

34 Illustration Projected power method: z (t+1) Lz is gradient of 1 2 z Lz Project n ( µlz (t) ) power iteration

35 Illustration Projected power method: z (t+1) Lz is gradient of 1 2 z Lz Project n ( µlz (t) ) power iteration projection

36 Illustration Projected power method: z (t+1) Lz is gradient of 1 2 z Lz Project n ( µlz (t) ) power iteration projection

37 Illustration Projected power method: z (t+1) Lz is gradient of 1 2 z Lz Project n ( µlz (t) ) power iteration projection

38 Illustration Projected power method: z (t+1) Lz is gradient of 1 2 z Lz Project n ( µlz (t) ) power iteration projection

39 Illustration Projected power method: z (t+1) Lz is gradient of 1 2 z Lz Project n ( µlz (t) ) power iteration projection

40 Illustration Projected power method: z (t+1) Lz is gradient of 1 2 z Lz Project n ( µlz (t) ) power iteration projection

41 Initialization? L = E[L] }{{} approx. low-rank + L E [L]

42 Initialization? L = E[L] }{{} approx. low-rank + L E [L] Spectral initialization 1. ˆL rank-m approximation of L ˆL

43 Initialization? L = E[L] }{{} approx. low-rank + L E [L] Spectral initialization 1. ˆL rank-m approximation of L ˆL 2. z (0) Project n(µẑ), where ẑ is a random column of ˆL

44 Summary of projected power method (PPM) 1. Spectral initialization 2. For t = 1, 2, z (t) Project n (µlz (t 1))

45 Random corruption model ( y i,j ind = ( x i x j mod m with prob. 0 Uniform (m) else 2 3

46 Random corruption model ( y i,j ind = ( x i x j mod m with prob. 0 Uniform (m) else 2 3 Theorem (Chen-Candès 16) Fix m > 0 and set µ 1/σ 2 (L). With high prob., PPM recovers the truth exactly within O(log n) iterations if signal-to-noise ratio (SNR) not too small: π 0 > 2 log n mn

47 Implications Theorem (Chen-Candès 16) PPM succeeds within O(log n) iterations if log n non-corruption rate π 0 > 2 mn PPM succeeds even when most (i.e. 1 O ( ) log n n ) entries are corrupted

48 Implications Theorem (Chen-Candès 16) PPM succeeds within O(log n) iterations if log n non-corruption rate π 0 > 2 mn PPM succeeds even when most (i.e. 1 O ( ) log n n ) entries are corrupted Nearly linear time algorithm

49 Implications Theorem (Chen-Candès 16) PPM succeeds within O(log n) iterations if log n non-corruption rate π 0 > 2 mn PPM succeeds even when most (i.e. 1 O ( ) log n n ) entries are corrupted Nearly linear time algorithm Works for any initialization obeying z (0) x < 0.5 x

Empirical misclassification rate input corruption rate: 1-: 0 0.95 0.9 0.8 0.

50 Empirical misclassification rate input corruption rate: 1-: theory number of variables: n (m =3) 10% 1% 0.1% 0.01% 0 Misclassification rate when n and π 0 vary (µ = 10/σ 2 (L))

51 More general noise models y i,j = x i x j + η i,j mod m, where η i,j i.i.d. P 0

52 More general noise models y i,j = x i x j + η i,j mod m, where η i,j i.i.d. P 0 Distributions of y i,j under different hypotheses P 0 x i x j = 0

53 More general noise models y i,j = x i x j + η i,j mod m, where η i,j i.i.d. P 0 Distributions of y i,j under different hypotheses P 0 P 1 x i x j = 0 x i x j = 1

54 More general noise models y i,j = x i x j + η i,j mod m, where η i,j i.i.d. P 0 Distributions of y i,j under different hypotheses P 0 P 1 P 9 x i x j = 0 x i x j = 1 x i x j = 9

55 More general noise models y i,j = x i x j + η i,j mod m, where η i,j i.i.d. P 0 Distributions of y i,j under different hypotheses P 0 P 1 P 9 x i x j = 0 x i x j = 1 x i x j = 9 KL(P 0 P 1) KL(P 0 P 9)

56 More general noise models y i,j = x i x j + η i,j mod m, where η i,j i.i.d. P 0 Distributions of y i,j under different hypotheses P 0 P 1 P 9 x i x j = 0 x i x j = 1 x i x j = 9 KL(P 0 P 1) KL(P 0 P 9) Theorem (Chen-Candès 16) Fix m > 0 and set µ 1/σ 2(L). Under mild conditions, PPM succeeds within O(log n) iterations with high prob., provided that KL min := min 1 l<m KL(P0 P l) > 4 log n n

57 Interpretation: why KL min matters Theorem (Chen-Candès 16)... PPM succeeds within O(log n) iterations if KL min := min 1 l<m KL(P0 P l) > 4 log n n Suppose x 1 = = x n = 1

58 Interpretation: why KL min matters Theorem (Chen-Candès 16)... PPM succeeds within O(log n) iterations if KL min := min 1 l<m KL(P0 P l) > 4 log n n Suppose x 1 = = x n = 1

59 Interpretation: why KL min matters Theorem (Chen-Candès 16)... PPM succeeds within O(log n) iterations if KL min := min 1 l<m KL(P0 P l) > 4 log n n Suppose x 1 = = x n = 1 Peaks of E[L] reveal ground truth E[L i,j ]

60 Interpretation: why KL min matters Theorem (Chen-Candès 16)... PPM succeeds within O(log n) iterations if KL min := min 1 l<m KL(P0 P l) > 4 log n n Suppose x 1 = = x n = 1 Peaks of E[L] reveal ground truth E[L i,j ] L E[L] if KL min is sufficiently large

61 Empirical misclassification rate Modified Gaussian noise model: ( ) P {η i,j = z} exp z2 2σ, z m standard dev of Gaussian density theory number of variables: n (m =9) µ = 20/σ m (L) 1% 0.1% 0.01% 0

62 PPM is information-theoretically optimal

63 PPM is information-theoretically optimal Theorem (Chen-Candès 16) Fix m > 0. No method achieves exact recovery if KL min < 4 log n n

64 Singer 09; Wang & Singer 12; Bandeira et al 14; Boumal 16; Liu et al 16, Perry et al Large-m case: random corruption model y i,j = { x i x j, with prob. π 0 Unif(m), else Theorem (Chen-Candès 16) Suppose log n m poly(n). PPM succeeds if π 0 1 n

65 Singer 09; Wang & Singer 12; Bandeira et al 14; Boumal 16; Liu et al 16, Perry et al Large-m case: random corruption model y i,j = { x i x j, with prob. π 0 Unif(m), else Theorem (Chen-Candès 16) Suppose log n m poly(n). PPM succeeds if π 0 1 n Spiky model: when m n, model converges to x i [0, 1), y i,j = { x i x j, with prob. π 0 Unif(0, 1), else

66 Singer 09; Wang & Singer 12; Bandeira et al 14; Boumal 16; Liu et al 16, Perry et al Large-m case: random corruption model y i,j = { x i x j, with prob. π 0 Unif(m), else Theorem (Chen-Candès 16) Suppose log n m poly(n). PPM succeeds if π 0 1 n Spiky model: when m n, model converges to x i [0, 1), y i,j = { x i x j, with prob. π 0 Unif(0, 1), else Succeeds even if a dominant fraction 1 O(1/ n) of inputs are corrupted

67 Singer 09; Wang & Singer 12; Bandeira et al 14; Boumal 16; Liu et al 16, Perry et al Large-m case: random corruption model y i,j = { x i x j, with prob. π 0 Unif(m), else Theorem (Chen-Candès 16) Suppose log n m poly(n). PPM succeeds if π 0 1 n Smooth noise model if m n

π 0 Unif(m), else Theorem (Chen-Candès 16) Suppose log n m poly(n).

68 Singer 09; Wang & Singer 12; Bandeira et al 14; Boumal 16; Liu et al 16, Perry et al Large-m case: random corruption model y i,j = { x i x j, with prob. π 0 Unif(m), else Theorem (Chen-Candès 16) Suppose log n m poly(n). PPM succeeds if π 0 1 n Smooth noise model if m n Recovers each x i [0, 1) up to a resolution of 1 1 m n

69 Joint shape alignment: Chair dataset from ShapeNet 1 20 representative shapes (out of 50) 1 We add extra noise to each point of the shapes to make it more challenging.

70 Joint shape alignment: Chair dataset from ShapeNet 1 20 representative shapes (out of 50) pairwise cost l i,j(x i, x j): avg nearest-neighbor squared distance 1 We add extra noise to each point of the shapes to make it more challenging.

71 Joint shape alignment: Chair dataset from ShapeNet 1 20 representative shapes (out of 50) pairwise cost l i,j(x i, x j): avg nearest-neighbor squared distance aligned shapes 1 We add extra noise to each point of the shapes to make it more challenging.

72 Joint shape alignment: angular estimation errors 2 Cumulative distribution PPM SDP Absolute angular estimation error projected power method semidefinite relaxation Runtime 2.4 sec sec 2 We add extra noise to each point of the shapes to make it more challenging.

73 Description 111 images of Joint a toy housegraph matching: CMU House dataset Thumbnails images of a toy house

74 Description 111 images of Joint a toy housegraph matching: CMU House dataset Thumbnails images of a toy house input matches 3 representative images

75 Description 111 images of Joint a toy housegraph matching: CMU House dataset Thumbnails images of a toy house input matches optimized matches 3 representative images

76 Dixon imaging in body MRI Zhang et al., Magn. Reson. Med., phasor candidates for field inhomogeneity at each voxel candidate 1 candidate 2

77 Dixon imaging in body MRI Zhang et al., Magn. Reson. Med., phasor candidates for field inhomogeneity at each voxel candidate 1 optimize some pariwise cost function candidate 2 recovery maximize l(xi, x j ) subject to x i {1, 2}

78 Dixon imaging in body MRI Representative cases of water signal recovery Zhang et al., Magn. Reson. Med., 2016 commercial software projected power method

79 Things I have not talked about General noise model with large m 2. Incomplete data

80 Concluding remarks A new approach to discrete assignment problems Finds MLE in suitable regimes Computationally efficient Paper: The projected power method: an efficient algorithm for joint alignment from pairwise differences, Y. Chen and E. Candès, 2016

Information Recovery from Pairwise Measurements

Information Recovery from Pairwise Measurements A Shannon-Theoretic Approach Yuxin Chen, Changho Suh, Andrea Goldsmith Stanford University KAIST Page 1 Recovering data from correlation measurements A large