Iterative Projection Methods

Size: px

Start display at page:

Download "Iterative Projection Methods"

Antony Willis
5 years ago
Views:

1 Iterative Projection Methods for noisy and corrupted systems of linear equations Deanna Needell February 1, 2018 Mathematics UCLA joint with Jamie Haddock and Jesús De Loera and forthcoming articles 1

2 Setup We are interested in solving highly overdetermined systems of equations, Ax = b, where A R m n, b R m and m >> n. Rows are denoted a T i. 2

3 Projection Methods If {x R n : Ax = b} is nonempty, these methods construct an approximation to an element: 1. Randomized Kaczmarz Method 2. Motzkin s Method(s) 3. Sampling Kaczmarz-Motzkin Methods (SKM) 3

4 Randomized Kaczmarz Method Given x 0 R n : 1. Choose i k [m] with probability a i k Define x k := x k 1 + b i k a T i k x k 1 a ik 2 a ik. 3. Repeat. A 2 F 4

5 Kaczmarz Method x 0 5

6 Kaczmarz Method x 0 x1 5

7 Kaczmarz Method x 0 x1 x 2 5

8 Kaczmarz Method x 0 x1 x 2 x 3 5

9 Convergence Rate Theorem (Strohmer - Vershynin 2009) Let x be the solution to the consistent system of linear equations Ax = b. Then the Random Kaczmarz method converges to x linearly in expectation: E x k x 2 2 ( 1 1 A 2 F A ) k x 0 x

10 Motzkin s Relaxation Method(s) Given x 0 R n : 1. If x k is feasible, stop. 2. Choose i k [m] as i k := argmax i [m] 3. Define x k := x k 1 + b i k a T i k x k 1 a ik 2 a ik. 4. Repeat. a T i x k 1 b i. 7

11 Motzkin s Method x 0 8

12 Motzkin s Method x 0 x 1 8

13 Motzkin s Method x 0 x 1 x 2 8

14 Convergence Rate Theorem (Agmon 1954) For a consistent, normalized system, a i = 1 for all i = 1,..., m, Motzkin s method converges linearly to the solution x: x k x 2 ( ) k 1 1 m A 1 2 x 0 x 2. 9

15 Our Hybrid Method (SKM) Given x 0 R n : 1. Choose τ k [m] to be a sample of size β constraints chosen uniformly at random from among the rows of A. 2. From among these β rows, choose i k := argmax i τ k a T i x k 1 b i. 3. Define x k := x k 1 + b i k a T i k x k 1 a ik 2 a ik. 4. Repeat. 10

16 SKM x 0 11

17 SKM x 0 x 1 11

18 SKM x 0 x 1 x 2 11

19 SKM Method Convergence Rate Theorem (De Loera - Haddock - N. 2017) For a consistent, normalized system the SKM method with samples of size β converges to the solution x at least linearly in expectation: If s k 1 is the number of constraints satisfied by x k 1 and V k 1 = max{m s k 1, m β + 1} then E x k x 2 ( 1 1 V k 1 A 1 2 ( ) k 1 1 m A 1 2 x 0 x 2. ) x 0 x 2 12

20 Convergence 13

21 Convergence Rates ) k RK: E x k x 2 2 (1 1 x A 2 F A x

22 Convergence Rates ) k RK: E x k x 2 2 (1 1 x A 2 F A x ( ) k MM: x k x m A 1 x 2 0 x 2. 14

23 Convergence Rates ) k RK: E x k x 2 2 (1 1 x A 2 F A x ( ) k MM: x k x m A 1 x 2 0 x 2. SKM: E x k x 2 ( 1 1 m A 1 2 ) k x 0 x 2. 14

24 Convergence Rates ) k RK: E x k x 2 2 (1 1 x A 2 F A x ( ) k MM: x k x m A 1 x 2 0 x 2. SKM: E x k x 2 ( 1 1 m A 1 2 ) k x 0 x 2. Why are these all the same? 14

25 An Accelerated Convergence Rate Theorem (Haddock - N ) Let x denote the solution of the consistent, normalized system Ax = b. Motzkin s method exhibits the (possibly highly accelerated) convergence rate: x T x 2 T 1 k=0 ( 1 ) 1 4γ k A 1 2 x 0 x 2 Here γ k bounds the dynamic range of the kth residual, γ k := Ax k Ax 2. improvement over previous result when 4γ k < m Ax k Ax 2 15

26 γ k : Gaussian systems 16

27 γ k : Gaussian systems γ k m log m 16

28 Gaussian Convergence 17

29 Is this the right problem? x LS noisy 18

30 Is this the right problem? x LS noisy corrupted x x LS 18

31 Noisy Convergence Results Theorem (N. 2010) Let A have full column rank, denote the desired solution to the system Ax = b by x, and define the error term e = Ax b. Then RK iterates satisfy E x k x 2 ( ) k 1 1 A 2 x F A x 2 + A 2 F A 1 2 e 2 19

32 Noisy Convergence Results Theorem (N. 2010) Let A have full column rank, denote the desired solution to the system Ax = b by x, and define the error term e = Ax b. Then RK iterates satisfy E x k x 2 ( ) k 1 1 A 2 x F A x 2 + A 2 F A 1 2 e 2 Theorem (Haddock - N ) Let x denote the desired solution of the system Ax = b and define the error term e = b Ax. If Motzkin s method is run with stopping criterion Ax k b 4 e, then the iterates satisfy x T x 2 T 1 k=0 ( 1 ) 1 4γ k A 1 2 x 0 x 2 + 2m A 1 2 e 2 19

33 Noisy Convergence 20

34 What about corruption? x M 1 x M 3 x RK 3 x M 2 x 0 x RK 1 x RK 2 21

35 Problem Problem: Ax = b + e (Corrupted) Error (e): sparse, arbitrarily large entries Solution (x ): x {x : Ax = b} 22

36 Problem Problem: Ax = b + e (Corrupted) Error (e): sparse, arbitrarily large entries Solution (x ): x {x : Ax = b} Applications: logic programming, error correction in telecommunications 22

37 Problem Problem: Ax = b + e (Corrupted) Error (e): sparse, arbitrarily large entries Solution (x ): x {x : Ax = b} Applications: logic programming, error correction in telecommunications Problem: Ax = b + e (Noisy) Error (e): small, evenly distributed entries Solution (x LS ): x LS argmin Ax b e 2 22

38 Why not least-squares? x x LS 23

39 MAX-FS MAX-FS: Given Ax = b, determine the largest feasible subsystem. 24

40 MAX-FS MAX-FS: Given Ax = b, determine the largest feasible subsystem. MAX-FS is NP-hard even when restricted to homogenous systems with coefficients in { 1, 0, 1} (Amaldi - Kann 1995) 24

41 MAX-FS MAX-FS: Given Ax = b, determine the largest feasible subsystem. MAX-FS is NP-hard even when restricted to homogenous systems with coefficients in { 1, 0, 1} (Amaldi - Kann 1995) no PTAS unless P = NP 24

42 Proposed Method Goal: Use RK to detect the corrupted equations with high probability. 25

43 Proposed Method Goal: Use RK to detect the corrupted equations with high probability. Lemma (Haddock - N ) Let ɛ = min i [m] Ax b i = e i and suppose supp(e) = s. If a i = 1 for i [m] and x x < 1 2 ɛ we have that the d s indices of largest magnitude residual entries are contained in supp(e). That is, we have D supp(e), where D = argmax D [A], D =d Ax b i. i D 25

44 Proposed Method Goal: Use RK to detect the corrupted equations with high probability. x k x 25

45 Proposed Method Goal: Use RK to detect the corrupted equations with high probability. x k x We call ɛ /2 the detection horizon. 25

46 Proposed Method Method 1 Windowed Kaczmarz 1: procedure WK(A, b, k, W, d) 2: S = 3: for i = 1, 2,...W do 4: x i k = kth iterate produced by RK with x 0 = 0, A, b. 5: D = d indices of the largest entries of the residual, Ax i k b. 6: S = S D 7: return x, where A S C x = b S C 26

47 Example WK(A,b,k = 2,W = 3,d = 1): j = 1, i = 1, S = x H 1 H 2 H 3 x 1 0 H 4 H 5 H 6 H 7 27

48 Example WK(A,b,k = 2,W = 3,d = 1): j = 1, i = 1, S = x 1 1 x H 1 H 2 H 3 x 1 0 H 4 H 5 H 6 H 7 27

49 Example WK(A,b,k = 2,W = 3,d = 1): j = 2, i = 1, S = {7} x 1 2 x 1 1 x H 1 H 2 H 3 x 1 0 H 4 H 5 H 6 H 7 27

50 Example WK(A,b,k = 2,W = 3,d = 1): j = 1, i = 2, S = {7} x H 1 H 2 H 3 x 2 0 H 4 H 5 H 6 H 7 27

51 Example WK(A,b,k = 2,W = 3,d = 1): j = 1, i = 2, S = {7} x 2 1 x H 1 H 2 H 3 x 2 0 H 4 H 5 H 6 H 7 27

52 Example WK(A,b,k = 2,W = 3,d = 1): j = 2, i = 2, S = {7, 5} x 2 1 x H 1 H 2 H 3 H 4 H 5 H 6 H 7 x 2 2 x

53 Example WK(A,b,k = 2,W = 3,d = 1): j = 1, i = 3, S = {7, 5} x H 1 H 2 H 3 x 3 0 H 4 H 5 H 6 H 7 27

54 Example WK(A,b,k = 2,W = 3,d = 1): j = 1, i = 3, S = {7, 5} x H 1 H 2 H 3 H 4 H 5 H 6 H 7 x 3 1 x

55 Example WK(A,b,k = 2,W = 3,d = 1): j = 2, i = 3, S = {7, 5, 6} x 3 2 x H 1 H 2 H 3 H 4 H 5 H 6 H 7 x 3 1 x

56 Example Solve A S C x = b S C. x H 1 H 2 H 3 H 4 27

57 Theoretical Guarantees Theorem (Haddock - N ) Assume that a i = 1 for all i [m] and let 0 < δ < 1. Suppose d s = supp(e), W m n d and k is as given in the detection horizon lemma. Then the Windowed Kaczmarz method on A, b will detect the corrupted equations (supp(e) S) and the remaining equations given by A [m] S, b [m] S will have solution x with probability at least [ ( m s ) k ] W p W := 1 1 (1 δ). m 28

58 Theoretical Guarantee Values (Gaussian A R ) [ ( ) k ] W m s p W := 1 1 (1 δ) m s = 1 s = 10 s = 50 s = 100 s = 200 s = 300 s = 400 p W

59 Experimental Values (Gaussian A R ) Success ratio s = 100 s = 200 s = 500 s = 750 s =

60 Experimental Values (Gaussian A R ) s = 100 s = 200 s = 500 s = 750 s = 1000 Success ratio k 31

61 Experimental Values (Gaussian A R ) 32

62 Experimental Values (Gaussian A R ) 33

63 Conclusions and Future Work Motzkin s method is accelerated even in the presence of noise RK methods may be used to detect corruption identify useful bounds on γ k for other useful systems reduce dependence on artificial parameters in corruption detection bounds 34

A Sampling Kaczmarz-Motzkin Algorithm for Linear Feasibility

A Sampling Kaczmarz-Motzkin Algorithm for Linear Feasibility Jamie Haddock Graduate Group in Applied Mathematics, Department of Mathematics, University of California, Davis Copper Mountain Conference on