Iterative Projection Methods for noisy and corrupted systems of linear equations Deanna Needell February 1, 2018 Mathematics UCLA joint with Jamie Haddock and Jesús De Loera https://arxiv.org/abs/1605.01418 and forthcoming articles 1
Setup We are interested in solving highly overdetermined systems of equations, Ax = b, where A R m n, b R m and m >> n. Rows are denoted a T i. 2
Projection Methods If {x R n : Ax = b} is nonempty, these methods construct an approximation to an element: 1. Randomized Kaczmarz Method 2. Motzkin s Method(s) 3. Sampling Kaczmarz-Motzkin Methods (SKM) 3
Randomized Kaczmarz Method Given x 0 R n : 1. Choose i k [m] with probability a i k 2. 2. Define x k := x k 1 + b i k a T i k x k 1 a ik 2 a ik. 3. Repeat. A 2 F 4
Kaczmarz Method x 0 5
Kaczmarz Method x 0 x1 5
Kaczmarz Method x 0 x1 x 2 5
Kaczmarz Method x 0 x1 x 2 x 3 5
Convergence Rate Theorem (Strohmer - Vershynin 2009) Let x be the solution to the consistent system of linear equations Ax = b. Then the Random Kaczmarz method converges to x linearly in expectation: E x k x 2 2 ( 1 1 A 2 F A 1 2 2 ) k x 0 x 2 2. 6
Motzkin s Relaxation Method(s) Given x 0 R n : 1. If x k is feasible, stop. 2. Choose i k [m] as i k := argmax i [m] 3. Define x k := x k 1 + b i k a T i k x k 1 a ik 2 a ik. 4. Repeat. a T i x k 1 b i. 7
Motzkin s Method x 0 8
Motzkin s Method x 0 x 1 8
Motzkin s Method x 0 x 1 x 2 8
Convergence Rate Theorem (Agmon 1954) For a consistent, normalized system, a i = 1 for all i = 1,..., m, Motzkin s method converges linearly to the solution x: x k x 2 ( ) k 1 1 m A 1 2 x 0 x 2. 9
Our Hybrid Method (SKM) Given x 0 R n : 1. Choose τ k [m] to be a sample of size β constraints chosen uniformly at random from among the rows of A. 2. From among these β rows, choose i k := argmax i τ k a T i x k 1 b i. 3. Define x k := x k 1 + b i k a T i k x k 1 a ik 2 a ik. 4. Repeat. 10
SKM x 0 11
SKM x 0 x 1 11
SKM x 0 x 1 x 2 11
SKM Method Convergence Rate Theorem (De Loera - Haddock - N. 2017) For a consistent, normalized system the SKM method with samples of size β converges to the solution x at least linearly in expectation: If s k 1 is the number of constraints satisfied by x k 1 and V k 1 = max{m s k 1, m β + 1} then E x k x 2 ( 1 1 V k 1 A 1 2 ( ) k 1 1 m A 1 2 x 0 x 2. ) x 0 x 2 12
Convergence 13
Convergence Rates ) k RK: E x k x 2 2 (1 1 x A 2 F A 1 2 0 x 2 2. 2 14
Convergence Rates ) k RK: E x k x 2 2 (1 1 x A 2 F A 1 2 0 x 2 2. 2 ( ) k MM: x k x 2 1 1 m A 1 x 2 0 x 2. 14
Convergence Rates ) k RK: E x k x 2 2 (1 1 x A 2 F A 1 2 0 x 2 2. 2 ( ) k MM: x k x 2 1 1 m A 1 x 2 0 x 2. SKM: E x k x 2 ( 1 1 m A 1 2 ) k x 0 x 2. 14
Convergence Rates ) k RK: E x k x 2 2 (1 1 x A 2 F A 1 2 0 x 2 2. 2 ( ) k MM: x k x 2 1 1 m A 1 x 2 0 x 2. SKM: E x k x 2 ( 1 1 m A 1 2 ) k x 0 x 2. Why are these all the same? 14
An Accelerated Convergence Rate Theorem (Haddock - N. 2018+) Let x denote the solution of the consistent, normalized system Ax = b. Motzkin s method exhibits the (possibly highly accelerated) convergence rate: x T x 2 T 1 k=0 ( 1 ) 1 4γ k A 1 2 x 0 x 2 Here γ k bounds the dynamic range of the kth residual, γ k := Ax k Ax 2. improvement over previous result when 4γ k < m Ax k Ax 2 15
γ k : Gaussian systems 16
γ k : Gaussian systems γ k m log m 16
Gaussian Convergence 17
Is this the right problem? x LS noisy 18
Is this the right problem? x LS noisy corrupted x x LS 18
Noisy Convergence Results Theorem (N. 2010) Let A have full column rank, denote the desired solution to the system Ax = b by x, and define the error term e = Ax b. Then RK iterates satisfy E x k x 2 ( ) k 1 1 A 2 x F A 1 2 0 x 2 + A 2 F A 1 2 e 2 19
Noisy Convergence Results Theorem (N. 2010) Let A have full column rank, denote the desired solution to the system Ax = b by x, and define the error term e = Ax b. Then RK iterates satisfy E x k x 2 ( ) k 1 1 A 2 x F A 1 2 0 x 2 + A 2 F A 1 2 e 2 Theorem (Haddock - N. 2018+) Let x denote the desired solution of the system Ax = b and define the error term e = b Ax. If Motzkin s method is run with stopping criterion Ax k b 4 e, then the iterates satisfy x T x 2 T 1 k=0 ( 1 ) 1 4γ k A 1 2 x 0 x 2 + 2m A 1 2 e 2 19
Noisy Convergence 20
What about corruption? x M 1 x M 3 x RK 3 x M 2 x 0 x RK 1 x RK 2 21
Problem Problem: Ax = b + e (Corrupted) Error (e): sparse, arbitrarily large entries Solution (x ): x {x : Ax = b} 22
Problem Problem: Ax = b + e (Corrupted) Error (e): sparse, arbitrarily large entries Solution (x ): x {x : Ax = b} Applications: logic programming, error correction in telecommunications 22
Problem Problem: Ax = b + e (Corrupted) Error (e): sparse, arbitrarily large entries Solution (x ): x {x : Ax = b} Applications: logic programming, error correction in telecommunications Problem: Ax = b + e (Noisy) Error (e): small, evenly distributed entries Solution (x LS ): x LS argmin Ax b e 2 22
Why not least-squares? x x LS 23
MAX-FS MAX-FS: Given Ax = b, determine the largest feasible subsystem. 24
MAX-FS MAX-FS: Given Ax = b, determine the largest feasible subsystem. MAX-FS is NP-hard even when restricted to homogenous systems with coefficients in { 1, 0, 1} (Amaldi - Kann 1995) 24
MAX-FS MAX-FS: Given Ax = b, determine the largest feasible subsystem. MAX-FS is NP-hard even when restricted to homogenous systems with coefficients in { 1, 0, 1} (Amaldi - Kann 1995) no PTAS unless P = NP 24
Proposed Method Goal: Use RK to detect the corrupted equations with high probability. 25
Proposed Method Goal: Use RK to detect the corrupted equations with high probability. Lemma (Haddock - N. 2018+) Let ɛ = min i [m] Ax b i = e i and suppose supp(e) = s. If a i = 1 for i [m] and x x < 1 2 ɛ we have that the d s indices of largest magnitude residual entries are contained in supp(e). That is, we have D supp(e), where D = argmax D [A], D =d Ax b i. i D 25
Proposed Method Goal: Use RK to detect the corrupted equations with high probability. x k x 25
Proposed Method Goal: Use RK to detect the corrupted equations with high probability. x k x We call ɛ /2 the detection horizon. 25
Proposed Method Method 1 Windowed Kaczmarz 1: procedure WK(A, b, k, W, d) 2: S = 3: for i = 1, 2,...W do 4: x i k = kth iterate produced by RK with x 0 = 0, A, b. 5: D = d indices of the largest entries of the residual, Ax i k b. 6: S = S D 7: return x, where A S C x = b S C 26
Example WK(A,b,k = 2,W = 3,d = 1): j = 1, i = 1, S = x H 1 H 2 H 3 x 1 0 H 4 H 5 H 6 H 7 27
Example WK(A,b,k = 2,W = 3,d = 1): j = 1, i = 1, S = x 1 1 x H 1 H 2 H 3 x 1 0 H 4 H 5 H 6 H 7 27
Example WK(A,b,k = 2,W = 3,d = 1): j = 2, i = 1, S = {7} x 1 2 x 1 1 x H 1 H 2 H 3 x 1 0 H 4 H 5 H 6 H 7 27
Example WK(A,b,k = 2,W = 3,d = 1): j = 1, i = 2, S = {7} x H 1 H 2 H 3 x 2 0 H 4 H 5 H 6 H 7 27
Example WK(A,b,k = 2,W = 3,d = 1): j = 1, i = 2, S = {7} x 2 1 x H 1 H 2 H 3 x 2 0 H 4 H 5 H 6 H 7 27
Example WK(A,b,k = 2,W = 3,d = 1): j = 2, i = 2, S = {7, 5} x 2 1 x H 1 H 2 H 3 H 4 H 5 H 6 H 7 x 2 2 x 2 0 27
Example WK(A,b,k = 2,W = 3,d = 1): j = 1, i = 3, S = {7, 5} x H 1 H 2 H 3 x 3 0 H 4 H 5 H 6 H 7 27
Example WK(A,b,k = 2,W = 3,d = 1): j = 1, i = 3, S = {7, 5} x H 1 H 2 H 3 H 4 H 5 H 6 H 7 x 3 1 x 3 0 27
Example WK(A,b,k = 2,W = 3,d = 1): j = 2, i = 3, S = {7, 5, 6} x 3 2 x H 1 H 2 H 3 H 4 H 5 H 6 H 7 x 3 1 x 3 0 27
Example Solve A S C x = b S C. x H 1 H 2 H 3 H 4 27
Theoretical Guarantees Theorem (Haddock - N. 2018+) Assume that a i = 1 for all i [m] and let 0 < δ < 1. Suppose d s = supp(e), W m n d and k is as given in the detection horizon lemma. Then the Windowed Kaczmarz method on A, b will detect the corrupted equations (supp(e) S) and the remaining equations given by A [m] S, b [m] S will have solution x with probability at least [ ( m s ) k ] W p W := 1 1 (1 δ). m 28
Theoretical Guarantee Values (Gaussian A R 50000 100 ) [ ( ) k ] W m s p W := 1 1 (1 δ) m 1 0.8 0.6 s = 1 s = 10 s = 50 s = 100 s = 200 s = 300 s = 400 p W 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 29
Experimental Values (Gaussian A R 50000 100 ) Success ratio 1 0.9 0.8 0.7 0.6 s = 100 s = 200 s = 500 s = 750 s = 1000 0.5 0.4 0 0.2 0.4 0.6 0.8 1 30
Experimental Values (Gaussian A R 50000 100 ) 1 0.8 s = 100 s = 200 s = 500 s = 750 s = 1000 Success ratio 0.6 0.4 0.2 0 0 500 1000 1500 2000 k 31
Experimental Values (Gaussian A R 50000 100 ) 32
Experimental Values (Gaussian A R 50000 100 ) 33
Conclusions and Future Work Motzkin s method is accelerated even in the presence of noise RK methods may be used to detect corruption identify useful bounds on γ k for other useful systems reduce dependence on artificial parameters in corruption detection bounds 34