Iterative Projection Methods

Similar documents
A Sampling Kaczmarz-Motzkin Algorithm for Linear Feasibility

SGD and Randomized projection algorithms for overdetermined linear systems

Acceleration of Randomized Kaczmarz Method

Randomized projection algorithms for overdetermined linear systems

Convergence Rates for Greedy Kaczmarz Algorithms

Greedy Signal Recovery and Uniform Uncertainty Principles

Randomized Kaczmarz Nick Freris EPFL

Stochastic Gradient Descent, Weighted Sampling, and the Randomized Kaczmarz algorithm

CoSaMP: Greedy Signal Recovery and Uniform Uncertainty Principles

An algebraic perspective on integer sparse recovery

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Solving Corrupted Quadratic Equations, Provably

Elaine T. Hale, Wotao Yin, Yin Zhang

On the exponential convergence of. the Kaczmarz algorithm

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Robust Principal Component Analysis

Introduction to Compressed Sensing

Optimization (168) Lecture 7-8-9

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

Low-Rank Factorization Models for Matrix Completion and Matrix Separation

Course Notes: Week 1

Gossip algorithms for solving Laplacian systems

Compressive Sensing and Beyond

Sparse Legendre expansions via l 1 minimization

An Introduction to Sparse Approximation

Learning Theory of Randomized Kaczmarz Algorithm

Structured signal recovery from non-linear and heavy-tailed measurements

15.083J/6.859J Integer Optimization. Lecture 10: Solving Relaxations

Simplex method(s) for solving LPs in standard form

GREEDY SIGNAL RECOVERY REVIEW

STAT 200C: High-dimensional Statistics

Two-subspace Projection Method for Coherent Overdetermined Systems

Basic Concepts in Linear Algebra

ROBUST BLIND SPIKES DECONVOLUTION. Yuejie Chi. Department of ECE and Department of BMI The Ohio State University, Columbus, Ohio 43210

A Polynomial Column-wise Rescaling von Neumann Algorithm

TIM 206 Lecture 3: The Simplex Method

1 The linear algebra of linear programs (March 15 and 22, 2015)

Strengthened Sobolev inequalities for a random subspace of functions

Sparse analysis Lecture V: From Sparse Approximation to Sparse Signal Recovery

Noisy Signal Recovery via Iterative Reweighted L1-Minimization

Combining geometry and combinatorics

Hermite normal form: Computation and applications

Linear Convergence of Stochastic Iterative Greedy Algorithms with Sparse Constraints

Analysis of Greedy Algorithms

Approximating maximum satisfiable subsystems of linear equations of bounded width

The dual simplex method with bounds

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles

PAVED WITH GOOD INTENTIONS: ANALYSIS OF A RANDOMIZED BLOCK KACZMARZ METHOD 1. INTRODUCTION

Math 273a: Optimization The Simplex method

Linear Algebraic Equations

Compressed Sensing and Sparse Recovery

PAVED WITH GOOD INTENTIONS: ANALYSIS OF A RANDOMIZED BLOCK KACZMARZ METHOD 1. INTRODUCTION

Optimization methods

Review Notes for Linear Algebra True or False Last Updated: February 22, 2010

Linear and Sublinear Linear Algebra Algorithms: Preconditioning Stochastic Gradient Algorithms with Randomized Linear Algebra

Numerical Methods. Rafał Zdunek Underdetermined problems (2h.) Applications) (FOCUSS, M-FOCUSS,

Optimization for Compressed Sensing

Topics in Compressed Sensing

arxiv: v2 [math.na] 28 Jan 2016

x 1 + x 2 2 x 1 x 2 1 x 2 2 min 3x 1 + 2x 2

Lecture 5 Least-squares

A strongly polynomial algorithm for linear systems having a binary solution

Sparse Solutions of Linear Systems of Equations and Sparse Modeling of Signals and Images!

MTH 35, SPRING 2017 NIKOS APOSTOLAKIS

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

The definition of a vector space (V, +, )

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Conjugate Gradient (CG) Method

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

Matrices: 2.1 Operations with Matrices

Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning

CSCI5654 (Linear Programming, Fall 2013) Lectures Lectures 10,11 Slide# 1

Sparse Optimization Lecture: Dual Certificate in l 1 Minimization

Bias-free Sparse Regression with Guaranteed Consistency

Sparse and TV Kaczmarz solvers and the linearized Bregman method

The Analysis Cosparse Model for Signals and Images

CSC 576: Linear System

Sparse Approximation of Signals with Highly Coherent Dictionaries

OPERATIONS RESEARCH. Linear Programming Problem

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Chapter 3, Operations Research (OR)

SPARSE signal representations have gained popularity in recent

BALANCING GAUSSIAN VECTORS. 1. Introduction

Convex optimization. Javier Peña Carnegie Mellon University. Universidad de los Andes Bogotá, Colombia September 2014

MATH 2050 Assignment 6 Fall 2018 Due: Thursday, November 1. x + y + 2z = 2 x + y + z = c 4x + 2z = 2

Robust Sparse Recovery via Non-Convex Optimization

CoSaMP. Iterative signal recovery from incomplete and inaccurate samples. Joel A. Tropp

15-780: LinearProgramming

5. Subgradient method

Exponential decay of reconstruction error from binary measurements of sparse signals

Phase Transitions for Greedy Sparse Approximation Algorithms

Decision Procedures An Algorithmic Point of View

Outline Introduction: Problem Description Diculties Algebraic Structure: Algebraic Varieties Rank Decient Toeplitz Matrices Constructing Lower Rank St

Convergence Properties of the Randomized Extended Gauss-Seidel and Kaczmarz Methods

6. Approximation and fitting

Linear Independence x

System of Linear Equations

Distributed MAP probability estimation of dynamic systems with wireless sensor networks

Sparse analysis Lecture VII: Combining geometry and combinatorics, sparse matrices for sparse signal recovery

Recovering overcomplete sparse representations from structured sensing

Transcription:

Iterative Projection Methods for noisy and corrupted systems of linear equations Deanna Needell February 1, 2018 Mathematics UCLA joint with Jamie Haddock and Jesús De Loera https://arxiv.org/abs/1605.01418 and forthcoming articles 1

Setup We are interested in solving highly overdetermined systems of equations, Ax = b, where A R m n, b R m and m >> n. Rows are denoted a T i. 2

Projection Methods If {x R n : Ax = b} is nonempty, these methods construct an approximation to an element: 1. Randomized Kaczmarz Method 2. Motzkin s Method(s) 3. Sampling Kaczmarz-Motzkin Methods (SKM) 3

Randomized Kaczmarz Method Given x 0 R n : 1. Choose i k [m] with probability a i k 2. 2. Define x k := x k 1 + b i k a T i k x k 1 a ik 2 a ik. 3. Repeat. A 2 F 4

Kaczmarz Method x 0 5

Kaczmarz Method x 0 x1 5

Kaczmarz Method x 0 x1 x 2 5

Kaczmarz Method x 0 x1 x 2 x 3 5

Convergence Rate Theorem (Strohmer - Vershynin 2009) Let x be the solution to the consistent system of linear equations Ax = b. Then the Random Kaczmarz method converges to x linearly in expectation: E x k x 2 2 ( 1 1 A 2 F A 1 2 2 ) k x 0 x 2 2. 6

Motzkin s Relaxation Method(s) Given x 0 R n : 1. If x k is feasible, stop. 2. Choose i k [m] as i k := argmax i [m] 3. Define x k := x k 1 + b i k a T i k x k 1 a ik 2 a ik. 4. Repeat. a T i x k 1 b i. 7

Motzkin s Method x 0 8

Motzkin s Method x 0 x 1 8

Motzkin s Method x 0 x 1 x 2 8

Convergence Rate Theorem (Agmon 1954) For a consistent, normalized system, a i = 1 for all i = 1,..., m, Motzkin s method converges linearly to the solution x: x k x 2 ( ) k 1 1 m A 1 2 x 0 x 2. 9

Our Hybrid Method (SKM) Given x 0 R n : 1. Choose τ k [m] to be a sample of size β constraints chosen uniformly at random from among the rows of A. 2. From among these β rows, choose i k := argmax i τ k a T i x k 1 b i. 3. Define x k := x k 1 + b i k a T i k x k 1 a ik 2 a ik. 4. Repeat. 10

SKM x 0 11

SKM x 0 x 1 11

SKM x 0 x 1 x 2 11

SKM Method Convergence Rate Theorem (De Loera - Haddock - N. 2017) For a consistent, normalized system the SKM method with samples of size β converges to the solution x at least linearly in expectation: If s k 1 is the number of constraints satisfied by x k 1 and V k 1 = max{m s k 1, m β + 1} then E x k x 2 ( 1 1 V k 1 A 1 2 ( ) k 1 1 m A 1 2 x 0 x 2. ) x 0 x 2 12

Convergence 13

Convergence Rates ) k RK: E x k x 2 2 (1 1 x A 2 F A 1 2 0 x 2 2. 2 14

Convergence Rates ) k RK: E x k x 2 2 (1 1 x A 2 F A 1 2 0 x 2 2. 2 ( ) k MM: x k x 2 1 1 m A 1 x 2 0 x 2. 14

Convergence Rates ) k RK: E x k x 2 2 (1 1 x A 2 F A 1 2 0 x 2 2. 2 ( ) k MM: x k x 2 1 1 m A 1 x 2 0 x 2. SKM: E x k x 2 ( 1 1 m A 1 2 ) k x 0 x 2. 14

Convergence Rates ) k RK: E x k x 2 2 (1 1 x A 2 F A 1 2 0 x 2 2. 2 ( ) k MM: x k x 2 1 1 m A 1 x 2 0 x 2. SKM: E x k x 2 ( 1 1 m A 1 2 ) k x 0 x 2. Why are these all the same? 14

An Accelerated Convergence Rate Theorem (Haddock - N. 2018+) Let x denote the solution of the consistent, normalized system Ax = b. Motzkin s method exhibits the (possibly highly accelerated) convergence rate: x T x 2 T 1 k=0 ( 1 ) 1 4γ k A 1 2 x 0 x 2 Here γ k bounds the dynamic range of the kth residual, γ k := Ax k Ax 2. improvement over previous result when 4γ k < m Ax k Ax 2 15

γ k : Gaussian systems 16

γ k : Gaussian systems γ k m log m 16

Gaussian Convergence 17

Is this the right problem? x LS noisy 18

Is this the right problem? x LS noisy corrupted x x LS 18

Noisy Convergence Results Theorem (N. 2010) Let A have full column rank, denote the desired solution to the system Ax = b by x, and define the error term e = Ax b. Then RK iterates satisfy E x k x 2 ( ) k 1 1 A 2 x F A 1 2 0 x 2 + A 2 F A 1 2 e 2 19

Noisy Convergence Results Theorem (N. 2010) Let A have full column rank, denote the desired solution to the system Ax = b by x, and define the error term e = Ax b. Then RK iterates satisfy E x k x 2 ( ) k 1 1 A 2 x F A 1 2 0 x 2 + A 2 F A 1 2 e 2 Theorem (Haddock - N. 2018+) Let x denote the desired solution of the system Ax = b and define the error term e = b Ax. If Motzkin s method is run with stopping criterion Ax k b 4 e, then the iterates satisfy x T x 2 T 1 k=0 ( 1 ) 1 4γ k A 1 2 x 0 x 2 + 2m A 1 2 e 2 19

Noisy Convergence 20

What about corruption? x M 1 x M 3 x RK 3 x M 2 x 0 x RK 1 x RK 2 21

Problem Problem: Ax = b + e (Corrupted) Error (e): sparse, arbitrarily large entries Solution (x ): x {x : Ax = b} 22

Problem Problem: Ax = b + e (Corrupted) Error (e): sparse, arbitrarily large entries Solution (x ): x {x : Ax = b} Applications: logic programming, error correction in telecommunications 22

Problem Problem: Ax = b + e (Corrupted) Error (e): sparse, arbitrarily large entries Solution (x ): x {x : Ax = b} Applications: logic programming, error correction in telecommunications Problem: Ax = b + e (Noisy) Error (e): small, evenly distributed entries Solution (x LS ): x LS argmin Ax b e 2 22

Why not least-squares? x x LS 23

MAX-FS MAX-FS: Given Ax = b, determine the largest feasible subsystem. 24

MAX-FS MAX-FS: Given Ax = b, determine the largest feasible subsystem. MAX-FS is NP-hard even when restricted to homogenous systems with coefficients in { 1, 0, 1} (Amaldi - Kann 1995) 24

MAX-FS MAX-FS: Given Ax = b, determine the largest feasible subsystem. MAX-FS is NP-hard even when restricted to homogenous systems with coefficients in { 1, 0, 1} (Amaldi - Kann 1995) no PTAS unless P = NP 24

Proposed Method Goal: Use RK to detect the corrupted equations with high probability. 25

Proposed Method Goal: Use RK to detect the corrupted equations with high probability. Lemma (Haddock - N. 2018+) Let ɛ = min i [m] Ax b i = e i and suppose supp(e) = s. If a i = 1 for i [m] and x x < 1 2 ɛ we have that the d s indices of largest magnitude residual entries are contained in supp(e). That is, we have D supp(e), where D = argmax D [A], D =d Ax b i. i D 25

Proposed Method Goal: Use RK to detect the corrupted equations with high probability. x k x 25

Proposed Method Goal: Use RK to detect the corrupted equations with high probability. x k x We call ɛ /2 the detection horizon. 25

Proposed Method Method 1 Windowed Kaczmarz 1: procedure WK(A, b, k, W, d) 2: S = 3: for i = 1, 2,...W do 4: x i k = kth iterate produced by RK with x 0 = 0, A, b. 5: D = d indices of the largest entries of the residual, Ax i k b. 6: S = S D 7: return x, where A S C x = b S C 26

Example WK(A,b,k = 2,W = 3,d = 1): j = 1, i = 1, S = x H 1 H 2 H 3 x 1 0 H 4 H 5 H 6 H 7 27

Example WK(A,b,k = 2,W = 3,d = 1): j = 1, i = 1, S = x 1 1 x H 1 H 2 H 3 x 1 0 H 4 H 5 H 6 H 7 27

Example WK(A,b,k = 2,W = 3,d = 1): j = 2, i = 1, S = {7} x 1 2 x 1 1 x H 1 H 2 H 3 x 1 0 H 4 H 5 H 6 H 7 27

Example WK(A,b,k = 2,W = 3,d = 1): j = 1, i = 2, S = {7} x H 1 H 2 H 3 x 2 0 H 4 H 5 H 6 H 7 27

Example WK(A,b,k = 2,W = 3,d = 1): j = 1, i = 2, S = {7} x 2 1 x H 1 H 2 H 3 x 2 0 H 4 H 5 H 6 H 7 27

Example WK(A,b,k = 2,W = 3,d = 1): j = 2, i = 2, S = {7, 5} x 2 1 x H 1 H 2 H 3 H 4 H 5 H 6 H 7 x 2 2 x 2 0 27

Example WK(A,b,k = 2,W = 3,d = 1): j = 1, i = 3, S = {7, 5} x H 1 H 2 H 3 x 3 0 H 4 H 5 H 6 H 7 27

Example WK(A,b,k = 2,W = 3,d = 1): j = 1, i = 3, S = {7, 5} x H 1 H 2 H 3 H 4 H 5 H 6 H 7 x 3 1 x 3 0 27

Example WK(A,b,k = 2,W = 3,d = 1): j = 2, i = 3, S = {7, 5, 6} x 3 2 x H 1 H 2 H 3 H 4 H 5 H 6 H 7 x 3 1 x 3 0 27

Example Solve A S C x = b S C. x H 1 H 2 H 3 H 4 27

Theoretical Guarantees Theorem (Haddock - N. 2018+) Assume that a i = 1 for all i [m] and let 0 < δ < 1. Suppose d s = supp(e), W m n d and k is as given in the detection horizon lemma. Then the Windowed Kaczmarz method on A, b will detect the corrupted equations (supp(e) S) and the remaining equations given by A [m] S, b [m] S will have solution x with probability at least [ ( m s ) k ] W p W := 1 1 (1 δ). m 28

Theoretical Guarantee Values (Gaussian A R 50000 100 ) [ ( ) k ] W m s p W := 1 1 (1 δ) m 1 0.8 0.6 s = 1 s = 10 s = 50 s = 100 s = 200 s = 300 s = 400 p W 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 29

Experimental Values (Gaussian A R 50000 100 ) Success ratio 1 0.9 0.8 0.7 0.6 s = 100 s = 200 s = 500 s = 750 s = 1000 0.5 0.4 0 0.2 0.4 0.6 0.8 1 30

Experimental Values (Gaussian A R 50000 100 ) 1 0.8 s = 100 s = 200 s = 500 s = 750 s = 1000 Success ratio 0.6 0.4 0.2 0 0 500 1000 1500 2000 k 31

Experimental Values (Gaussian A R 50000 100 ) 32

Experimental Values (Gaussian A R 50000 100 ) 33

Conclusions and Future Work Motzkin s method is accelerated even in the presence of noise RK methods may be used to detect corruption identify useful bounds on γ k for other useful systems reduce dependence on artificial parameters in corruption detection bounds 34