Lecture 16: Compressed Sensing

Similar documents
Lecture 15: Random Projections

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method

Introduction to Compressed Sensing

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Methods for sparse analysis of high-dimensional data, II

Signal Recovery from Permuted Observations

Sparse analysis Lecture V: From Sparse Approximation to Sparse Signal Recovery

Compressed Sensing Using Bernoulli Measurement Matrices

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees

Optimization for Compressed Sensing


Optimisation Combinatoire et Convexe.

Introduction How it works Theory behind Compressed Sensing. Compressed Sensing. Huichao Xue. CS3750 Fall 2011

An Introduction to Sparse Approximation

Lecture 22: More On Compressed Sensing

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco

Sparse signal recovery using sparse random projections

Compressed Sensing: Extending CLEAN and NNLS

Robust Principal Component Analysis

Methods for sparse analysis of high-dimensional data, II

Sensing systems limited by constraints: physical size, time, cost, energy

Compressed Sensing: Lecture I. Ronald DeVore

CS 229r: Algorithms for Big Data Fall Lecture 19 Nov 5

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Algorithms for sparse analysis Lecture I: Background on sparse approximation

AN INTRODUCTION TO COMPRESSIVE SENSING

Compressed Sensing and Sparse Recovery

17 Random Projections and Orthogonal Matching Pursuit

Compressive Sensing with Random Matrices

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

Statistical Machine Learning

Model-Based Compressive Sensing for Signal Ensembles. Marco F. Duarte Volkan Cevher Richard G. Baraniuk

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016

Oslo Class 6 Sparsity based regularization

Lecture 2: Linear Algebra Review

Z Algorithmic Superpower Randomization October 15th, Lecture 12

Compressed Sensing - Near Optimal Recovery of Signals from Highly Incomplete Measurements

Reconstruction from Anisotropic Random Measurements

CS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA)

Supremum of simple stochastic processes

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Constructing Explicit RIP Matrices and the Square-Root Bottleneck

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Machine learning for pervasive systems Classification in high-dimensional spaces

ECS289: Scalable Machine Learning

Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota

Compressive Sensing and Beyond

Randomized Algorithms

approximation algorithms I

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Lecture 4: Linear predictors and the Perceptron

Strengthened Sobolev inequalities for a random subspace of functions

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

SPARSE signal representations have gained popularity in recent

Very Sparse Random Projections

PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION. A Thesis MELTEM APAYDIN

Lecture Notes 9: Constrained Optimization

The Representor Theorem, Kernels, and Hilbert Spaces

Compressed Sensing and Neural Networks

Constrained optimization

6 Compressed Sensing and Sparse Recovery

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

ECS289: Scalable Machine Learning

1 Regression with High Dimensional Data

Lecture 5 : Sparse Models

An iterative hard thresholding estimator for low rank matrix recovery

Lecture 3: Compressive Classification

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017

Permutation-invariant regularization of large covariance matrices. Liza Levina

Sparse Proteomics Analysis (SPA)

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble

Elaine T. Hale, Wotao Yin, Yin Zhang

Sparse analysis Lecture VII: Combining geometry and combinatorics, sparse matrices for sparse signal recovery

CS 229r: Algorithms for Big Data Fall Lecture 17 10/28

Machine Learning for Signal Processing Sparse and Overcomplete Representations

COMPRESSIVE SAMPLING USING EM ALGORITHM. Technical Report No: ASU/2014/4

Sparse Solutions of an Undetermined Linear System

Rui ZHANG Song LI. Department of Mathematics, Zhejiang University, Hangzhou , P. R. China

COMS E F15. Lecture 22: Linearity Testing Sparse Fourier Transform

ACCORDING to Shannon s sampling theorem, an analog

Compressed Sensing and Linear Codes over Real Numbers

Bayesian Methods for Sparse Signal Recovery

Lecture 18 Nov 3rd, 2015

Generalized Orthogonal Matching Pursuit- A Review and Some

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

A new method on deterministic construction of the measurement matrix in compressed sensing

Lasso, Ridge, and Elastic Net

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

CSC 576: Variants of Sparse Learning

EE 381V: Large Scale Optimization Fall Lecture 24 April 11

Introduction to Sparsity. Xudong Cao, Jake Dreamtree & Jerry 04/05/2012

Compressive sensing of low-complexity signals: theory, algorithms and extensions

Combining geometry and combinatorics

Phase Transition Phenomenon in Sparse Approximation

Uniform Uncertainty Principle and signal recovery via Regularized Orthogonal Matching Pursuit

Transcription:

Lecture 16: Compressed Sensing Introduction to Learning and Analysis of Big Data Kontorovich and Sabato (BGU) Lecture 16 1 / 12

Review of Johnson-Lindenstrauss Unsupervised learning technique key insight: for any m points {xi } in R d, ε (0, 3) there is a W R k d, k = O(log(m)/ε 2 ) s.t. 1 ε Wx i 2 x i 1 + ε 2 i m ( ) further, if we draw W ij N(0, 1/k) independently then (*) holds w/high prob for any m-point set! J-L is a data-oblivious technique (unlike PCA...) Applications: NN (1 + O(ε))-approximate nearest neighbors at query point x, if true NN has distance approx NN has distance (1 + O(ε)) speedup from O(md) to O(mk) Applications: SVM if data is separable with margin γ, random projection to dim O(1/γ 2 ) results w/high prob in data separable with margin γ/2 effective dimension is O(1/γ 2 ) independent of d Kontorovich and Sabato (BGU) Lecture 16 2 / 12

This lecture: sparsity For x R d, define the pseudo-norm: x 0 := {i d : x i 0} x 0 is the number of non-zero components of x. If x 0 s, can compress using s (index,value) pairs. Compression is lossless: exact reconstruction is possible. Consider cases where measuring/transmitting/storing x is costly. Field sensors MRI Wireless ultrasound We measure the whole x R d but only need 2s numbers! Can we use few measurements to find out x? We don t know the location of the non-zero coordinates! Kontorovich and Sabato (BGU) Lecture 16 3 / 12

Compressed Sensing Compressed sensing: recovering the sparse signal x while making only O( x 0 log(d)) measurements. Measurement: linear map u : x R. Equivalently, measurement is u, x, for u R d. Linear measurement is easy in many physical devices. In return, reconstruction will be more expensive. Useful when weak sensing devices transmit to powerful decoders or when measurements are expensive. Kontorovich and Sabato (BGU) Lecture 16 4 / 12

Restricted Isometry Property (RIP) We will define a property of matrices called RIP Any matrix W R k d compresses x to k-dimensional Wx If W has RIP then x can be recovered from Wx The recovery can be done efficiently We will show: A random k d matrix, where entries are i.i.d. Gaussian, and k O( x 0 log(d)), has RIP w. high probability Kontorovich and Sabato (BGU) Lecture 16 5 / 12

Restricted Isometry Property (RIP) definition Definition: Matrix W R k d is (ε, s)-rip if for all 0 x R d and x 0 s, 1 ε Wx 2 x 2 1 + ε Looks familiar? Like J-L, W approximately preserves the norm of x. Unlike J-L: RIP: holds for all s-sparse vectors J-L: holds for a finite set of exp(kε 2 ) vectors J-L: does not require s-sparsity. Kontorovich and Sabato (BGU) Lecture 16 6 / 12

RIP and lossless compression Theorem Matrix W R k d is (ε, s)-rip if for all 0 x R d and x 0 s, 1 ε Wx 2 x 2 1 + ε. Let ε (0, 1), W be (ε, 2s)-RIP matrix. If x R d is s-sparse and y = Wx then x = argmin v R d :Wv=y v 0. Proof: Suppose (for contradiction) that some s-sparse x x satisfies y = W x. Then x x 0 2s. Apply RIP to x x: W (x x) 2 x x 2 1 ɛ. x x > 0 and W (x x) 2 = 0 = 1 ε 0; contradiction Kontorovich and Sabato (BGU) Lecture 16 7 / 12

RIP and efficient reconstruction Theorem By RIP theorem, x = argmin v R d :Wv=y v 0.. To recover x, choose sparsest element of {v : Wv = y} This recovery procedure is not efficient (why?) Let s do something else. Let ε (0, 1), W be (ε, 2s)-RIP matrix. If ε < 1 1+ 2 then x = argmin v 0 = argmin v 1. v R d :Wv=y v R d :Wv=y Why is this good news? Convex optimization! Again, l 1 regularization encourages sparsity as in LASSO (l 1 regularized regression). Kontorovich and Sabato (BGU) Lecture 16 8 / 12

Constructing RIP matrices Explicit constructions are not known Do instead an efficient random construction which is likely to work. Theorem fix ε, δ (0, 1), s [d]. Set k 100 s log(40d/(δε)) ε 2 Draw W R k d via W ij N(0, 1/k) independently. Then, w. probability 1 δ, the matrix W is (ε, s)-rip. Same random matrix as J-L! (k is different) Full compressed sensing process: Generate random W R k d Get the k measurements Wx Find x := argminv R d :Wv=y v 1. Used Õ(s log(d)/ɛ2 ) measurements instead of d. Kontorovich and Sabato (BGU) Lecture 16 9 / 12

Latent sparsity x 0 := {i d : x i 0} If x 0 s, can compress using s (index,value) pairs Sometimes the sparsity of x is hidden : x = Uα where U R d d is orthogonal and α 0 s. U is fixed and assumed known. x has a sparse representation in basis U. Holds for many natural signals. E.g., JPEG image compression exploits sparsity in wavelet basis. Can we still compress x using O(s log(d)) measurements? Find W such that W := WU is RIP; Measure Wx WUα W α. Compute α = argmin v R d :W v=y v 0 Compute x = Uα. How to get W such that WU is RIP? Kontorovich and Sabato (BGU) Lecture 16 10 / 12

Latent sparsity How to get W R k d such that W = WU is RIP? Claim: If W ij N(0, 1/k) independently then so is W ij! Proof: W ij = k t=1 W itu tj N(0, k t=1 U2 tj /k) = N(0, 1/k). Independence: Fact: For Gaussians X, Y, they are independent iff E[XY ] = 0. Check independence of entries in W : k E[W ijw i j ] = E[ t,t =1 W it U tj W i t U t j ] = k t,t =1 U tj U t j E[W itw i t ] = 0. Conclusion: Compressed sensing with latent sparsity can use same W. Regardless of U! Kontorovich and Sabato (BGU) Lecture 16 11 / 12

Compressed Sensing summary Assumes that data x R d has s-sparse representation in some basis U R d d Goal: recover x from fewer measurements than d Method: measure using W R k d which is an RIP matrix. Set k s log(d), where x 0 s. Get k measurements Wx = WUα. Recover original α efficiently using convex optimization Get x = Uα. RIP matrices W can be efficiently constructed. Construction uses i.i.d Gaussian entries. Construction produce an RIP matrix with high probability Kontorovich and Sabato (BGU) Lecture 16 12 / 12