Statistical Issues in Searches: Photon Science Response. Rebecca Willett, Duke University

Similar documents
The Phase Problem: A Mathematical Tour from Norbert Wiener to Random Matrices and Convex Optimization

PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION. A Thesis MELTEM APAYDIN

Phase recovery with PhaseCut and the wavelet transform case

Constrained optimization

Nonlinear Analysis with Frames. Part III: Algorithms

Fast Angular Synchronization for Phase Retrieval via Incomplete Information

Provable Alternating Minimization Methods for Non-convex Optimization

Solving Corrupted Quadratic Equations, Provably

PhaseLift: A novel framework for phase retrieval. Vladislav Voroninski

Fast and Robust Phase Retrieval

Phase Retrieval from Local Measurements: Deterministic Measurement Constructions and Efficient Recovery Algorithms

Sparse Solutions of an Undetermined Linear System

Three Recent Examples of The Effectiveness of Convex Programming in the Information Sciences

Introduction to Compressed Sensing

Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization

arxiv: v1 [cs.it] 26 Oct 2015

Compressive Phase Retrieval From Squared Output Measurements Via Semidefinite Programming

Lecture Notes 9: Constrained Optimization

Phase Retrieval with Random Illumination

Convex relaxation. In example below, we have N = 6, and the cut we are considering

Three Generalizations of Compressed Sensing

An iterative hard thresholding estimator for low rank matrix recovery

Strengthened Sobolev inequalities for a random subspace of functions

Coherent imaging without phases

The convex algebraic geometry of rank minimization

Sensing systems limited by constraints: physical size, time, cost, energy

2D X-Ray Tomographic Reconstruction From Few Projections

arxiv: v2 [cs.it] 20 Sep 2011

Covariance Sketching via Quadratic Sampling

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France.

Self-Calibration and Biconvex Compressive Sensing

An algebraic perspective on integer sparse recovery

Sparse and Low Rank Recovery via Null Space Properties

arxiv: v1 [cs.it] 21 Feb 2013

Reconstruction from Anisotropic Random Measurements

Primal Dual Pursuit A Homotopy based Algorithm for the Dantzig Selector

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles

Compressive Sensing and Beyond

POISSON COMPRESSED SENSING. Rebecca Willett and Maxim Raginsky

Low-Rank Matrix Recovery

Sparse Legendre expansions via l 1 minimization

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees

Model-Based Compressive Sensing for Signal Ensembles. Marco F. Duarte Volkan Cevher Richard G. Baraniuk

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

Phase retrieval with random phase illumination

Lecture 22: More On Compressed Sensing

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 9, SEPTEMBER

Gauge optimization and duality

Nonconvex Methods for Phase Retrieval

A Geometric Analysis of Phase Retrieval

AN INTRODUCTION TO COMPRESSIVE SENSING

Phase Retrieval via Matrix Completion

Recovering overcomplete sparse representations from structured sensing

Fundamental Limits of Weak Recovery with Applications to Phase Retrieval

Sparse Recovery Beyond Compressed Sensing

Optimization-based sparse recovery: Compressed sensing vs. super-resolution

Convex Phase Retrieval without Lifting via PhaseMax

Structured matrix factorizations. Example: Eigenfaces

Fast Compressive Phase Retrieval

arxiv: v1 [math.na] 26 Nov 2009

Fourier phase retrieval with random phase illumination

Algorithms for sparse analysis Lecture I: Background on sparse approximation

INEXACT ALTERNATING OPTIMIZATION FOR PHASE RETRIEVAL WITH OUTLIERS

Convex Optimization and l 1 -minimization

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery

Sparsity Regularization

The uniform uncertainty principle and compressed sensing Harmonic analysis and related topics, Seville December 5, 2008

Near Optimal Signal Recovery from Random Projections

Constructing Explicit RIP Matrices and the Square-Root Bottleneck

Noisy Signal Recovery via Iterative Reweighted L1-Minimization

Compressed Sensing: Extending CLEAN and NNLS

An algebraic approach to phase retrieval

l 1 -Regularized Linear Regression: Persistence and Oracle Inequalities

CSC 576: Variants of Sparse Learning

AN ALGORITHM FOR EXACT SUPER-RESOLUTION AND PHASE RETRIEVAL

Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization

Universal low-rank matrix recovery from Pauli measurements

Information and Resolution

Introduction How it works Theory behind Compressed Sensing. Compressed Sensing. Huichao Xue. CS3750 Fall 2011

The Iterative and Regularized Least Squares Algorithm for Phase Retrieval

GREEDY SIGNAL RECOVERY REVIEW

The deterministic Lasso

Komprimované snímání a LASSO jako metody zpracování vysocedimenzionálních dat

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016

An Introduction to Sparse Approximation

Recent Developments in Compressed Sensing

The Pros and Cons of Compressive Sensing

Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery

Random Coding for Fast Forward Modeling

Mitigating the Effect of Noise in Iterative Projection Phase Retrieval

1 Sparsity and l 1 relaxation

Interpolation-Based Trust-Region Methods for DFO

SPARSE signal representations have gained popularity in recent

Three right directions and three wrong directions for tensor research

Fundamental Limits of PhaseMax for Phase Retrieval: A Replica Analysis

Tractable Upper Bounds on the Restricted Isometry Constant

Phase Retrieval from Local Measurements in Two Dimensions

Transcription:

Statistical Issues in Searches: Photon Science Response Rebecca Willett, Duke University 1 / 34

Photon science seen this week Applications tomography ptychography nanochrystalography coherent diffraction imaging Challenges high dimensional signals underdetermined problems limited number of photons complex forward models 2 / 34

Generic problem statement f is the signal of interest Before hitting our detector, the signal is transformed f A(f ) (tomographic projections, diffraction patterns, coded aperture) We observe y Poisson(A(f )) Goal is to infer f from y 3 / 34

Key questions We infer f via where f = arg min log p(y A(f)) + pen(f) f F log p(y A(f)) is the negative Poisson log likelihood pen(f) is a penalty / regularizer / negative log prior F is a class of candidate estimates We d like to understand the following: How does performance depend on A? What should we use for pen(f)? How do we solve the optimization problem? 4 / 34

First a case study Linear operator Convex objective Nonlinear operator Convex objective Linear operator Nonconvex objective Nonlinear operator Nonconvex objective For now, consider the case where A(f) = Af is a (possibly underdetermined) linear operator pen(f) is convex and measures sparsity 5 / 34

An oracle inequality 1 Theorem If y Poisson(f ), where f 1 = I, and f arg min { log p(y f) + l(f)} f F where l(f) prefix codelength(f), then (omitting constants) ( ) f EH 2 I, f min f I f F I f 2 l(f) I + 2 }{{} }{{}}{{} I risk approximation error estimation error (Similar oracle inequalities exist without the codelength perspective.) Codelengths correspond to generalized notions of sparsity. 1 Li & Barron 2000, Massart 2003, Willett & Nowak 2005, Baraud & Birge 2006-2011, Bunea Tsybakov & Wegkamp 2007 6 / 34

A myriad of tradeoffs We want to choose F and l( ) to make this bound as small as possible for large families of possible f Choosing a small model class F gives short codelengths but large approximation errors (bias) Choosing a rich model class makes the choice of coding scheme challenging we want l( ) convex so we can perform search quickly we want shorter codes for estimators which conform better to prior information 7 / 34

Codes and sparsity Encode which pixels contain stars and the brightness of those stars 8 / 34

Codes and sparsity Encode which wavelet coefficients are non-zero and their magnitudes 8 / 34

Codes and sparsity Encode image patches as linear combination of a small number of representative patches encode which representatives used and their weights 8 / 34

Codes and sparsity Encode location of object along a low-dimensional manifold 2 2 Fu et al. 2007 8 / 34

An oracle inequality Theorem If y Poisson(f ), where f 1 = I, and f arg min { log p(y f) + l(f)} f F where l(f) prefix codelength(f), then (omitting constants) ( ) f EH 2 I, f min f I f F I f 2 l(f) I + 2 }{{} }{{}}{{} I risk approximation error estimation error (Similar oracle inequalities exist without the codelength perspective.) This framework leads to guarantees on estimation performance for all f with sparse approximations. 9 / 34

Scheffé s identity We focus on the squared Hellinger distance: H 2 (f 1, f 2 ) ( f1 (x) f 2 (x)) 2 dx The squared Hellinger distance helps bound the L 1 error: which is important due to Sheffé s identity sup H 2 (f 1, f 2 ) f 1 f 2 1 2H(f 1, f 2 ) B B B f 1 f 2 = 1 B 2 f 1 f 2 1 The L 1 distance gives a bound on the absolute different of probability measures of any Borel-measurable subset of the density s domain. 10 / 34

Poisson inverse problems A j,i corresponds to the probability of a photon originating at location i hitting a detector at location j. Hardware systems can only aggregate events / photons, not measure differences: A j,i 0 The total number of observed events / photons cannot exceed the total number of events: n m (Af) j j=1 i=1 f i 11 / 34

Oracle inequality revisited Theorem Assume c f 1 f 2 2 2 Af 1 Af 2 2 2 C f 1 f 2 2 2 for all f 1, f 2 F f. If y Poisson(Af ) and f arg min { log p(y Af) + l(f)} f F where l(f) prefix codelength(f), then (omitting constants) f E I f 2 C min f I f F c I f 2 l(f) I + 2 2 }{{} }{{}}{{} ci risk approximation error estimation error Similar results hold for l 1 norms. 12 / 34

Big questions 1. If we have complete control over A, can we design our measurement system to get accurate high-resolution estimates from relatively few photodetectors? (compressed sensing) 2. When we have limited control over A, what does the theory tell us about various design tradeoffs? 13 / 34

Random compressed sensing matrices Let à be an n m Bernoulli ensemble matrix, so each Ãi,j is 1 n 1 or n with equal probability. à then satisfies the restricted isometry property (RIP, so C c 1) with high probability for n sufficiently large. We would like to observe Ãf. 14 / 34

Incorporating physical constraints Elements of A must be nonnegative = we shift à so all elements are nonnegative Total measured intensity Af 1 must not exceed the total incident intensity f 1 = we must scale à to ensure flux preservation. 15 / 34

Shifted and scaled measurements Original sensing matrix à Our shifted and scaled version is So we measure { } 1 1 n m, n n A = 1 2 n ( à + 1 ) n Af = Ãf 2 n + I 2n, where Ãf is the ideal signal and I f 1 is the total signal intensity 16 / 34

Compressible signals Assume f is compressible in some basis Φ; that is, f = Φθ and θ θ (k) 2 I where θ (k) is the best k-term approximation to θ. Theorem = O(k α ) There exist a finite set of candidate estimators F I and c (0, I/n) with the property Af c f F I and a prefix code such that, for n sufficiently large f E I f I 2 2 l(f) Φ T f 0 log(m) [ ( log(m) = O n I ) 2α 2α+1 + log(m/n) n ]. (Recall m is length of f, n is number of detectors, and I is the total intensity.) 17 / 34

Simulation results, m = 10 5 Relative Error vs Measurements (10 Trial Average) 1.2 I = 100,000, k = 100 I = 10,000, k = 100 I = 100,000, k = 500 I = 10,000, k = 500 I = 100,000, k = 1,000 I = 10,000, k = 1,000 1 Relative Error 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Number of Measurements x 10 4 18 / 34

Exercising limited control over A These bounds can also provide insight when we have a fixed signal intensity or photon budget, and want to choose the measurement system A to optimize the tradeoff between measurement diversity and photon scarcity. 19 / 34

Measurement diversity and photon scarcity In limited- or sparse-angle tomography, how many and which different angles should be used? 3 3 Cederlund et al. 2009 20 / 34

Measurement diversity and photon scarcity Representative Observation, 2 Excitation Frequencies 220 1000 500 140 120 100 400 80 300 60 100 200 300 Spectral band index 400 40 500 Raman Spectrum, 2 Excitation Frequencies 40 30 20 10 100 200 300 Spectral band index 400 500 100 Raman Spectrum, 10 Excitation Frequencies True Ramen Spectrum Estimate of Ramen Spectrum 400 70 200 500 15 50 Intensity Intensity 250 400 True Ramen Spectrum Estimate of Ramen Spectrum 20 60 300 200 300 Spectral band index Raman Spectrum, 40 Excitation Frequencies True Ramen Spectrum Estimate of Ramen Spectrum 80 350 Intensity Intensity Intensity Intensity 600 40 10 30 150 20 100 5 10 50 0 50 160 700 450 Representative Observation, 40 Excitation Frequencies 60 180 800 200 Representative Observation, 10 Excitation Frequencies 200 900 100 200 300 Spectral band index 400 500 0 100 200 300 Spectral band index 400 500 0 100 200 300 Spectral band index 400 500 In Shifted Excitation Raman Spectroscopy, how many and which different excitation frequencies should be used to separate Raman spectrum from fluorescent background? 20 / 34

Measurement diversity and photon scarcity In coded aperture or structured illumination imaging, which mask patterns best facilitate high-resolution image reconstruction? 20 / 34

Linear operator Convex objective Nonlinear operator Convex objective Linear operator Nonconvex objective Nonlinear operator Nonconvex objective We know a lot about linear systems and convex objectives. A good strategy for the other regimes is to find a problem relaxation such that (a) the relaxed problem is linear and convex and (b) the solution to the relaxed problem closely corresponds to the solution to the original problem 21 / 34

The following slides are derived from slides made by Thomas Strohmer, UC Davis 22 / 34

Phase retrieval X-ray diffraction of DNA (Photo 51 by Rosalind Franklin) X-ray crystallography Optical alignment (e.g., James Web Space Telescope) 23 / 34

At the core of phase retrieval lies the problem: We want to recover a function f(t) from intensity measurements of its Fourier transform, ˆf(ω) 2. Without further information about f, the phase retrieval problem is ill-posed. We can either impose additional properties of f or take more measurements (or both) 24 / 34

At the core of phase retrieval lies the problem: We want to recover a function f(t) from intensity measurements of its Fourier transform, ˆf(ω) 2. Without further information about f, the phase retrieval problem is ill-posed. We can either impose additional properties of f or take more measurements (or both) We want an efficient phase retrieval algorithm based on a rigorous mathematical framework, for which we can guarantee exact recovery, and which is also provably stable in presence of noise. Want flexible framework that does not require any prior information about the function (signal, image,...), yet can incorporate additional information if available. 24 / 34

General phase retrieval problem Suppose we have f C m or C m 1 m 2 quadratic measurements of the form about which we have y = A(f ) = { a k, f 2 : k = 1, 2,..., m}. Phase retrieval: find f obeying A(f) = A(f ) := y. Goals: Find measurement vectors {a k } k I such that f is uniquely determined by { a k, f } k I. Find an algorithm that reconstructs f from { a k, f } k I. 25 / 34

Standard approach: Oversampling Sufficient oversampling of the diffraction pattern in the Fourier domain gives uniqueness for generic two- or higher-dim. signals in combination with support constraints Alternating projections [Gerchberg-Saxton, Fienup] Alternatingly enforce object constraint in spatial domain and measurement constraint in Fourier domain Seems to work in certain cases but has many limitations and problems. No proof of convergence to actual solution! 26 / 34

Lifting Following [Balan, Bodman, Casazza, Edidin, 2007], we will interpret quadratic measurements of f as linear measurements of the rank-one matrix F := ff H : a k, f 2 = Tr(f H a k a H k f) = Tr(A kf ) where A k is the rank-one matrix a k a H k. Define linear operator A that maps pos.sem.def. matrix F into {Tr(A k F )} n k=1. 27 / 34

Lifting Following [Balan, Bodman, Casazza, Edidin, 2007], we will interpret quadratic measurements of f as linear measurements of the rank-one matrix F := ff H : a k, f 2 = Tr(f H a k a H k f) = Tr(A kf ) where A k is the rank-one matrix a k a H k. Define linear operator A that maps pos.sem.def. matrix F into {Tr(A k F )} n k=1. Now, the phase retrieval problem is equivalent to find subject to F A(F ) = y F 0 rank(f ) = 1 (RANKMIN) Having found F, we factorize F as ff H to obtain the phase retrieval solution (up to global phase factor). 27 / 34

Phase retrieval as convex problem? We need to solve: minimize rank(f ) subject to A(F ) = y F 0. (RANKMIN) Note that A(F ) is highly underdetermined, thus cannot just invert A to get F. Rank minimization problems are typically NP-hard. 28 / 34

Phase retrieval as convex problem? We need to solve: minimize rank(f ) subject to A(F ) = y F 0. (RANKMIN) Note that A(F ) is highly underdetermined, thus cannot just invert A to get F. Rank minimization problems are typically NP-hard. Use trace norm as convex surrogate for the rank functional [Beck 98, Mesbahi 97], giving the semidefinite program: minimize trace(f ) subject to A(F ) = y F 0. (TRACEMIN) 28 / 34

A new methodology for phase retrieval Lift up the problem of recovering a vector from quadratic constraints into that of recovering a rank-one matrix from affine constraints, and relax the combinatorial problem into a convenient convex program. 29 / 34

A new methodology for phase retrieval Lift up the problem of recovering a vector from quadratic constraints into that of recovering a rank-one matrix from affine constraints, and relax the combinatorial problem into a convenient convex program. PhaseLift 29 / 34

A new methodology for phase retrieval Lift up the problem of recovering a vector from quadratic constraints into that of recovering a rank-one matrix from affine constraints, and relax the combinatorial problem into a convenient convex program. PhaseLift But when (if ever) is the trace minimization problem equivalent to the rank minimization problem? 29 / 34

When is phase retrieval a convex problem? Theorem: [Candès-Strohmer-Voroninski 11] Let f in R m or C m and suppose we choose the measurement vectors {a k } n k=1 independently and uniformly at random on the unit sphere of C m or R m. If n c m log m, where c is a constant, then PhaseLift recovers f exactly from { a k, f 2 } n k=1 with probability at least 1 3e γ n m, where γ is an absolute constant. Note that the oversampling factor log m is rather minimal! This is the first result of its kind: phase retrieval can provably be accomplished via convex optimization. 30 / 34

Stability in presence of noise Assume we observe y i = f, a i 2 + ν i, where ν i is a noise term with ν 2 ɛ. Consider the solution to minimize trace(f ) subject to A(F ) y 2 ɛ f 0. Theorem: [Candès-Strohmer-Voroninski 11] Under the same assumptions as in the other theorem, the solution to the noisy, the solution ˆf computed via PhaseLift obeys ˆf f 2 C 0 min ( f 2, ɛ/ f 2 ) 31 / 34

Noise-aware framework Suppose we observe y Poisson(A(f )) Our convex formulation suggests to solve minimize log p(y A(F )) + λ trace(f ) subject to F 0. We can easily include additional constraints frequently used in phase retrieval, such as support constraints or positivity. 32 / 34

Limitations of our theory Theorems are not yet completely practical, since most phase retrieval problems involve diffraction, i.e., Fourier transforms, and not unstructured random measurements. 33 / 34

Conclusions Oracle inequalities provide valuable insight into tradeoffs related to sensing system design and measurement diversity, photon limitations, generalized sparsity, and optimization methods Nonlinear, nonconvex problems notoriously difficult to solve (hard to find global optimum) Convex relaxations may help alleviate this challenge 34 / 34