Sparse Sensing for Statistical Inference

Similar documents
Sparse Sensing for Statistical Inference

Sensor networks, Internet, power grids, biological networks,

QUANTIZATION FOR DISTRIBUTED ESTIMATION IN LARGE SCALE SENSOR NETWORKS

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Introduction to Machine Learning CMU-10701

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems

General Construction of Irreversible Kernel in Markov Chain Monte Carlo

CO-OPERATION among multiple cognitive radio (CR)

certain class of distributions, any SFQ can be expressed as a set of thresholds on the sufficient statistic. For distributions

Compressive Sensing and Beyond

F denotes cumulative density. denotes probability density function; (.)

SIGNALS with sparse representations can be recovered

Controlled sequential Monte Carlo

On Bayesian Computation

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

Brief Review on Estimation Theory

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Scalable robust hypothesis tests using graphical models

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

Variational Bayesian Inference Techniques

Gaussian Graphical Models and Graphical Lasso

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Robust multichannel sparse recovery

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Censor, Sketch, and Validate for Learning from Large-Scale Data

Markov Chain Monte Carlo, Numerical Integration

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Adaptive Monte Carlo methods

STA 4273H: Statistical Machine Learning

Estimating Unknown Sparsity in Compressed Sensing

Control Variates for Markov Chain Monte Carlo

CPSC 540: Machine Learning

Pattern Recognition and Machine Learning

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Introduction to Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Forward Problems and their Inverse Solutions

ISyE 691 Data mining and analytics

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

MCMC Sampling for Bayesian Inference using L1-type Priors

Generating the Sample

Econometrics I, Estimation

ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE)

ELEC633: Graphical Models

Solving Corrupted Quadratic Equations, Provably

Mixture Models and EM

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US

Large Scale Bayesian Inference

17 : Markov Chain Monte Carlo

Machine Learning using Bayesian Approaches

Outline Lecture 2 2(32)

Markov Chain Monte Carlo methods

Contents. Part I: Fundamentals of Bayesian Inference 1

Approximate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Lecture 7 and 8: Markov Chain Monte Carlo

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Covariance Matrix Simplification For Efficient Uncertainty Management

Bayesian Paradigm. Maximum A Posteriori Estimation

Likelihood-free MCMC

Learning the hyper-parameters. Luca Martino

Convex Optimization CMU-10725

RESEARCH ARTICLE. Online quantization in nonlinear filtering

Parameter Estimation in a Moving Horizon Perspective

Robust Principal Component Analysis

Kernel adaptive Sequential Monte Carlo

Lecture 13 : Variational Inference: Mean Field Approximation

Noisy Signal Recovery via Iterative Reweighted L1-Minimization

Distributed Binary Quantizers for Communication Constrained Large-scale Sensor Networks

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)

Simulation of truncated normal variables. Christian P. Robert LSTA, Université Pierre et Marie Curie, Paris

ACCORDING to Shannon s sampling theorem, an analog

Bayesian Methods for Sparse Signal Recovery

Compressive Sensing under Matrix Uncertainties: An Approximate Message Passing Approach

wissen leben WWU Münster

Optimization for Compressed Sensing

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Sensitivity Considerations in Compressed Sensing

An ADMM algorithm for optimal sensor and actuator selection

ST 740: Markov Chain Monte Carlo

arxiv: v1 [cs.it] 21 Feb 2013

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Bayesian Models for Regularization in Optimization

Advances and Applications in Perfect Sampling

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Stochastic optimization Markov Chain Monte Carlo

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet

Transdimensional Markov Chain Monte Carlo Methods. Jesse Kolb, Vedran Lekić (Univ. of MD) Supervisor: Kris Innanen

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Terminology Suppose we have N observations {x(n)} N 1. Estimators as Random Variables. {x(n)} N 1

Wavelet Based Image Restoration Using Cross-Band Operators

Transcription:

Sparse Sensing for Statistical Inference Model-driven and data-driven paradigms Geert Leus, Sundeep Chepuri, and Georg Kail ITA 2016, 04 Feb 2016 1/17

Power networks, grid analytics Health informatics 2/17 Radio astronomy (e.g., SKA) Internet, social media Design sparse sensing functions

What is sparse sensing? y R d Φ(w) = {0,1} {}}{ diag r (w) x R D = 3/17 What is sparse sensing? Design w {0,1} D to select the most informative d ( D) samples for - data acquisition (e.g., offline sensor selection) - data selection (e.g., outlier rejection, censoring) - Or, combination of above hybrid

State of the art Compressive sensing: sparse signal recovery [Donoho-2006], [Candès-Wakin-2008] Sensor selection: model-driven - convex optimization: design {0,1} M selection vector [Joshi-Boyd-2009], [Chepuri-Leus-2015] - greedy methods and heuristics: submodularity [Krause-Singh-Guestrin-08], [Ranieri-Chebira-Vetterli-14] Censoring (or data selection) and outlier rejection: data-driven [Rago-Willett-Shalom-96], [Msechu-Giannakis-12] Model-driven, data-driven, or hybrid? 4/17

Linear regression setup model-driven Observations follow x m = a T m θ +n m, m = 1,2,...,D - θ R p unknown parameter - n m i.i.d. zero-mean unit-variance Gaussian noise Problem statement Given {a m } and noise pdf, i.e., only the data model, design w to select the best subset of d ( D) sensors 5/17 Best subset of d ( D) sensors are obtained by optimizing a scalar function (trace, max. eigenvalue, log det) of the CRB matrix of θ: ( D f(w) = g w m a m a T m ) 1 m=1

Optimization problem Sparse sensing function can be designed by solving W = {w {0,1} D w 0 = d}. min f(w) w W 6/17 The above problem is convex on w if the set W is relaxed to W c = {w [0,1] D w 1 = d}. Monotone submodular cost functions can be optimized using near-optimal greedy methods. + Sampler needs to be designed offline only once, and samplers are obtained by optimizing ensemble performance. - Design might suffer from any possible outliers or model mismatch.

Model-driven design for other inference tasks 7/17 Can be generalized to nonlinear models by optimizing scalar functions of the Cramér-Rao bound matrix. S.P. Chepuri and G. Leus. Sparsity-Promoting Sensor Selection for Non-linear Measurement Models. IEEE Trans. on Signal Processing, Volume 63, Issue 3, pp. 684-698, February 2015. Samplers for nonlinear models with correlated errors can be obtained by solving a convex program. S.P. Chepuri and G. Leus. Sparse Sensing for Estimation with Correlated Observations. In Proc. of Asilomar 2015, Pacific Grove, CA, Nov. 2015. Samplers can be designed for detection problems by optimizing Kullback-Leibler or Bhattacharyya distance measures. S.P. Chepuri and G. Leus. Sparse Sensing for Distributed Detection. Trans. on Signal Processing, October 2015.

Linear regression setup data-driven The output data {x m } D m=1 is possibly contaminated with up to o outliers. We know the (uncontaminated) data model x m = a T m θ +n m, m = 1,2,...,D - θ R p unknown parameter - n m i.i.d. zero-mean unit-variance Gaussian noise Problem statement Given {x m }, {a m }, and noise pdf: (a) design w to censor less-informative samples and reject outliers (b) estimate θ that performs using the uncensored data 8/17

Data-driven design Data samples with smaller residuals are informative; more generally the one s with large likelihood Sensing function is obtained by solving 9/17 min w W,θ D m=1 ( ( ) ) r(w) = x T w I A w A T 1 w A w A T w x w. A w = diag r (w)a;x w = diag r (w)x. w m (x m a T m θ ) 2 min w W r(w) Also known as least trimmed squares. This problem is non-convex in general - Can be convexified for linear Gaussian case - Markov chain Monte Carlo methods (e.g., Metropolis-Hastings sampling)

Generalization of data-driven design For linear Gaussian models, the cost function amounts to sparsity based outlier rejection methods [Fuchs-1999], [Rousseeuw-Leroy-2005], [Giannakis et al.-2011] Can be generalized to nonlinear models by optimizing likelihood function parameterized with θ and w. G. Kail, S.P. Chepuri, and G. Leus. Robust Censoring Using Metropolis-Hastings Sampling. IEEE Journ. of Selec. Topics in Signal Processing, Nov. 2015. + Designs are robust to outliers. - Sampler needs to be designed for each data realization, and samplers are obtained by optimizing instantaneous measure (performance could be bad). 10/17

Hybrid Model-data-driven design Data-driven samplers are robust to outliers, but don t take the resulting inference performance (i.e., MSE) into account Model-driven samplers are MSE optimal, but are not robust to outliers. Hybrid model-data-driven designs allow to combine (and trade) the above two advantages. + Designs are robust to outliers, and take into account the inference performance. - Sampler needs to be designed for each data realization. 11/17

Optimization problem Hybrid model-data-driven sensing scheme jointly optimizes the likelihood function (i.e., residual) and the MSE: W = {w {0,1} D w 0 = d}. min r(w)+λf(w) w W λ 0( ) results in the related data (model)-driven scheme This problem is non-convex in general - Can be convexified for linear Gaussian case - Markov chain Monte Carlo methods (e.g., Metropolis-Hastings sampling) 12/17

Convex optimization based solver Hybrid model-data-driven sensing design is equivalent to min w W,t 1,t 2 t 1 +λt 2 s.t. r(w) t 1, f(w) t 2. Using Schur complement and Φ T Φ = diag(w) min t 1 +λt 2 w W,t 1,t 2 [ A s.t. T diag(w)a A T ] diag(w)x x T diag(w)a t 1 x T 0, diag(w)x f(w) t 2. The above problem is convex after relaxing W to W c. 13/17

Metropolis-Hastings sampler Powerful concept for generating samples from complex distributions: 1. Each iteration generates a proposal w from some proposal distribution q ( w ) w (j 1). 2. The new sample w (j) is obtained as { w with probability w (j) αj = w (j 1) with probability 1 α j { pt ( w ) q ( w (j 1) w ) α j = min ( p ) t w (j 1) q ( w w (j 1)),1 ( p t (w) exp 1 ( ) ) r(w)+λf(w) 2σ 2 Doesn t involve convex relaxations (W to W c ). } 14/17

Numerical experiments 1 cdf of ˆθ θ 2 / θ 2 0.75 0.5 0.25 Data-driven Model-driven Hybrid 0 10 6 10 4 10 2 10 0 10 2 ˆθ θ 2 / θ 2 Best performance of the hybrid scheme is not necessarily in between the best performances of the data-driven and model-driven schemes. 15/17

Numerical experiments 1 cdf of ˆθ θ 2 / θ 2 0.75 0.5 0.25 Convex Relax. MH Exh. Search 0 10 6 10 4 10 2 10 0 10 2 ˆθ θ 2 / θ 2 Error distribution achieved with the MH method almost coincides with that of exhaustive search. 16/17

Conclusions and future directions Model-driven, data-driven, hybrid sparse sensing - for basic inference problems - respective strengths and weaknesses Future directions - Correlated observations, clustering, and classification - Greedy algorithms (submodular) Thank You!! 17/17