Compressive Inference

Similar documents
Can we do statistical inference in a non-asymptotic way? 1

Semi-Nonparametric Inferences for Massive Data

Regularization in Reproducing Kernel Banach Spaces

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 04: Features and Kernels. Lorenzo Rosasco

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Support Vector Machines for Classification: A Statistical Portrait

A Modern Look at Classical Multivariate Techniques

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)

(Part 1) High-dimensional statistics May / 41

Kernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton

Advanced Introduction to Machine Learning

Reproducing Kernel Hilbert Spaces

Direct Learning: Linear Classification. Donglin Zeng, Department of Biostatistics, University of North Carolina

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design

Spline Density Estimation and Inference with Model-Based Penalities

MATH 590: Meshfree Methods

Regression, Ridge Regression, Lasso

10-701/ Recitation : Kernels

Divide and Conquer Kernel Ridge Regression. A Distributed Algorithm with Minimax Optimal Rates

High-dimensional regression with unknown variance

Nonparametric Inference In Functional Data

Alternatives to Basis Expansions. Kernels in Density Estimation. Kernels and Bandwidth. Idea Behind Kernel Methods

High-dimensional Ordinary Least-squares Projection for Screening Variables

Convergence Rates of Kernel Quadrature Rules

Statistical Inference

Mathematical Methods for Data Analysis

Nonparametric Regression. Badr Missaoui

Gaussian Processes (10/16/13)

Bayesian Aggregation for Extraordinarily Large Dataset

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

A Short Introduction to the Lasso Methodology

Maximum Mean Discrepancy

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable.

The Learning Problem and Regularization

Spectral Regularization

Kernels A Machine Learning Overview

A Magiv CV Theory for Large-Margin Classifiers

Support Vector Machines

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Regularization via Spectral Filtering

Theoretical Exercises Statistical Learning, 2009

Support Vector Machine

Kernel Principal Component Analysis

MATH 829: Introduction to Data Mining and Analysis Support vector machines and kernels

Reconstruction from Anisotropic Random Measurements

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Adaptive Piecewise Polynomial Estimation via Trend Filtering

CIS 520: Machine Learning Oct 09, Kernel Methods

Linear models and their mathematical foundations: Simple linear regression

Hypothesis Testing. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA

Minimax Estimation of Kernel Mean Embeddings

Kernel Methods. Outline

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina

Inference for High Dimensional Robust Regression

Statistical Methods for Data Mining

Data Mining Stat 588

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Kernel Learning via Random Fourier Representations

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Reproducing Kernel Hilbert Spaces

Kernel Methods. Jean-Philippe Vert Last update: Jan Jean-Philippe Vert (Mines ParisTech) 1 / 444

Linear Regression Linear Regression with Shrinkage

21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk)

Bayesian Support Vector Machines for Feature Ranking and Selection

Basis Expansion and Nonlinear SVM. Kai Yu

Linear Regression Linear Regression with Shrinkage

Bayesian estimation of the discrepancy with misspecified parametric models

A Selective Review of Sufficient Dimension Reduction

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Statistical Methods for SVM

9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee

Sliced Inverse Regression

Kernel methods and the exponential family

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

An iterative hard thresholding estimator for low rank matrix recovery

Chapter 7, continued: MANOVA

Online gradient descent learning algorithm

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Lasso, Ridge, and Elastic Net

Support Vector Machine I

Kernel Methods. Machine Learning A W VO

Manifold Learning: Theory and Applications to HRI

Introduction to Machine Learning

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

Lecture 11: Regression Methods I (Linear Regression)

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

RANDOM FIELDS AND GEOMETRY. Robert Adler and Jonathan Taylor

Towards stability and optimality in stochastic gradient descent

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Interpolation-Based Trust-Region Methods for DFO

Psychology 282 Lecture #4 Outline Inferences in SLR

Hilbert Space Methods in Learning

Lasso, Ridge, and Elastic Net

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University

Generalization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh

Sparse Additive Functional and kernel CCA

Each new feature uses a pair of the original features. Problem: Mapping usually leads to the number of features blow up!

Transcription:

Compressive Inference Weihong Guo and Dan Yang Case Western Reserve University and SAMSI SAMSI transition workshop Project of Compressive Inference subgroup of Imaging WG Active members: Garvesh Raskutti, Jiayang Sun and Grace Yi Wang May 22, 2013 G & Y (CWRU & SAMSI) Compressive inference 1 / 20

Outline 1 Compressive Sensing 2 Compressive Inference 3 Method 4 Simulation 5 Conclusion G & Y (CWRU & SAMSI) Compressive inference 2 / 20

Compressive Sensing Why: Too long/expensive to collect data, too much space to store, too much time to analyze/retrieve information, and increases the risk of developing cancer in medical application. A volume of human brain scans. 175 slices. (Courtesy: oasis-brains.org.) G & Y (CWRU & SAMSI) Compressive inference 3 / 20

Traditional v.s. Compressive Sensing (CS) Fourier Domain Image Domain Traditional Compressive G & Y (CWRU & SAMSI) Compressive inference 4 / 20

Formulation Continuous Setup: f : X R is the intensity (or difference) function of an image. X R d is the ROI. Example: [0, 1] or [0, 1] 2 G & Y (CWRU & SAMSI) Compressive inference 5 / 20

Formulation Continuous Setup: f : X R is the intensity (or difference) function of an image. X R d is the ROI. Example: [0, 1] or [0, 1] 2 Discrete Setup: f = (f (x 1 ), f (x 2 ),, f (x p)) Example: difference of intensities at grid ( 1, 2,..., 1) for grayscale images p p G & Y (CWRU & SAMSI) Compressive inference 5 / 20

Formulation Continuous Setup: f : X R is the intensity (or difference) function of an image. X R d is the ROI. Example: [0, 1] or [0, 1] 2 Discrete Setup: f = (f (x 1 ), f (x 2 ),, f (x p)) Example: difference of intensities at grid ( 1, 2,..., 1) for grayscale images p p Compressive sensing: Observe: y = Af + ɛ where A R nxp is a sampling matrix with n p, satisfying RIP ɛ N(0, σ 2 I n) Goal: recover f or make inference about f G & Y (CWRU & SAMSI) Compressive inference 5 / 20

Formulation Continuous Setup: f : X R is the intensity (or difference) function of an image. X R d is the ROI. Example: [0, 1] or [0, 1] 2 Discrete Setup: f = (f (x 1 ), f (x 2 ),, f (x p)) Example: difference of intensities at grid ( 1, 2,..., 1) for grayscale images p p Compressive sensing: Observe: y = Af + ɛ where A R nxp is a sampling matrix with n p, satisfying RIP ɛ N(0, σ 2 I n) Goal: recover f or make inference about f Comparison with statistical latent variable model y = f + ɛ z = Ay + γ G & Y (CWRU & SAMSI) Compressive inference 5 / 20

Example Low dimensional information retrieval (Courtesy: sadies-brain-tumor.org) G & Y (CWRU & SAMSI) Compressive inference 6 / 20

Compressive Inference Recall: given data y = Af + ɛ, make reference about f from y. Examples: 1 Test H 0 : f (x) = 0, x X - Applicable in comparison of images - Aside: Find the support of f, i.e., {x : f (x) 0} - Multiple hypothesis testing G & Y (CWRU & SAMSI) Compressive inference 7 / 20

Compressive Inference Recall: given data y = Af + ɛ, make reference about f from y. Examples: 1 Test H 0 : f (x) = 0, x X - Applicable in comparison of images - Aside: Find the support of f, i.e., {x : f (x) 0} - Multiple hypothesis testing 2 Test H 0 : f = 0 vs H 1 : f = s, where s is known or unknown - Davenport et al. (2006, 2010), Duarte et al. (2006) - Single hypothesis testing - Likelihood or Hotelling T 2 G & Y (CWRU & SAMSI) Compressive inference 7 / 20

Compressive Inference Recall: given data y = Af + ɛ, make reference about f from y. Examples: 1 Test H 0 : f (x) = 0, x X - Applicable in comparison of images - Aside: Find the support of f, i.e., {x : f (x) 0} - Multiple hypothesis testing 2 Test H 0 : f = 0 vs H 1 : f = s, where s is known or unknown - Davenport et al. (2006, 2010), Duarte et al. (2006) - Single hypothesis testing - Likelihood or Hotelling T 2 3 Test H 0i : f i = 0 vs H 1i : f i 0 - Buhlmann (2012) - Multiple hypothesis testing - Sparsity (no function perspective) - Combination of LASSO and Ridge - Discrete conservative G & Y (CWRU & SAMSI) Compressive inference 7 / 20

Compressive Inference Recall: given data y = Af + ɛ, make reference about f from y. Examples: 1 Test H 0 : f (x) = 0, x X - Applicable in comparison of images - Aside: Find the support of f, i.e., {x : f (x) 0} - Multiple hypothesis testing 2 Test H 0 : f = 0 vs H 1 : f = s, where s is known or unknown - Davenport et al. (2006, 2010), Duarte et al. (2006) - Single hypothesis testing - Likelihood or Hotelling T 2 3 Test H 0i : f i = 0 vs H 1i : f i 0 - Buhlmann (2012) - Multiple hypothesis testing - Sparsity (no function perspective) - Combination of LASSO and Ridge - Discrete conservative Key: CI takes advantage of compressibility of smooth image (continuous f ), so that complex information can be obtained from small amount information y G & Y (CWRU & SAMSI) Compressive inference 7 / 20

Method Smooth assumption of f Estimation of f by kernel ridge regression Tube method on compressed sensing G & Y (CWRU & SAMSI) Compressive inference 8 / 20

Smoothness Assumption Need to impose assumptions on class of difference of images function f. Eg. Polynomial, Lipschitz, smoothing spline. Special cases of Reproducing Kernel Hilbert Spaces (RKHS). G & Y (CWRU & SAMSI) Compressive inference 9 / 20

Smoothness Assumption Need to impose assumptions on class of difference of images function f. Eg. Polynomial, Lipschitz, smoothing spline. Special cases of Reproducing Kernel Hilbert Spaces (RKHS). Important properties: Mercer s theorem: K : X X R K(x, x ) = λ k φ k (x)φ k (x ) k=1 Statistical complexity decay of eigenvalues G & Y (CWRU & SAMSI) Compressive inference 9 / 20

Example Lipschitz Kernel K(x, x ) = min{x, x } Function class {f : f L 2, f L 2 } Corresponds to Sobolev class with smoothness α = 1 Other examples G & Y (CWRU & SAMSI) Compressive inference 10 / 20

From Assumption to Penalty Expansion for f in RKHS K(x, x ) = λ k φ k (x)φ k (x ) k=1 f (x) = a k φ k (x) k=1 Hilbert ball of radius ρ B H(ρ) = {f : f 2 H = k a 2 k λ k ρ 2 } G & Y (CWRU & SAMSI) Compressive inference 11 / 20

From Assumption to Penalty Expansion for f in RKHS K(x, x ) = λ k φ k (x)φ k (x ) k=1 f (x) = a k φ k (x) k=1 Hilbert ball of radius ρ B H(ρ) = {f : f 2 H = k a 2 k λ k ρ 2 } Kernel Ridge Regression min y Af 2 2 + λ f 2 H min y AΦa 2 2 + λa T Λ 1 a where (Φ) ik = φ k (x i ), Λ = diag(λ 1, λ 2,...), a = (a 1, a 2,...) T. G & Y (CWRU & SAMSI) Compressive inference 11 / 20

Estimator Minimizer ˆfλ (x) = κ(x) T A T (AK A T + λi) 1 y where (K ) ij = K(x i, x j ), κ(x) = (K(x, x 1 ),..., K(x, x n)) T G & Y (CWRU & SAMSI) Compressive inference 12 / 20

Estimator Minimizer ˆfλ (x) = κ(x) T A T (AK A T + λi) 1 y def. = l(x), y where (K ) ij = K(x i, x j ), κ(x) = (K(x, x 1 ),..., K(x, x n)) T Linear in y G & Y (CWRU & SAMSI) Compressive inference 12 / 20

SCR via Tube Method Confidence bands (ˆf (x) ± cˆσ l(x) ) G & Y (CWRU & SAMSI) Compressive inference 13 / 20

SCR via Tube Method Confidence bands (ˆf (x) ± cˆσ l(x) ) Simultaneous: α = P( ˆf (x) f (x) > cˆσ l(x), x X ) G & Y (CWRU & SAMSI) Compressive inference 13 / 20

SCR via Tube Method Confidence bands (ˆf (x) ± cˆσ l(x) ) Simultaneous: α = P( ˆf (x) f (x) > cˆσ l(x), x X ) Tube method (Sun and Loader, 94) for d = 1 α κ 0 π where κ 0 can be derived from l(x) ) ν/2 (1 + c2 + P( t ν > c) ν G & Y (CWRU & SAMSI) Compressive inference 13 / 20

SCR via Tube Method Confidence bands (ˆf (x) ± cˆσ l(x) ) Simultaneous: α = P( ˆf (x) f (x) > cˆσ l(x), x X ) Tube method (Sun and Loader, 94) for d = 1 α κ 0 π where κ 0 can be derived from l(x) ) ν/2 (1 + c2 + P( t ν > c) ν Decision: reject H 0 if there exists x such that ˆf (x) ˆσ l(x) > c G & Y (CWRU & SAMSI) Compressive inference 13 / 20

Simulation Setup 1 Fix p; vary n 2 Consider functions f = 0, f 1, f 2 evaluated at x i = i/p. 3 Generate A n p : A ij iid N(0, 1/n). 4 Generate ɛ N(0, I n). 5 y = Af + ɛ. Tube Bonferroni H 0 : f (x) = 0, x X H 0i : f (x i ) = 0 vs H 1i : f (x i ) 0 G & Y (CWRU & SAMSI) Compressive inference 14 / 20

Table: Test Size;(T: tube method; B: Bonferroni). n/p method α = 1% 10% 20% 30% 40% 50% 100% T 0.90% 1.20% 1.10% 0.70% 1.10% 0.70% B 0.20% 0.20% 0.00% 0.20% 0.10% 0.00% n/p method α = 5% 10% 20% 30% 40% 50% 100% T 4.80% 4.50% 5.30% 4.40% 3.10% 3.70% B 0.30% 0.50% 0.30% 0.60% 0.80% 0.00% n/p method α = 10% 10% 20% 30% 40% 50% 100% T 9.10% 9.40% 8.90% 9.60% 7.70% 7.70% B 0.40% 1.00% 1.00% 1.10% 1.10% 0.00% G & Y (CWRU & SAMSI) Compressive inference 15 / 20

Figure: f 1 = 2δ x 0.5, x [0, 1]; f 2 = δ exp ( 10 4 x 0.5 2 ), x [0, 1] G & Y (CWRU & SAMSI) Compressive inference 16 / 20

Figure: Test Power for f 1 = 2δ x 0.5, x [0, 1] G & Y (CWRU & SAMSI) Compressive inference 17 / 20

Figure: Test Power for f 2 = δ exp ( 10 4 x 0.5 2 ), x [0, 1] G & Y (CWRU & SAMSI) Compressive inference 18 / 20

Future Work Multidimensional image d 2; real image and video Automating λ Supervised selection of K More asymptotics Conditional tube formula (A random) Software Real applications: medical, hidden messages, security monitoring, etc. G & Y (CWRU & SAMSI) Compressive inference 19 / 20

Thank you! G & Y (CWRU & SAMSI) Compressive inference 20 / 20