Sparse Proteomics Analysis (SPA)

Similar documents
Sparse Proteomics Analysis

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Compressed Sensing and Neural Networks

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Exponential decay of reconstruction error from binary measurements of sparse signals

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Structured signal recovery from non-linear and heavy-tailed measurements

STAT 200C: High-dimensional Statistics

Proteomics and Variable Selection

IEOR 265 Lecture 3 Sparse Linear Regression

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

1 Regression with High Dimensional Data

An Introduction to Sparse Approximation

Sampling and high-dimensional convex geometry

Sparse Approximation and Variable Selection

From approximation theory to machine learning

Computational Genomics

arxiv: v1 [cs.it] 21 Feb 2013

Oslo Class 6 Sparsity based regularization

Building a Prognostic Biomarker

Protein Expression Molecular Pattern Discovery by Nonnegative Principal Component Analysis

Super-resolution via Convex Programming

The Pros and Cons of Compressive Sensing

Optimization for Compressed Sensing

Analysis of Greedy Algorithms

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco

Provable Alternating Minimization Methods for Non-convex Optimization

CPSC 340: Machine Learning and Data Mining

CSC 576: Variants of Sparse Learning

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

ECE521 week 3: 23/26 January 2017


Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Learning From Data: Modelling as an Optimisation Problem

New ways of dimension reduction? Cutting data sets into small pieces

Advanced metric adaptation in Generalized LVQ for classification of mass spectrometry data

Lecture 16: Compressed Sensing

FACTORIZATION MACHINES AS A TOOL FOR HEALTHCARE CASE STUDY ON TYPE 2 DIABETES DETECTION

Sparse Linear Models (10/7/13)

Probabilistic Machine Learning. Industrial AI Lab.

1 Sparsity and l 1 relaxation

Discriminative Models

Advanced metric adaptation in Generalized LVQ for classification of mass spectrometry data

Graph Partitioning Using Random Walks

ECS289: Scalable Machine Learning

Greedy Signal Recovery and Uniform Uncertainty Principles

Sparse solutions of underdetermined systems

Komprimované snímání a LASSO jako metody zpracování vysocedimenzionálních dat

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

SPARSE signal representations have gained popularity in recent

COMS 4771 Regression. Nakul Verma

Quantized Iterative Hard Thresholding:

Sparsity in Underdetermined Systems

Sparse Algorithms are not Stable: A No-free-lunch Theorem

Sparse analysis Lecture V: From Sparse Approximation to Sparse Signal Recovery

Machine Learning Linear Classification. Prof. Matteo Matteucci

Model Selection with Partly Smooth Functions

ECS289: Scalable Machine Learning

Optimal Linear Estimation under Unknown Nonlinear Transform

Integrated Non-Factorized Variational Inference

Applied Machine Learning Annalisa Marsico

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

Compressed Sensing in Cancer Biology? (A Work in Progress)

Towards the Prediction of Protein Abundance from Tandem Mass Spectrometry Data

DATA MINING AND MACHINE LEARNING

Importance Sampling for Minibatches

Reconstruction from Anisotropic Random Measurements

STATS 306B: Unsupervised Learning Spring Lecture 13 May 12

Small-variance Asymptotics for Dirichlet Process Mixtures of SVMs

25 : Graphical induced structured input/output models

Discriminative Models

25 : Graphical induced structured input/output models

CPSC 540: Machine Learning

Linear Models for Regression CS534

Towards a Mathematical Theory of Super-resolution

Development and Evaluation of Methods for Predicting Protein Levels from Tandem Mass Spectrometry Data. Han Liu

Lecture 8: Classification

The uniform uncertainty principle and compressed sensing Harmonic analysis and related topics, Seville December 5, 2008

Robust Principal Component Analysis

Minimizing the Difference of L 1 and L 2 Norms with Applications

Dictionary Learning for photo-z estimation

CPSC 540: Machine Learning

Massive MIMO: Signal Structure, Efficient Processing, and Open Problems II

Bayesian Methods for Sparse Signal Recovery

Towards the Prediction of Protein Abundance from Tandem Mass Spectrometry Data

Generalized Orthogonal Matching Pursuit- A Review and Some

Optimization methods

Sparse Gaussian conditional random fields

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs)

Genetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig

EUSIPCO

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

The Pros and Cons of Compressive Sensing

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)

Optimisation Combinatoire et Convexe.

CPSC 540: Machine Learning

Knockoffs as Post-Selection Inference

Transcription:

Sparse Proteomics Analysis (SPA) Toward a Mathematical Theory for Feature Selection from Forward Models Martin Genzel Technische Universität Berlin Winter School on Compressed Sensing December 5, 2015

Outline 1 Biological Background 2 Sparse Proteomics Analysis (SPA) 3 Theoretical Foundation by High-dimensional Estimation Theory Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 2 / 19

1 Biological Background 2 Sparse Proteomics Analysis (SPA) 3 Theoretical Foundation by High-dimensional Estimation Theory

What is Proteomics? The pathological mechanisms of many diseases, such as cancer, are manifested on the level of protein activities. Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 3 / 19

What is Proteomics? The pathological mechanisms of many diseases, such as cancer, are manifested on the level of protein activities. To improve clinical treatment options and early diagnostics, we need to understand protein structures and their interactions! Proteins are long chains of amino acids, controlling many biological and chemical processes in the human body. http://www.topsan.org/proteins/jcsg/3qxb Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 3 / 19

What is Proteomics? The pathological mechanisms of many diseases, such as cancer, are manifested on the level of protein activities. To improve clinical treatment options and early diagnostics, we need to understand protein structures and their interactions! Proteins are long chains of amino acids, controlling many biological and chemical processes in the human body. The entire set of proteins at a certain point of time is called a proteome. Proteomics is the large-scale study of the human proteome. http://www.topsan.org/proteins/jcsg/3qxb Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 3 / 19

What is Mass Spectrometry? How to capture a proteome? Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 4 / 19

What is Mass Spectrometry? How to capture a proteome? Mass spectrometry (MS) is a popular technique to detect the abundance of proteins in samples (blood, urine, etc.). Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 4 / 19

What is Mass Spectrometry? How to capture a proteome? Mass spectrometry (MS) is a popular technique to detect the abundance of proteins in samples (blood, urine, etc.). Schematic Work-Flow Laser Mass spectrum Sample + + + - - + + Detector Intensity (cts) Mass (m/z) Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 4 / 19

Real-World MS-Data Intensity (cts) Mass (m/z) MS-vector: x = (x 1,..., x d ) R d, d 10 4... 10 6 Index ˆ= Mass/Feature, Entry ˆ= Intensity/Amplitude Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 5 / 19

Real-World MS-Data Intensity (cts) Mass (m/z) MS-vector: x = (x 1,..., x d ) R d, d 10 4... 10 6 Index ˆ= Mass/Feature, Entry ˆ= Intensity/Amplitude Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 5 / 19

Real-World MS-Data Intensity (cts) Mass (m/z) MS-vector: x = (x 1,..., x d ) R d, d 10 4... 10 6 Index ˆ= Mass/Feature, Entry ˆ= Intensity/Amplitude Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 5 / 19

Feature Selection from MS-Data Goal: Detect a small set of features (disease fingerprint) that allows for an appropriate distinction between the diseased and healthy group. Schematic Work-Flow Samples Blood from healthy individual Blood from diseased individual Mass (m/z) Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 6 / 19

Feature Selection from MS-Data Goal: Detect a small set of features (disease fingerprint) that allows for an appropriate distinction between the diseased and healthy group. Schematic Work-Flow Samples Mass Spectra Blood from healthy individual MS Blood from diseased individual MS Intensity (cts) Intensity (cts) Mass (m/z) Mass (m/z) Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 6 / 19

Feature Selection from MS-Data Goal: Detect a small set of features (disease fingerprint) that allows for an appropriate distinction between the diseased and healthy group. Schematic Work-Flow Samples Mass Spectra Feature Selection Blood from healthy individual MS Intensity (cts) Mass (m/z) Disease Fingerprint Blood from diseased individual Comparing MS Intensity (cts) Mass (m/z) Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 6 / 19

Mathematical Problem Formulation Supervised Learning: We are given n samples (x 1, y 1 ),..., (x n, y n ). x k R d : y k { 1, +1}: Mass spectrum of the k-th patient Health status of the k-th patient (healthy = +1, diseased = 1) Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 7 / 19

Mathematical Problem Formulation Supervised Learning: We are given n samples (x 1, y 1 ),..., (x n, y n ). x k R d : y k { 1, +1}: Mass spectrum of the k-th patient Health status of the k-th patient (healthy = +1, diseased = 1) Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 7 / 19

Mathematical Problem Formulation Supervised Learning: We are given n samples (x 1, y 1 ),..., (x n, y n ). x k R d : y k { 1, +1}: Mass spectrum of the k-th patient Health status of the k-th patient (healthy = +1, diseased = 1) Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 7 / 19

Mathematical Problem Formulation Supervised Learning: We are given n samples (x 1, y 1 ),..., (x n, y n ). x k R d : y k { 1, +1}: Mass spectrum of the k-th patient Health status of the k-th patient (healthy = +1, diseased = 1) Goal: Learn a feature vector ω R d which is sparse, i.e., few non-zero entries, ( stability, avoid overfitting) and its entries correspond to peaks that are highly correlated with the disease. ( interpretability, biological relevance) Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 7 / 19

Mathematical Problem Formulation Supervised Learning: We are given n samples (x 1, y 1 ),..., (x n, y n ). x k R d : y k { 1, +1}: Mass spectrum of the k-th patient Health status of the k-th patient (healthy = +1, diseased = 1) Goal: Learn a feature vector ω R d which is sparse, i.e., few non-zero entries, ( stability, avoid overfitting) and its entries correspond to peaks that are highly correlated with the disease. ( interpretability, biological relevance) Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 7 / 19

How to learn a fingerprint ω?

1 Biological Background 2 Sparse Proteomics Analysis (SPA) 3 Theoretical Foundation by High-dimensional Estimation Theory

Sparse Proteomics Analysis (SPA) Sparse Proteomics Analysis is a generic framework to meet this challenge. Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 8 / 19

Sparse Proteomics Analysis (SPA) Sparse Proteomics Analysis is a generic framework to meet this challenge. Input: Sample pairs (x 1, y 1 ),..., (x n, y n ) R d { 1, +1} Compute: 1 Preprocessing (Smoothing, Standardization) 2 Feature Selection (LASSO, l 1 -SVM, Robust 1-Bit CS) 3 Postprocessing (Sparsification) Output: Sparse feature vector ω R d Biomarker extraction, dimension reduction Intensity (cts) Mass (m/z) Biomarker Identification Blood Sample Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 8 / 19

Sparse Proteomics Analysis (SPA) Sparse Proteomics Analysis is a generic framework to meet this challenge. Input: Sample pairs (x 1, y 1 ),..., (x n, y n ) R d { 1, +1} Compute: 1 Preprocessing (Smoothing, Standardization) 2 Feature Selection (LASSO, l 1 -SVM, Robust 1-Bit CS) 3 Postprocessing (Sparsification) Output: Sparse feature vector ω R d Rest of this talk Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 8 / 19

Feature Selection (Geometric Intuition) Linear Separation Model: Find a feature vector ω R d such that y k = sign( x k, ω ) for many k {1,..., n}. Moreover, ω should be sparse and interpretable. Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 9 / 19

Feature Selection via the LASSO The LASSO (Tibshirani 96) n (y k x k, ω ) 2 subject to ω 1 R min ω R d k=1 Multivariate approach, originally designed for linear regression models: y k x k, ω, k = 1,..., n. But also applicable to non-linear models Next part Later: R s to allow for s-sparse solutions (with unit norm). Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 10 / 19

Feature Selection via the LASSO The LASSO (Tibshirani 96) n (y k x k, ω ) 2 subject to ω 1 R min ω R d k=1 Multivariate approach, originally designed for linear regression models: y k x k, ω, k = 1,..., n. But also applicable to non-linear models Next part Later: R s to allow for s-sparse solutions (with unit norm). Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 10 / 19

Some Numerical Results 5-fold cross-validation for real-world pancreas data (156 samples): 1 Learn feature vector ω by SPA, using 80% of the samples. 2 Classify the remaining 20% of the sample by an ordinary SVM, after projecting onto supp(ω). 3 Iterate this procedure 12-times for random partitions. Classification accuracy for different sparsity levels s = # supp(ω) Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 11 / 19

But what about theoretical guarantees?

1 Biological Background 2 Sparse Proteomics Analysis (SPA) 3 Theoretical Foundation by High-dimensional Estimation Theory

Toward a Theoretical Foundation of SPA Linear Separation Model: Explains the observations/labels: y k = sign( x k, ω 0 ), k = 1,..., n Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 12 / 19

Toward a Theoretical Foundation of SPA Linear Separation Model: Explains the observations/labels: y k = sign( x k, ω 0 ), k = 1,..., n Forward Model: Explains the random distribution of the data: x k = M m=1 s m,ka m + n k, k = 1,..., n Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 12 / 19

Toward a Theoretical Foundation of SPA Linear Separation Model: Explains the observations/labels: y k = sign( x k, ω 0 ), k = 1,..., n Forward Model: Explains the random distribution of the data: x k = M m=1 s m,ka m + n k, k = 1,..., n a m : s m,k : n k : Deterministic feature atom, sampled Gaussian peak ( R d ) Random latent factor specifying the peak amplitude ( R) Random baseline noise ( R d ) s ",$ s ",$ % exp (% c ") - β " - Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 12 / 19

Toward a Theoretical Foundation of SPA Linear Separation Model: Explains the observations/labels: y k = sign( x k, ω 0 ), k = 1,..., n Forward Model: Explains the random distribution of the data: x k = M m=1 s m,ka m + n k, k = 1,..., n a m : s m,k : n k : Deterministic feature atom, sampled Gaussian peak ( R d ) Random latent factor specifying the peak amplitude ( R) Random baseline noise ( R d ) s ",$ s ",$ % exp (% c ") - β " - Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 12 / 19

Toward a Theoretical Foundation of SPA Linear Separation Model: Explains the observations/labels: y k = sign( x k, ω 0 ), k = 1,..., n Forward Model: Explains the random distribution of the data: x k = M m=1 s m,ka m + n k, k = 1,..., n a m : s m,k : n k : Deterministic feature atom, sampled Gaussian peak ( R d ) Random latent factor specifying the peak amplitude ( R) Random baseline noise ( R d ) s ",$ s ",$ % exp (% c ") - β " - Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 12 / 19

Toward a Theoretical Foundation of SPA Linear Separation Model: Explains the observations/labels: y k = sign( x k, ω 0 ), k = 1,..., n Forward Model: Explains the random distribution of the data: x k = M m=1 s m,ka m + n k, k = 1,..., n Supposed that sufficiently many samples are given, can we learn the sparse fingerprint ω 0? Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 12 / 19

Toward a Theoretical Foundation of SPA Linear Separation Model: Explains the observations/labels: y k = sign( x k, ω 0 ), k = 1,..., n Forward Model: Explains the random distribution of the data: x k = M m=1 s m,ka m + n k, k = 1,..., n Supposed that sufficiently many samples are given, can we learn the sparse fingerprint ω 0? Problem: The vector ω 0 is not unique because some features are perfectly correlated No hope for support recovery or approximation Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 12 / 19

Toward a Theoretical Foundation of SPA Linear Separation Model: Explains the observations/labels: y k = sign( x k, ω 0 ), k = 1,..., n Forward Model: Explains the random distribution of the data: x k = M m=1 s m,ka m + n k, k = 1,..., n Supposed that sufficiently many samples are given, can we learn the sparse fingerprint ω 0? Problem: The vector ω 0 is not unique because some features are perfectly correlated No hope for support recovery or approximation Idea: Separate the fingerprint from its data representation! Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 12 / 19

Combining the Models Assumptions: x k = M m=1 s m,ka m + n k, k = 1,..., n s k := (s 1,k,..., s M,k ) N (0, I M ) peak amplitudes n k N (0, σ 2 I d ) noise vector a 1,..., a M R d arbitrary (peak) atoms, D := a 1. R M d a M Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 13 / 19

Combining the Models Assumptions: x k = M m=1 s m,ka m + n k, k = 1,..., n s k := (s 1,k,..., s M,k ) N (0, I M ) peak amplitudes n k N (0, σ 2 I d ) noise vector a 1,..., a M R d arbitrary (peak) atoms, D := a 1. R M d Put this into the classification model: y k = sign( x k, ω 0 ) = sign( M m=1 s m,ka m + n k, ω 0 ) = sign( D s k + n k, ω 0 ) a M Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 13 / 19

Combining the Models Assumptions: x k = M m=1 s m,ka m + n k, k = 1,..., n s k := (s 1,k,..., s M,k ) N (0, I M ) peak amplitudes n k N (0, σ 2 I d ) noise vector a 1,..., a M R d arbitrary (peak) atoms, D := a 1. R M d Put this into the classification model: y k = sign( x k, ω 0 ) = sign( M m=1 s m,ka m + n k, ω 0 ) a M = sign( D s k + n k, ω 0 ) = sign( s k, Dω 0 }{{} =:z 0 + n k, ω 0 ) = sign( s k, z 0 + n k, ω 0 ) Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 13 / 19

Signal Space vs. Coefficient Space x k = M m=1 s m,ka m + n k = D s k + n k Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 14 / 19

Signal Space vs. Coefficient Space x k = M m=1 s m,ka m = D s k Let us first assume that n k = 0 (no baseline noise). Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 14 / 19

Signal Space vs. Coefficient Space x k = M m=1 s m,ka m = D s k Let us first assume that n k = 0 (no baseline noise). Then where z 0 = Dω 0. y k = sign( x k, ω 0 ) = sign( s k, z 0 ), Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 14 / 19

Signal Space vs. Coefficient Space x k = M m=1 s m,ka m = D s k Let us first assume that n k = 0 (no baseline noise). Then where z 0 = Dω 0. y k = sign( x k, ω 0 ) = sign( s k, z 0 ), z 0 has a (non-unique) representation in the dictionary D with sparse coefficients ω 0. z 0 lives in the signal space R M (independent of specific data type). ω 0 lives in the coefficient space R d (data dependent). Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 14 / 19

Signal Space vs. Coefficient Space x k = M m=1 s m,ka m = D s k Let us first assume that n k = 0 (no baseline noise). Then where z 0 = Dω 0. y k = sign( x k, ω 0 ) = sign( s k, z 0 ), z 0 has a (non-unique) representation in the dictionary D with sparse coefficients ω 0. z 0 lives in the signal space R M (independent of specific data type). ω 0 lives in the coefficient space R d (data dependent). Try to show a recovery result for z 0! Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 14 / 19

What Does This Mean for the LASSO? y k = sign( x k, ω 0 ) = sign( s k, z 0 ) with z 0 = Dω 0 SPA via the LASSO n (y k x k, ω ) 2 subject to ω 1 R min ω R d k=1 Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 15 / 19

What Does This Mean for the LASSO? y k = sign( x k, ω 0 ) = sign( s k, z 0 ) with z 0 = Dω 0 SPA via the LASSO n (y k x k, ω ) 2 min ω R B d 1 k=1 Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 15 / 19

What Does This Mean for the LASSO? y k = sign( x k, ω 0 ) = sign( s k, z 0 ) with z 0 = Dω 0 SPA via the LASSO n z:=dω min (y k x k, ω ) 2 = min ω R B1 d }{{} z R DB k=1 1 d = s k,z n (y k s k, z ) 2 k=1 Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 15 / 19

What Does This Mean for the LASSO? y k = sign( x k, ω 0 ) = sign( s k, z 0 ) with z 0 = Dω 0 SPA via the LASSO n (y k x k, ω ) 2 min ω R B d 1 k=1 } {{ } Solvable in practice! z:=dω = min z R DB1 d k=1 n (y k s k, z ) 2 } {{ } Solvable in theory! Warning: The minimizers live in different spaces! Warning: We neither know D nor s k, but just their product. Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 15 / 19

What Does This Mean for the LASSO? y k = sign( x k, ω 0 ) = sign( s k, z 0 ) with z 0 = Dω 0 SPA via the LASSO n (y k x k, ω ) 2 min ω R B d 1 k=1 } {{ } Solvable in practice! z:=dω = min z R DB1 d k=1 n (y k s k, z ) 2 } {{ } Solvable in theory! Warning: The minimizers live in different spaces! Warning: We neither know D nor s k, but just their product. Idea: Apply results for the K-LASSO with K = R DB d 1! Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 15 / 19

A Simplified Version of Roman Vershynin s Result Theorem (Plan, Vershynin 15) Suppose that s k N (0, I M ), z 0 S M 1, and the observations follow y k = sign( s k, z 0 ), k = 1,..., n. 2 Put µ = π and assume that µz 0 K, where K is convex, and n w(k) 2. Then, with high probability, the solution ẑ of the K-LASSO satisfies w(k) ẑ µz 0 2 n. The (global) mean width for bounded K R M is given by w(k) = sup g, u, where g N (0, I M ). u K Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 16 / 19

A Simplified Version of Roman Vershynin s Result Theorem (Plan, Vershynin 15) Suppose that s k N (0, I M ), z 0 S M 1, and the observations follow y k = sign( s k, z 0 ), k = 1,..., n. 2 Put µ = π and assume that µz 0 K, where K is convex, and n w(k) 2. Then, with high probability, the solution ẑ of the K-LASSO satisfies w(k) ẑ µz 0 2 n. Assume that K = µr DB d 1 z 0 = Dω 0 for some ω 0 R B d 1. Assume that the columns of D are normalized. Then w(k) R log(d). Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 16 / 19

A Recovery Guarantee for SPA Theorem (G. 15) Suppose that s k N (0, I M ). Let z 0 S M 1 and assume that there exists R > 0 such that z 0 = Dω 0 for some ω 0 R B1 d. The observations follow y k = sign( s k, z 0 ) = sign( x k, ω 0 ), k = 1,..., n. and the number of samples satisfies n R 2 log(d). Then, with high probability, the solution of the LASSO n ẑ = argmin (y k s k, z ) 2 z R DB1 d k=1 satisfies ( ) 1/4 2 ẑ π z 0 2 R2 log(d) n. Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 17 / 19

A Recovery Guarantee for SPA Theorem (G. 15) Suppose that s k N (0, I M ). Let z 0 S M 1 and assume that there exists R > 0 such that z 0 = Dω 0 for some ω 0 R B1 d. The observations follow y k = sign( s k, z 0 ) = sign( x k, ω 0 ), k = 1,..., n. and the number of samples satisfies n R 2 log(d). Then, with high probability, the solution of the LASSO n ẑ = argmin (y k s k, z ) 2 z R DB1 d k=1 satisfies ( ) 1/4 2 ẑ π z 0 2 R2 log(d) n. Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 17 / 19

A Recovery Guarantee for SPA Theorem (G. 15) Suppose that s k N (0, I M ). Let z 0 S M 1 and assume that there exists R > 0 such that z 0 = Dω 0 for some ω 0 R B1 d. The observations follow y k = sign( s k, z 0 ) = sign( x k, ω 0 ), k = 1,..., n. and the number of samples satisfies n R 2 log(d). Then, with high probability, the solution of the LASSO n ẑ = D ˆω = D argmin (y k x k, ω ) 2 ω R B1 d k=1 satisfies ( ) 1/4 2 D ˆω π Dω 2 0 2 = ẑ π z 0 2 R2 log(d) n. Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 17 / 19

Practical Relevance for MS-Data? Extensions: Baseline noise nk N (0, σ 2 I d ) Non-trivial covariance matrix, i.e., sk N (0, Σ) Adversarial bit-flips in the model y k = sign( x k, ω 0 ) How to achieve normalized columns in D? How to guarantee that R s, i.e., s-sparse vectors are allowed? Standardize the data (centering + normalizing) Given ˆω, how to switch over to the signal space? (D is unknown) Identify supp( ˆω) with peaks (manual approach) Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 18 / 19

Practical Relevance for MS-Data? Extensions: Baseline noise nk N (0, σ 2 I d ) Non-trivial covariance matrix, i.e., sk N (0, Σ) Adversarial bit-flips in the model y k = sign( x k, ω 0 ) How to achieve normalized columns in D? How to guarantee that R s, i.e., s-sparse vectors are allowed? Standardize the data (centering + normalizing) Given ˆω, how to switch over to the signal space? (D is unknown) Identify supp( ˆω) with peaks (manual approach) Message of this talk An s-sparse disease fingerprint can be accurately recovered from only O(s log(d)) samples! Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 18 / 19

THANK YOU FOR YOUR ATTENTION! Further Reading M. Genzel Sparse Proteomics Analysis: Toward a Mathematical Foundation of Feature Selection and Disease Classification. Master s Thesis, 2015. Y. Plan, R. Vershynin The generalized Lasso with non-linear observations. arxiv:1502.04071, 2015.

What to Do Next? Development of an abstract framework What kind of properties should the dictionary D have? Extension/generalization of the results More complicated models and algorithms Numerical verification of the theory Other examples from real-world applications Bio-informatics, neuro-imaging, astronomy, chemistry,... Dictionary learning / Factor analysis What can we learn about D? Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 19 / 19