Overview. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Similar documents
Learning representations

Lecture Notes 10: Matrix Factorization

Sparse linear models

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Optimization-based sparse recovery: Compressed sensing vs. super-resolution

Lecture Notes 2: Matrices

Structured matrix factorizations. Example: Eigenfaces

Constrained optimization

Towards a Mathematical Theory of Super-resolution

Compressed Sensing and Neural Networks

Using SVD to Recommend Movies

Sparse Approximation and Variable Selection

Sparse Recovery Beyond Compressed Sensing

2.3. Clustering or vector quantization 57

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsifying Transform Learning for Compressed Sensing MRI

Lecture Notes 9: Constrained Optimization

Introduction to Compressed Sensing

1-Bit Matrix Completion

Sparse linear models and denoising

Oslo Class 6 Sparsity based regularization

Machine Learning for Signal Processing Sparse and Overcomplete Representations

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013

Is the test error unbiased for these programs? 2017 Kevin Jamieson

Combining Sparsity with Physically-Meaningful Constraints in Sparse Parameter Estimation

LEARNING OVERCOMPLETE SPARSIFYING TRANSFORMS FOR SIGNAL PROCESSING. Saiprasad Ravishankar and Yoram Bresler

Super-resolution via Convex Programming

Optimization methods

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco

Multiresolution Analysis

Linear regression. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Efficient Data-Driven Learning of Sparse Signal Models and Its Applications

Machine Learning CSE546 Carlos Guestrin University of Washington. October 7, Efficiency: If size(w) = 100B, each prediction is expensive:

CSC 576: Variants of Sparse Learning

Application to Hyperspectral Imaging


1-Bit Matrix Completion

Collaborative Filtering Matrix Completion Alternating Least Squares

Gauge optimization and duality

EUSIPCO

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Part IV Compressed Sensing

Lecture Notes 5: Multiresolution Analysis

EE 381V: Large Scale Optimization Fall Lecture 24 April 11

1-Bit Matrix Completion

Sparsity in Underdetermined Systems

Minimizing the Difference of L 1 and L 2 Norms with Applications

Learning Sparsifying Transforms

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017

Signal Recovery, Uncertainty Relations, and Minkowski Dimension

1 Sparsity and l 1 relaxation

Collaborative Filtering

Recovering overcomplete sparse representations from structured sensing

Decision-making, inference, and learning theory. ECE 830 & CS 761, Spring 2016

Recovery of Sparse Signals from Noisy Measurements Using an l p -Regularized Least-Squares Algorithm

Sparse Optimization Lecture: Basic Sparse Optimization Models

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

SPARSE signal representations have gained popularity in recent

Linear Inverse Problems

Regularizing inverse problems using sparsity-based signal models

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles

Linear Models. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

2 Regularized Image Reconstruction for Compressive Imaging and Beyond

Lecture 9: September 28

High-dimensional Statistics

Generalized Conditional Gradient and Its Applications

Super-Resolution of Point Sources via Convex Programming

Part III Super-Resolution with Sparsity

Compressive sensing of low-complexity signals: theory, algorithms and extensions

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Structured signal recovery from non-linear and heavy-tailed measurements

STA141C: Big Data & High Performance Statistical Computing

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

2D X-Ray Tomographic Reconstruction From Few Projections

Image Processing in Astrophysics

Support Detection in Super-resolution

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.

ORIE 4741: Learning with Big Messy Data. Regularization

Machine Learning & Data Mining CS/CNS/EE 155. Lecture 4: Regularization, Sparsity & Lasso

Adaptive Compressive Imaging Using Sparse Hierarchical Learned Dictionaries

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Learning Parametric Dictionaries for. Signals on Graphs

High-dimensional Statistical Models

Linear Regression. Volker Tresp 2018

Collaborative Filtering. Radek Pelánek

sparse and low-rank tensor recovery Cubic-Sketching

Mathematical introduction to Compressed Sensing

Leveraging Machine Learning for High-Resolution Restoration of Satellite Imagery

Low Rank Matrix Completion Formulation and Algorithm

Compressed sensing and imaging

Sparse Parameter Estimation: Compressed Sensing meets Matrix Pencil

Recent developments on sparse representation

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

Dictionary Learning for photo-z estimation

MLISP: Machine Learning in Signal Processing Spring Lecture 10 May 11

Introduction How it works Theory behind Compressed Sensing. Compressed Sensing. Huichao Xue. CS3750 Fall 2011

An Overview of Sparsity with Applications to Compression, Restoration, and Inverse Problems

Designing Information Devices and Systems I Discussion 13B

Matrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang

Robust Principal Component Analysis

Transcription:

Overview Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 1/25/2016

Sparsity Denoising Regression Inverse problems Low-rank models Matrix completion Low rank + sparse model Nonnegative matrix factorization

Sparsity Denoising Regression Inverse problems Low-rank models Matrix completion Low rank + sparse model Nonnegative matrix factorization

Denoising Additive noise model data = signal + noise

Hard thresholding H η (x) i := { x i if x i > η, 0 otherwise

Denoising via thresholding Data Signal

Denoising via thresholding Estimate Signal

Sparsifying transforms x = Dc = n D i c i, i=1

Discrete cosine transform Signal DCT coefficients

Wavelets

Sorted wavelet coefficients 10 3 10 1 10 1 10 3

Denoising via thresholding in a basis x = Bc y = Bc + z ĉ = H η ( B 1 y ) ŷ = Bĉ

Denoising via thresholding in a DCT basis DCT coefficients Data Signal Data

Denoising via thresholding in a DCT basis DCT coefficients Estimate Signal Estimate

Denoising via thresholding in a wavelet basis Original Noisy Estimate

Denoising via thresholding in a wavelet basis Original Noisy Estimate

Denoising via thresholding in a wavelet basis Original Noisy Estimate

Overcomplete dictionary DCT coefficients

Overcomplete dictionary x = Dc = [ A B ] [ ] a = Aa + Bb b

Sparsity estimation First idea: min c R m c 0 such that x = Dc Computationally intractable! Tractable alternative: min c R m c 1 such that x = Dc

Geometric intuition c 2 Min. l 1 -norm solution Min. l 2 -norm solution c 1 Dc = x

Sparsity estimation DCT subdictionary Spike subdictionary

Minimum l 2 -norm coefficients DCT subdictionary Spike subdictionary

Minimum l 1 -norm coefficients DCT subdictionary Spike subdictionary

Denoising via l 1 -norm regularized least squares ĉ = arg min c R m x D c 2 2 + λ c 1 ˆx = Dĉ

Denoising via l 1 -norm regularized least squares

Denoising via l 1 -norm regularized least squares Signal Estimate

Denoising via l 1 -norm regularized least squares Signal Estimate

Learning the dictionary X min D C 2 + λ c 1 C R m k F such that 2 D i = 1, 1 i m

Dictionary learning

Denoising via dictionary learning

Denoising via dictionary learning

Denoising via dictionary learning

Sparsity Denoising Regression Inverse problems Low-rank models Matrix completion Low rank + sparse model Nonnegative matrix factorization

Linear regression y i p θ j X ij, j=1 1 i n y X θ

Sparse regression Least squares y ˆθ ls := arg min X θ θ R n 2 Ridge regression (aka Tikhonov regularization) y ˆθ ridge := arg min X θ 2 + λ θ θ R n 2 2 2 The lasso y ˆθ lasso := arg min X θ 2 + λ θ θ R n 2 1

Ridge-regression coefficients 2.0 1.5 Relevant features Coefficients 1.0 0.5 0.0 0.5 10-4 10-3 10-2 10-1 10 0 10 1 10 2 10 3 10 4 10 5 Regularization parameter

Lasso coefficients 2.0 1.5 Relevant features Coefficients 1.0 0.5 0.0 0.5 10-4 10-3 10-2 10-1 10 0 10 1 10 2 10 3 10 4 10 5 Regularization parameter

Results Relative error (l2 norm) 1.0 0.8 0.6 0.4 Least squares (training) Least squares (test) Ridge (training) Ridge (test) Lasso (training) Lasso (test) 0.2 10-4 10-3 10-2 10-1 10 0 10 1 10 2 10 3 10 4 10 5 Regularization parameter

Sparsity Denoising Regression Inverse problems Low-rank models Matrix completion Low rank + sparse model Nonnegative matrix factorization

Limits of resolution in imaging The resolving power of lenses, however perfect, is limited (Lord Rayleigh) Diffraction imposes a fundamental limit on the resolution of optical systems

Fluorescence microscopy Data Point sources Low-pass blur (Figures courtesy of V. Morgenshtern)

Sensing model for super-resolution Point sources Point-spread function Data = Spectrum =

Minimum l 1 -norm estimate minimize estimate 1 subject to estimate psf = data Point sources Estimate

Magnetic resonance imaging

Magnetic resonance imaging Data: Samples from spectrum Problem: Sampling is time consuming (patients get annoyed, kids move during data acquisition) Images are compressible ( sparse) Can we recover compressible signals from less data?

Compressed sensing 1. Undersample the spectrum randomly Signal Spectrum Data

Compressed sensing 2. Solve the optimization problem minimize estimate 1 subject to frequency samples of estimate = data Signal Estimate

Compressed sensing in MRI Original Min. l 2 -norm estimate Min. l 1 -norm estimate

Sparsity Denoising Regression Inverse problems Low-rank models Matrix completion Low rank + sparse model Nonnegative matrix factorization

Sparsity Denoising Regression Inverse problems Low-rank models Matrix completion Low rank + sparse model Nonnegative matrix factorization

Netflix Prize???????????????

Collaborative filtering A := Bob Molly Mary Larry 1 1 5 4 The Dark Knight 2 1 4 5 Spiderman 3 4 5 2 1 Love Actually 5 4 2 1 Bridget Jones s Diary 4 5 1 2 Pretty Woman 1 2 5 5 Superman 2

Centering µ := 1 n Ā := m i=1 j=1 n A ij, µ µ µ µ µ µ µ µ µ

SVD A Ā = USV T = U 7.79 0 0 0 0 1.62 0 0 0 0 1.55 0 0 0 0 0.62 V T

First left singular vector U 1 = D. Knight Sp. 3 Love Act. B.J. s Diary P. Woman Sup. 2 ( 0.45 0.39 0.39 0.39 0.39 0.45)

First right singular vector V 1 = Bob Molly Mary Larry ( 0.48 0.52 0.48 0.52)

Rank 1 model Bob Molly Mary Larry 1.34 (1) 1.19 (1) 4.66 (5) 4.81 (4) The Dark Knight 1.55 (2) 1.42 (1) 4.45 (4) 4.58 (5) Spiderman 3 Ā + σ 1 U 1 V1 T = 4.45 (4) 4.58 (5) 1.55 (2) 1.42 (1) Love Actually 4.43 (5) 4.56 (4) 1.57 (2) 1.44 (1) B.J. s Diary 4.43 (4) 4.56 (5) 1.57 (1) 1.44 (2) Pretty Woman 1.34 (1) 1.19 (2) 4.66 (5) 4.81 (5) Superman 2

Matrix completion Bob Molly Mary Larry 1? 5 4 The Dark Knight? 1 4 5 Spiderman 3 4 5 2? Love Actually 5 4 2 1 Bridget Jones s Diary 4 5 1 2 Pretty Woman 1 2? 5 Superman 2

Low-rank matrix estimation First idea: ) min rank ( X X R m n such that X Ω y Computationally intractable because of missing entries Tractable alternative: min XΩ y 2 + λ X X R m n 2

Matrix completion via nuclear-norm minimization Bob Molly Mary Larry 1 2 (1) 5 4 The Dark Knight 2 (2) 1 4 5 Spiderman 3 4 5 2 2 (1) Love Actually 5 4 2 1 Bridget Jones s Diary 4 5 1 2 Pretty Woman 1 2 5 (5) 5 Superman 2

Sparsity Denoising Regression Inverse problems Low-rank models Matrix completion Low rank + sparse model Nonnegative matrix factorization

Background subtraction 20 40 60 80 100 120 20 40 60 80 100 120 140 160

Low rank + sparse model min L, S R m n L + λ S 1 such that L + S = Y

Frame 17 20 40 60 80 100 120 20 40 60 80 100 120 140 160

Low-rank component 20 40 60 80 100 120 20 40 60 80 100 120 140 160

Sparse component 20 40 60 80 100 120 20 40 60 80 100 120 140 160

Frame 42 20 40 60 80 100 120 20 40 60 80 100 120 140 160

Low-rank component 20 40 60 80 100 120 20 40 60 80 100 120 140 160

Sparse component 20 40 60 80 100 120 20 40 60 80 100 120 140 160

Frame 75 20 40 60 80 100 120 20 40 60 80 100 120 140 160

Low-rank component 20 40 60 80 100 120 20 40 60 80 100 120 140 160

Sparse component 20 40 60 80 100 120 20 40 60 80 100 120 140 160

Sparsity Denoising Regression Inverse problems Low-rank models Matrix completion Low rank + sparse model Nonnegative matrix factorization

Topic modeling A := singer GDP senate election vote stock bass market band Articles 6 1 1 0 0 1 9 0 8 a 1 0 9 5 8 1 0 1 0 b 8 1 0 1 0 0 9 1 7 c 0 7 1 0 0 9 1 7 0 d 0 5 6 7 5 6 0 7 2 e 1 0 8 5 9 2 0 0 1 f

SVD 19.32 0 0 0 0 14.46 0 0 0 0 A Ā = USV T = U 0 0 4.99 0 0 0 0 0 0 2.77 0 0 0 0 0 0 1.67 0 0 0 0 0 0 0.93 V T

Left singular vectors a b c d e f U 1 = ( 0.51 0.40 0.54 0.11 0.38 0.38) U 2 = ( 0.19 0.45 0.19 0.69 0.2 0.46) U 3 = ( 0.14 0.27 0.09 0.58 0.69 0.29)

Right singular vectors singer GDP senate election vote stock bass market band V 1 = ( 0.38 0.05 0.40 0.27 0.40 0.17 0.52 0.14 0.38 ) V 2 = ( 0.16 0.46 0.33 0.15 0.38 0.49 0.10 0.47 0.12 ) V 3 = ( 0.18 0.18 0.04 0.74 0.05 0.11 0.10 0.43 0.43)

Nonnegative matrix factorization M WH, W i,j 0, 1 i m, 1 j r, H i,j 0, 1 i r, 1 i n,

Right nonnegative factors singer GDP senate election vote stock bass market band H 1 = ( 0.34 0 3.73 2.54 3.67 0.52 0 0.35 0.35) H 2 = ( 0 2.21 0.21 0.45 0 2.64 0.21 2.43 0.22) H 3 = ( 3.22 0.37 0.19 0.2 0 0.12 4.13 0.13 3.43)

Left nonnegative factors a b c d e f W 1 = ( 0.03 2.23 0 0 1.59 2.24) W 2 = ( 0.1 0 0.08 3.13 2.32 0 ) W 3 = ( 2.13 0 2.22 0 0 0.03)