Overview. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Size: px

Start display at page:

Download "Overview. Optimization-Based Data Analysis. Carlos Fernandez-Granda"

Moris Griffin
5 years ago
Views:

Overview Optimization-Based Data Analysis http://www.cims.nyu.

1 Overview Optimization-Based Data Analysis Carlos Fernandez-Granda 1/25/2016

2 Sparsity Denoising Regression Inverse problems Low-rank models Matrix completion Low rank + sparse model Nonnegative matrix factorization

3 Sparsity Denoising Regression Inverse problems Low-rank models Matrix completion Low rank + sparse model Nonnegative matrix factorization

4 Denoising Additive noise model data = signal + noise

5 Hard thresholding H η (x) i := { x i if x i > η, 0 otherwise

6 Denoising via thresholding Data Signal

7 Denoising via thresholding Estimate Signal

8 Sparsifying transforms x = Dc = n D i c i, i=1

9 Discrete cosine transform Signal DCT coefficients

10 Wavelets

11 Sorted wavelet coefficients

12 Denoising via thresholding in a basis x = Bc y = Bc + z ĉ = H η ( B 1 y ) ŷ = Bĉ

13 Denoising via thresholding in a DCT basis DCT coefficients Data Signal Data

14 Denoising via thresholding in a DCT basis DCT coefficients Estimate Signal Estimate

15 Denoising via thresholding in a wavelet basis Original Noisy Estimate

16 Denoising via thresholding in a wavelet basis Original Noisy Estimate

17 Denoising via thresholding in a wavelet basis Original Noisy Estimate

18 Overcomplete dictionary DCT coefficients

19 Overcomplete dictionary x = Dc = [ A B ] [ ] a = Aa + Bb b

20 Sparsity estimation First idea: min c R m c 0 such that x = Dc Computationally intractable! Tractable alternative: min c R m c 1 such that x = Dc

21 Geometric intuition c 2 Min. l 1 -norm solution Min. l 2 -norm solution c 1 Dc = x

22 Sparsity estimation DCT subdictionary Spike subdictionary

23 Minimum l 2 -norm coefficients DCT subdictionary Spike subdictionary

24 Minimum l 1 -norm coefficients DCT subdictionary Spike subdictionary

25 Denoising via l 1 -norm regularized least squares ĉ = arg min c R m x D c λ c 1 ˆx = Dĉ

26 Denoising via l 1 -norm regularized least squares

27 Denoising via l 1 -norm regularized least squares Signal Estimate

28 Denoising via l 1 -norm regularized least squares Signal Estimate

29 Learning the dictionary X min D C 2 + λ c 1 C R m k F such that 2 D i = 1, 1 i m

30 Dictionary learning

31 Denoising via dictionary learning

32 Denoising via dictionary learning

33 Denoising via dictionary learning

34 Sparsity Denoising Regression Inverse problems Low-rank models Matrix completion Low rank + sparse model Nonnegative matrix factorization

35 Linear regression y i p θ j X ij, j=1 1 i n y X θ

36 Sparse regression Least squares y ˆθ ls := arg min X θ θ R n 2 Ridge regression (aka Tikhonov regularization) y ˆθ ridge := arg min X θ 2 + λ θ θ R n The lasso y ˆθ lasso := arg min X θ 2 + λ θ θ R n 2 1

37 Ridge-regression coefficients Relevant features Coefficients Regularization parameter

38 Lasso coefficients Relevant features Coefficients Regularization parameter

39 Results Relative error (l2 norm) Least squares (training) Least squares (test) Ridge (training) Ridge (test) Lasso (training) Lasso (test) Regularization parameter

40 Sparsity Denoising Regression Inverse problems Low-rank models Matrix completion Low rank + sparse model Nonnegative matrix factorization

$Diffraction imposes a$

41 Limits of resolution in imaging The resolving power of lenses, however perfect, is limited (Lord Rayleigh) Diffraction imposes a fundamental limit on the resolution of optical systems

42 Fluorescence microscopy Data Point sources Low-pass blur (Figures courtesy of V. Morgenshtern)

43 Sensing model for super-resolution Point sources Point-spread function Data = Spectrum =

44 Minimum l 1 -norm estimate minimize estimate 1 subject to estimate psf = data Point sources Estimate

45 Magnetic resonance imaging

46 Magnetic resonance imaging Data: Samples from spectrum Problem: Sampling is time consuming (patients get annoyed, kids move during data acquisition) Images are compressible ( sparse) Can we recover compressible signals from less data?

47 Compressed sensing 1. Undersample the spectrum randomly Signal Spectrum Data

48 Compressed sensing 2. Solve the optimization problem minimize estimate 1 subject to frequency samples of estimate = data Signal Estimate

49 Compressed sensing in MRI Original Min. l 2 -norm estimate Min. l 1 -norm estimate

50 Sparsity Denoising Regression Inverse problems Low-rank models Matrix completion Low rank + sparse model Nonnegative matrix factorization

51 Sparsity Denoising Regression Inverse problems Low-rank models Matrix completion Low rank + sparse model Nonnegative matrix factorization

52 Netflix Prize???????????????

53 Collaborative filtering A := Bob Molly Mary Larry The Dark Knight Spiderman Love Actually Bridget Jones s Diary Pretty Woman Superman 2

54 Centering µ := 1 n Ā := m i=1 j=1 n A ij, µ µ µ µ µ µ µ µ µ

55 SVD A Ā = USV T = U V T

56 First left singular vector U 1 = D. Knight Sp. 3 Love Act. B.J. s Diary P. Woman Sup. 2 ( )

57 First right singular vector V 1 = Bob Molly Mary Larry ( )

58 Rank 1 model Bob Molly Mary Larry 1.34 (1) 1.19 (1) 4.66 (5) 4.81 (4) The Dark Knight 1.55 (2) 1.42 (1) 4.45 (4) 4.58 (5) Spiderman 3 Ā + σ 1 U 1 V1 T = 4.45 (4) 4.58 (5) 1.55 (2) 1.42 (1) Love Actually 4.43 (5) 4.56 (4) 1.57 (2) 1.44 (1) B.J. s Diary 4.43 (4) 4.56 (5) 1.57 (1) 1.44 (2) Pretty Woman 1.34 (1) 1.19 (2) 4.66 (5) 4.81 (5) Superman 2

59 Matrix completion Bob Molly Mary Larry 1? 5 4 The Dark Knight? Spiderman ? Love Actually Bridget Jones s Diary Pretty Woman 1 2? 5 Superman 2

60 Low-rank matrix estimation First idea: ) min rank ( X X R m n such that X Ω y Computationally intractable because of missing entries Tractable alternative: min XΩ y 2 + λ X X R m n 2

61 Matrix completion via nuclear-norm minimization Bob Molly Mary Larry 1 2 (1) 5 4 The Dark Knight 2 (2) Spiderman (1) Love Actually Bridget Jones s Diary Pretty Woman (5) 5 Superman 2

62 Sparsity Denoising Regression Inverse problems Low-rank models Matrix completion Low rank + sparse model Nonnegative matrix factorization

63 Background subtraction

64 Low rank + sparse model min L, S R m n L + λ S 1 such that L + S = Y

65 Frame

66 Low-rank component

67 Sparse component

68 Frame

69 Low-rank component

70 Sparse component

71 Frame

72 Low-rank component

73 Sparse component

74 Sparsity Denoising Regression Inverse problems Low-rank models Matrix completion Low rank + sparse model Nonnegative matrix factorization

75 Topic modeling A := singer GDP senate election vote stock bass market band Articles a b c d e f

76 SVD A Ā = USV T = U V T

77 Left singular vectors a b c d e f U 1 = ( ) U 2 = ( ) U 3 = ( )

78 Right singular vectors singer GDP senate election vote stock bass market band V 1 = ( ) V 2 = ( ) V 3 = ( )

79 Nonnegative matrix factorization M WH, W i,j 0, 1 i m, 1 j r, H i,j 0, 1 i r, 1 i n,

80 Right nonnegative factors singer GDP senate election vote stock bass market band H 1 = ( ) H 2 = ( ) H 3 = ( )

81 Left nonnegative factors a b c d e f W 1 = ( ) W 2 = ( ) W 3 = ( )

Learning representations

Learning representations Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 4/11/2016 General problem For a dataset of n signals X := [ x 1 x