Learning with matrix and tensor based models using low-rank penalties

Size: px

Start display at page:

Download "Learning with matrix and tensor based models using low-rank penalties"

Leonard Osborne
5 years ago
Views:

1 Learning with matrix and tensor based models using low-rank penalties Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark Arenberg 10 B-3001 Leuven (Heverlee), Belgium Nonsmooth optimization in machine learning, Liege, March (joint work with Marco Signoretto, Quoc Tran Dinh, Lieven De Lathauwer) Learning with matrix and tensor based models using low-rank penalties - Johan Suykens

2 neuroscience: Learning with matrices and tensors EEG data (time samples frequency electrodes) computer vision: image (/video) compression/completion/ (pixel illumination expression ) web mining: analyze users behaviors (users queries webpages) vector x matrix X tensor X data vector x data matrix X data tensor X vector model: matrix model: tensor model: ŷ = w T x ŷ = W,X ŷ = W, X [Signoretto M., Tran Dinh Q., De Lathauwer L., Suykens J.A.K., Learning with Tensors: a Framework Based on Convex Optimization and Spectral Regularization, 2011] Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 1

3 Overview Sparsity Matrix completion and tensor completion Learning with matrices and low rank penalty Learning with tensors Optimization algorithms Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 1

4 Learning with matrices and tensors vector x matrix X tensor X data vector x data matrix X data tensor X vector model: matrix model: tensor model: ŷ = w T x ŷ = W,X ŷ = W, X Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 2

5 Sparsity in machine learning through loss function: model ŷ = i α ik(x, x i ) + b min w T w + γ i L(e i ) ε 0 +ε sparse α through regularization: model ŷ = w T x + b min j w j + γ i e 2 i sparse w Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 3

6 Underdetermined linear system: Sparsity (1) Ax = b A R n m, n < m Minimum norm solution: Sparsest solution: min x x 2 2 s.t. Ax = b ˆx = A T (AA T ) 1 b (P 0 ) min x x 0 s.t. Ax = b (with x 0 = #{i : x i 0}) Alternatives: l p -norms x p = ( i x i p ) 1/p (P p ) min x x p s.t. Ax = b Nonconvex for 0 < p < 1, convex for p = 1. Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 4

7 Sparsity (2) Mutual coherence: µ(a) = max 1 k,j m, k j a T k a j a k 2 a j 2 For a full rank A R n m,n < m, if a solution x exists satisfying x 0 < 1 2 (1 + 1 µ(a) ) it is both the unique solution of (P 1 ) and (P 0 ). Restricted Isometry Property (RIP): Matrix A R n m has by definition RIP(δ,k) if each submatrix A I (by combining at most k columns of A) has its nonzero singular values bounded between 1 δ and 1 + δ. Matrix A with RIP(0.41; 2k) implies that (P 1 ) and (P 0 ) have identical solutions on all k-sparse vectors. [Bruckstein et al., SIAM Review, 2009; Candes & Tao, 2005; Donoho & Elad, 2003;...] Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 5

8 Learning with matrices and tensors vector x matrix X tensor X data vector x data matrix X data tensor X vector model: matrix model: tensor model: ŷ = w T x ŷ = W,X ŷ = W, X Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 6

9 Matrix completion: example Given image (80 % missing entries) [experiments by M. Signoretto] Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 7

10 Matrix completion: example Given image (80 % missing entries) and completed image [experiments by M. Signoretto] Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 7

11 Matrix completion: example Given image (40 % missing entries) [experiments by M. Signoretto] Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 7

12 Matrix completion: example Given image (40 % missing entries) and completed image [experiments by M. Signoretto] Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 7

13 Matrix completion: example Original image Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 7

14 Matrix completion (1) Given: matrix X with missing entries Goal: complete the missing entries Assumption: assume that X has low rank min X X subject to X ij = Y ij, i, j S given values Y ij with i, j S a subset of all entries of the matrix Nuclear norm X = i σ i with σ i the singular values of X (singular value decomposition: X = i σ iu i v T i ) X is convex envelope of rankx on {X : X 1} [Fazel, 2002] X X F X r X F r X [Recht et al., 2010] Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 8

15 Matrix completion (2) This can be written as an SDP problem (semidefinite program): min tr(w 1 ) + tr(w 2 ) X,W 1,W 2 subject to X ij = Y ij i, j S [ ] W1 X X 0 W 2 The nuclear norm plays a similar role as the l 1 norm, at the matrix level. [Fazel et al., 2001; Candes & Recht, 2009] Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 9

16 Matrix completion: RIP property Consider: (P 0 ) : min rank X s.t. A(X) = b (P 1 ) : min X s.t. A(X) = b r-restricted isometry constant: smallest number δ r (A) such that 1 δ r (A) A(X) X F 1 + δ r (A) holds for all X of rank at most r, with A : R m n R p a linear map. Suppose that δ 2r < 1 for integer r 1. Then solution to (P 0 ) is the only matrix of rank at most r satisfying A(X) = b. Suppose that r 1, is such that δ 5r < 1 10, then solution to (P 1) equals solution to (P 0 ). [Recht, Fazel, Parrilo, Siam Rev, 2010] Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 10

17 Tensor completion Given: N-th order tensor X R I 1.. I N with missing entries Goal: complete the missing entries Assumption: assume that X has low rank with min X X subject to X i1 i 2...i N = Y i1 i 2...i N, i 1,i 2,...,i N S given entries Y i1 i 2...i N with i 1,i 2,...,i N S a subset of the tensor Nuclear norm X = 1 N matrix unfolding n N N X n with X n the n-th mode [Signoretto M., Van De Plas R., De Moor B., Suykens J.A.K., IEEE-SPL, 2011; Gandy et al., 2011, Tomioka et al., 2011] Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 11

Mass spectral imaging - digital staining Data tensor: 51 34 pixels 6490 variables/spectrum Given partial labelling (4 classes), SVM prediction on all pixels cerebellar cortex - Ammon s horn section

18 Mass spectral imaging - digital staining Data tensor: pixels 6490 variables/spectrum Given partial labelling (4 classes), SVM prediction on all pixels cerebellar cortex - Ammon s horn section of hippocampus - cauda-putamen - lateral ventricle area [Luts J., Ojeda F., Van de Plas R., De Moor B., Van Huffel S., Suykens J.A.K., ACA 2010] Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 12

19 Tensor completion on mass spectral imaging Mass spectral imaging: sagittal section mouse brain [data: E. Waelkens, R. Van de Plas] Tensor completion using nuclear norm regularization [Signoretto et al., IEEE-SPL, 2011] Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 13

20 Multichannel EEG for patient-specific seizure detection The electroencephalogram (EEG) measures the electrical activity of the brain and is a well-established technique in epilepsy diagnosis and monitoring. Automatic seizure detection would drastically decrease the workload of clinicians; EEG can provide accurate information about the onset of the seizure. As the seizure spreads quickly through the brain, the early detection of the seizure is essential. [Hunyadi B., Signoretto M., Van Paesschen W., Suykens J., Van Huffel S., De Vos A., Clinical Neurophysiology, 2012] Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 14

21 Extracted features: Time domain features: Number zero crossings, max & min 4. Skewness (skew) 5. Kurtosis (kurt) 6. Root mean square amplitude (rmsa) Frequency domain features: 7. Total power (TP) 8. Peak frequency (PF) Feature-channel matrix Mean and normalized power in frequency bands: delta: 13 Hz (D, nd), theta: 48 Hz (T, nt), FP1 F7 F7 T7 T7 P7 P7 O1 FP1 F3 F3 C3 C3 P3 P3 O1 FP2 F4 F4 C4 C4 P4 P4 O2 FP2 F8 F8 T8 T8 P8 P8 O2 FZ CZ CZ PZ P7 T7 T7 FT9 FT9 FT10 FT10 T8 T8 P8 alpha: 913 Hz (A, na), beta: 1420 Hz (B, nb) 365 uv Time (sec) EEG data: CHB-MIT database - scalp EEG recordings, 23 pediatric patients, 18 channels Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 15

22 Model with nuclear norm regularization Synchronization between EEG channels is a generally occurring characteristic. Representing the data in matrix form allows to exploit the common information among the channels. Model: (per patient) ŷ = W,X + b where W,X = ij W ijx ij with X,W R d p, d the number of features, p number of channels. Classifier with decision rule sign[ŷ] Training from given data {(X k, y k )} N k=1 : min W,b N (y k ŷ k ) 2 + µ W k=1 with nuclear norm W = i σ i with singular values σ i ; the labels ±1 correspond to seizure and non-seizure epoch. Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 16

23 Multichannel EEG for patient-specific seizure detection [Hunyadi B., Signoretto M., Van Paesschen W., Suykens J., Van Huffel S., De Vos A., Clinical Neurophysiology, 2012] Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 17

24 Learning with matrices and tensors vector x matrix X tensor X data vector x data matrix X data tensor X vector model: matrix model: tensor model: ŷ = w T x ŷ = W,X ŷ = W, X Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 18

25 Tensors N-th order tensor A R I 1 I 2 I N inner product: A, B := i 1 i 2 i N A i1 i 2 i N B i1 i 2 i N norm: A := A, A n mode vector: obtained by varying i n and keeping other indices fixed n rank rank n (A): dimension of space spanned by n mode vectors rank-(r 1, r 2,...,r N ) tensor: tensor for which r n = rank n (A) for n N N multilinear rank: N tuple (r 1,r 2,...,r N ) { R N : A = r N R u (1) r u (2) r } : u r (n) R I n r N R, n N N rank: rank(a) := arg min u (N) r property: rank n (A) rank(a) n special case of matrix: rank 1 (A) = rank 2 (A) = rank(a) Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 19

26 Tensors N-th order tensor A R I 1 I 2 I N inner product: A, B := i 1 i 2 i N A i1 i 2 i N B i1 i 2 i N norm: A := A, A n mode vector: obtained by varying i n and keeping other indices fixed n rank rank n (A): dimension of space spanned by n mode vectors rank-(r 1, r 2,...,r N ) tensor: tensor for which r n = rank n (A) for n N N multilinear rank: N tuple (r 1,r 2,...,r N ) { R N : A = r N R u (1) r u (2) r } : u r (n) R I n r N R, n N N rank: rank(a) := arg min u (N) r property: rank n (A) rank(a) n special case of matrix: rank 1 (A) = rank 2 (A) = rank(a) Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 20

27 Tensors N-th order tensor A R I 1 I 2 I N inner product: A, B := i 1 i 2 i N A i1 i 2 i N B i1 i 2 i N norm: A := A, A n mode vector: obtained by varying i n and keeping other indices fixed n rank rank n (A): dimension of space spanned by n mode vectors rank-(r 1, r 2,...,r N ) tensor: tensor for which r n = rank n (A) for n N N multilinear rank: N tuple (r 1,r 2,...,r N ) { R N : A = r N R u (1) r u (2) r } : u r (n) R I n r N R, n N N rank: rank(a) := arg min u (N) r property: rank n (A) rank(a) n special case of matrix: rank 1 (A) = rank 2 (A) = rank(a) Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 21

$Mode unfoldings of a tensor n mode unfolding A n R I n J (matricization): matrix whose columns are the n mode vectors with J := j N N \{n} I j n mode unfolding n : R I 1 I$

28 Mode unfoldings of a tensor n mode unfolding A n R I n J (matricization): matrix whose columns are the n mode vectors with J := j N N \{n} I j n mode unfolding n : R I 1 I 2 I N R I n J. refolding: n : R I n J R I 1 I 2 I N property: rank n (A) = rank(a n ) Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 22

29 Multilinear SVD (1) [De Lathauwer L., De Moor B., Vandewalle J., 2000] Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 23

30 Multilinear SVD (2) n mode product A n U R I 1 I 2 I n 1 J n I n+1 I N : product of tensor A R I 1 I 2 I N by matrix U R J n I n multilinear SVD: with A = S 1 U (1) 2 U (2) 3 N U (N) core tensor S R I 1 I 2 I N U (n) R I n I n a matrix of n mode singular vectors, i.e., left singular vectors of the n mode unfolding W n with SVD A n = U (n) diag(σ(a n ))V (n) Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 24

31 Inductive and transductive learning transductive learning with tensors inductive learning with tensors soft-completion data: partially specified input data tensor and matrix of target labels data: pairs of fully specified input features and vectors of target labels output: latent features and missing labels output: models for out-of-sample evaluations of multiple tasks hard-completion data: output: pairs of fully specified input features and vectors of target labels missing input data [Signoretto M., Tran Dinh Q., De Lathauwer L., Suykens J.A.K., 2011] Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 25

32 Inductive learning with tensors: setting Training data D N = {( X (n),y (n)) R D 1 D 2 D M R T : n N N } n = 1,...,N training data t = 1,...,T outputs (tasks) M th order input data tensor Model Assumptions: ỹ t = W (t), X + b t, t = 1,...,T X = X + E with X a rank-(r 1,r 2,...,r M ) tensor for core tensors: W (t), X = S W (t), S X low multilinear rank in W (t) = S W (t) 1 U 1 2 U 2 M U M target lables y t generated according to p(y t ỹ t ) = 1/(1 + exp( y t ỹ t )) Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 26

33 Inductive learning with tensors: training Penalized empirical risk minimization: min W,b f DN (W, b) + m N M+1 λ m W m with misclassification error e.g. based on logistic loss: f DN : (W,b) n N N t N T log ( ( ))) 1 + exp y (n) t ( X (n), W (t) + b t gives a predictive model, applicable to input data X beyond the training data Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 27

34 Transductive learning with X and Y completion Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 28

35 Transductive learning with tensors: setting Tensors X R D 1 D 2 D M N and Y = [y (1) y (2) y (N) ] R T N Missing entries both in X and Y: S X, S Y : index sets of observed entries of X and Y S SX, S SY : sampling operators related to the index sets Implicit model: ỹ (n) t = W (t), X (n) + b t, t = 1,...,T Assumptions: X = X + E with X a rank-(r 1,r 2,...,r M, r M+1 ) tensor targets y t generated according to p(y tn ỹ tn ) = 1/(1 + exp( y tn ỹ tn )) rank ([ X M+1,Ỹ ]) r M+1 min(n, J + T) with J = j N M D j Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 29

36 Transductive learning with tensors: estimation Estimation of X, Ỹ,b: min ( X,Ỹ,b) V f λ0 ( X,Ỹ, b) + m N M λ m X m + λ M+1 [ X M+1, Ỹ ] objective function: V has module spaces ( R ) D 1 D 2 D M N ( R T N) R T and inner product ( X 1,Ỹ1, b 1 ), ( X 2,Ỹ2, b 2 ) V = X 1, X 2 + Ỹ1,Ỹ2 + b 1, b 2 objective f λ0 ( X,Ỹ, b) = fx y ( X) + λ 0 f (Ỹ, b) with f x : X p N P l x ((Ω S X X) p,zp) x f y : (Ỹ,b) q N Q l y ((Ω SỸ (Ỹ + b 1 J)) q,zq) y losses e.g. l x : (u,v) 1 2 (u v)2, l y : (u,v) log(1 + exp( uv)) z x, z y are vectors of the observed entries Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 30

37 Transductive soft completion: Olivetti faces original input data matrix-sc tensor-sc true label: 5 predicted: 3 predicted: 5 original input data matrix-sc tensor-sc true label: 5 predicted: 3 predicted: 5 original input data matrix-sc tensor-sc true label: 3 predicted: 3 predicted: 3 Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 31

8-bit RGB color information Learning with matrix and

38 Impainting color images by hard completion original given image completed Tensor: mode 1 and 2: pixel space, mode 3: 8-bit RGB color information Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 32

39 Impainting color images by hard completion original given image completed Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 33

40 Impainting color images by hard completion original given image completed Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 33

41 Impainting color images by hard completion original given image completed Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 33

42 Impainting color images by hard completion original given image completed Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 33

43 Optimization algorithm (1) The learning problems are instances of the following convex optimization problem on an abstract vector space: min w W f(w) + ḡ(w) subject to w C with - f: convex and differentiable functional - f is L f-lipschitz: f(w) f(v) W L f w v W w, v W ; - ḡ: convex but possibly non-differentiable functional - C W is a is non-empty, closed and convex set Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 34

44 Optimization algorithm (2) Problem restatement min w W h(w) = f(w) + ḡ(w) + δ C(w), δ C : w { 0, if w C, otherwise Proximity operator x (t+1) = prox τ h (x (t)) with prox τ h : x arg min w W h(w) + 1 2τ w x 2. Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 35

45 Optimization algorithm (3) Operator splitting approach: split h(w) = f(w) + ḡ(w) + δ C(w) into f(w) + δ C(w) and non-smooth term ḡ(w) Douglas-Rachford splitting: y (k) = arg min x C f (x) + 1 2τ r (k) = prox τḡ (2y (k) w (k) ) x w (k) 2 W (solved inexactly) w (k+1) = w (k) + γ (k) ( r (k) y (k)) Projection onto C Proof of convergence for sequence {y (k) } k Stopping criterion based on h Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 36

46 Note: matrix case - Singular Value Thresholding Given matrix Y, finding the solution to min X 1 2 X Y 2 F + λ X with λ > 0, is given by a shrinkage operation on singular values of Y : prox tr λ (Y ) = U max(s λi,0)v T [Cai et al., 2008; Tomioka et al., 2011] Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 37

47 Tensor case Learning problems involve the tensor modes: λ m W m m N M+1 Consider space W with cartesian product W 1 W 2 W I and inner product x,y = i N I x i, y i i. Assume function ḡ : W R defined by ḡ : (x 1, x 2,...,x I ) i N I g i (x i ) where for any i N I, g i : W i R is convex. Then we have: proxḡ(x) = ( prox g1 (x 1 ),prox g2 (x 2 ),,prox gi (x I ) ) Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 38

48 Duplication - transductive learning case C := Duplication of the tensors leads to considering the set: {( X [1], X [2],..., X [M], X [M+1],Ỹ, b) W : X[1] = X [2] =... = X [M+1] } This gives the problem statement: min ( X [1], X [2],..., X [M], X [M+1],Ỹ,b) W subject to f( X [1],..., X [M+1], Ỹ,b) + ḡ( X [1],..., X [M+1],Ỹ ) ( X [1],..., X [M+1], Ỹ,b) C Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 39

49 Prox and tensor modes We apply prox τḡ ( X [1],..., X [M+1], Ỹ ) = (prox τλ1 σ( 1 ) 1 ( X [1] ),, prox τλm σ( M ) 1 ( X [M] ),Z 1, Z 2 ) where [Z 1 ( X, Ỹ ),Z 2( X, Ỹ )] is a partitioning of Z( X,Ỹ ) = Udiag ( prox τλm+1 σ( ) 1 ([ X M+1, Ỹ ])) V with prox λ σ( n ) 1 (W) = (U (n) diag(d λ )V (n) ) n and (d λ ) i := max(σ i (W n ) λ, 0). Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 40

50 Conclusions Sparsity: from vectors to matrices from matrices to tensors Transductive and inductive learning with matrices/tensors: going beyond matrix/tensor completion Further details: Signoretto M., Tran Dinh Q., De Lathauwer L., Suykens J.A.K., Learning with Tensors: a Framework Based on Convex Optimization and Spectral Regularization, 2011 Software: Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 41

51 Acknowledgements Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 42

52 Thank you Learning with matrix and tensor based models using low-rank penalties - Johan Suykens 43

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear