university of innsbruck institute of computer science intelligent and interactive systems Kronecker Decomposition for Image Classification Sabrina Fontanella 1,2, Antonio Rodríguez-Sánchez 1, Justus Piater 1, and Sandor Szedmak 3 1 University of Innsbruck 2 University of Salerno 3 Aalto University Évora, September 2016
Outline Image classification The problem Decomposing the environment The tensor decomposition What is it Compression Interpretation of the image components Learning approach Maximum Margin Regression Experimental evaluation ImageCLEF 2015 Experimental evaluation Pascal and Flickr Antonio Rodríguez-Sánchez (CLEF 2016) 1/41
Outline Image classification The problem Decomposing the environment The tensor decomposition What is it Compression Interpretation of the image components Learning approach Maximum Margin Regression Experimental evaluation ImageCLEF 2015 Experimental evaluation Pascal and Flickr Antonio Rodríguez-Sánchez (CLEF 2016) 2/41
Image classification I I I Images are classified according to their visual content Applicability: 1. Recognition of specific objects 2. Indoor/outdoor recognition 3. Analysis of medical images Antonio Rodrı guez-sa nchez (CLEF 2016) 3/41
Image classification II Example of classification algorithm, Bag of Words: 1. Features extraction, stored into feature vectors 2. Approximation of the distribution of the features by an histogram 3. Apply a classification algorithm (Support Vector Machine, Neural Network, Markov Random Field, etc) Antonio Rodríguez-Sánchez (CLEF 2016) 4/41
Relations between objects are of interest Is it possible to recognize relationships between the objects appearing in a scene? This is of interest, since this relationship can provide knowledge necessary to identify and classify the image E.g. A car is quite likely to be in an image where there is also buildings and people. E.g. A zebra is quite likely to be outdoors, surrounded by Savanna plants or animals. Antonio Rodríguez-Sánchez (CLEF 2016) 5/41
Outline Image classification The problem Decomposing the environment The tensor decomposition What is it Compression Interpretation of the image components Learning approach Maximum Margin Regression Experimental evaluation ImageCLEF 2015 Experimental evaluation Pascal and Flickr Antonio Rodríguez-Sánchez (CLEF 2016) 6/41
Decomposing the environment Structured decomposition of the environment Learning structured output is a popular stream of machine learning By decomposing the matrix that represent the image, the structure behind the scene could be captured Let us consider 2D image decomposition Points close to each other within continuous 2D blocks can strongly relate to each other Antonio Rodríguez-Sánchez (CLEF 2016) 7/41
Outline Image classification The problem Decomposing the environment The tensor decomposition What is it Compression Interpretation of the image components Learning approach Maximum Margin Regression Experimental evaluation ImageCLEF 2015 Experimental evaluation Pascal and Flickr Antonio Rodríguez-Sánchez (CLEF 2016) 8/41
Tensor decomposition A tensor is a multidimensional or N-way array an N- way or Nth-order tensor is an element of the tensor product of N vector spaces Tensor decomposition can be considered as a higher- order generalization of the matrix singular value decomposition (SVD) and principal component analysis (PCA) The tensor decomposition for a same image is not unique Given an RGB image of size (256,256,3), it is possible to perform the following decompositions: (16,16,3),(16,16,1) tensor + matrix (2 components) (8,8,3), (8,8,1), (4,4,1) tensor + 2 matrices (3 components) Antonio Rodríguez-Sánchez (CLEF 2016) 9/41
Tensor decomposition Concerning computer vision, the tensor decomposition could be used to represent: Color images, where three matrices express the RGB images and we can use a tensor of order three (for example (1024,1024,3)). Video stream of color images where the dimensions are R, G, B and the time. Antonio Rodríguez-Sánchez (CLEF 2016) 10/41
The Kronecker product Given two matrices A R m A n A and B R m B n B, the Kronecker product X can be expressed as: A 1,1 B A 1,2 B A 1,nA B A 2,1 B A 2,2 B A 2,nA B X = A B...... A ma,1b A ma,2b A ma,n A B with m X = m A m B, n X = n A n B If X is given (the image), how can we compute A and B (its components)? B can be considered as a 2D filter of the image represented by the matrix X components)? Antonio Rodríguez-Sánchez (CLEF 2016) 11/41
The Kronecker decomposition and SVD The Kronecker decomposition can be carried out by Singular Value Decomposition(SVD) Given an arbitrary matrix X with size m n the SVD is given by X = USV T where U R mxm is an orthogonal matrix of left singular vectors, where UU T = I m, V R nxn, is an orthogonal matrix of right singular vectors, where VV T = I n, S R mxn, is a diagonal matrix containing the singular values with nonnegative components in its diagonal Antonio Rodríguez-Sánchez (CLEF 2016) 12/41
Note The algorithm solving the SVD does not depend on the order of the elements of the matrix Thus, any permutation of the indexes, reordering, of the columns and (or) rows preserves the same solution We can then work on a reordered representation of the matrix X Antonio Rodríguez-Sánchez (CLEF 2016) 13/41
Algorithm for solving Kronecker decomposition 1. Reorder the matrix 2. Compute SVD decomposition 3. Compute the approximation of X 4. Invert the reordering Antonio Rodríguez-Sánchez (CLEF 2016) 14/41
Nearest Kronecker Product (NKP) Given a matrix X R mxn, the NKP problem involves minimizing: φ(a, B) = X A B F F is the Frobenius norm This problem can be solved using SVD, working on a reordered representation of X Antonio Rodríguez-Sánchez (CLEF 2016) 15/41
Step 1: Reorder matrix X 1 X x 11 x 12 x 13 x 14 x 15 x 16 x 21 x 22 x 23 x 24 x 25 x 26 x 31 x 32 x 33 x 34 x 35 x 36 x 41 x 42 x 43 x 44 x 45 x 46 x 51 x 52 x 53 x 54 x 55 x 56 x 61 x 62 x 63 x 64 x 65 x 66 = = A B a 11 a 12 a 13 a 21 a 22 a 23 a 31 a 32 a 33 [ b11 b 12 b 21 b 22 ], can be reordered into = X = Ã B x 11 x 13 x 15 x 31 x 33 x 35 x 51 x 53 x 55 x 12 x 14 x 16 x 32 x 34 x 36 x 52 x 54 x 56 x 21 x 23 x 25 x 41 x 43 x 45 x 61 x 63 x 65 x 22 x 24 x 26 x 42 x 44 x 46 x 62 x 64 x 66 b 11 b 12 b 21 b 22 [ a 11 a 12 a 13 a 21 a 22 a 23 a 31 a 32 a 33 ], 1 C.F.V. Loan. The ubiquitous Kronecker product. Journal of Computational and Applied Mathematics, 123:85-100, 2000. Antonio Rodríguez-Sánchez (CLEF 2016) 16/41
Approximation of X and reordering X vec(a) vec(b) F Vec() is a vectorization operator which stacks columns of a matrix on top of each other Problem of finding the nearest rank-1 matrix to X Well known solutions using SVD Antonio Rodríguez-Sánchez (CLEF 2016) 17/41
Step 2: Compute SVD decomposition X vec(a) vec(b) F Let X = USV T the decomposition of X The best à and B are defined as: à = σ 1 U(:, 1) and B = σ 1 V (:; 1) where σ 1 is the largest singular value and U and V are the corresponding singular vectors Antonio Rodríguez-Sánchez (CLEF 2016) 18/41
Steps 3 and 4: Approximation and reordering Once we have à and B is possible to compute the approximation of X Since at beginning we have changed the order of values into matrix, invert the reordering is necessary for obtain the original A and B Antonio Rodríguez-Sánchez (CLEF 2016) 19/41
Components and factorization The number of components and factorization influence the level of details Given, for example, a gray image of size (1024,1024): If it has many details, is better chose many components with small factorization: Example: (4,4)(4,4)(4,4)(4,4)(4,4) If is less detailed, less component with high factorization: Example: (32,32)(32,32) Antonio Rodríguez-Sánchez (CLEF 2016) 20/41
Outline Image classification The problem Decomposing the environment The tensor decomposition What is it Compression Interpretation of the image components Learning approach Maximum Margin Regression Experimental evaluation ImageCLEF 2015 Experimental evaluation Pascal and Flickr Antonio Rodríguez-Sánchez (CLEF 2016) 21/41
Compression I The tensor decomposition can provide a very high level of images compression It takes consideration only the largest singular values (Eckart-Young theorem) The level of compression is given by the total number of: elements in image matrix elements of components in the decomposition Antonio Rodríguez-Sánchez (CLEF 2016) 22/41
Compression II Let nsv number of singular values taken in consideration nf number of factors for component v value of factors n c the number of components used Then the total number of elements of components is given by: nsv n c v n f For simplify the notation we assume that the all factors are equal for every component Decomposition with different factors can be taken in consideration For example (32,28)(16,8)(2,4) Antonio Rodríguez-Sánchez (CLEF 2016) 23/41
Compression III: Example Given an image of size (1024,1024). It can be compressed with components (32,32)(32,32) and with 10 singular values by: 1024 2 10 2 32 2 = 51.2 (4,4),(4,4),(4,4),(4,4),(4,4) and with 10 singular values by: 1024 2 10 5 4 2 = 1310.72 Antonio Rodríguez-Sánchez (CLEF 2016) 24/41
Compression IV: Example Compression ratio: 202 Compression ratio: 99 Figure: Example of compression on toys room image. Antonio Rodríguez-Sánchez (CLEF 2016) 25/41
Outline Image classification The problem Decomposing the environment The tensor decomposition What is it Compression Interpretation of the image components Learning approach Maximum Margin Regression Experimental evaluation ImageCLEF 2015 Experimental evaluation Pascal and Flickr Antonio Rodríguez-Sánchez (CLEF 2016) 26/41
Interpretation of image components I X = A B B can be interpreted like an image filter It finds the boundary of the critical regions where most of the structural information concentrates This represents a big advantage: In general, in image filtering processes, a predetermined filter is used The Kronecker decomposition automatically tries to predict the optimal filters Antonio Rodríguez-Sánchez (CLEF 2016) 27/41
Interpretation of image components II Highest components (A) Lowest components (B) Figure: Toys room picture and its components. The Highest component and the Lowest component correspond to the matrices A1,... and B1,... respectively. Antonio Rodríguez-Sánchez (CLEF 2016) 28/41
Outline Image classification The problem Decomposing the environment The tensor decomposition What is it Compression Interpretation of the image components Learning approach Maximum Margin Regression Experimental evaluation ImageCLEF 2015 Experimental evaluation Pascal and Flickr Antonio Rodríguez-Sánchez (CLEF 2016) 29/41
Learning Sample set of pairs of output and input objects {(y i, x i ) : y i Y, x i X, i = 1,..., m} Define two functions, φ and ψ, that map the input and output objects respectively into linear vector spaces feature space in case of the input label space in case of the output φ : X H φ and ψ : Y H ψ Antonio Rodríguez-Sánchez (CLEF 2016) 30/41
Objective Find a linear function acting on the feature space f (φ(x)) = Wφ(x) + b that produces a prediction of every input object in the label space The output corresponding to X is: y = ψ 1 (f (φ(x))) Antonio Rodríguez-Sánchez (CLEF 2016) 31/41
MMR (Maximum Margin Regression) vs SVM (Support Vector Machine) MMR is a framework for multilabel classification Is based on Support Vector Machine (SVM) Key idea: reinterpretation of the normal vector w SVM w is the normal vector of the separating hyperplane. y i { 1, +1} binary outputs. The labels are equal to the binary objects. Extended View W is a linear operator projecting the feature space into the label space y i Y arbitrary outputs ψ(y i ) H ψ are the labels, the embedded outputs in a linear vector space Antonio Rodríguez-Sánchez (CLEF 2016) 32/41
Outline Image classification The problem Decomposing the environment The tensor decomposition What is it Compression Interpretation of the image components Learning approach Maximum Margin Regression Experimental evaluation ImageCLEF 2015 Experimental evaluation Pascal and Flickr Antonio Rodríguez-Sánchez (CLEF 2016) 33/41
ImageCLEF dataset Task: multi-label classification Figure: The hierarchy of classes in ImageCLEF multi-label challenge. Antonio Rodríguez-Sánchez (CLEF 2016) 34/41
Results on ImageCLEF F1 score F1 score Degree of polynomial (a) Standard deviation (b) Figure: Results for six filter sizes: 4, 8, 12, 20, 18 and 32 using 3 components, training with two different kernel: a) polynomial b) Gaussian. The parameter varied in F1 measure are degree of polynomial from 1 to 10 for polynomial kernel and values of standard deviation of Gaussian for Gaussian kernel. Antonio Rodríguez-Sánchez (CLEF 2016) 35/41
Outline Image classification The problem Decomposing the environment The tensor decomposition What is it Compression Interpretation of the image components Learning approach Maximum Margin Regression Experimental evaluation ImageCLEF 2015 Experimental evaluation Pascal and Flickr Antonio Rodríguez-Sánchez (CLEF 2016) 36/41
Pascal and Flickr: Features to compare to Feature Dimension Source Descriptor Hsv 4096 color HSV Lab 4096 color LAB Rgb 4096 color RGB HsvV3H1 5184 color HSV LabV3H1 5184 color LAB RgbV3H1 5184 color RGB DenseHue 100 texture hue HarrisHue 100 texture Hue DenseHueV3H1 300 texture hue HarrisHueV3H1 300 texture Hue DenseSift 1000 texture sift HarrisSift 1000 texture sift DenseSiftV3H1 3000 texture sift HarrisSiftV3H1 3000 texture sift Figure: Comparing tensor decomposition with other features 1 on Pascal07 dataset with Gaussian and Polynomial kernel. The decomposition chosen is 3 components with factorization (22,22). 1 Matthieu Guillaumin, Thomas Mensink, Jakob Verbeek, and Cordelia Schmid. Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation, 2009. Antonio Rodríguez-Sánchez (CLEF 2016) 37/41
Results on Pascal07 dataset Gaussian kernel Feature P(%) R(%) F1(%) TD 0.4158 0.2877 0.3400 HarrisSiftV3H1 0.4623 0.4491 0.4552 HarrisSift 0.4202 0.4895 0.4522 DenseSiftV3H1 0.4189 0.4886 0.4510 DenseSift 0.3750 0.5044 0.4302 LabV3H1 0.3911 0.3366 0.3618 DenseHueV3H1 0.3884 0.3282 0.3558 HarrisHueV3H1 0.3274 0.3884 0.3552 RgbV3H1 0.3907 0.3224 0.3533 HsvV3H1 0.4080 0.3048 0.3489 Hsv 0.3911 0.3085 0.3449 Lab 0.4135 0.2920 0.3423 Rgb 0.3857 0.2985 0.3350 HarrisHue 0.3930 0.2887 0.3328 DenseHue 0.3962 0.2828 0.3299 Polynomial kernel Feature P(%) R(%) F1(%) TD 0.3931 0.2855 0.3308 HarrisSiftV3H1 0.4002 0.5520 0.4640 HarrisSift 0.3728 0.5523 0.4449 DenseSiftV3H1 0.3592 0.5663 0.4396 DenseSift 0.3442 0.5337 0.4184 HsvV3H1 0.3815 0.3295 0.3536 RgbV3H1 0.3479 0.3551 0.3515 LabV3H1 0.3106 0.3868 0.3434 HarrisHueV3H1 0.3110 0.3894 0.3417 DenseHueV3H1 0.3166 0.3607 0.3363 Hsv 0.3390 0.3232 0.3309 HarrisHue 0.3037 0.3597 0.3241 Rgb 0.2906 0.3420 0.3135 Lab 0.2800 0.3389 0.3031 DenseHue 0.2808 0.3329 0.2995 Figure: Comparing tensor decomposition with other features 1 on Pascal07 dataset with Gaussian and Polynomial kernel. The decomposition chosen is 3 components with factorization (22,22). 1 Matthieu Guillaumin, Thomas Mensink, Jakob Verbeek, and Cordelia Schmid. Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation, 2009. Antonio Rodríguez-Sánchez (CLEF 2016) 38/41
Results on Flickr dataset Gaussian kernel Feature P(%) R(%) F1(%) TD 0.3164 0.3780 0.3118 HarrisSiftV3H1 0.5470 0.3842 0.4512 DenseSift 0.5438 0.3862 0.4515 HarrisSift 0.5368 0.3780 0.4435 DenseSiftV3H1 0.5475 0.3807 0.4491 LabV3H1 0.4693 0.3200 0.3806 HarrisHueV3H1 0.4368 0.3288 0.3752 DenseHueV3H1 0.4221 0.3333 0.3723 HsvV3H1 0.4570 0.3062 0.3667 HarrisHue 0.3753 0.3435 0.3587 RgbV3H1 0.4150 0.3089 0.3542 Lab 0.4153 0.3016 0.3494 DenseHue 0.3854 0.3187 0.3477 Rgb 0.4181 0.2824 0.3371 Hsv 0.4152 0.2762 0.3317 Polynomial kernel Feature P(%) R(%) F1(%) TD 0.2311 0.2615 0.2453 HarrisSiftV3H1 0.5289 0.4646 0.4940 DenseSiftV3H1 0.5328 0.4415 0.4828 HarrisSift 0.5260 0.4447 0.4819 DenseSift 0.5132 0.4316 0.4688 LabV3H1 0.4508 0.3533 0.3961 HsvV3H1 0.3961 0.3655 0.3798 HarrisHueV3H1 0.4115 0.3490 0.3777 DenseHueV3H1 0.4086 0.3445 0.3737 RgbV3H1 0.3996 0.3460 0.3704 Lab 0.2717 0.5600 0.3658 DenseHue 0.2698 0.5249 0.3564 HarrisHue 0.3294 0.4159 0.3561 Hsv 0.3603 0.3602 0.3540 Rgb 0.3495 0.3406 0.3443 Figure: Comparing tensor decomposition with other features 1 on Flickr dataset with Gaussian and Polynomial kernel. The decomposition chosen is 3 components with factorization (22,22). 1 Matthieu Guillaumin, Thomas Mensink, Jakob Verbeek, and Cordelia Schmid. Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation, 2009. Antonio Rodríguez-Sánchez (CLEF 2016) 39/41
Conclusions We have presented a method for feature extraction based on decomposition of environment Pro: 1. Compression 2. Automatic prediction of the best filters to use for extracting features Cons: 1. Different decompositions can strong influence the final result 2. Lack of a mechanism for automatically choose the best parameters Antonio Rodríguez-Sánchez (CLEF 2016) 40/41