Correlation Autoencoder Hashing for Supervised Cross-Modal Search

Size: px

Start display at page:

Download "Correlation Autoencoder Hashing for Supervised Cross-Modal Search"

Ellen Henry
5 years ago
Views:

1 Correlation Autoencoder Hashing for Supervised Cross-Modal Search Yue Cao, Mingsheng Long, Jianmin Wang, and Han Zhu School of Software Tsinghua University The Annual ACM International Conference on Multimedia Retrieval ICMR 2016 Y Cao et al (Tsinghua University) Correlation Autoencoder Hashing ICMR / 23

can be described by data of multiple modalities Y Cao et al

2 Motivation Cross-Modal Retrieval Background In the big data era, the amount of multimedia data has exploded An object or topic can be described by data of multiple modalities Y Cao et al (Tsinghua University) Correlation Autoencoder Hashing ICMR / 23

Motivation Cross-Modal Retrieval Cross-Modal

to search for semantically relevant items from

using textual tags bear, deer Building Blue Sky

Building Building Night Blue Y Cao et al

3 Motivation Cross-Modal Retrieval Cross-Modal Similarity Search Use a query from one modality to search for semantically relevant items from another modality eg search for animal images using textual tags bear, deer Building Blue Sky Downtown Deer Grass Green Bear Grass Green Animal Building Building Night Blue Y Cao et al (Tsinghua University) Correlation Autoencoder Hashing ICMR / 23

Motivation Cross-Modal Retrieval Challenges Trillions of

modalities are heterogeneous Different dimensions

[03, 05, -02,, 04] [0, 0, 1,, 0, 1,, 0] Y Cao et al

4 Motivation Cross-Modal Retrieval Challenges Trillions of images and texts are generated Features from different modalities are heterogeneous Different dimensions Distinct distributions Image Tags Bear Soil Grass Green [03, 05, -02,, 04] [0, 0, 1,, 0, 1,, 0] Y Cao et al (Tsinghua University) Correlation Autoencoder Hashing ICMR / 23

Motivation Hashing Methods Cross-Modal Hashing Deer Grass Green generate image descriptors SIFT GIST DeCAF generate image hash codes Semantic Correlations Tag 1 generate tag Occurrence Word2Vec

5 Motivation Hashing Methods Cross-Modal Hashing Deer Grass Green generate image descriptors SIFT GIST DeCAF generate image hash codes Semantic Correlations Tag 1 generate tag Occurrence Word2Vec generate tag descriptors hash codes Approximate Nearest Neighbor Retrieval Memory 128-d float : 512 bytes 16 bytes 1 billion items : 512 GB 16 GB Time Computation: x10 - x100 faster Transmission (disk / web): x30 faster Y Cao et al (Tsinghua University) Correlation Autoencoder Hashing ICMR / 23

6 Method Model Homogeneous Architecture Key Points Homogeneous Architecture: image and text can use the same deep architecture Ŷ ˆX Decoder Reconstruction Reconstruction Decoder Hash Code h x (x) Hash Code h y (y) Encoder Representation Representation Encoder X Y Y Cao et al (Tsinghua University) Correlation Autoencoder Hashing ICMR / 23

7 Method Model Feature Correlations Key Points Feature correlations can be maximized to reduce heterogeneity across modalities, using pairwise correlations (solid lines) Ŷ Pairwise ˆX Decoder Reconstruction Reconstruction Decoder Hash Code h x (x) Hash Code h y (y) Encoder Representation X Correlation S Y Representation Encoder Y Cao et al (Tsinghua University) Correlation Autoencoder Hashing ICMR / 23

8 Method Model Feature Correlation Maximization Key Points Use pairwise correlations for reconstructive embedding Within-modal Reconstructive Embedding n ( min xi V x h x (x i ) 2 V x,v 2 + y i V y h y (y i ) 2 ) 2, (1) y i=1 Cross-modal Reconstructive Embedding min L = n ( xi V x h y (y i ) 2 V + y x,v 2 i V y h x (x i ) 2 ) 2, (2) y i=1 Y Cao et al (Tsinghua University) Correlation Autoencoder Hashing ICMR / 23

9 Method Model Semantic Correlations Key Points Due to semantic gap, semantic correlations (dashed lines) need to be maximized Decoder Reconstruction Ŷ Semantic ˆX Reconstruction Decoder Hash Code h x (x) Hash Code h y (y) Encoder Representation X Correlation S Y Representation Encoder Y Cao et al (Tsinghua University) Correlation Autoencoder Hashing ICMR / 23

10 Method Model Semantic Correlation Maximization Key Points Construct a Nearest Neighbor Affinity Matrix A Nearest Neighbor Affinity Matrix d ( { ( ) ) x i N k xj xj N k (x i ) x i, y j, if li = l j ( ) A ij = y i N k yj yj N k (y i ) 0, otherwise, (3) d (x i, y j ) = e x i x j 2 2 /2σ2 x + e y i y j 2 2 /2σ2 y (4) where N k (x) represents the k-nearest neighbors of x Y Cao et al (Tsinghua University) Correlation Autoencoder Hashing ICMR / 23

11 Method Model Semantic Correlation Maximization Key Points Construct a within-category and a between-category similarity matrix Similarity Matrices { A ij (1/n 1/n c ), S b if l i = l j = c ij = A ij /n, if l i l j, { S w A ij /n c, if l i = l j = c ij = 0, if l i l j, where n c is the number of objects within the c-th category (5) Y Cao et al (Tsinghua University) Correlation Autoencoder Hashing ICMR / 23

12 Method Model Semantic Correlation Maximization Key Points Maximize the inter-category separation margin Circumvent the large intra-class variance Cross-modal Semantic Correlation min R = n n S ij h x (x i ) h y (y j ) 2, (6) W 2 x,w y S ij = i=1j=1 { A ij (2/n c 1/n), if l i = l j = c A ij /n, if l i l j (7) Y Cao et al (Tsinghua University) Correlation Autoencoder Hashing ICMR / 23

13 Method Model Correlation Autoencoder Hashing Key Points enhances feature correlation by cross-modal reconstruct embedding maximizes the inter-category separation margin for learning more discriminative hash codes minimizes the intra-category variance by further exploring the cross-modal locality information Hash Code h x (x) Decoder Reconstruction Ŷ Pairwise Semantic ˆX Reconstruction Decoder Hash Code h y (y) Encoder Representation X Correlation S Representation Encoder Y Cao et al (Tsinghua University) Correlation Autoencoder Hashing ICMR / 23 Y

14 Method Model Correlation Autoencoder Hashing Unified Optimization Problem min O = L + λr V x,v y,w x,w y h x (x) = sgn ( W T xx ), h y (y) = sgn ( W T y y ), where λ is a penalty parameter for trading off the relative importance of feature correlation and semantic correlation (8) Learning Algorithm By back-propagation (BP) using mini-batch SGD O(x i, y i ) W x pq = L(y i) W x pq + λ R(x i), (9) W x pq Y Cao et al (Tsinghua University) Correlation Autoencoder Hashing ICMR / 23

15 Method Model Deep Architecture Key Points A three-layer stacked auto-encoder architecture The feature correlations and semantic correlations are distilled in each layer and can be strengthened layer by layer Use hyperbolic tangent function tanh as the activation function to reduce the large binarization loss sign ( ) h x 3 x sign h y 3 (y) tanh tanh h y 2 (y) tanh tanh h 1 x (x) h 1 y (y) Y Cao et al (Tsinghua University) Correlation Autoencoder Hashing ICMR / 23

16 Experiment Setup Experiment Setup Datasets: Nus-wide, Wiki and MIR-Flickr Protocols: Mean Average s, - Curves Parameter selection: cross-validation Comparison Methods Unsupervised Shallow Hashing: Supervised Shallow Hashing: Unsupervised Deep Hashing: + Sign Supervised Deep Hashing: Our approach Variants only with feature correlation (-F) without using data locality (-L) Y Cao et al (Tsinghua University) Correlation Autoencoder Hashing ICMR / 23

17 Experiment Results Results and Discussion [Nus-wide] outperforms unsupervised deep hashing (), supervised hashing () and unsupervised shallow hashing () also outperforms -F and -L, verifying the vital importance of every component newly-crafted in this paper Dataset Nuswide Method I T T I 8 bits 16 bits 32 bits 64 bits 8 bits 16 bits 32 bits 64 bits F L (a) I 16 bits (b) I 32 bits (c) T 16 bits (d) T 32 bits Y Cao et al (Tsinghua University) Correlation Autoencoder Hashing ICMR / 23

18 Experiment Results Results and Discussion [Wiki] The low quality of the image modality leads to that task I T is more difficult than task T I Almost all the methods achieve better results on task T I Dataset Wiki Method I T T I 8 bits 16 bits 32 bits 64 bits 8 bits 16 bits 32 bits 64 bits F L (e) I 16 bits (f) I 32 bits (g) T 16 bits (h) T 32 bits Y Cao et al (Tsinghua University) Correlation Autoencoder Hashing ICMR / 23

19 Experiment Results Results and Discussion [MIR-Flickr] also achieve the state-of-the-art results on large-scale dataset Dataset Flickr Method I T T I 8 bits 16 bits 32 bits 64 bits 8 bits 16 bits 32 bits 64 bits F L (i) I 16 bits (j) I 32 bits (k) T 16 bits (l) T 32 bits Y Cao et al (Tsinghua University) Correlation Autoencoder Hashing ICMR / 23

20 Experiment Results Quantization Error Quantization error: search quality loss due to binarization from continuous features to binary codes (black bars) incurs significantly less loss on search quality than other two baselines, due to that sgn (x) tanh (x) is a more accurate surrogate than the widely-adopted spectral relaxation sgn (x) x I T T I 06 I T T I MAP 03 MAP Wiki Nus Wide (m) Quantization Error on Wiki (n) Quantization Error on NUS-WIDE Y Cao et al (Tsinghua University) Correlation Autoencoder Hashing ICMR / 23

21 Experiment Results Parameter Sensitivity consistently outperforms the strongest baseline on all datasets when λ is varied in a large range [01, 2] MAP 04 MAP NUS WIDE Wiki Flickr1M λ 01 NUS WIDE Wiki Flickr1M λ (o) I 32 bits (p) T 32 bits Y Cao et al (Tsinghua University) Correlation Autoencoder Hashing ICMR / 23

22 Summary Summary Correlation Autoencoder Hashing () for cross-modal search Three key points Explore the feature correlations by reconstructing feature vectors of one modality from corresponding hash codes of another modality Explore the semantic correlations by maximizing the inter-category separation margin and minimizing the intra-category variance Enhance both cross-modal correlations in a deep architecture, which will make the embedded hash codes generalize better across different modalities Future work Hybrid Deep Architecture: Use Convolutional Neural Net to model images, and use Autoencoder to model texts Y Cao et al (Tsinghua University) Correlation Autoencoder Hashing ICMR / 23

23 Summary Thanks for your listening! Q & A Y Cao et al (Tsinghua University) Correlation Autoencoder Hashing ICMR / 23

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as