2017 Fall ECE 692/599: Binary Representation Learning for Large Scale Visual Data

Size: px

Start display at page:

Download "2017 Fall ECE 692/599: Binary Representation Learning for Large Scale Visual Data"

Verity Riley
6 years ago
Views:

1 2017 Fall ECE 692/599: Binary Representation Learning for Large Scale Visual Data Liu Liu Instructor: Dr. Hairong Qi University of Tennessee, Knoxville September 21, 2017 Liu Liu (UTK) DL Class September 21, / 62

2 Overview 1 Introduction 2 Learning Binary Representation via Cross Entropy 3 End-to-end Learning Binary Representation 4 Discriminative Cross-View Hashing 5 Cross-Domain Image Hashing with Adversarial Learning 6 Conclusion Liu Liu (UTK) DL Class September 21, / 62

has over 250 million active daily user ImageNet has over 13 million

3 Massive Datasets Modern media brought massive multimedia dataset. Facebook has about 300 million photo uploads per day Instagram Stories has over 250 million active daily user ImageNet has over 13 million images, with over 21k categories Liu Liu (UTK) DL Class September 21, / 62

4 Resource Constraint Resource-constrained environment smart camera networks (SCN) often deployed in harsh communication environment on-board computation and storage resource is limited distributed object/scene recognition Liu Liu (UTK) DL Class September 21, / 62

detailed descriptions (pathology reports) and medical images

5 Multi-modality Social media offer massive volumes of multimedia content textual tags, annotations of images detailed descriptions (pathology reports) and medical images thumbnails, titles of videos Liu Liu (UTK) DL Class September 21, / 62

6 Cross-domain Multi-domain visual contents images from different contexts largely available unlabeled images cross-index/retrieval w.r.t. different domains non-negligible domain shift Liu Liu (UTK) DL Class September 21, / 62

7 Binary Representation Focus on learning efficient representation for visual content. Project high-dimensional visual data into low-dimensional embedding space Binarize the embedding in Hamming space Why binary? binary representation is computationally efficient much less storage (comparing to floating number) versatile for different tasks: retrieval, classification, etc. Liu Liu (UTK) DL Class September 21, / 62

8 End-to-end Learning conventional approach: generating feature step + binary embedding end-to-end approach: learning binary embedding for visual content together with feature learning usually achieved by deep learning approaches Liu Liu (UTK) DL Class September 21, / 62

9 Semantic Similarity vs. Visual Similarity Liu Liu (UTK) DL Class September 21, / 62

10 Hash Code Most common use of binary representation: hash code (retrieval and indexing) Pairwise/Affinity: multiple ways of learning pairwise similarity information 1 min H S 1 q HHT 2 F s.t. H [ 1, 1] n q 2 min H log P(S H) s.t. P(s ij h i, h j ) = σ( h i, h j ) s ij (1 σ( h i, h j )) 1 s ij 3 Both MAP and (weighted) MLE can be considered Triplet loss 1 Hinge ranking loss: min max(0, q/2 (d H (h i, h j ) d H (h i, h k ))) 2 (Normalized) Discounted Cumulative Gain: DCG p = p i=1 2rel i 1 log 2 (i+1) s.t. rel i {0, 1} Liu Liu (UTK) DL Class September 21, / 62

Learning Discriminative Representation Traditionally, binary representation is learned as hash code for retrieval purpose, pairwise similarity is exploited.

11 Learning Discriminative Representation Traditionally, binary representation is learned as hash code for retrieval purpose, pairwise similarity is exploited. Problem: the uniqueness of each class is lost when using similarity as supervision. Approach: use labels as supervision directly Liu Liu (UTK) DL Class September 21, / 62

12 End-to-end Learning Binary representation Learning Binary Representation via Cross Entropy Liu Liu (UTK) DL Class September 21, / 62

13 Learning Binary Descriptors via Classification Given a dataset of N samples X = {x i } N i=1, x i R d 1 Goal: learn B = {b i } N i=1 { 1, +1}L N via nonlinear hash functions F : R d N R L N, L << d: B = sgn(f (X)) Learn via Classification 1 N min L(W T b i, y i ) + Ω(W) W,F N i=1 b i = sgn(f (x i )) { 1, +1} L, i = 1,..., N (1) where W R L C is the linear classifier; L is some loss function; Y is the ground truth labels of training set, and Ω is the regularizor for the classifier Liu Liu (UTK) DL Class September 21, / 62

14 Cross Entropy Besides L-2 and hinge loss, cross entropy is a common loss function for classification, measuring the probabilistic distributional difference between ground truth and prediction (softmax classifier): Cross Entropy P(y i = k b i ; w k ) = e wt k b i C j=1 ewt j b i (2) C L i = t k (y i ) log P(y i = k b i ; w k ), (3) k=1 Liu Liu (UTK) DL Class September 21, / 62

15 Continuous Relaxation Total formulation: min b i,w,f 1 N N 1(y i ) wk T b i log i=1 b i = sgn(f (x i )), i = 1,..., N C j=1 e wt j b i + λ W 2 F (4) However, Eq. 4 is difficult to optimize directly, we relax it to a continuous form: min b i,w,f 1 N C 1(y i ) wk T N b i log e wt j b i + λ W 2 F i=1 j=1 } N (5) +γ b i F (x i ) ρ F 2 2 i=1 s.t. b i { 1, +1} L, i = 1,..., N. Liu Liu (UTK) DL Class September 21, / 62

16 Alternating Optimization Alternatingly optimize three parameters F step: embedding function optimization F (x) = M T φ(x) (6) Fix the binary code B and classifier W : min M B MT φ(x) ρ M 2 2 s.t. B = { 1, +1} L N. (7) Regularized least squared: M = (φ(x)φ(x) T + ρi) 1 φ(x)b (8) Liu Liu (UTK) DL Class September 21, / 62

17 Alternating Optimization W step: classifier optimization Fix binary code B and the embedding function F : min 1 W N N e wt k b i 1(y i ) log + λ W 2 C j=1 ewt j b F, (9) i i=1 Optimized by gradient descent: v (t) = θv (t 1) + α w (t+1) k L w (t 1) k = w (t) k v (t), k = 1,..., C (10) Liu Liu (UTK) DL Class September 21, / 62

18 Alternating Optimization B step: binary code optimization Fix W and F : ( min b i 1 N 1(y i )wk T N b i + 2γ i=1 ) N F (x i ) T b i + 1 N i=1 N log i=1 C j=1 e wt j b i s.t. b i { 1, 1} L, i = 1,..., N. (11) where log C j=1 ewt j b i in problem (11) is a Log-Sum-Exp (LSE) function max{x 1,..., x n } LSE(x 1,..., x n ) max{x 1,..., x n } + log(n) (12) Liu Liu (UTK) DL Class September 21, / 62

19 Alternating Optimization As a result, Eq. 11 can be approximated as { ( ) 1 N N min 1(y i )wk T b i N b i + 2γ F (x i ) T b i + 1 N i=1 i=1 N i=1 max{wj T b i } j s.t. b i { 1, +1} L, i = 1,..., N. (13) It is an NP-hard problem. We propose a sub-optimal solution greedily. } Liu Liu (UTK) DL Class September 21, / 62

20 Experiment Datasets: CIFAR-10 dataset, the BMW dataset and the Oxford 17 category flower dataset. Exp 1. Classification Task Methods Testing Accuracy Training Time (sec) KSH (5,000 tr) 91.5% 1720 FastHash 92.3% 609 SDH 92.0% 33.4 CCA-ITQ 91.8% 3.2 ResNet Feature 92.4% - CE-Bits (5,000 tr) 92.1% 3.1 CE-Bits 92.4% 22.1 Table: The testing accuracy of different methods on CIFAR-10 dataset (ResNet features), all binary codes are 64 bits. Liu Liu (UTK) DL Class September 21, / 62

21 Experiment Methods Testing Accuracy Training Time (sec) KSH 87.4% 83.1 FastHash 88.5% 38.0 SDH 87.9% 0.71 CCA-ITQ 88.5% 7.67 VGG Feature 88.8% - CE-Bits 88.6% 1.12 Table: The testing accuracy of different methods on Oxford 17 category flower dataset (VGG features), all binary codes are 64 bits. Liu Liu (UTK) DL Class September 21, / 62

22 Experiment Methods Testing Accuracy Training Time (sec) KSH 93.8% 18.4 FastHash 91.1% 14.8 SDH 95.9% 0.15 CCA-ITQ 92.9% 1.17 SURF 94.7% - CE-Bits 97.2% 0.31 Table: The testing accuracy of different methods on BMW dataset (SURF features), all binary codes are 64 bits. Liu Liu (UTK) DL Class September 21, / 62

23 Experiment Exp. 2 Retrieval Task 92 CIFAR 10: Resnet feature CIFAR 10: Resnet feature Precision (%) map(%) CE Bits SDH KSH CCA ITQ FastHash Code width CE Bits SDH KSH CCA ITQ FastHash Code width Figure: Comparison of precision achieved by different methods within Hamming radius of 2. Figure: Comparison of MAP achieved by different methods within Hamming radius of 2. Liu Liu (UTK) DL Class September 21, / 62

24 Experiment Convergence 10 5 Convergence of the Proposed Algorithm CE Bits (64 bit) Cost 10 4 Cost (log) Iteration Figure: The convergence of CE-Bits on CIFAR-10 during training with learning rate α = 5e 3. The code width is 64-bit Liu Liu (UTK) DL Class September 21, / 62

25 End-to-end Learning Binary Representation End-to-end Learning Binary representation with Direct Binary Embedding Liu Liu (UTK) DL Class September 21, / 62

26 Learning with Deep Architectures Problem Formulation min W,F 1 N N i=1 ( ) L(W b i, y i ) + λ b i F (I i ; Ω) 2 2 s.t. b i = thresold(f (I i ; Ω), 0.5) (14) F (I, Ω) = f DBE (f n ( f 2 (f 1 (I ; ω 1 ); ω 2 ) ; ω n )ω DBE ), (15) Similar continuous relaxation: 1 min W,F N N i=1 ( L(W F (I i ; Ω), y i ) + λ 2F (I i ; Ω) 1 1 2) (16) Liu Liu (UTK) DL Class September 21, / 62

27 Direct Binary Embedding Z = f DBE (X) = tanh(relu(bn(xw DBE + b DBE ))) (17) I DCNN X W DBE T BN tanh(relu( )) b DBE Z f DBE F(I; Ω) The benefit of DBE layer approximating binary code is three-fold: 1 batch normalization mitigates training with saturating nonlinearity, and potentially promotes more effective binary representation. 2 ReLU activation is sparse and learns bit 0 inherently. 3 tanh activation bounds the ramping of ReLU activation and learns bit 1 effectively without jeopardizing the sparsity of ReLU. Liu Liu (UTK) DL Class September 21, / 62

28 Classification Multicalss classification: min W,F 1 N N i=1 k=1 C 1(y i ) log e w k F (I i ;Ω) C j=1 ew j F (I i ;Ω) s.t. F (I, Ω) = f DBE (f n ( f 2 (f 1 (I; ω 1 ); ω 2 ) ; ω n )ω DBE ) (18) Multilabel classification: min W,F 1 N log N c + i=1 j=1 1 c + log e w j F (I i ;Ω) C p=1 ew p F (I i ;Ω) ν 1 N e w p F (I i ;Ω) + (1 1(y i)) log N i=1 p=1 e w p F (I i ;Ω) 1 + e w p F (I i ;Ω) s.t. F (I; Ω) = f DBE (f n ( f 2 (f 1 (I; ω 1 ); ω 2 ) ; ω n )ω DBE ) C [ρ1(y i ) (19) ] Liu Liu (UTK) DL Class September 21, / 62

29 Toy Example MNIST with LeNet 10 1 DBE-LeNet 10 0 LeNet log(loss) Figure: The histogram of DBE layer activation epcohs Figure: The convergence of the original LeNet and with DBE trained on MNIST Liu Liu (UTK) DL Class September 21, / 62

30 Toy Example Method LeNet DBE-LeNet SDH FastHash testing acc(%) Table: The comparison of the testing accuracy on MNIST. Code-length for all hashing algorithms is 64-bit. LeNet feature (1000-d continuous vectors) is used for SDH and FastHash. λ 0 1e-4 1e-3 1e-2 1e-1 testing acc(%) Table: The impact on quantization error coefficient λ Liu Liu (UTK) DL Class September 21, / 62

31 Experiment Evaluate the proposed DBE layer with the deep residual network (ResNet) Datasets: CIFAR-10 (50K training, 10K test) and MS COCO (83K training, 40K test) Exp. 1 Classification Methods Testing Accuracy (%) CCA-ITQ FastHash SDH DLBHC ResNet DBE (ours) Table: The testing accuracy of different methods on CIFAR-10 dataset. All binary representations have code-length of 64 bits. Liu Liu (UTK) DL Class September 21, / 62

32 Experiment Performance w.r.t. different code lengths Code length (bits) testing acc(%) Table: Classification accuracy of DBE on CIFAR-10 dataset across different code lengths Liu Liu (UTK) DL Class September 21, / 62

33 Experiment Exp. 2 Natural object retrieval and multilabel image retrieval Code length (bits) CCA-ITQ FastHash SDH DSH DSRH DLBHC DBE (ours) Table: Comparison of mean average precision (map) on CIFAR-10 Liu Liu (UTK) DL Class September 21, / 62

34 Experiment Code length (bits) CCA-ITQ CMFH CCA-ACQ DHN DBE (ours) Table: Comparison of mean average precision (map) on COCO Liu Liu (UTK) DL Class September 21, / 62

35 Experiment Exp. 3 Multilabel image annotation Method O-P O-R O-F1 WARP DBE-Softmax DBE-weighted binary cross entropy DBE-joint cross entropy Table: Performance comparison on COCO for K = 3. The code length for all the DBE methods is 64-bit. Liu Liu (UTK) DL Class September 21, / 62

36 THANK YOU Liu Liu (UTK) DL Class September 21, / 62

END-TO-END BINARY REPRESENTATION LEARNING VIA DIRECT BINARY EMBEDDING. Liu Liu, Alireza Rahimpour, Ali Taalimi, Hairong Qi

END-TO-END BINARY REPRESENTATION LEARNING VIA DIRECT BINARY EMBEDDING. Liu Liu, Alireza Rahimpour, Ali Taalimi, Hairong Qi ED-TO-ED BIARY REPRESETATIO LEARIG VIA DIRECT BIARY EMBEDDIG Liu Liu, Alireza Rahimpour, Ali Taalimi, Hairong Qi Department of Electrical Engineering and Computer Science University of Tennessee, Knoxville