Composite Quantization for Approximate Nearest Neighbor Search

Composite Quantization for Approximate Nearest Neighbor Search Jingdong Wang Lead Researcher Microsoft Research http://research.microsoft.com/~jingdw ICML 104, joint work with my interns Ting Zhang from USTC and Chao Du from Tsinghua University

Outline Introduction Problem Product quantization Cartesian k-means Composite quantization Experiments

Nearest neighbor search Application to similar image search query

Nearest neighbor search Application to particular object retrieval query

Nearest neighbor search Application to duplicate image search

Nearest neighbor search Similar image search Application to K-NN annotation: Annotate the query image using the similar images

Nearest neighbor search Definition Database: Query: Nearest neighbor: R d x q

Nearest neighbor search Exact nearest neighbor search linear scan:

Nearest neighbor search - speedup K-dimensional tree (Kd tree) Generalized binary search tree Metric tree Ball tree VP tree BD tree Cover tree

Nearest neighbor search Exact nearest neighbor search linear scan: Costly and impractical for large scale high-dimensional cases Approximate nearest neighbor (ANN) search Efficient Acceptable accuracy Practically used

Two principles for ANN search Recall the complexity of linear scan: 1. Reduce the number of distance computations Time complexity: Tree structure, neighborhood graph search and inverted index

Our work: TP Tree + NG Search TP Tree Jingdong Wang, Naiyan Wang, You Jia, Jian Li, Gang Zeng, Hongbin Zha, Xian-Sheng Hua:Trinary- Projection Trees for Approximate Nearest Neighbor Search. IEEE Trans. Pattern Anal. Mach. Intell. 36(): 388-403 (014) You Jia, Jingdong Wang, Gang Zeng, Hongbin Zha, Xian-Sheng Hua: Optimizing kd-trees for scalable visual descriptor indexing. CVPR 010: 339-3399 Neighborhood graph search Jingdong Wang, Shipeng Li:Query-driven iterated neighborhood graph search for large scale indexing. ACM Multimedia 01: 179-188 Jing Wang, Jingdong Wang, Gang Zeng, Rui Gan, Shipeng Li, Baining Guo: Fast Neighborhood Graph Search Using Cartesian Concatenation. ICCV 013: 18-135 Neighborhood graph construction Jing Wang, Jingdong Wang, Gang Zeng, Zhuowen Tu, Rui Gan, Shipeng Li: Scalable k-nn graph construction for visual descriptors. CVPR 01: 1106-1113

Comparison over SIFT 1M ICCV13 ACMMM1 CVPR10 1 NN 13

Comparison over GIST 1M ICCV13 ACMMM1 CVPR10 1 NN 14

Comparison over HOG 10M ICCV13 ACMMM1 CVPR10 1 NN 15

Neighborhood Graph Search Shipped to Bing ClusterBed Number Index building time on 40M documents is only hours and 10 minutes. Search DPS on each NNS machine stable at 950 without retry and errors Five times faster improved FLANN

Two principles for ANN search Recall the complexity of linear scan: 1. Reduce the number of distance computations Time complexity: 1. High efficiency. Large memory cost Tree structure, neighborhood graph search and inverted index. Reduce the cost of each distance computation Time complexity: Hashing (compact codes) 1. Small memory cost. Low efficiency

Approximate nearest neighbor search Binary embedding methods (hashing) Produce a few distinct distances Limited ability and flexibility of distance approximation Vector quantization (compact codes) K-means Impossible to use for medium and large code length Impossible to learn the codebook Impossible to compute a code for a vector Product quantization Cartesian k-means

Combined solution for very large scale search Retrieve candidates with an index structure using compact codes Load raw features for retrieved candidates from disk Reranking using the true distances Efficient and small memory consumption IO cost is small

Outline Introduction Problem Product quantization Cartesian k-means Composite quantization Experiments

Product quantization Approximate x by the concatenation of M subvectors

Product quantization Approximate x by the concatenation of M subvectors x 1i1 p 1i1 {p 11, p 1,, p 1K } Codebook in the 1 st subspace x = x i x = p i {p 1, p,, p K } Codebook in the nd subspace x MiM p MiM {p M1, p M,, p MK } Codebook in the M th subspace

Product quantization Approximate x by the concatenation of M subvectors x 1 p 1i1 {p 11, p 1,, p 1K } Codebook in the 1 st subspace x = x x = p i {p 1, p,, p K } Codebook in the nd subspace x M p MiM {p M1, p M,, p MK } Codebook in the M th subspace

Product quantization Approximate x by the concatenation of M subvectors Code presentation: (i 1, i,, i M ) Distance computation: d q, x = d q 1, p 1i1 + d q, p i + + d qm, p MiM M additions using a pre-computed distance table x 1 p 1i1 d( {p 11, p 1,, p 1K }, qcodebook in the 1 st 1 ) subspace {d q 1, p 11, d q 1, p 1,, d(q 1, p 1K )} x = x x = p i d( {p 1, p,, p K }, qcodebook ) in the nd subspace q 1 x M p MiM d( {p M1, p M,, p MK }, qcodebook in the M th M ) subspace

Product quantization M = Approximate x by the concatenation of M subvectors Codebook generation Do k-means for each subspace

Product quantization M =, K = 3 Approximate x by the concatenation of M subvectors Codebook generation Do k-means for each subspace p 3 p p 1 p 11 p 1 p 13

Product quantization M =, K = 3 Approximate x by the concatenation of M subvectors Codebook generation Do k-means for each subspace Result in K M groups p 3 p The center of each is the concatenation of M subvectorsp 1 p 11 p 1 p 13 = p 11 p 1

Product quantization M =, K = 3 Approximate x by the concatenation of M subvectors Codebook generation Do k-means for each subspace Result in K M groups The center of each is the concatenation of M subvectors

Outline Introduction Problem Product quantization Cartesian k-means Composite quantization Experiments

Cartesian K-means Extended product quantization Optimal space rotation Perform PQ over the rotated space p 1i1 x x = R p i p MiM

Cartesian K-means Extended product quantization Optimal space rotation Perform PQ over the rotated space p 3 p p 13 x x = R p 1i1 p i p 1 p 1 p 11 p MiM

Cartesian K-means Extended product quantization Optimal space rotation Perform PQ over the rotated space p 3 p x p 13 x x = R p 1i1 p i p MiM p 1 p 1 p 11 x x = R p 13 p

Outline Introduction Composite quantization Experiments

Composite quantization Approximate x by the addition of M vectors x x = c 1i1 + c i + + c MiM {c 11, c 1,, c 1K } {c 1, c,, c K } {c M1, c M,, c MK } Source codebook 1 Source codebook Source codebook M Each source codebook is composed of K d-dimensional vectors

Composite quantization source codebooks: {c 11, c 1, c 13 } {c 1, c, c 3 } c 11 c 3 c 1 c 1 c c 13

Composite quantization c 11 source codebooks: {c 11, c 1, c 13 } {c 1, c, c 3 } Composite center: = c 11 + c 1 c 1

Composite quantization c 11 source codebooks: {c 11, c 1, c 13 } {c 1, c, c 3 } Composite center: = c 11 + c c

Composite quantization c 11 c 3 source codebooks: {c 11, c 1, c 13 } {c 1, c, c 3 } Composite center: = c 11 + c 3

Composite quantization source codebooks: {c 11, c 1, c 13 } {c 1, c, c 3 } More composite centers

Composite quantization Source codebook: {c 11, c 1, c 13 } {c 1, c, c 3 } Composite codebook: 9 composite centers

Composite quantization x Source codebook: {c 11, c 1, c 13 } {c 1, c, c 3 } c 11 Space partition: 9 groups c x x = c 11 + c

Composite quantization Approximate x by the addition of M vectors Code representation: i 1 i i M length: MlogK x x = c 1i1 + c i + + c MiM {c 11, c 1,, c 1K } {c 1, c,, c K } {c M1, c M,, c MK } Source codebook 1 Source codebook Source codebook M

Connection to product quantization Concatenation in product quantization p 1i1 p 1i1 0 0 x x = p i = 0 + p i + + 0 p MiM 0 0 p MiM

Connection to product quantization Concatenation in product quantization = addition p 1i1 p 1i1 0 0 x x = p i = 0 + p i + + 0 p MiM 0 0 p MiM

Connection to product quantization Concatenation in product quantization = addition p 1i1 p 1i1 0 0 x x = p i = 0 + p i + + 0 p MiM 0 0 p MiM Product quantization is constrained Composite quantization composite quantization x x = c 1i1 + c i + + c MiM

Connection to Cartesian k-means Concatenation in Cartesian k-means = addition p 1i1 p 1i1 0 0 x x = R p i = R 0 + R p i + + R 0 p MiM 0 0 p MiM

Connection to Cartesian k-means Concatenation in Cartesian k-means = addition p 1i1 p 1i1 0 0 x x = R p i = R 0 + R p i + + R 0 p MiM 0 0 p MiM Cartesian k-means is constrained Composite quantization composite quantization x x = c 1i1 + c i + + c MiM

Composite quantization Composite quantization Generalizing product quantization and Cartesian k-means Advantages More flexible codebooks More accurate data approximation Higher search accuracy Search efficiency? Depend on the efficiency of the distance computation Product quantization: Coordinate aligned space partition Cartesian k-means: Rotated coordinate aligned space partition Composite quantization: Flexible space partition

Approximate distance computation Recall the data approximation M x x = m=1 c mim x Approximate distance to the query q q x q M m=1 c mim x Time-consuming

Approximate distance computation Expanded into three terms q M m=1 = m=1 c mim x M q c mim (x) M 1 q + m l c mim (x) T c lil (x)

Approximate distance computation q M m=1 = m=1 c mim x M q c mim (x) M 1 q + m l c mim (x) O(M) additions Implemented with a pre-computed distance lookup table Distance lookup table: Store the distances from source codebook elements to q T c lil (x)

Approximate distance computation q M m=1 = m=1 c mim x M q c mim (x) M 1 q + m l c mim (x) T c lil (x) O(M) additions Constant

Approximate distance computation q M m=1 = m=1 c mim x M q c mim (x) M 1 q + m l c mim (x) T c lil (x) O(M) additions Constant O(M ) additions Using a pre-computed dot product lookup table Dot product lookup table: Store the dot products between codebook elements

Approximate distance computation q M m=1 = m=1 c mim x M q c mim (x) M 1 q + m l c mim (x) T c lil (x) O(M) additions Constant O(M ) additions O(M ): still expensive

Approximate distance computation q M m=1 = m=1 c mim x M q c mim (x) M 1 q + m l c mim (x) T c lil (x) O(M) additions Constant If constant

Approximate distance computation q M m=1 = m=1 c mim x M q c mim (x) M 1 q + m l c mim (x) T c lil (x) O(M) additions Constant If constant Computing this is enough for search

Formulation q M m=1 = m=1 c mim x M q c mim (x) M 1 q + m l c mim (x) T c lil (x) Minimize quantization error: x M m=1 c mim x Constant Subject to the third term is a constant

Formulation Constrained formulation min C m, i m x,ε x x M m=1 T c mim x s. t. m l c mim (x) c lil (x) = ε Minimize quantization error for search accuracy Constant constraint for search efficiency

Connection to PQ and CKM Constrained formulation min C m, i m x,ε x x M m=1 T c mim x s. t. m l c mim (x) c lil (x) = ε Minimize quantization error for search accuracy Constant constraint for search efficiency Product quantization and Cartesian k-means: suboptimal solutions of our approach Non-overlapped space partitioning Codebooks are mutually orthogonal m l T c mim (x) c lil (x) = ε Product quantization and Cartesian k-means

Formulation Constrained formulation min C m, i m x,ε x x M m=1 T c mim x s. t. m l c mim (x) c lil (x) = ε Minimize quantization error for search accuracy Constant constraint for search efficiency Transformed to unconstrained formulation Add the constraints into the objective function φ {C m, i m (x), ε) = x x M m=1 c mim x + μ T x m l c mim (x) c lil (x) ε

Optimization Unconstrained formulation φ {C m, i m (x), ε) = x x M m=1 c mim x Distortion error + μ T x m l c mim (x) c lil (x) ε Alternative optimization between C m, i m (x), ε Constraints violation Selected by validation

Alternative optimization Update {i m (x)} Iteratively alternative optimization, fixing i l (x) l m, update i m (x) Update ε Closed form solution,ε = 1 #{x} Update {C m } L-BFGS algorithm T x m l c mim x c lil (x)

Complexity Update {i m (x)} Iteratively alternative optimization, fixing i l (x) l m, update i m (x) O(MKdT d ) Update ε Closed form solution,ε = 1 #{x} O(NM ) Update {C m } L-BFGS algorithm O(NMdT l T c ) T x m l c mim x Convergency? c lil (x)

Convergence 1MSIFT, 64 bits Converge about 10~15 iterations

Outline Introduction Composite quantization Experiments

Experiments on ANN search Datasets 1 million of 18D SIFT vectors, 10000 queries 1 million of 960D GIST vectors, 1000 queries 1 billion of 18D SIFT vectors, 1000 queries Evaluation Recall@R the fraction of queries for which the ground-truth Euclidean nearest neighbor is in the R retrieved items

Search pipeline Source codebooks Code of database vector x Query q Distance tables (between query and codebook elements) Distance between q and x Output the nearest vectors Repeated for n database vectors

Comparison on 1M SIFT and 1M GIST

Comparison on 1M SIFT and 1M GIST 64 btis Recall@10: Our:71.59% CKM:63.83%

Comparison on 1M SIFT and 1M GIST Our:71.59% CKM:63.83% Relatively small improvement on 1M GIST might be that CKM has already achieved large improvement

Comparison on 1M SIFT and 1M GIST Our:71.59% 64 bits ITQ: 53.95% 18 bits ITQ without asymmetric distance underperformed ITQ with asymmetric distance Our approach with 64 bits outperforms (A) ITQ with 18 bits, with slightly smaller search cost

Comparison on 1B SIFT

Comparison on 1B SIFT Recall@100: Our:70.1% CKM:64.57%

Average query time Average query time

Application to object retrieval Two datasets INRIA Holidays contains 500 queries and 991 corresponding relevant images UKBench contains 550 groups of 4 images each Results MAP on the holiday dataset Scores on the UKBench dataset

recall@r Discussion The effect of ε min x x x T T subject to m l c mim (x) c lil (x) = ε m l c mim (x) c lil (x) = 0 Indicate the dictionaries are mutual orthogonal like splitting the space (R,T) Search performance with learnt ε is better, since learning ε is more flexible

recall@r Discussion The effect of translation min x x x T subject to m l c mim (x) c lil (x) = ε search performance doesn t change too much x x t x (R,T) Contribution of the offset is relatively small compared with the composite quantization

Extension (CVPR15) Constrained formulation min C m, i m x,ε x x M m=1 T c mim x s. t. m l c mim (x) c lil (x) = ε Minimize quantization error for search accuracy Constant constraint for search efficiency m C m 1 T Sparsity constraint for precomputation efficiency

Extension (CVPR15) Constrained formulation min C m, i m x,ε x x P M m=1 T c mim x s. t. m l c mim (x) c lil (x) = ε Minimize quantization error for search accuracy Constant constraint for search efficiency m C m 1 T Sparsity constraint for precomputation efficiency Dimension reduction

Multi-stage vector quantization Database X Vector quantization (1) Residuals Vector quantization () Residuals Residuals Vector quantization (M)

Conclusion Composite quantization A compact coding approach for approximate nearest neighbor search Joint optimization of search accuracy and search efficiency State-of-the-art performance

A Survey on Learning to Hash http://research.microsoft.com/~jingdw/pubs/lthsurvey.pdf

Call for papers http://research.microsoft.com/~jingdw/cfp/cfp_tbdsi_bmd.pdf

Call for papers http://research.microsoft.com/~jingdw/cfp/cfp_icdm15workshop_bmd.pdf

Thanks Q&A

A distance preserving view of quantization Quantization Data approximation: x x x x q Better search If better distance preserving: q x q x Distance preserving view Triangle inequality: q x q x x x Minimize the upper bound: x x x

A joint minimization view Generalized triangle inequality d q, x d(q, x) x x + δ 1/ Triangle inequality q x q x x x Distortion Efficiency M d q, x = (Σ m=1 q c mim x ) 1/ d q, x = ( q x + M 1 q ) 1/ T δ = Σ m l c mim (x) c lil (x) x = Σ M m=1 c mim (x)

A joint minimization view Generalized triangle inequality d q, x d(q, x) x x + δ 1/ Distortion Efficiency Our formulation min Σ x x x C m, i m x,ε s. t. δ = ε Minimize distortion for search accuracy Constant constraint for search efficiency