Image retrieval, vector quantization and nearest neighbor search

Size: px

Start display at page:

Download "Image retrieval, vector quantization and nearest neighbor search"

Austin Hart
5 years ago
Views:

1 Image retrieval, vector quantization and nearest neighbor search Yannis Avrithis National Technical University of Athens Rennes, October 2014

Part I: Image retrieval Particular object retrieval Match images under different viewpoint/lighting, occlusion Given local descriptors, investigate match kernels beyond Bag-of-Words Part II:

2 Part I: Image retrieval Particular object retrieval Match images under different viewpoint/lighting, occlusion Given local descriptors, investigate match kernels beyond Bag-of-Words Part II: Vector quantization and nearest neighbor search Fast nearest neighbor search in high-dimensional spaces Encode vectors based on vector quantization Improve fitting to underlying distribution

3 Part I: Image retrieval Particular object retrieval Match images under different viewpoint/lighting, occlusion Given local descriptors, investigate match kernels beyond Bag-of-Words Part II: Vector quantization and nearest neighbor search Fast nearest neighbor search in high-dimensional spaces Encode vectors based on vector quantization Improve fitting to underlying distribution

4 Part I: Image retrieval To aggregate or not to aggregate: selective match kernels for image search Joint work with Giorgos Tolias and Hervé Jégou, ICCV 2013

5 Overview Problem: particular object retrieval Build common model for matching (HE) and aggregation (VLAD) methods; derive new match kernels Evaluate performance under exact or approximate descriptors

6 Related work In our common model: Bag-of-Words (BoW) [Sivic & Zisserman 03] Descriptor approximation (Hamming embedding) [Jégou et al. 08] Aggregated representations (VLAD, Fisher) [Jégou et al. 10][Perronnin et al. 10] Relevant to Part II: Soft (multiple) assignment [Philbin et al. 08][Jégou et al. 10] Not discussed: Spatial matching [Philbin et al. 08][Tolias & Avrithis 11] Query expansion [Chum et al. 07][Tolias & Jégou 13] Re-ranking [Qin et al. 11][Shen et al. 12]

7 Related work In our common model: Bag-of-Words (BoW) [Sivic & Zisserman 03] Descriptor approximation (Hamming embedding) [Jégou et al. 08] Aggregated representations (VLAD, Fisher) [Jégou et al. 10][Perronnin et al. 10] Relevant to Part II: Soft (multiple) assignment [Philbin et al. 08][Jégou et al. 10] Not discussed: Spatial matching [Philbin et al. 08][Tolias & Avrithis 11] Query expansion [Chum et al. 07][Tolias & Jégou 13] Re-ranking [Qin et al. 11][Shen et al. 12]

8 Related work In our common model: Bag-of-Words (BoW) [Sivic & Zisserman 03] Descriptor approximation (Hamming embedding) [Jégou et al. 08] Aggregated representations (VLAD, Fisher) [Jégou et al. 10][Perronnin et al. 10] Relevant to Part II: Soft (multiple) assignment [Philbin et al. 08][Jégou et al. 10] Not discussed: Spatial matching [Philbin et al. 08][Tolias & Avrithis 11] Query expansion [Chum et al. 07][Tolias & Jégou 13] Re-ranking [Qin et al. 11][Shen et al. 12]

9 Image representation Entire image: set of local descriptors X = {x 1,..., x n } Descriptors assigned to cell c: X c = {x X : q(x) = c}

10 Set similarity function K(X, Y) = γ(x ) γ(y) c C w c M (X c, Y c ) normalization factor cell weighting cell similarity

11 Set similarity function K(X, Y) = γ(x ) γ(y) c C w c M (X c, Y c ) normalization factor cell weighting cell similarity

12 Set similarity function K(X, Y) = γ(x ) γ(y) c C w c M (X c, Y c ) normalization factor cell weighting cell similarity

13 Set similarity function K(X, Y) = γ(x ) γ(y) c C w c M (X c, Y c ) normalization factor cell weighting cell similarity

14 Set similarity function K(X, Y) = γ(x ) γ(y) c C w c M (X c, Y c ) normalization factor cell weighting cell similarity

15 Set similarity function K(X, Y) = γ(x ) γ(y) c C w c M (X c, Y c ) normalization factor cell weighting cell similarity

16 Bag-of-Words (BoW) similarity function Cosine similarity M(X c, Y c ) = X c Y c = x X c y Y c 1 Generic set similarity K(X, Y) = γ(x ) γ(y) c C w c M (X c, Y c )

17 Bag-of-Words (BoW) similarity function Cosine similarity M(X c, Y c ) = X c Y c = x X c y Y c 1 Generic set similarity K(X, Y) = γ(x ) γ(y) c C w c M (X c, Y c )

18 Bag-of-Words (BoW) similarity function Cosine similarity M(X c, Y c ) = X c Y c = x X c y Y c 1 Generic set similarity K(X, Y) = γ(x ) γ(y) c C w c M (X c, Y c )

19 Hamming Embedding (HE) M (X c, Y c ) = x X c y Y c w ( )) h (b x, b y weight function Hamming distance binary codes Generic set similarity K(X, Y) = γ(x ) γ(y) c C w c M (X c, Y c )

20 Hamming Embedding (HE) M (X c, Y c ) = x X c y Y c w ( )) h (b x, b y weight function Hamming distance binary codes Generic set similarity K(X, Y) = γ(x ) γ(y) c C w c M (X c, Y c )

21 Hamming Embedding (HE) M (X c, Y c ) = x X c y Y c w ( )) h (b x, b y weight function Hamming distance binary codes Generic set similarity K(X, Y) = γ(x ) γ(y) c C w c M (X c, Y c )

22 Hamming Embedding (HE) M (X c, Y c ) = x X c y Y c w ( )) h (b x, b y weight function Hamming distance binary codes Generic set similarity K(X, Y) = γ(x ) γ(y) c C w c M (X c, Y c )

23 Hamming Embedding (HE) M (X c, Y c ) = x X c y Y c w ( )) h (b x, b y weight function Hamming distance binary codes Generic set similarity K(X, Y) = γ(x ) γ(y) c C w c M (X c, Y c )

24 Hamming Embedding (HE) M (X c, Y c ) = x X c y Y c w ( )) h (b x, b y weight function Hamming distance binary codes Generic set similarity K(X, Y) = γ(x ) γ(y) c C w c M (X c, Y c )

25 VLAD M (X c, Y c ) = V (X c ) V (Y c ) = x X c y Y c r(x) r(y) aggregated residual x X c r(x) residual x q(x) Generic set similarity K(X, Y) = γ(x ) γ(y) c C w c M (X c, Y c )

26 VLAD M (X c, Y c ) = V (X c ) V (Y c ) = x X c y Y c r(x) r(y) aggregated residual x X c r(x) residual x q(x) Generic set similarity K(X, Y) = γ(x ) γ(y) c C w c M (X c, Y c )

27 VLAD M (X c, Y c ) = V (X c ) V (Y c ) = x X c y Y c r(x) r(y) aggregated residual x X c r(x) residual x q(x) Generic set similarity K(X, Y) = γ(x ) γ(y) c C w c M (X c, Y c )

28 VLAD M (X c, Y c ) = V (X c ) V (Y c ) = x X c y Y c r(x) r(y) aggregated residual x X c r(x) residual x q(x) Generic set similarity K(X, Y) = γ(x ) γ(y) c C w c M (X c, Y c )

29 VLAD M (X c, Y c ) = V (X c ) V (Y c ) = x X c y Y c r(x) r(y) aggregated residual x X c r(x) residual x q(x) Generic set similarity K(X, Y) = γ(x ) γ(y) c C w c M (X c, Y c )

30 Design choices Hamming embedding Binary signature & voting per descriptor (not aggregated) Selective: discard weak votes VLAD One aggregated vector per cell Linear operation Questions Is aggregation good with large vocabularies (e.g. 65k)? How important is selectivity in this case?

31 Design choices Hamming embedding Binary signature & voting per descriptor (not aggregated) Selective: discard weak votes VLAD One aggregated vector per cell Linear operation Questions Is aggregation good with large vocabularies (e.g. 65k)? How important is selectivity in this case?

32 Common model Non aggregated M N (X c, Y c ) = x X c y Y c σ ( ) φ(x) φ(y) selectivity function descriptor representation (residual, binary, scalar) Aggregated ( ) M A (X c, Y c ) = σ ψ φ(x) ψ ( ) φ(y) = σ Φ(X c ) Φ(Y c ) x X c y Yc normalization (l 2, power-law) cell representation

33 Common model Non aggregated M N (X c, Y c ) = x X c y Y c σ ( ) φ(x) φ(y) selectivity function descriptor representation (residual, binary, scalar) Aggregated ( ) M A (X c, Y c ) = σ ψ φ(x) ψ ( ) φ(y) = σ Φ(X c ) Φ(Y c ) x X c y Yc normalization (l 2, power-law) cell representation

34 Common model Non aggregated M N (X c, Y c ) = x X c y Y c σ ( ) φ(x) φ(y) selectivity function descriptor representation (residual, binary, scalar) Aggregated ( ) M A (X c, Y c ) = σ ψ φ(x) ψ ( ) φ(y) = σ Φ(X c ) Φ(Y c ) x X c y Yc normalization (l 2, power-law) cell representation

35 Common model Non aggregated M N (X c, Y c ) = x X c y Y c σ ( ) φ(x) φ(y) selectivity function descriptor representation (residual, binary, scalar) Aggregated ( ) M A (X c, Y c ) = σ ψ φ(x) ψ ( ) φ(y) = σ Φ(X c ) Φ(Y c ) x X c y Yc normalization (l 2, power-law) cell representation

36 Common model Non aggregated M N (X c, Y c ) = x X c y Y c σ ( ) φ(x) φ(y) selectivity function descriptor representation (residual, binary, scalar) Aggregated ( ) M A (X c, Y c ) = σ ψ φ(x) ψ ( ) φ(y) = σ Φ(X c ) Φ(Y c ) x X c y Yc normalization (l 2, power-law) cell representation

37 Common model Non aggregated M N (X c, Y c ) = x X c y Y c σ ( ) φ(x) φ(y) selectivity function descriptor representation (residual, binary, scalar) Aggregated ( ) M A (X c, Y c ) = σ ψ φ(x) ψ ( ) φ(y) = σ Φ(X c ) Φ(Y c ) x X c y Yc normalization (l 2, power-law) cell representation

38 BoW, HE and VLAD in the common model Model M(X c, Y c ) φ(x) σ(u) ψ(z) Φ(X c ) BoW M N or M A 1 u z X c HE M N only ˆbx w ( B 2 (1 u)) VLAD M N or M A r(x) u z V (X c ) BoW HE VLAD M(X c, Y c ) = 1 = X c Y c x X c y Y c M (X c, Y c ) = w (h (b x, b y )) x X c y Y c M (X c, Y c ) = x X c M N (X c, Y c ) = y Y c r(x) r(y) = V (X c ) V (Y c ) σ x X c y Y c ( ) φ(x) φ(y) ( ) M A (X c, Y c ) = σ ψ φ(x) ψ ( φ(y) = σ x X c y Y c ) Φ(X c ) Φ(Y c )

39 BoW, HE and VLAD in the common model Model M(X c, Y c ) φ(x) σ(u) ψ(z) Φ(X c ) BoW M N or M A 1 u z X c HE M N only ˆbx w ( B 2 (1 u)) VLAD M N or M A r(x) u z V (X c ) BoW HE VLAD M(X c, Y c ) = 1 = X c Y c x X c y Y c M (X c, Y c ) = w (h (b x, b y )) x X c y Y c M (X c, Y c ) = x X c M N (X c, Y c ) = y Y c r(x) r(y) = V (X c ) V (Y c ) σ x X c y Y c ( ) φ(x) φ(y) ( ) M A (X c, Y c ) = σ ψ φ(x) ψ ( φ(y) = σ x X c y Y c ) Φ(X c ) Φ(Y c )

40 BoW, HE and VLAD in the common model Model M(X c, Y c ) φ(x) σ(u) ψ(z) Φ(X c ) BoW M N or M A 1 u z X c HE M N only ˆbx w ( B 2 (1 u)) VLAD M N or M A r(x) u z V (X c ) BoW HE VLAD M(X c, Y c ) = 1 = X c Y c x X c y Y c M (X c, Y c ) = w (h (b x, b y )) x X c y Y c M (X c, Y c ) = x X c M N (X c, Y c ) = y Y c r(x) r(y) = V (X c ) V (Y c ) σ x X c y Y c ( ) φ(x) φ(y) ( ) M A (X c, Y c ) = σ ψ φ(x) ψ ( φ(y) = σ x X c y Y c ) Φ(X c ) Φ(Y c )

41 Selective Match Kernel (SMK) SMK(X c, Y c ) = x X c y Y c σ α (ˆr(x) ˆr(y)) Descriptor representation: l 2 -normalized residual Selectivity function σ α (u) = φ(x) = ˆr(x) = r(x)/ r(x) { sign(u) u α, u > τ 0, otherwise

42 Selective Match Kernel (SMK) SMK(X c, Y c ) = x X c y Y c σ α (ˆr(x) ˆr(y)) Descriptor representation: l 2 -normalized residual Selectivity function σ α (u) = φ(x) = ˆr(x) = r(x)/ r(x) { sign(u) u α, u > τ 0, otherwise 1 α=3, τ=0 similarity score dot product

43 Matching example impact of threshold α = 1, τ = 0.0 α = 1, τ = 0.25 thresholding removes false correspondences

44 Matching example impact of shape parameter α = 3, τ = 0.0 α = 3, τ = 0.25 weighs matches based on confidence

45 Aggregated Selective Match Kernel (ASMK) ( ) ASMK(X c, Y c ) = σ α ˆV (Xc ) ˆV (Yc ) Cell representation: l 2 -normalized aggregated residual Φ(X c ) = ˆV (X c ) = V (X c )/ V (X c ) Similar to [Arandjelovic & Zisserman 13], but: with selectivity function σ α used with large vocabularies

46 Aggregated Selective Match Kernel (ASMK) ( ) ASMK(X c, Y c ) = σ α ˆV (Xc ) ˆV (Yc ) Cell representation: l 2 -normalized aggregated residual Φ(X c ) = ˆV (X c ) = V (X c )/ V (X c ) Similar to [Arandjelovic & Zisserman 13], but: with selectivity function σ α used with large vocabularies

47 Aggregated features: k = 128 as in VLAD

48 Aggregated features: k = 65K as in ASMK

49 Why to aggregate: burstiness Burstiness: non-iid statistical behaviour of descriptors Matches of bursty features dominate the total similarity score Previous work: [Jégou et al. 09][Chum & Matas 10][Torii et al. 13] In this work Aggregation and normalization per cell handles burstiness Keeps a single representative, similar to max-pooling

50 Why to aggregate: burstiness Burstiness: non-iid statistical behaviour of descriptors Matches of bursty features dominate the total similarity score Previous work: [Jégou et al. 09][Chum & Matas 10][Torii et al. 13] In this work Aggregation and normalization per cell handles burstiness Keeps a single representative, similar to max-pooling

51 Binary counterparts SMK and ASMK Full vector representation: high memory cost Approximate vector representation: binary vector SMK (X c, Y c ) = } {ˆb(r(x)) ˆb(r(y)) x X c y Y c σ α ( ) ASMK (X c, Y c ) = σ α r(x) ˆb ˆb r(y) x X c y Yc ˆb includes centering and rotation as in HE, followed by binarization and l 2 -normalization

52 Impact of selectivity map map SMK Oxford5k Paris6k Holidays α ASMK Oxford5k Paris6k Holidays α map map SMK * Oxford5k Paris6k Holidays α ASMK * Oxford5k Paris6k Holidays α

53 Impact of aggregation Improves performance for different vocabulary sizes Reduces memory requirements of inverted file Oxford5k - MA k memory ratio 8k 69 % 16k 78 % 32k 85 % 65k 89 % with k = 8k on Oxford5k VLAD 65.5% SMK 74.2% ASMK 78.1% map SMK 72 SMK-BURST ASMK 70 8k 16k 32k 65k k

54 Comparison to state of the art Dataset MA Oxf5k Oxf105k Par6k Holiday ASMK ASMK ASMK ASMK HE [Jégou et al. 10] HE [Jégou et al. 10] HE-BURST [Jain et al. 10] HE-BURST [Jain et al. 10] Fine vocab. [Mikuĺık et al. 10] AHE-BURST [Jain et al. 10] AHE-BURST [Jain et al. 10] Rep. structures [Torri et al. 13]

55 Discussion Aggregation is also beneficial with large vocabularies burstiness Selectivity always helps (with or without aggregation) Descriptor approximation reduces performance only slightly

56 Part II: Vector quantization and nearest neighbor search Locally optimized product quantization Joint work with Yannis Kalantidis, CVPR 2014

57 Overview Problem: given query point q, find its nearest neighbor with respect to Euclidean distance within data set X in a d-dimensional space Focus on large scale: encode (compress) vectors, speed up distance computations Fit better underlying distribution with little space & time overhead

58 Applications Retrieval (image as point) [Jégou et al. 10][Perronnin et al. 10] Retrieval (descriptor as point) [Tolias et al. 13][Qin et al. 13] Localization, pose estimation [Sattler et al. 12][Li et al. 12] Classification [Boiman et al. 08][McCann & Lowe 12] Clustering [Philbin et al. 07][Avrithis 13]

59 Related work Indexing Inverted index (image retrieval) Inverted multi-index [Babenko & Lempitsky 12] (nearest neighbor search) Encoding and ranking Vector quantization (VQ) Product quantization (PQ) [Jégou et al. 11] Optimized product quantization (OPQ) [Ge et al. 13] Cartesian k-means [Norouzi & Fleet 13] Locally optimized product quantization (LOPQ) [Kalantidis and Avrithis 14] Not discussed Tree-based indexing, e.g., [Muja and Lowe 09] Hashing and binary codes, e.g., [Norouzi et al. 12]

60 Related work Indexing Inverted index (image retrieval) Inverted multi-index [Babenko & Lempitsky 12] (nearest neighbor search) Encoding and ranking Vector quantization (VQ) Product quantization (PQ) [Jégou et al. 11] Optimized product quantization (OPQ) [Ge et al. 13] Cartesian k-means [Norouzi & Fleet 13] Locally optimized product quantization (LOPQ) [Kalantidis and Avrithis 14] Not discussed Tree-based indexing, e.g., [Muja and Lowe 09] Hashing and binary codes, e.g., [Norouzi et al. 12]

61 Related work Indexing Inverted index (image retrieval) Inverted multi-index [Babenko & Lempitsky 12] (nearest neighbor search) Encoding and ranking Vector quantization (VQ) Product quantization (PQ) [Jégou et al. 11] Optimized product quantization (OPQ) [Ge et al. 13] Cartesian k-means [Norouzi & Fleet 13] Locally optimized product quantization (LOPQ) [Kalantidis and Avrithis 14] Not discussed Tree-based indexing, e.g., [Muja and Lowe 09] Hashing and binary codes, e.g., [Norouzi et al. 12]

62 Index Inverted index query images

63 Inverted file Inverted index query images

64 Inverted file Inverted index query images

65 Inverted file Inverted index query images

66 Ranking Inverted index query ranked shortlist images

67 Inverted index issues Are items in a postings list equally important? What changes under soft (multiple) assignment? How should vectors be encoded for memory efficiency and fast search?

68 Inverted multi-index inverted index inverted multi-index Figure 1. Indexing the set of 600 points (small black) distributed non-uniformly within the unit 2D square. Left the inverted index based on standard quantization (the codebook has 16 2D codewords; boundaries are in green). Right the inverted multi-index based on product quantization decompose (each of the two codebooks vectors hasas 16 1Dx codewords). = (x 1, The x 2 number ) of operations needed to match a query to codebooks is the same for both structures. Two example queries are issued (light-blue and light-red circles). The lists returned by the inverted index (left) contain 45train and 62 words codebooks respectively (circled). C 1, CNote 2 from that whendatasets a query lies near{x a space 1 n}, partition {x 2 n} boundary (as happens most often in high dimensions) the resulting list is heavily skewed and may not contain many of the nearest neighbors. Note also that the inverted index is not able induced to return listscodebook of a pre-specifiedcsmall 1 length C 2 (e.g. gives 30 points). a finer For the same partition queries, the candidate lists of at least 30 vectors are requested from the inverted multi-index (right) and the lists containing 31 and 32 words are returned (circled). As even such short lists require visiting several nearest cells in the partition (which can be done efficiently via the multi-sequence algorithm), the resulting given query q, visit cells (c 1, c 2 ) C 1 C 2 in ascending order of vector sets span the neighborhoods that are much less skewed (i.e., the neighborhoods are approximately centered at the queries). In high dimensions, distance the capabilityto visit q many by cells multi-sequence that surround the query from algorithm different directions translates into considerably higher accuracy of retrieval and nearest neighbor search. multi-index table corresponds to a part of the original vector space and contains a list of points that fall within that part. Importantly, we propose a simple and efficient algorithm that produces a sequence of multi-index entries oroneered by Sivic and Zisserman [23]. Since then, a large number of improvements that transfer further ideas from text retrieval (e.g. [4]), improve the quantization process (e.g. [20]), and integrate the query process with geomet-

69 Inverted multi-index inverted index inverted multi-index Figure 1. Indexing the set of 600 points (small black) distributed non-uniformly within the unit 2D square. Left the inverted index based on standard quantization (the codebook has 16 2D codewords; boundaries are in green). Right the inverted multi-index based on product quantization decompose (each of the two codebooks vectors hasas 16 1Dx codewords). = (x 1, The x 2 number ) of operations needed to match a query to codebooks is the same for both structures. Two example queries are issued (light-blue and light-red circles). The lists returned by the inverted index (left) contain 45train and 62 words codebooks respectively (circled). C 1, CNote 2 from that whendatasets a query lies near{x a space 1 n}, partition {x 2 n} boundary (as happens most often in high dimensions) the resulting list is heavily skewed and may not contain many of the nearest neighbors. Note also that the inverted index is not able induced to return listscodebook of a pre-specifiedcsmall 1 length C 2 (e.g. gives 30 points). a finer For the same partition queries, the candidate lists of at least 30 vectors are requested from the inverted multi-index (right) and the lists containing 31 and 32 words are returned (circled). As even such short lists require visiting several nearest cells in the partition (which can be done efficiently via the multi-sequence algorithm), the resulting given query q, visit cells (c 1, c 2 ) C 1 C 2 in ascending order of vector sets span the neighborhoods that are much less skewed (i.e., the neighborhoods are approximately centered at the queries). In high dimensions, distance the capabilityto visit q many by cells multi-sequence that surround the query from algorithm different directions translates into considerably higher accuracy of retrieval and nearest neighbor search. multi-index table corresponds to a part of the original vector space and contains a list of points that fall within that part. Importantly, we propose a simple and efficient algorithm that produces a sequence of multi-index entries oroneered by Sivic and Zisserman [23]. Since then, a large number of improvements that transfer further ideas from text retrieval (e.g. [4]), improve the quantization process (e.g. [20]), and integrate the query process with geomet-

70 Inverted multi-index inverted index inverted multi-index Figure 1. Indexing the set of 600 points (small black) distributed non-uniformly within the unit 2D square. Left the inverted index based on standard quantization (the codebook has 16 2D codewords; boundaries are in green). Right the inverted multi-index based on product quantization decompose (each of the two codebooks vectors hasas 16 1Dx codewords). = (x 1, The x 2 number ) of operations needed to match a query to codebooks is the same for both structures. Two example queries are issued (light-blue and light-red circles). The lists returned by the inverted index (left) contain 45train and 62 words codebooks respectively (circled). C 1, CNote 2 from that whendatasets a query lies near{x a space 1 n}, partition {x 2 n} boundary (as happens most often in high dimensions) the resulting list is heavily skewed and may not contain many of the nearest neighbors. Note also that the inverted index is not able induced to return listscodebook of a pre-specifiedcsmall 1 length C 2 (e.g. gives 30 points). a finer For the same partition queries, the candidate lists of at least 30 vectors are requested from the inverted multi-index (right) and the lists containing 31 and 32 words are returned (circled). As even such short lists require visiting several nearest cells in the partition (which can be done efficiently via the multi-sequence algorithm), the resulting given query q, visit cells (c 1, c 2 ) C 1 C 2 in ascending order of vector sets span the neighborhoods that are much less skewed (i.e., the neighborhoods are approximately centered at the queries). In high dimensions, distance the capabilityto visit q many by cells multi-sequence that surround the query from algorithm different directions translates into considerably higher accuracy of retrieval and nearest neighbor search. multi-index table corresponds to a part of the original vector space and contains a list of points that fall within that part. Importantly, we propose a simple and efficient algorithm that produces a sequence of multi-index entries oroneered by Sivic and Zisserman [23]. Since then, a large number of improvements that transfer further ideas from text retrieval (e.g. [4]), improve the quantization process (e.g. [20]), and integrate the query process with geomet-

71 Multi-sequence algorithm C 2 C 1

72 Vector quantization (VQ) minimize x Xmin x c C c 2 = x q(x) 2 = E(C) x X dataset codebook quantizer distortion

73 Vector quantization (VQ) minimize x Xmin x c C c 2 = x q(x) 2 = E(C) x X dataset codebook quantizer distortion

74 Vector quantization (VQ) minimize x Xmin x c C c 2 = x q(x) 2 = E(C) x X dataset codebook quantizer distortion

75 Vector quantization (VQ) For small distortion large k = C : hard to train too large to store too slow to search

76 Product quantization (PQ) minimize subject to x X min x c 2 c C C = C 1 C m

77 Product quantization (PQ) train: q = (q 1,..., q m ) where q 1,..., q m obtained by VQ store: C = k m with C 1 = = C m = k m search: y q(x) 2 = y j q j (x j ) 2 where q j (x j ) C j j=1

78 Optimized product quantization (OPQ) minimize subject to x X min x Rĉ 2 ĉ Ĉ Ĉ = C 1 C m R R = I

79 OPQ, parametric solution for X N (0, Σ) independence: PCA-align by diagonalizing Σ as UΛU balanced variance: permute Λ such that i λ i is constant in each subspace; R UP π find Ĉ by PQ on rotated data ˆx = R x

80 Locally optimized product quantization (LOPQ) compute residuals r(x) = x q(x) on coarse quantizer q collect residuals Z i = {r(x) : q(x) = c i } per cell train (R i, q i ) OPQ(Z i ) per cell

81 Locally optimized product quantization (LOPQ) better capture support of data distribution, like local PCA [Kambhatla & Leen 97] multimodal (e.g. mixture) distributions distributions on nonlinear manifolds residual distributions closer to Gaussian assumption

82 Multi-LOPQ x = ( x 1, x 2 ) q 2 q

83 Comparison to state of the art SIFT1B, 64-bit codes Method R = 1 R = 10 R = 100 Ck-means [Norouzi & Fleet 13] IVFADC IVFADC [Jégou et al. 11] OPQ Multi-D-ADC [Babenko & Lempitsky 12] LOR+PQ LOPQ Most benefit comes from locally optimized rotation!

84 Comparison to state of the art SIFT1B, 64-bit codes Method R = 1 R = 10 R = 100 Ck-means [Norouzi & Fleet 13] IVFADC IVFADC [Jégou et al. 11] OPQ Multi-D-ADC [Babenko & Lempitsky 12] LOR+PQ LOPQ Most benefit comes from locally optimized rotation!

85 Comparison to state of the art SIFT1B, 128-bit codes T Method R = K 10K 30K 100K IVFADC+R [Jégou et al. 11] LOPQ+R Multi-D-ADC [Babenko & Lempitsky 12] OMulti-D-OADC [Ge et al. 13] Multi-LOPQ Multi-D-ADC [Babenko & Lempitsky 12] OMulti-D-OADC [Ge et al. 13] Multi-LOPQ Multi-D-ADC [Babenko & Lempitsky 12] OMulti-D-OADC [Ge et al. 13] Multi-LOPQ

86 Residual encoding in related work PQ (IVFADC) [Jégou et al. 11]: single product quantizer for all cells [Uchida et al. 12]: multiple product quantizers shared by multiple cells OPQ [Ge et al. 13]: single product quantizer for all cells, globally optimized for rotation (single/multi-index) LOPQ: with/without one product quantizer per cell, with/without rotation optimization per cell (single/multi-index) [Babenko & Lempitsky 14]: one product quantizer per cell, optimized for rotation per cell (multi-index)

87 Residual encoding in related work PQ (IVFADC) [Jégou et al. 11]: single product quantizer for all cells [Uchida et al. 12]: multiple product quantizers shared by multiple cells OPQ [Ge et al. 13]: single product quantizer for all cells, globally optimized for rotation (single/multi-index) LOPQ: with/without one product quantizer per cell, with/without rotation optimization per cell (single/multi-index) [Babenko & Lempitsky 14]: one product quantizer per cell, optimized for rotation per cell (multi-index)

88 Thank you!

Optimized Product Quantization

OPTIMIZED PRODUCT QUATIZATIO Optimized Product Quantization Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun Abstract Product quantization (PQ) is an effective vector quantization method. A product quantizer