The Power of Asymmetry in Binary Hashing

Size: px

Start display at page:

Download "The Power of Asymmetry in Binary Hashing"

Pamela Harrison
5 years ago
Views:

1 The Power of Asymmetry in Binary Hashing Behnam Neyshabur Yury Makarychev Toyota Technological Institute at Chicago Russ Salakhutdinov University of Toronto Nati Srebro Technion/TTIC

2 Search by Image Image Database

3 Search by Image f( ) = Database of hashed image patches f( ) = Brompton

4 2 2 2 Binary Hashing { 1} k { 1} k 0/1 [ d hamming ( f(x), f(x ) ) < µ ] ¼ Sim(x,x )

5 2 2 2 Binary Hashing { 1} k { 1} k 0/1 [ d hamming ( f(x), f(x ) ) < µ ] ¼ Sim(x,x ) g Even if Sim(x,x ) is symmetric and well behaved : Use f(x) to hash objects in the database Use g(x) to hash queries No additional memory, communication or computation to perform query No increase in hash table size or lookup complexity We will show: shorter bit length; higher accuracy

6 Outline Theoretical Observation: Capturing similarity with arbitrary binary hashes Empirical Investigation: Database lookup with linear threshold hashes

7 Capturing Similarity with an Arbitrary Binary Code Given similarity S(x,x ) over objects X={x 1,x 2,,x n }, want mapping f:x! 1 k, s.t. S ij = S(x i,x j ) = [d(f(x i ),f(x j ))<µ] f( ) arbitrary, specified by u 1,,u n, u i =f(x i ) What is shortest k possible? 9 u 1,,u n 2 1 k, µ2r 8 ij S ij =[d(u i,u j )<µ] What is shortest k if we allow asymmetry? 9 u 1,,u n,v 1,,v n 2 1 k, µ2r 8 ij S ij =[d(u i,v j )<µ] v j =g(x j )

8 The Power of Asymmetry for Arbitrary Binary Hashes Theorem: For any r, there exists a set of points in Euclidean space, s.t. to capture S(x i,x j ) = [ x i -x j < 1 ] using a symmetric binary hash we need k sym 2 r-1 bits, but using an asymmetric hash, we need only k asym 2r bits. X X = n n n n n n n=2 r

9 Binary Hashing as Matrix Factorization [ d hamming ( u i, u j ) < µ ] ¼ Sim(x,x )

10 Binary Hashing as Matrix Factorization [ d hamming [ h( u i, i, u j j i) < µ ] ¼ Sim(x,x ) Given similarity matrix S 2 { 1} n n : min k s.t. U 2 { 1} k n symmetric sign(u T U- µ)=s min k s.t. U,V 2 { 1} k n asymmetric sign(u T V- µ)=s

11 Bonus: Approximation Algorithm via SDP Relaxation Given similarity matrix S 2 { 1} n n : min k s.t. U 2 { 1} k n sign( Y - µ)=s Y = U T U min k s.t. U,V 2 { 1} k n sign( Y - µ)=s Y = U T V symmetric asymmetric

12 Bonus: Approximation Algorithm via SDP Relaxation Given similarity matrix S 2 { 1} n n : min k s.t. U k symmetric S Y θ 1 Y = U T U min k s.t. U, V k asymmetric S Y θ 1 Y = U T V

13 Bonus: Approximation Algorithm via SDP Relaxation Given similarity matrix S 2 { 1} n n : min Y max s.t. symmetric S Y θ 1 Y 0 min Y max s.t. asymmetric S Y θ 1 Y 2 max min Y LR L,2 R,2

14 SDP Relaxation Rounding Y 2 max min Y LR L,2 R,2 Using random vectors z 1,,z k : u i = sign(l it z i ) v i = sign(r it z i ) k = O( (k opt ) 2 log n )

15 The Power of Asymmetry for Arbitrary Binary Hashes Theorem: For any r, there exists a set of points in Euclidean space, s.t. to capture S(x i,x j ) = [ x i -x j < 1 ] using a symmetric binary hash we need k sym 2 r-1 bits, but using an asymmetric hash, we need only k asym 2r bits. X X = n n n n n n n=2 r

16 Arbitrary Binary Hashes: Empirical Evaluation 35 10D Uniform 70 LabelMe bits Symmetric Asymmetric bits Symmetric Asymmetric Average Precision Average Precision

17 Parametric Mappings f(x) = Á F (x) has some parametric form E.g. Á F (x) = sign(fx), F2R k d Could be more complex, e.g. multi-layer network, kernel based, etc. Why restrict f( )? Generalization: learn F using objects x 1,,x n, then receive new objects x as queries Compactness of representation Asymmetric extension: S(x,x ) ¼ [ d(á F (x), Á G (x ) ) < µ ] Learn params F, G from training data (= pairs of database objects) Hash database objects using Á G Hash queries using Á F

18 A bit on optimization Loss function : L(Á F (x), Á G (x ), S(x,x )) Parameters F, G For Á F (x) = sign(fx): Updating single row of F (responsible for single bit in the hashing) entails solving a single weighted binary classification problem

19 Empirical Results using Asymmetric Linear Threshold Hashes bits required LabelMe bits required MNIST 5 5 bits required Average Precision Peekaboom Average Precision Comparison with learning symmetric linear threshold hashes using Minimum Loss Hashing [Norouzi Fleet ICML 2011] Training on all pairs of database objects, average precision measured on held out query objects.

20 Even More Power: Searching a Static Database S(x query, x i ) ¼ [ d(áf F (x query ), Ág G (x i ) ) < µ ] f(x query ) g(x 1 ) g(x 5 ) g(x 2 ) g(x 4 ) g(x n ) g(x 3 )

21 Even More Power: Searching a Static Database S(x query, x i ) ¼ [ d(á F (x query ), Á G (x i ) ) < µ ] v i No need to generalize: only used on database objects Stored in database hash, no need for compactness Can use arbitrary mapping g(x i )=v i Learn parametric Á F ( ) and arbitrary vectors v 1,,v n such that on x 1,,x n : S(x i,x j ) ¼ [ d(á F (x i ), v j ) < µ ] For query x, use d(á F (x),v j ) to aprox S(x,x j ) e.g. find v j in DB with small d(á F (x),v j )

22 Empirical Results using Linear-Threshold-to-Arbitrary Hashes bits required LabelMe bits required MNIST 5 5 bits required Average Precision Peekaboom Precision Average Precision LabelMe Lin:Arb 64bits MLH 64bits MLH 256bits Recall

23 Summary Neyshabur et al, The Power of Asymmetry in Binary Hashing, NIPS 2013 Binary Hashing Fast similarity approximation Approximate retrieval Nearest Neighbor search Locality Sensitive Hashing (LSH) In many applications: want short bit-length Smaller hash tables Super-fast hamming distance eval on short words Fewer bits to transmit Asymmetric hashes can enable better approximation with shorter bit-length! Even if similarity function symmetric and well-behaves In most applications: no additional computational or memory costs

On Symmetric and Asymmetric LSHs for Inner Product Search

On Symmetric and Asymmetric LSHs for Inner Product Search Behnam Neyshabur Nathan Srebro Toyota Technological Institute at Chicago, Chicago, IL 6637, USA BNEYSHABUR@TTIC.EDU NATI@TTIC.EDU Abstract We consider the problem of designing locality sensitive hashes