Approximate Nearest Neighbor (ANN) Search - II

Size: px

Start display at page:

Download "Approximate Nearest Neighbor (ANN) Search - II"

Kerrie Kennedy
5 years ago
Views:

1 Approxmate Nearest Neghbor (ANN) Search - II Sanjv Kumar, Google Research, NY EECS-6898, Columba Unversty - Fall, 2010 EECS6898 Large Scale Machne Learnng 1

2 Two popular ANN approaches Tree approaches Recursvely partton the data: Dvde and Conquer Expected query tme: O(log n) (wth constants exponental n dmenson) Performance degrades wth hgh-dmensonal data Large storage needs Orgnal data s requred at run-tme Hashng approaches Each tem n database represented as a code Sgnfcant reducton n storage For 64 bt codes, just 8GB storage nstead of 40TB Expected query tme: O(1) or sublnear n n Search n 64-bt hammng space: ~13 sec nstead of ~15 hrs/query Compact codes preferred EECS6898 Large Scale Machne Learnng 2

3 Example: Bnary Codes Lnear projecton (hyperplane) based parttonng h2 h 1 X x 1 x 2 x 3 x 4 x 5 x 1 y x 3 y x 4 y m x 2 x No recursve parttonng unle trees! EECS6898 Large Scale Machne Learnng 3

4 Hashng: Man Steps 1. Tranng Defne a model to convert an nput tem n a code Learn the parameters of the model Possbly usng a subset of randomly sampled database tems EECS6898 Large Scale Machne Learnng 4

5 Hashng: Man Steps 1. Tranng Defne a model to convert an nput tem n a code Learn the parameters of the model Possbly usng a subset of randomly sampled database tems Gven nput x Learn h x) = { h ( x), h ( x),..., h ( )} ( x) Z ( 1 2 m x h Example: Bnary codes usng lnear projectons equvalent to T ( h x) = sgn( w x + b ) ( x) { 1,1} y h ( x) = (1 h ( x)) / 2 ( x) {0,1} + y EECS6898 Large Scale Machne Learnng 5

6 Hashng: Man Steps 1. Tranng Defne a model to convert an nput tem n a code Learn the parameters of the model Possbly usng a subset of randomly sampled database tems Gven nput x Learn h x) = { h ( x), h ( x),..., h ( )} ( x) Z ( 1 2 m x h Example: Bnary codes usng lnear projectons equvalent to T ( h x) = sgn( w x + b ) ( x) { 1,1} y h ( x) = (1 h ( x)) / 2 ( x) {0,1} + y Tranng goal: To learn parameters for m hash functons { w, b } = 1,..., m EECS6898 Large Scale Machne Learnng 6

7 Hashng: Man Steps 2. Indexng Represent each tem n the database as a code In some cases, organze all the codes n a hash table (nverse-looup): For a gven code, return all the ponts wth the same code EECS6898 Large Scale Machne Learnng 7

8 Hashng: Man Steps 2. Indexng Represent each tem n the database as a code In some cases, organze all the codes n a hash table (nverse-looup): For a gven code, return all the ponts wth the same code x 1 x M x n database tems code EECS6898 Large Scale Machne Learnng 8

9 Hashng: Man Steps 2. Indexng Represent each tem n the database as a code In some cases, organze all the codes n a hash table (nverse-looup): For a gven code, return all the ponts wth the same code x 1 x x, 1 x 2 x n M M x n database tems code Hash-Table (Inverse Looup) tems EECS6898 Large Scale Machne Learnng 9

10 Hashng: Man Steps 3. Queryng Convert the query to code Fnd all tems wth the same code n database usng hash table Return all ponts wthn a small radus of query n code space Use multple codes (and tables) to ncrease recall Ran all the database tems accordng to ther dstance from query n code space EECS6898 Large Scale Machne Learnng 10

11 Hashng: Man Steps 3. Queryng Convert the query to code Fnd all tems wth the same code n database usng hash table Return all ponts wthn a small radus of query n code space Use multple codes (and tables) to ncrease recall Ran all the database tems accordng to ther dstance from query n code space x n q h(q) = codes at hammng radus M M Hash-Table (Inverse Looup) x, 1 x 2 tems EECS6898 Large Scale Machne Learnng 11

12 Hashng: Man Steps 3. Queryng Convert the query to code Fnd all tems wth the same code n database usng hash table Return all ponts wthn a small radus of query n code space Use multple codes (and tables) to ncrease recall Ran all the database tems accordng to ther dstance from query n code space x n q h(q) = x, 1 x 2 O(nm) O(1) space M tme codes at M hammng Hash-Table tems radus (Inverse Looup) r Number of codes to search at radus r : O( m ) Bucets for many codes may be empty: wth hgh probablty for large m! EECS6898 Large Scale Machne Learnng 12

13 Hashng: Man Steps each m-bt 3. Queryng Convert the query to code Fnd all tems wth the same code n database usng hash table Return all ponts wthn a small radus of query n code space Use multple codes (and tables) to ncrease recall Ran all the database tems accordng to ther dstance from query n code space Table 1 Table x n Multple Codes h 1 ( x) M h L (x) M x, 1 x 2 M q x n x 2 x 1 L O(nLm) O(L) space tme EECS6898 Large Scale Machne Learnng 13

14 Hashng: Man Steps 3. Queryng Convert the query to code Fnd all tems wth the same code n database usng hash table Return all ponts wthn a small radus of query n code space Use multple codes (and tables) to ncrease recall Ran all the database tems accordng to ther dstance from query n code space dstance x x Exhaustve dstances n code space q h(q) = M return closest tems x n O(n) lnear search! database tems code At most m dstance levels EECS6898 Large Scale Machne Learnng 14

15 Hashng Technques 1. Unsupervsed use unlabeled data to learn hash functons Localty Senstve Hashng (LSH), PCA Hashng, Spectral Hashng, Mn-Hashng, Kernel-LSH, 2. Supervsed use labeled pars to learn hash functons Boosted Hashng, Bnary Reconstructve Embeddng, 3. Sem-Supervsed use labeled pars and unlabeled data both Sequental Learnng, 4. Type of Hash Functon Lnear/Quas-lnear: LSH, Mn-Hash, SH, PCA-Hash, Nonlnear: KLSH, RBM, BRE, EECS6898 Large Scale Machne Learnng 15

16 Localty Senstve Hashng (LSH) A famly of hash functons H = { h : X Z} s called ( r1, r2, p1, p2) - senstve f for any x x X 1, 2 f d( x1, x2) r1 then Pr[ h( x1) = h( x2)] p1, f d( x1, x2) > r2 then Pr[ h( x1 ) = h( x2)] p2. r < r and p > p where EECS6898 Large Scale Machne Learnng 16

17 Localty Senstve Hashng (LSH) A famly of hash functons H = { h : X Z} s called ( r1, r2, p1, p2) - senstve f for any x x X 1, A smple LSH famly 2 f d( x1, x2) r1 then Pr[ h( x1) = h( x2)] p1, f d( x1, x2) > r2 then Pr[ h( x1 ) = h( x2)] p2. r < r and p > p h where 1 2 T ( x) = ( w x + b t w ~ Ps ( w) ) 1 2 b ~ U[0, t] s-stable dstrbuton EECS6898 Large Scale Machne Learnng 17

18 Localty Senstve Hashng (LSH) A famly of hash functons H = { h : X Z} s called ( r1, r2, p1, p2) - senstve f for any x x X 1, A smple LSH famly 2 f d( x1, x2) r1 then Pr[ h( x1) = h( x2)] p1, f d( x1, x2) > r2 then Pr[ h( x1 ) = h( x2)] p2. r < r and p > p h where 1 2 T ( x) = ( w x + b t w ~ Ps ( w) ) 1 2 b ~ U[0, t] s-stable dstrbuton Specal case: bnary hashng h (x) t T w x + b EECS6898 Large Scale Machne Learnng 18

19 s-stable Dstrbutons A dstrbuton P s (r) s called s-stable f there exsts an s 0 such that for any x R d, and any w wth..d. w ~ P s, then T x w ~ x s w T ( 1 x2) w ~ ( x1 x2 x ) Neghborng ponts tend to have smlar projectons Bnnng projectons has LSH property! s w EECS6898 Large Scale Machne Learnng 19

20 s-stable Dstrbutons A dstrbuton P s (r) s called s-stable f there exsts an s 0 such that for any x R d, and any w wth..d. w ~ P s, then T x w ~ x s w T ( 1 x2) w ~ ( x1 x2 x ) Neghborng ponts tend to have smlar projectons Bnnng projectons has LSH property! Specal Case: s = 2 (Eucldean dstance) w Var s w ~ P = N(0,1) w ~ N(0, I) s T T T T T 2 [ x w] = E[ x ww x] = x E[ ww ] x = x 2 E T T x w ~ x w 2 Gaussan dstrbuton s 2-stable! [ x w] = 0 Whch dstrbuton s 1-stable? Cauchy! One can fnd s-stable dstrbuton for all s (0, 2] EECS6898 Large Scale Machne Learnng 20

21 Collson Probablty Suppose u = ( x1 x2) and f s s (a) s pdf of absolute of s-stable random varable,.e., a = w, then probablty of collson, p( u) = t 0 Pr[ h( x1 ) = h( x2)] = (1/ u) fs ( a / u)(1 a / t) da p(u) ncreases as u decreases! EECS6898 Large Scale Machne Learnng 21

22 Collson Probablty Suppose u = ( x1 x2) and f s s (a) s pdf of absolute of s-stable random varable,.e., a = w, then probablty of collson, p( u) = t 0 Pr[ h( x1 ) = h( x2)] = (1/ u) fs ( a / u)(1 a / t) da p(u) ncreases as u decreases! How to choose t? tˆ = arg mn ρ = arg mn[log(1/ p 1 ) / log(1/ p2)] t t Can be computed analytcally for s = 1, 2 EECS6898 Large Scale Machne Learnng 22

23 Collson Probablty Suppose u = ( x1 x2) and f s s (a) s pdf of absolute of s-stable random varable,.e., a = w, then probablty of collson, p( u) = t 0 Pr[ h( x1 ) = h( x2)] = (1/ u) fs ( a / u)(1 a / t) da p(u) ncreases as u decreases! How to choose t? tˆ = arg mn ρ = arg mn[log(1/ p 1 ) / log(1/ p2)] t t Can be computed analytcally for s = 1, 2 ρ ρ not too senstve to t f suffcently away from 0! t Datar et al.[5] EECS6898 Large Scale Machne Learnng 23

24 Parameter selecton n LSH How many tables (L) and how many bts per table (m)? Gven a query q and ts near-neghbor x, suppose each hash functon satsfes Pr[ h ( q) = h ( x )] p 1 EECS6898 Large Scale Machne Learnng 24

25 Parameter selecton n LSH How many tables (L) and how many bts per table (m)? Gven a query q and ts near-neghbor x, suppose each hash functon satsfes Pr[ h ( q) = h ( x )] p For m-bt code Pr[ m h( q) = h( x )] p1 Probablty of collson falls exponentally wth m! 1 EECS6898 Large Scale Machne Learnng 25

26 Parameter selecton n LSH How many tables (L) and how many bts per table (m)? Gven a query q and ts near-neghbor x, suppose each hash functon satsfes Pr[ h ( q) = h ( x )] p For m-bt code Pr[ m h( q) = h( x )] p1 Probablty of collson falls exponentally wth m! 1 Probablty that q and x wll fal to collde for all L tables m ( 1 1 p ) Bound the probablty that q and x wll collde for at least one of L tables m 1 1 L L (1 p ) 1 δ L log( 1/ δ) / log(1 p1 m ) EECS6898 Large Scale Machne Learnng 26

27 Precson-Recall Tradeoff For hgh precson, longer codes (.e. large m) preferred Large m reduces the probablty of collson exponentally low recall Many tables (large L) necessary to get good recall Large storage EECS6898 Large Scale Machne Learnng 27

28 Precson-Recall Tradeoff For hgh precson, longer codes (.e. large m) preferred Large m reduces the probablty of collson exponentally low recall Many tables (large L) necessary to get good recall Large storage Desgn L and m such that run-tme s mnmzed for a gven applcaton: T = T + T total h r To compute L m-bt hash functons and retreve ponts from tables va looups To compute exact dstance wth retreved ponts and return top (sublnear n n) Larger m ncreases T h but decreases T r Emprcally estmate total tme averaged over many queres In some cases, T r s smply vote over how many tables returned an tem How to avod large number of tables? EECS6898 Large Scale Machne Learnng 28

29 Mult-Probe LSH Strategy to ncrease recall wthout usng large number of tables Basc dea If two neghbors do not fall n the same bucet, they should fall n a nearby one, e.g., wthn a hammng dstance of 1 Lv et al.[7] EECS6898 Large Scale Machne Learnng 29

30 Mult-Probe LSH Strategy to ncrease recall wthout usng large number of tables Basc dea If two neghbors do not fall n the same bucet, they should fall n a nearby one, e.g., wthn a hammng dstance of 1 How to choose the sequence? q tables wth smaller average gap are gven prorty q T w x + b T l w x + b l Lv et al.[7] EECS6898 Large Scale Machne Learnng 30

31 Data-Dependent Projectons PCA-Hash: Let s focus on bnary codes 0 1 egenvector h ( x) = sgn( w x + b ) T w ~ egenvec ( Cov( X )) Projecton on max varance drectons followed by medan threshold EECS6898 Large Scale Machne Learnng 31

32 Data-Dependent Projectons PCA-Hash: Let s focus on bnary codes 0 1 egenvector h ( x) = sgn( w x + b ) T w ~ egenvec ( Cov( X )) Projecton on max varance drectons followed by medan threshold Performance degrades wth larger number of bts Varance decreases rapdly for most real-world data Can one reuse the hgh varance drectons? EECS6898 Large Scale Machne Learnng 32

33 Spectral Hash Data-dependent learnng of bnary codes h(x) such that smlarty between x, x j mn W h( x ) h( x, j j j ) 2 subject to h ( x ) = 0 balanced parttonng h ( x ) h ( x ) = 0 l uncorrelated l h ( x) { 1,1} EECS6898 Large Scale Machne Learnng 33

34 Spectral Hash Data-dependent learnng of bnary codes h(x) such that Issues smlarty between x, x j 2 mn Wj h( x ) h( x j ) Graph Laplacan O(n 2 ), j subject to h ( x ) = 0 balanced parttonng h ( x ) h ( x ) = 0 l uncorrelated l h ( x) { 1,1} Computatonally extremely expensve (needs complete NN search) Balanced graph parttonng problem even wth sngle bt NP hard Approxmaton Assumes unform data dstrbuton and solves 1D Laplacan egenfunctons analytcally EECS6898 Large Scale Machne Learnng 34

35 Spectral Hash Three man steps Extract max-varance drectons usng PCA Select whch drecton to pc next based on modes of 1D-Laplacan Hgh varance PCA drectons may be pced agan Create bts by thresholdng snusodal egenfunctons at zero slower than smple thresholdng EECS6898 Large Scale Machne Learnng 35

36 Spectral Hash h ( x) = sgn(cos( αw x)) w ~ egenvec ( Cov( X )) Three man steps Extract max-varance drectons usng PCA Select whch drecton to pc next based on modes of 1D-Laplacan Hgh varance PCA drectons may be pced agan Create bts by thresholdng snusodal egenfunctons slower than smple thresholdng T In practce, PCA-hash wth medan threshold may do better but both suffer from low-varance drectons 1 EECS6898 Large Scale Machne Learnng 36

37 Spectral Hash Experment Testng usng Hammng radus around the query Dense 384-dm vector PCA-Hash gves smlar or better performance For Hammng-radus testng, better to use LSH wth medan threshold Wess et al.[9] EECS6898 Large Scale Machne Learnng 37

38 Shft-Invarant Kernel Embeddng Use random projectons along wth snusodal thresholdng Key dea Suppose smlarty between a par of ponts s gven by a shft-nvarant ernel,.e., K( x x) = K(0) = 1 Examples s( x, y) = K( x, y) = K( x y) 1 s( x, y) = exp( γ s( x, y) = exp( γ x x y y 2 1 ) / 2) L 2 dstance L 1 dstance EECS6898 Large Scale Machne Learnng 38

39 Shft-Invarant Kernel Embeddng Use random projectons along wth snusodal thresholdng Key dea Suppose smlarty between a par of ponts s gven by a shft-nvarant ernel,.e., K( x x) = K(0) = 1 Examples Want to learn m-bt code h(x) such that s( x, y) = K( x, y) = K( x y) 1 s( x, y) = exp( γ s( x, y) = exp( γ x x y y 2 1 ) / 2) L 2 dstance L 1 dstance f1( K( x y)) (1/ m) dh ( h( x), h( y)) f2( K( x y)) normalzed Hammng dstance decreasng functons small for smlar ponts f1 ( 1) = f2(1) = 0, f1 ( 0) = f2(0) = c > 0 EECS6898 Large Scale Machne Learnng 39

40 Shft-Invarant Kernel Embeddng Man steps Approxmate shft-nvarant ernels as dot products of random fourer features Pc drectons from dstrbuton nduced by ernel smlar to s-stable drectons Create bts by thresholdng snusodal egenfunctons EECS6898 Large Scale Machne Learnng 40

41 Shft-Invarant Kernel Embeddng standard converson nto 0/1 Man steps Approxmate shft-nvarant ernels as dot products of random fourer features Pc drectons from dstrbuton nduced by ernel smlar to s-stable drectons Create bts by thresholdng snusodal egenfunctons h ( x) = sgn(cos( w x + b ) + t ) T Performance (wth hammng ranng) better f large number of bts are used! w EECS6898 Large Scale Machne Learnng 41 1 ~ N(0, γi ) crtcal for performance b ~ U[0, 2π] ~ U[ 1,1] t

42 Shft-Invarant Kernel Embeddng SIKE Spectral Hash Test usng exhaustve hammng ranng wth all database tems Dense 384-dm vectors After 256 bts, performance of Spectral Hash falls Even regular LSH qute powerful f large number of bts are used Ragnsy et al.[11] EECS6898 Large Scale Machne Learnng 42

43 Mn-Hash A method to estmate Jaccard smlarty between sets (or vectors) Jaccard smlarty between two sets (A, B) or two vectors (x, y) A I B mn(, ) J ( A, B) = (, ) = x y J x y x, y 0 A U B max( x, y ) EECS6898 Large Scale Machne Learnng 43

44 Mn-Hash A method to estmate Jaccard smlarty between sets (or vectors) Jaccard smlarty between two sets (A, B) or two vectors (x, y) J ( A, B) A I B = A U B mn(, ) (, ) = x y J x y x, y 0 max( x, y ) relaton wth L 1? The above two defntons equvalent represent each set as a bnary vector of length A U B EECS6898 Large Scale Machne Learnng 44

45 Mn-Hash A method to estmate Jaccard smlarty between sets (or vectors) Jaccard smlarty between two sets (A, B) or two vectors (x, y) J ( A, B) A I B = A U B mn(, ) (, ) = x y J x y x, y 0 max( x, y ) relaton wth L 1? The above two defntons equvalent represent each set as a bnary vector of length A U B Suppose h (.) s a random functon that maps each tem to a real number smple choce j h ( x ) h ( x ) and Pr[ h ( x ) < h ( x )] = 0. 5 h = ( x ) U[0,1] j EECS6898 Large Scale Machne Learnng 45

46 Mn-Hash A method to estmate Jaccard smlarty between sets (or vectors) Jaccard smlarty between two sets (A, B) or two vectors (x, y) J ( A, B) A I B = A U B The above two defntons equvalent represent each set as a bnary vector of length A U B Suppose h (.) s a random functon that maps each tem to a real number smple choce j h ( x ) h ( x ) and Pr[ h ( x ) < h ( x )] = 0. 5 h = ( x ) U[0,1] mn(, ) (, ) = x y J x y x, y 0 max( x, y ) mn-hash m( A, h ) = arg mn h ( x ) x A j relaton wth L 1? Pr[ m( A, h ) = m( B, h )] = J ( A, B) EECS6898 Large Scale Machne Learnng 46

47 Mn-Hash f suppose x u A I B Thus mn-hash m( A, h ) = arg mn h ( x ) m ( A U B, h ) = x Pr[ m( A, h u u x = m( A, h ) & x = m( B, h ) A A I B ) = m( B, h )] = A U B x u EECS6898 Large Scale Machne Learnng 47

48 Mn-Hash f Setches Thus mn-hash m A, h ) = arg mn h ( x ) suppose x u A I B Pr[ m( A, h ( x A m ( A U B, h ) = x u ) = m( B, h )] = A I B A U B For retreval effcency, mn-hashes are grouped n s-tuples. For s random functons ( h 1,..., h s ), setch ( A) = ( m( A, h1 ),..., m( A, hs )) s Pr[ setch ( A) = setch ( B)] = J ( A, B) In practce, many setches are created and sets (.e. vectors) that have at least setches n common are retreved for further testng. Generalzatons to non-bnary vectors, contnuous valued vectors possble Good performance for hgh-dm (but mostly sparse) vectors u x = m( A, h ) & x = m( B, h ) u EECS6898 Large Scale Machne Learnng 48

49 Kernel Localty Senstve Hashng (KLSH) Learn LSH-type codes but when only ernel smlarty, (x,y), s nown Data may not be gven n explct vector space A dfferent vew of LSH Pr[ h ( x) = h( y)] = sm ( x, y) EECS6898 Large Scale Machne Learnng 49

50 Kernel Localty Senstve Hashng (KLSH) Learn LSH-type codes but when only ernel smlarty, (x,y), s nown Data may not be gven n explct vector space A dfferent vew of LSH Pr[ h ( x) = h( y)] = Example sm ( x, y) = x T y sm ( x, y) [0,1] query-tme to fnd (1+ε)-neghbor O( n h T 1, f r x > 0 ( x) = 0 otherwse 1/(1+ ε) r ~ N(0, I) ) Pr[sgn( r x) = sgn( r 1 y)] = 1 cos π T T 1 T x y x y EECS6898 Large Scale Machne Learnng 50

51 Kernel Localty Senstve Hashng (KLSH) Learn LSH-type codes but when only ernel smlarty, (x,y), s nown Data may not be gven n explct vector space A dfferent vew of LSH Pr[ h ( x) = h( y)] = Example sm ( x, y) = x T Pr[sgn( r y sm ( x, y) [0,1] query-tme to fnd (1+ε)-neghbor O( n h x) = sgn( r 1, f r x > 0 ( x) = 0 otherwse 1 y)] = 1 cos π T T 1 1/(1+ ε) EECS6898 Large Scale Machne Learnng 51 T x y x y ) r ~ N(0, I) Suppose we are gven only smlarty mplct (unnown) feature vector sm ( x, y) = ( x, y) = Φ T ( x) Φ ( y) How to compute the hash functon when vector s not nown? T

52 Kernel Localty Senstve Hashng (KLSH) Goal: To fnd approprate random projecton n mplct feature space r T Φ(x) From RKHS argument r n 1 p = = wφ( x ) = 1wΦ( x = Φw ) randomly chosen p << n EECS6898 Large Scale Machne Learnng 52

53 Kernel Localty Senstve Hashng (KLSH) Goal: To fnd approprate random projecton n mplct feature space r T Φ(x) From RKHS argument r n 1 p = = wφ( x ) = 1wΦ( x = Φw ) randomly chosen p << n Fnd w such that T = E[ r] = 0, E[ rr ] I w = K 1/ 2 e s K = Φ T Φ, e = [0,0,1,0,1...] s T # of 1 s s a parameter T h( Φ( x)) = sgn( r Φ( x) = sgn = 1w ( x, x ) p non-lnear hashng usually slower run-tme! EECS6898 Large Scale Machne Learnng 53

54 KLSH vs LSH n = 100K mage patches Kuls et al.[12] EECS6898 Large Scale Machne Learnng 54

55 Semantc Hashng (RBM) Nonlnear method to create bnary codes usng Restrcted Boltzmann Machnes (RBMs) Specal type of Marov Random Felds p ( v, h) = exp{ E( v, h)} / Z p ( v) = h p( v, h) bnary EECS6898 Large Scale Machne Learnng 55

Semantc Hashng (RBM) Nonlnear method to create bnary codes usng Restrcted Boltzmann Machnes (RBMs) Specal type of Marov Random Felds p ( v, h) = exp{ E( v, h)} / Z p ( v) = h p( v, h) bnary p ( h 1

56 Semantc Hashng (RBM) Nonlnear method to create bnary codes usng Restrcted Boltzmann Machnes (RBMs) Specal type of Marov Random Felds p ( v, h) = exp{ E( v, h)} / Z p ( v) = h p( v, h) bnary p ( h 1 v) = σ( b + w v ) j = j j p( v h) = N( f ( h), τi) σ( x) = 1/(1 + e x ) Learn parameters (W and b) usng (approx) max-lelhood p(v) How to learn codes? Stac multple RBMs (Deep Belef Networs) EECS6898 Large Scale Machne Learnng 56

57 Semantc Hashng (RBM) Deep Belef Networs Smlar functonalty as mult-layer Neural Nets Output of each stage used as nput to next Bac-propagaton (coordnate descent) wth Auto-encodng objectve Many parameters, archtecture choces, usually slow to tran and to apply! EECS6898 Large Scale Machne Learnng Salahutdnov et al.[8] 57

58 Bnary Reconstructve Embeddng Construct bnary codes by mnmzng dfference between orgnal (metrc) dstance and hammng dstance h T ( x) = sgn( w v( x)) where v( x) = [1, K( x 1, x),..., K( x s, x)] T ernel chosen randomly or learned EECS6898 Large Scale Machne Learnng 58

59 Bnary Reconstructve Embeddng Construct bnary codes by mnmzng dfference between orgnal (metrc) dstance and hammng dstance h T ( x) = sgn( w v( x)) where v( x) = [1, K( x 1, x),..., K( x s, x)] T Wˆ = arg mn W ( x, x j [ d M ) T ( x, x j ) d ernel H ( x, x j chosen randomly or learned 2 )] EECS6898 Large Scale Machne Learnng 59

60 Bnary Reconstructve Embeddng Construct bnary codes by mnmzng dfference between orgnal (metrc) dstance and hammng dstance h T ( x) = sgn( w v( x)) where v( x) = [1, K( x 1, x),..., K( x s, x)] T Wˆ = arg mn W ( x, x j [ d M ) T ( 1/ 2) x x j 2 ( x, x j ) d ernel H ( x, x j chosen randomly or learned 2 )] m 2 1 [ h ( x ) h ( x j )] ( 1/ 4m) = Data scaled to unt norm to remove the effect of scale (but ths changes the dstance between ponts), unform scalng better Coordnate Descent for optmzaton, deals wth dscontnutes Expensve to compute codes, learnng approprate anchors for ernel representatons hard EECS6898 Large Scale Machne Learnng 60

61 Bnary Reconstructve Embeddng Testng based on ponts retreved wthn hammng radus 3 LSH s qute close even for moderate number of bts! Kuls et al.[13] EECS6898 Large Scale Machne Learnng 61

62 Sem-supervsed Hashng Suppose a few neghbor pars and a few non-neghbor pars are gven Neghbor par Non-neghbor par Maxmum entropy prncple Sem-supervsed Formulaton Emprcal ftness (labeled data) Regularzer (all data) Wang et al.[14] EECS6898 Large Scale Machne Learnng 62

63 Flcr-270K Wang et al.[15] EECS6898 Large Scale Machne Learnng 63

64 References 1. A. Broder, On the resemblance and contanment of documents, In Compresson and Complexty of Sequences, P. Indy and R. Motwan, Approxmate nearest neghbors: towards removng the curse of dmensonalty, n Proc. of 30th ACM Symposum on Theory of Computng, A. Gons, P. Indy, R. motwan, Smlarty Search n Hgh Dmensons va Hashng, In VLDB, G. Shahnarovch, P. Vola, and T. Darrell, Fast pose estmaton wth parameter-senstve hashng, n Proc. of the IEEE Internatonal Conference on Computer Vson, M. Datar, N. Immorlca, P. Indy, and V. Mrron, Localty-senstve hashng scheme based on p-stable dstrbutons, n Annual Symposum on Computatonal Geometry, G. Shahnarovch, T. Darrell, and P. Indy, Nearest-Neghbor Methods n Learnng and Vson: Theory and Practce (Neural Informaton Processng). The MIT Press, Q. Lv, W. Josephson, Z. Wang, M. Charar, and K. L, Multprobe LSH: effcent ndexng for hgh-dmensonal smlarty search, n Proceedngs of the 33rd nternatonal conference on Very large databases, R. Salahutdnov and G. Hnton, Semantc Hashng, ACM SIGIR, Y. Wess, A. Torralba, and R. Fergus, Spectral hashng, n NIPS, O. Chum, J. Phlbn, and A. Zsserman. Near duplcate mage detecton: mn-hash and tf-df weghtng. BMVC., M. Ragnsy and S. Lazebn, Localty-Senstve Bnary Codes from Shft-Invarant Kernels, n Proc. of Advances n neural nformaton processng systems, B. Kuls and K. Grauman, Kernelzed localty-senstve hashng for scalable mage search, Proc. of the IEEE Internatonal Conference on Computer Vson, B. Kuls and T. Darrell, Learnng to Hash wth Bnary Reconstructve Embeddngs, n Proc. of Advances n Neural Informaton Processng Systems, J. Wang, S. Kumar, and S.-F. Chang, Sem-supervsed hashng for scalable mage retreval, n IEEE Computer Socety Conference on Computer Vson and Pattern Recognton (CVPR), J. Wang, S. Kumar, and S.-F. Chang, Sequental projecton learnng for hashng wth compact codes, n Proceedngs of the 27th Internatonal Conference on Machne Learnng, EECS6898 Large Scale Machne Learnng 64

Linear Classification, SVMs and Nearest Neighbors

Linear Classification, SVMs and Nearest Neighbors 1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush