Approximate Nearest Neighbor (ANN) Search - II

Size: px
Start display at page:

Download "Approximate Nearest Neighbor (ANN) Search - II"

Transcription

1 Approxmate Nearest Neghbor (ANN) Search - II Sanjv Kumar, Google Research, NY EECS-6898, Columba Unversty - Fall, 2010 EECS6898 Large Scale Machne Learnng 1

2 Two popular ANN approaches Tree approaches Recursvely partton the data: Dvde and Conquer Expected query tme: O(log n) (wth constants exponental n dmenson) Performance degrades wth hgh-dmensonal data Large storage needs Orgnal data s requred at run-tme Hashng approaches Each tem n database represented as a code Sgnfcant reducton n storage For 64 bt codes, just 8GB storage nstead of 40TB Expected query tme: O(1) or sublnear n n Search n 64-bt hammng space: ~13 sec nstead of ~15 hrs/query Compact codes preferred EECS6898 Large Scale Machne Learnng 2

3 Example: Bnary Codes Lnear projecton (hyperplane) based parttonng h2 h 1 X x 1 x 2 x 3 x 4 x 5 x 1 y x 3 y x 4 y m x 2 x No recursve parttonng unle trees! EECS6898 Large Scale Machne Learnng 3

4 Hashng: Man Steps 1. Tranng Defne a model to convert an nput tem n a code Learn the parameters of the model Possbly usng a subset of randomly sampled database tems EECS6898 Large Scale Machne Learnng 4

5 Hashng: Man Steps 1. Tranng Defne a model to convert an nput tem n a code Learn the parameters of the model Possbly usng a subset of randomly sampled database tems Gven nput x Learn h x) = { h ( x), h ( x),..., h ( )} ( x) Z ( 1 2 m x h Example: Bnary codes usng lnear projectons equvalent to T ( h x) = sgn( w x + b ) ( x) { 1,1} y h ( x) = (1 h ( x)) / 2 ( x) {0,1} + y EECS6898 Large Scale Machne Learnng 5

6 Hashng: Man Steps 1. Tranng Defne a model to convert an nput tem n a code Learn the parameters of the model Possbly usng a subset of randomly sampled database tems Gven nput x Learn h x) = { h ( x), h ( x),..., h ( )} ( x) Z ( 1 2 m x h Example: Bnary codes usng lnear projectons equvalent to T ( h x) = sgn( w x + b ) ( x) { 1,1} y h ( x) = (1 h ( x)) / 2 ( x) {0,1} + y Tranng goal: To learn parameters for m hash functons { w, b } = 1,..., m EECS6898 Large Scale Machne Learnng 6

7 Hashng: Man Steps 2. Indexng Represent each tem n the database as a code In some cases, organze all the codes n a hash table (nverse-looup): For a gven code, return all the ponts wth the same code EECS6898 Large Scale Machne Learnng 7

8 Hashng: Man Steps 2. Indexng Represent each tem n the database as a code In some cases, organze all the codes n a hash table (nverse-looup): For a gven code, return all the ponts wth the same code x 1 x M x n database tems code EECS6898 Large Scale Machne Learnng 8

9 Hashng: Man Steps 2. Indexng Represent each tem n the database as a code In some cases, organze all the codes n a hash table (nverse-looup): For a gven code, return all the ponts wth the same code x 1 x x, 1 x 2 x n M M x n database tems code Hash-Table (Inverse Looup) tems EECS6898 Large Scale Machne Learnng 9

10 Hashng: Man Steps 3. Queryng Convert the query to code Fnd all tems wth the same code n database usng hash table Return all ponts wthn a small radus of query n code space Use multple codes (and tables) to ncrease recall Ran all the database tems accordng to ther dstance from query n code space EECS6898 Large Scale Machne Learnng 10

11 Hashng: Man Steps 3. Queryng Convert the query to code Fnd all tems wth the same code n database usng hash table Return all ponts wthn a small radus of query n code space Use multple codes (and tables) to ncrease recall Ran all the database tems accordng to ther dstance from query n code space x n q h(q) = codes at hammng radus M M Hash-Table (Inverse Looup) x, 1 x 2 tems EECS6898 Large Scale Machne Learnng 11

12 Hashng: Man Steps 3. Queryng Convert the query to code Fnd all tems wth the same code n database usng hash table Return all ponts wthn a small radus of query n code space Use multple codes (and tables) to ncrease recall Ran all the database tems accordng to ther dstance from query n code space x n q h(q) = x, 1 x 2 O(nm) O(1) space M tme codes at M hammng Hash-Table tems radus (Inverse Looup) r Number of codes to search at radus r : O( m ) Bucets for many codes may be empty: wth hgh probablty for large m! EECS6898 Large Scale Machne Learnng 12

13 Hashng: Man Steps each m-bt 3. Queryng Convert the query to code Fnd all tems wth the same code n database usng hash table Return all ponts wthn a small radus of query n code space Use multple codes (and tables) to ncrease recall Ran all the database tems accordng to ther dstance from query n code space Table 1 Table x n Multple Codes h 1 ( x) M h L (x) M x, 1 x 2 M q x n x 2 x 1 L O(nLm) O(L) space tme EECS6898 Large Scale Machne Learnng 13

14 Hashng: Man Steps 3. Queryng Convert the query to code Fnd all tems wth the same code n database usng hash table Return all ponts wthn a small radus of query n code space Use multple codes (and tables) to ncrease recall Ran all the database tems accordng to ther dstance from query n code space dstance x x Exhaustve dstances n code space q h(q) = M return closest tems x n O(n) lnear search! database tems code At most m dstance levels EECS6898 Large Scale Machne Learnng 14

15 Hashng Technques 1. Unsupervsed use unlabeled data to learn hash functons Localty Senstve Hashng (LSH), PCA Hashng, Spectral Hashng, Mn-Hashng, Kernel-LSH, 2. Supervsed use labeled pars to learn hash functons Boosted Hashng, Bnary Reconstructve Embeddng, 3. Sem-Supervsed use labeled pars and unlabeled data both Sequental Learnng, 4. Type of Hash Functon Lnear/Quas-lnear: LSH, Mn-Hash, SH, PCA-Hash, Nonlnear: KLSH, RBM, BRE, EECS6898 Large Scale Machne Learnng 15

16 Localty Senstve Hashng (LSH) A famly of hash functons H = { h : X Z} s called ( r1, r2, p1, p2) - senstve f for any x x X 1, 2 f d( x1, x2) r1 then Pr[ h( x1) = h( x2)] p1, f d( x1, x2) > r2 then Pr[ h( x1 ) = h( x2)] p2. r < r and p > p where EECS6898 Large Scale Machne Learnng 16

17 Localty Senstve Hashng (LSH) A famly of hash functons H = { h : X Z} s called ( r1, r2, p1, p2) - senstve f for any x x X 1, A smple LSH famly 2 f d( x1, x2) r1 then Pr[ h( x1) = h( x2)] p1, f d( x1, x2) > r2 then Pr[ h( x1 ) = h( x2)] p2. r < r and p > p h where 1 2 T ( x) = ( w x + b t w ~ Ps ( w) ) 1 2 b ~ U[0, t] s-stable dstrbuton EECS6898 Large Scale Machne Learnng 17

18 Localty Senstve Hashng (LSH) A famly of hash functons H = { h : X Z} s called ( r1, r2, p1, p2) - senstve f for any x x X 1, A smple LSH famly 2 f d( x1, x2) r1 then Pr[ h( x1) = h( x2)] p1, f d( x1, x2) > r2 then Pr[ h( x1 ) = h( x2)] p2. r < r and p > p h where 1 2 T ( x) = ( w x + b t w ~ Ps ( w) ) 1 2 b ~ U[0, t] s-stable dstrbuton Specal case: bnary hashng h (x) t T w x + b EECS6898 Large Scale Machne Learnng 18

19 s-stable Dstrbutons A dstrbuton P s (r) s called s-stable f there exsts an s 0 such that for any x R d, and any w wth..d. w ~ P s, then T x w ~ x s w T ( 1 x2) w ~ ( x1 x2 x ) Neghborng ponts tend to have smlar projectons Bnnng projectons has LSH property! s w EECS6898 Large Scale Machne Learnng 19

20 s-stable Dstrbutons A dstrbuton P s (r) s called s-stable f there exsts an s 0 such that for any x R d, and any w wth..d. w ~ P s, then T x w ~ x s w T ( 1 x2) w ~ ( x1 x2 x ) Neghborng ponts tend to have smlar projectons Bnnng projectons has LSH property! Specal Case: s = 2 (Eucldean dstance) w Var s w ~ P = N(0,1) w ~ N(0, I) s T T T T T 2 [ x w] = E[ x ww x] = x E[ ww ] x = x 2 E T T x w ~ x w 2 Gaussan dstrbuton s 2-stable! [ x w] = 0 Whch dstrbuton s 1-stable? Cauchy! One can fnd s-stable dstrbuton for all s (0, 2] EECS6898 Large Scale Machne Learnng 20

21 Collson Probablty Suppose u = ( x1 x2) and f s s (a) s pdf of absolute of s-stable random varable,.e., a = w, then probablty of collson, p( u) = t 0 Pr[ h( x1 ) = h( x2)] = (1/ u) fs ( a / u)(1 a / t) da p(u) ncreases as u decreases! EECS6898 Large Scale Machne Learnng 21

22 Collson Probablty Suppose u = ( x1 x2) and f s s (a) s pdf of absolute of s-stable random varable,.e., a = w, then probablty of collson, p( u) = t 0 Pr[ h( x1 ) = h( x2)] = (1/ u) fs ( a / u)(1 a / t) da p(u) ncreases as u decreases! How to choose t? tˆ = arg mn ρ = arg mn[log(1/ p 1 ) / log(1/ p2)] t t Can be computed analytcally for s = 1, 2 EECS6898 Large Scale Machne Learnng 22

23 Collson Probablty Suppose u = ( x1 x2) and f s s (a) s pdf of absolute of s-stable random varable,.e., a = w, then probablty of collson, p( u) = t 0 Pr[ h( x1 ) = h( x2)] = (1/ u) fs ( a / u)(1 a / t) da p(u) ncreases as u decreases! How to choose t? tˆ = arg mn ρ = arg mn[log(1/ p 1 ) / log(1/ p2)] t t Can be computed analytcally for s = 1, 2 ρ ρ not too senstve to t f suffcently away from 0! t Datar et al.[5] EECS6898 Large Scale Machne Learnng 23

24 Parameter selecton n LSH How many tables (L) and how many bts per table (m)? Gven a query q and ts near-neghbor x, suppose each hash functon satsfes Pr[ h ( q) = h ( x )] p 1 EECS6898 Large Scale Machne Learnng 24

25 Parameter selecton n LSH How many tables (L) and how many bts per table (m)? Gven a query q and ts near-neghbor x, suppose each hash functon satsfes Pr[ h ( q) = h ( x )] p For m-bt code Pr[ m h( q) = h( x )] p1 Probablty of collson falls exponentally wth m! 1 EECS6898 Large Scale Machne Learnng 25

26 Parameter selecton n LSH How many tables (L) and how many bts per table (m)? Gven a query q and ts near-neghbor x, suppose each hash functon satsfes Pr[ h ( q) = h ( x )] p For m-bt code Pr[ m h( q) = h( x )] p1 Probablty of collson falls exponentally wth m! 1 Probablty that q and x wll fal to collde for all L tables m ( 1 1 p ) Bound the probablty that q and x wll collde for at least one of L tables m 1 1 L L (1 p ) 1 δ L log( 1/ δ) / log(1 p1 m ) EECS6898 Large Scale Machne Learnng 26

27 Precson-Recall Tradeoff For hgh precson, longer codes (.e. large m) preferred Large m reduces the probablty of collson exponentally low recall Many tables (large L) necessary to get good recall Large storage EECS6898 Large Scale Machne Learnng 27

28 Precson-Recall Tradeoff For hgh precson, longer codes (.e. large m) preferred Large m reduces the probablty of collson exponentally low recall Many tables (large L) necessary to get good recall Large storage Desgn L and m such that run-tme s mnmzed for a gven applcaton: T = T + T total h r To compute L m-bt hash functons and retreve ponts from tables va looups To compute exact dstance wth retreved ponts and return top (sublnear n n) Larger m ncreases T h but decreases T r Emprcally estmate total tme averaged over many queres In some cases, T r s smply vote over how many tables returned an tem How to avod large number of tables? EECS6898 Large Scale Machne Learnng 28

29 Mult-Probe LSH Strategy to ncrease recall wthout usng large number of tables Basc dea If two neghbors do not fall n the same bucet, they should fall n a nearby one, e.g., wthn a hammng dstance of 1 Lv et al.[7] EECS6898 Large Scale Machne Learnng 29

30 Mult-Probe LSH Strategy to ncrease recall wthout usng large number of tables Basc dea If two neghbors do not fall n the same bucet, they should fall n a nearby one, e.g., wthn a hammng dstance of 1 How to choose the sequence? q tables wth smaller average gap are gven prorty q T w x + b T l w x + b l Lv et al.[7] EECS6898 Large Scale Machne Learnng 30

31 Data-Dependent Projectons PCA-Hash: Let s focus on bnary codes 0 1 egenvector h ( x) = sgn( w x + b ) T w ~ egenvec ( Cov( X )) Projecton on max varance drectons followed by medan threshold EECS6898 Large Scale Machne Learnng 31

32 Data-Dependent Projectons PCA-Hash: Let s focus on bnary codes 0 1 egenvector h ( x) = sgn( w x + b ) T w ~ egenvec ( Cov( X )) Projecton on max varance drectons followed by medan threshold Performance degrades wth larger number of bts Varance decreases rapdly for most real-world data Can one reuse the hgh varance drectons? EECS6898 Large Scale Machne Learnng 32

33 Spectral Hash Data-dependent learnng of bnary codes h(x) such that smlarty between x, x j mn W h( x ) h( x, j j j ) 2 subject to h ( x ) = 0 balanced parttonng h ( x ) h ( x ) = 0 l uncorrelated l h ( x) { 1,1} EECS6898 Large Scale Machne Learnng 33

34 Spectral Hash Data-dependent learnng of bnary codes h(x) such that Issues smlarty between x, x j 2 mn Wj h( x ) h( x j ) Graph Laplacan O(n 2 ), j subject to h ( x ) = 0 balanced parttonng h ( x ) h ( x ) = 0 l uncorrelated l h ( x) { 1,1} Computatonally extremely expensve (needs complete NN search) Balanced graph parttonng problem even wth sngle bt NP hard Approxmaton Assumes unform data dstrbuton and solves 1D Laplacan egenfunctons analytcally EECS6898 Large Scale Machne Learnng 34

35 Spectral Hash Three man steps Extract max-varance drectons usng PCA Select whch drecton to pc next based on modes of 1D-Laplacan Hgh varance PCA drectons may be pced agan Create bts by thresholdng snusodal egenfunctons at zero slower than smple thresholdng EECS6898 Large Scale Machne Learnng 35

36 Spectral Hash h ( x) = sgn(cos( αw x)) w ~ egenvec ( Cov( X )) Three man steps Extract max-varance drectons usng PCA Select whch drecton to pc next based on modes of 1D-Laplacan Hgh varance PCA drectons may be pced agan Create bts by thresholdng snusodal egenfunctons slower than smple thresholdng T In practce, PCA-hash wth medan threshold may do better but both suffer from low-varance drectons 1 EECS6898 Large Scale Machne Learnng 36

37 Spectral Hash Experment Testng usng Hammng radus around the query Dense 384-dm vector PCA-Hash gves smlar or better performance For Hammng-radus testng, better to use LSH wth medan threshold Wess et al.[9] EECS6898 Large Scale Machne Learnng 37

38 Shft-Invarant Kernel Embeddng Use random projectons along wth snusodal thresholdng Key dea Suppose smlarty between a par of ponts s gven by a shft-nvarant ernel,.e., K( x x) = K(0) = 1 Examples s( x, y) = K( x, y) = K( x y) 1 s( x, y) = exp( γ s( x, y) = exp( γ x x y y 2 1 ) / 2) L 2 dstance L 1 dstance EECS6898 Large Scale Machne Learnng 38

39 Shft-Invarant Kernel Embeddng Use random projectons along wth snusodal thresholdng Key dea Suppose smlarty between a par of ponts s gven by a shft-nvarant ernel,.e., K( x x) = K(0) = 1 Examples Want to learn m-bt code h(x) such that s( x, y) = K( x, y) = K( x y) 1 s( x, y) = exp( γ s( x, y) = exp( γ x x y y 2 1 ) / 2) L 2 dstance L 1 dstance f1( K( x y)) (1/ m) dh ( h( x), h( y)) f2( K( x y)) normalzed Hammng dstance decreasng functons small for smlar ponts f1 ( 1) = f2(1) = 0, f1 ( 0) = f2(0) = c > 0 EECS6898 Large Scale Machne Learnng 39

40 Shft-Invarant Kernel Embeddng Man steps Approxmate shft-nvarant ernels as dot products of random fourer features Pc drectons from dstrbuton nduced by ernel smlar to s-stable drectons Create bts by thresholdng snusodal egenfunctons EECS6898 Large Scale Machne Learnng 40

41 Shft-Invarant Kernel Embeddng standard converson nto 0/1 Man steps Approxmate shft-nvarant ernels as dot products of random fourer features Pc drectons from dstrbuton nduced by ernel smlar to s-stable drectons Create bts by thresholdng snusodal egenfunctons h ( x) = sgn(cos( w x + b ) + t ) T Performance (wth hammng ranng) better f large number of bts are used! w EECS6898 Large Scale Machne Learnng 41 1 ~ N(0, γi ) crtcal for performance b ~ U[0, 2π] ~ U[ 1,1] t

42 Shft-Invarant Kernel Embeddng SIKE Spectral Hash Test usng exhaustve hammng ranng wth all database tems Dense 384-dm vectors After 256 bts, performance of Spectral Hash falls Even regular LSH qute powerful f large number of bts are used Ragnsy et al.[11] EECS6898 Large Scale Machne Learnng 42

43 Mn-Hash A method to estmate Jaccard smlarty between sets (or vectors) Jaccard smlarty between two sets (A, B) or two vectors (x, y) A I B mn(, ) J ( A, B) = (, ) = x y J x y x, y 0 A U B max( x, y ) EECS6898 Large Scale Machne Learnng 43

44 Mn-Hash A method to estmate Jaccard smlarty between sets (or vectors) Jaccard smlarty between two sets (A, B) or two vectors (x, y) J ( A, B) A I B = A U B mn(, ) (, ) = x y J x y x, y 0 max( x, y ) relaton wth L 1? The above two defntons equvalent represent each set as a bnary vector of length A U B EECS6898 Large Scale Machne Learnng 44

45 Mn-Hash A method to estmate Jaccard smlarty between sets (or vectors) Jaccard smlarty between two sets (A, B) or two vectors (x, y) J ( A, B) A I B = A U B mn(, ) (, ) = x y J x y x, y 0 max( x, y ) relaton wth L 1? The above two defntons equvalent represent each set as a bnary vector of length A U B Suppose h (.) s a random functon that maps each tem to a real number smple choce j h ( x ) h ( x ) and Pr[ h ( x ) < h ( x )] = 0. 5 h = ( x ) U[0,1] j EECS6898 Large Scale Machne Learnng 45

46 Mn-Hash A method to estmate Jaccard smlarty between sets (or vectors) Jaccard smlarty between two sets (A, B) or two vectors (x, y) J ( A, B) A I B = A U B The above two defntons equvalent represent each set as a bnary vector of length A U B Suppose h (.) s a random functon that maps each tem to a real number smple choce j h ( x ) h ( x ) and Pr[ h ( x ) < h ( x )] = 0. 5 h = ( x ) U[0,1] mn(, ) (, ) = x y J x y x, y 0 max( x, y ) mn-hash m( A, h ) = arg mn h ( x ) x A j relaton wth L 1? Pr[ m( A, h ) = m( B, h )] = J ( A, B) EECS6898 Large Scale Machne Learnng 46

47 Mn-Hash f suppose x u A I B Thus mn-hash m( A, h ) = arg mn h ( x ) m ( A U B, h ) = x Pr[ m( A, h u u x = m( A, h ) & x = m( B, h ) A A I B ) = m( B, h )] = A U B x u EECS6898 Large Scale Machne Learnng 47

48 Mn-Hash f Setches Thus mn-hash m A, h ) = arg mn h ( x ) suppose x u A I B Pr[ m( A, h ( x A m ( A U B, h ) = x u ) = m( B, h )] = A I B A U B For retreval effcency, mn-hashes are grouped n s-tuples. For s random functons ( h 1,..., h s ), setch ( A) = ( m( A, h1 ),..., m( A, hs )) s Pr[ setch ( A) = setch ( B)] = J ( A, B) In practce, many setches are created and sets (.e. vectors) that have at least setches n common are retreved for further testng. Generalzatons to non-bnary vectors, contnuous valued vectors possble Good performance for hgh-dm (but mostly sparse) vectors u x = m( A, h ) & x = m( B, h ) u EECS6898 Large Scale Machne Learnng 48

49 Kernel Localty Senstve Hashng (KLSH) Learn LSH-type codes but when only ernel smlarty, (x,y), s nown Data may not be gven n explct vector space A dfferent vew of LSH Pr[ h ( x) = h( y)] = sm ( x, y) EECS6898 Large Scale Machne Learnng 49

50 Kernel Localty Senstve Hashng (KLSH) Learn LSH-type codes but when only ernel smlarty, (x,y), s nown Data may not be gven n explct vector space A dfferent vew of LSH Pr[ h ( x) = h( y)] = Example sm ( x, y) = x T y sm ( x, y) [0,1] query-tme to fnd (1+ε)-neghbor O( n h T 1, f r x > 0 ( x) = 0 otherwse 1/(1+ ε) r ~ N(0, I) ) Pr[sgn( r x) = sgn( r 1 y)] = 1 cos π T T 1 T x y x y EECS6898 Large Scale Machne Learnng 50

51 Kernel Localty Senstve Hashng (KLSH) Learn LSH-type codes but when only ernel smlarty, (x,y), s nown Data may not be gven n explct vector space A dfferent vew of LSH Pr[ h ( x) = h( y)] = Example sm ( x, y) = x T Pr[sgn( r y sm ( x, y) [0,1] query-tme to fnd (1+ε)-neghbor O( n h x) = sgn( r 1, f r x > 0 ( x) = 0 otherwse 1 y)] = 1 cos π T T 1 1/(1+ ε) EECS6898 Large Scale Machne Learnng 51 T x y x y ) r ~ N(0, I) Suppose we are gven only smlarty mplct (unnown) feature vector sm ( x, y) = ( x, y) = Φ T ( x) Φ ( y) How to compute the hash functon when vector s not nown? T

52 Kernel Localty Senstve Hashng (KLSH) Goal: To fnd approprate random projecton n mplct feature space r T Φ(x) From RKHS argument r n 1 p = = wφ( x ) = 1wΦ( x = Φw ) randomly chosen p << n EECS6898 Large Scale Machne Learnng 52

53 Kernel Localty Senstve Hashng (KLSH) Goal: To fnd approprate random projecton n mplct feature space r T Φ(x) From RKHS argument r n 1 p = = wφ( x ) = 1wΦ( x = Φw ) randomly chosen p << n Fnd w such that T = E[ r] = 0, E[ rr ] I w = K 1/ 2 e s K = Φ T Φ, e = [0,0,1,0,1...] s T # of 1 s s a parameter T h( Φ( x)) = sgn( r Φ( x) = sgn = 1w ( x, x ) p non-lnear hashng usually slower run-tme! EECS6898 Large Scale Machne Learnng 53

54 KLSH vs LSH n = 100K mage patches Kuls et al.[12] EECS6898 Large Scale Machne Learnng 54

55 Semantc Hashng (RBM) Nonlnear method to create bnary codes usng Restrcted Boltzmann Machnes (RBMs) Specal type of Marov Random Felds p ( v, h) = exp{ E( v, h)} / Z p ( v) = h p( v, h) bnary EECS6898 Large Scale Machne Learnng 55

56 Semantc Hashng (RBM) Nonlnear method to create bnary codes usng Restrcted Boltzmann Machnes (RBMs) Specal type of Marov Random Felds p ( v, h) = exp{ E( v, h)} / Z p ( v) = h p( v, h) bnary p ( h 1 v) = σ( b + w v ) j = j j p( v h) = N( f ( h), τi) σ( x) = 1/(1 + e x ) Learn parameters (W and b) usng (approx) max-lelhood p(v) How to learn codes? Stac multple RBMs (Deep Belef Networs) EECS6898 Large Scale Machne Learnng 56

57 Semantc Hashng (RBM) Deep Belef Networs Smlar functonalty as mult-layer Neural Nets Output of each stage used as nput to next Bac-propagaton (coordnate descent) wth Auto-encodng objectve Many parameters, archtecture choces, usually slow to tran and to apply! EECS6898 Large Scale Machne Learnng Salahutdnov et al.[8] 57

58 Bnary Reconstructve Embeddng Construct bnary codes by mnmzng dfference between orgnal (metrc) dstance and hammng dstance h T ( x) = sgn( w v( x)) where v( x) = [1, K( x 1, x),..., K( x s, x)] T ernel chosen randomly or learned EECS6898 Large Scale Machne Learnng 58

59 Bnary Reconstructve Embeddng Construct bnary codes by mnmzng dfference between orgnal (metrc) dstance and hammng dstance h T ( x) = sgn( w v( x)) where v( x) = [1, K( x 1, x),..., K( x s, x)] T Wˆ = arg mn W ( x, x j [ d M ) T ( x, x j ) d ernel H ( x, x j chosen randomly or learned 2 )] EECS6898 Large Scale Machne Learnng 59

60 Bnary Reconstructve Embeddng Construct bnary codes by mnmzng dfference between orgnal (metrc) dstance and hammng dstance h T ( x) = sgn( w v( x)) where v( x) = [1, K( x 1, x),..., K( x s, x)] T Wˆ = arg mn W ( x, x j [ d M ) T ( 1/ 2) x x j 2 ( x, x j ) d ernel H ( x, x j chosen randomly or learned 2 )] m 2 1 [ h ( x ) h ( x j )] ( 1/ 4m) = Data scaled to unt norm to remove the effect of scale (but ths changes the dstance between ponts), unform scalng better Coordnate Descent for optmzaton, deals wth dscontnutes Expensve to compute codes, learnng approprate anchors for ernel representatons hard EECS6898 Large Scale Machne Learnng 60

61 Bnary Reconstructve Embeddng Testng based on ponts retreved wthn hammng radus 3 LSH s qute close even for moderate number of bts! Kuls et al.[13] EECS6898 Large Scale Machne Learnng 61

62 Sem-supervsed Hashng Suppose a few neghbor pars and a few non-neghbor pars are gven Neghbor par Non-neghbor par Maxmum entropy prncple Sem-supervsed Formulaton Emprcal ftness (labeled data) Regularzer (all data) Wang et al.[14] EECS6898 Large Scale Machne Learnng 62

63 Flcr-270K Wang et al.[15] EECS6898 Large Scale Machne Learnng 63

64 References 1. A. Broder, On the resemblance and contanment of documents, In Compresson and Complexty of Sequences, P. Indy and R. Motwan, Approxmate nearest neghbors: towards removng the curse of dmensonalty, n Proc. of 30th ACM Symposum on Theory of Computng, A. Gons, P. Indy, R. motwan, Smlarty Search n Hgh Dmensons va Hashng, In VLDB, G. Shahnarovch, P. Vola, and T. Darrell, Fast pose estmaton wth parameter-senstve hashng, n Proc. of the IEEE Internatonal Conference on Computer Vson, M. Datar, N. Immorlca, P. Indy, and V. Mrron, Localty-senstve hashng scheme based on p-stable dstrbutons, n Annual Symposum on Computatonal Geometry, G. Shahnarovch, T. Darrell, and P. Indy, Nearest-Neghbor Methods n Learnng and Vson: Theory and Practce (Neural Informaton Processng). The MIT Press, Q. Lv, W. Josephson, Z. Wang, M. Charar, and K. L, Multprobe LSH: effcent ndexng for hgh-dmensonal smlarty search, n Proceedngs of the 33rd nternatonal conference on Very large databases, R. Salahutdnov and G. Hnton, Semantc Hashng, ACM SIGIR, Y. Wess, A. Torralba, and R. Fergus, Spectral hashng, n NIPS, O. Chum, J. Phlbn, and A. Zsserman. Near duplcate mage detecton: mn-hash and tf-df weghtng. BMVC., M. Ragnsy and S. Lazebn, Localty-Senstve Bnary Codes from Shft-Invarant Kernels, n Proc. of Advances n neural nformaton processng systems, B. Kuls and K. Grauman, Kernelzed localty-senstve hashng for scalable mage search, Proc. of the IEEE Internatonal Conference on Computer Vson, B. Kuls and T. Darrell, Learnng to Hash wth Bnary Reconstructve Embeddngs, n Proc. of Advances n Neural Informaton Processng Systems, J. Wang, S. Kumar, and S.-F. Chang, Sem-supervsed hashng for scalable mage retreval, n IEEE Computer Socety Conference on Computer Vson and Pattern Recognton (CVPR), J. Wang, S. Kumar, and S.-F. Chang, Sequental projecton learnng for hashng wth compact codes, n Proceedngs of the 27th Internatonal Conference on Machne Learnng, EECS6898 Large Scale Machne Learnng 64

Linear Classification, SVMs and Nearest Neighbors

Linear Classification, SVMs and Nearest Neighbors 1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush

More information

Classification. Representing data: Hypothesis (classifier) Lecture 2, September 14, Reading: Eric CMU,

Classification. Representing data: Hypothesis (classifier) Lecture 2, September 14, Reading: Eric CMU, Machne Learnng 10-701/15-781, 781, Fall 2011 Nonparametrc methods Erc Xng Lecture 2, September 14, 2011 Readng: 1 Classfcaton Representng data: Hypothess (classfer) 2 1 Clusterng 3 Supervsed vs. Unsupervsed

More information

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015 CS 3710: Vsual Recognton Classfcaton and Detecton Adrana Kovashka Department of Computer Scence January 13, 2015 Plan for Today Vsual recognton bascs part 2: Classfcaton and detecton Adrana s research

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

Nonlinear Classifiers II

Nonlinear Classifiers II Nonlnear Classfers II Nonlnear Classfers: Introducton Classfers Supervsed Classfers Lnear Classfers Perceptron Least Squares Methods Lnear Support Vector Machne Nonlnear Classfers Part I: Mult Layer Neural

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

VQ widely used in coding speech, image, and video

VQ widely used in coding speech, image, and video at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng

More information

Which Separator? Spring 1

Which Separator? Spring 1 Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal

More information

CSE 252C: Computer Vision III

CSE 252C: Computer Vision III CSE 252C: Computer Vson III Lecturer: Serge Belonge Scrbe: Catherne Wah LECTURE 15 Kernel Machnes 15.1. Kernels We wll study two methods based on a specal knd of functon k(x, y) called a kernel: Kernel

More information

Lecture 12: Classification

Lecture 12: Classification Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna

More information

Unified Subspace Analysis for Face Recognition

Unified Subspace Analysis for Face Recognition Unfed Subspace Analyss for Face Recognton Xaogang Wang and Xaoou Tang Department of Informaton Engneerng The Chnese Unversty of Hong Kong Shatn, Hong Kong {xgwang, xtang}@e.cuhk.edu.hk Abstract PCA, LDA

More information

Statistical pattern recognition

Statistical pattern recognition Statstcal pattern recognton Bayes theorem Problem: decdng f a patent has a partcular condton based on a partcular test However, the test s mperfect Someone wth the condton may go undetected (false negatve

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

Instance-Based Learning (a.k.a. memory-based learning) Part I: Nearest Neighbor Classification

Instance-Based Learning (a.k.a. memory-based learning) Part I: Nearest Neighbor Classification Instance-Based earnng (a.k.a. memory-based learnng) Part I: Nearest Neghbor Classfcaton Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing CSC321 Tutoral 9: Revew of Boltzmann machnes and smulated annealng (Sldes based on Lecture 16-18 and selected readngs) Yue L Emal: yuel@cs.toronto.edu Wed 11-12 March 19 Fr 10-11 March 21 Outlne Boltzmann

More information

Natural Language Processing and Information Retrieval

Natural Language Processing and Information Retrieval Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support

More information

Neural networks. Nuno Vasconcelos ECE Department, UCSD

Neural networks. Nuno Vasconcelos ECE Department, UCSD Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X

More information

CS 468 Lecture 16: Isometry Invariance and Spectral Techniques

CS 468 Lecture 16: Isometry Invariance and Spectral Techniques CS 468 Lecture 16: Isometry Invarance and Spectral Technques Justn Solomon Scrbe: Evan Gawlk Introducton. In geometry processng, t s often desrable to characterze the shape of an object n a manner that

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Image classification. Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing i them?

Image classification. Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing i them? Image classfcaton Gven te bag-of-features representatons of mages from dfferent classes ow do we learn a model for dstngusng tem? Classfers Learn a decson rule assgnng bag-offeatures representatons of

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

Notes on Frequency Estimation in Data Streams

Notes on Frequency Estimation in Data Streams Notes on Frequency Estmaton n Data Streams In (one of) the data streamng model(s), the data s a sequence of arrvals a 1, a 2,..., a m of the form a j = (, v) where s the dentty of the tem and belongs to

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

Probability Density Function Estimation by different Methods

Probability Density Function Estimation by different Methods EEE 739Q SPRIG 00 COURSE ASSIGMET REPORT Probablty Densty Functon Estmaton by dfferent Methods Vas Chandraant Rayar Abstract The am of the assgnment was to estmate the probablty densty functon (PDF of

More information

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011 Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected

More information

Lecture 3: Dual problems and Kernels

Lecture 3: Dual problems and Kernels Lecture 3: Dual problems and Kernels C4B Machne Learnng Hlary 211 A. Zsserman Prmal and dual forms Lnear separablty revsted Feature mappng Kernels for SVMs Kernel trck requrements radal bass functons SVM

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Support Vector Machines

Support Vector Machines CS 2750: Machne Learnng Support Vector Machnes Prof. Adrana Kovashka Unversty of Pttsburgh February 17, 2016 Announcement Homework 2 deadlne s now 2/29 We ll have covered everythng you need today or at

More information

Hopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen

Hopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen Hopfeld networks and Boltzmann machnes Geoffrey Hnton et al. Presented by Tambet Matsen 18.11.2014 Hopfeld network Bnary unts Symmetrcal connectons http://www.nnwj.de/hopfeld-net.html Energy functon The

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

Lossy Compression. Compromise accuracy of reconstruction for increased compression.

Lossy Compression. Compromise accuracy of reconstruction for increased compression. Lossy Compresson Compromse accuracy of reconstructon for ncreased compresson. The reconstructon s usually vsbly ndstngushable from the orgnal mage. Typcally, one can get up to 0:1 compresson wth almost

More information

arxiv: v1 [cs.ds] 19 Dec 2016

arxiv: v1 [cs.ds] 19 Dec 2016 Smlarty preservng compressons of hgh dmensonal sparse data Raghav Kulkarn LnkedIn Bangalore kulraghav@gmal.com Rameshwar Pratap IIIT Bangalore rameshwar.pratap@gmal.com arxv:6.06057v cs.ds 9 Dec 06 ABSTRACT

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Intro to Visual Recognition

Intro to Visual Recognition CS 2770: Computer Vson Intro to Vsual Recognton Prof. Adrana Kovashka Unversty of Pttsburgh February 13, 2018 Plan for today What s recognton? a.k.a. classfcaton, categorzaton Support vector machnes Separable

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Lecture 4: Universal Hash Functions/Streaming Cont d

Lecture 4: Universal Hash Functions/Streaming Cont d CSE 5: Desgn and Analyss of Algorthms I Sprng 06 Lecture 4: Unversal Hash Functons/Streamng Cont d Lecturer: Shayan Oves Gharan Aprl 6th Scrbe: Jacob Schreber Dsclamer: These notes have not been subjected

More information

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran

More information

Automatic Object Trajectory- Based Motion Recognition Using Gaussian Mixture Models

Automatic Object Trajectory- Based Motion Recognition Using Gaussian Mixture Models Automatc Object Trajectory- Based Moton Recognton Usng Gaussan Mxture Models Fasal I. Bashr, Ashfaq A. Khokhar, Dan Schonfeld Electrcal and Computer Engneerng, Unversty of Illnos at Chcago. Chcago, IL,

More information

Lecture 10: May 6, 2013

Lecture 10: May 6, 2013 TTIC/CMSC 31150 Mathematcal Toolkt Sprng 013 Madhur Tulsan Lecture 10: May 6, 013 Scrbe: Wenje Luo In today s lecture, we manly talked about random walk on graphs and ntroduce the concept of graph expander,

More information

Introduction to Algorithms

Introduction to Algorithms Introducton to Algorthms 6.046J/8.40J Lecture 7 Prof. Potr Indyk Data Structures Role of data structures: Encapsulate data Support certan operatons (e.g., INSERT, DELETE, SEARCH) Our focus: effcency of

More information

GEMINI GEneric Multimedia INdexIng

GEMINI GEneric Multimedia INdexIng GEMINI GEnerc Multmeda INdexIng Last lecture, LSH http://www.mt.edu/~andon/lsh/ Is there another possble soluton? Do we need to perform ANN? 1 GEnerc Multmeda INdexIng dstance measure Sub-pattern Match

More information

Support Vector Machines

Support Vector Machines Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x n class

More information

Lecture 4: Constant Time SVD Approximation

Lecture 4: Constant Time SVD Approximation Spectral Algorthms and Representatons eb. 17, Mar. 3 and 8, 005 Lecture 4: Constant Tme SVD Approxmaton Lecturer: Santosh Vempala Scrbe: Jangzhuo Chen Ths topc conssts of three lectures 0/17, 03/03, 03/08),

More information

18.1 Introduction and Recap

18.1 Introduction and Recap CS787: Advanced Algorthms Scrbe: Pryananda Shenoy and Shjn Kong Lecturer: Shuch Chawla Topc: Streamng Algorthmscontnued) Date: 0/26/2007 We contnue talng about streamng algorthms n ths lecture, ncludng

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

Learning Theory: Lecture Notes

Learning Theory: Lecture Notes Learnng Theory: Lecture Notes Lecturer: Kamalka Chaudhur Scrbe: Qush Wang October 27, 2012 1 The Agnostc PAC Model Recall that one of the constrants of the PAC model s that the data dstrbuton has to be

More information

Support Vector Machines

Support Vector Machines /14/018 Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x

More information

Radial-Basis Function Networks

Radial-Basis Function Networks Radal-Bass uncton Networs v.0 March 00 Mchel Verleysen Radal-Bass uncton Networs - Radal-Bass uncton Networs p Orgn: Cover s theorem p Interpolaton problem p Regularzaton theory p Generalzed RBN p Unversal

More information

Linear, affine, and convex sets and hulls In the sequel, unless otherwise specified, X will denote a real vector space.

Linear, affine, and convex sets and hulls In the sequel, unless otherwise specified, X will denote a real vector space. Lnear, affne, and convex sets and hulls In the sequel, unless otherwse specfed, X wll denote a real vector space. Lnes and segments. Gven two ponts x, y X, we defne xy = {x + t(y x) : t R} = {(1 t)x +

More information

Multilayer Perceptron (MLP)

Multilayer Perceptron (MLP) Multlayer Perceptron (MLP) Seungjn Cho Department of Computer Scence and Engneerng Pohang Unversty of Scence and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjn@postech.ac.kr 1 / 20 Outlne

More information

Computing Correlated Equilibria in Multi-Player Games

Computing Correlated Equilibria in Multi-Player Games Computng Correlated Equlbra n Mult-Player Games Chrstos H. Papadmtrou Presented by Zhanxang Huang December 7th, 2005 1 The Author Dr. Chrstos H. Papadmtrou CS professor at UC Berkley (taught at Harvard,

More information

Absolute chain codes. Relative chain code. Chain code. Shape representations vs. descriptors. Start

Absolute chain codes. Relative chain code. Chain code. Shape representations vs. descriptors. Start Shape representatons vs. descrptors After the segmentaton of an mage, ts regons or edges are represented and descrbed n a manner approprate for further processng. Shape representaton: the ways we store

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mnng Massve Datasets Jure Leskovec, Stanford Unversty http://cs246.stanford.edu 2/19/18 Jure Leskovec, Stanford CS246: Mnng Massve Datasets, http://cs246.stanford.edu 2 Hgh dm. data Graph data Infnte

More information

MDL-Based Unsupervised Attribute Ranking

MDL-Based Unsupervised Attribute Ranking MDL-Based Unsupervsed Attrbute Rankng Zdravko Markov Computer Scence Department Central Connectcut State Unversty New Brtan, CT 06050, USA http://www.cs.ccsu.edu/~markov/ markovz@ccsu.edu MDL-Based Unsupervsed

More information

Retrieval Models: Language models

Retrieval Models: Language models CS-590I Informaton Retreval Retreval Models: Language models Luo S Department of Computer Scence Purdue Unversty Introducton to language model Ungram language model Document language model estmaton Maxmum

More information

Natural Images, Gaussian Mixtures and Dead Leaves Supplementary Material

Natural Images, Gaussian Mixtures and Dead Leaves Supplementary Material Natural Images, Gaussan Mxtures and Dead Leaves Supplementary Materal Danel Zoran Interdscplnary Center for Neural Computaton Hebrew Unversty of Jerusalem Israel http://www.cs.huj.ac.l/ danez Yar Wess

More information

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD Matrx Approxmaton va Samplng, Subspace Embeddng Lecturer: Anup Rao Scrbe: Rashth Sharma, Peng Zhang 0/01/016 1 Solvng Lnear Systems Usng SVD Two applcatons of SVD have been covered so far. Today we loo

More information

Tutorial 2. COMP4134 Biometrics Authentication. February 9, Jun Xu, Teaching Asistant

Tutorial 2. COMP4134 Biometrics Authentication. February 9, Jun Xu, Teaching Asistant Tutoral 2 COMP434 ometrcs uthentcaton Jun Xu, Teachng sstant csjunxu@comp.polyu.edu.hk February 9, 207 Table of Contents Problems Problem : nswer the questons Problem 2: Power law functon Problem 3: Convoluton

More information

Semi-supervised Classification with Active Query Selection

Semi-supervised Classification with Active Query Selection Sem-supervsed Classfcaton wth Actve Query Selecton Jao Wang and Swe Luo School of Computer and Informaton Technology, Beng Jaotong Unversty, Beng 00044, Chna Wangjao088@63.com Abstract. Labeled samples

More information

Lecture 3. Ax x i a i. i i

Lecture 3. Ax x i a i. i i 18.409 The Behavor of Algorthms n Practce 2/14/2 Lecturer: Dan Spelman Lecture 3 Scrbe: Arvnd Sankar 1 Largest sngular value In order to bound the condton number, we need an upper bound on the largest

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression 11 MACHINE APPLIED MACHINE LEARNING LEARNING MACHINE LEARNING Gaussan Mture Regresson 22 MACHINE APPLIED MACHINE LEARNING LEARNING Bref summary of last week s lecture 33 MACHINE APPLIED MACHINE LEARNING

More information

Large-Scale Kernel Methods - I

Large-Scale Kernel Methods - I Large-Scale Kernel Methos - I Sanjv Kumar, Google Research, NY EECS-6898, Columba Unversty - Fall, 2010 EECS6898 Large Scale Machne Learnng 1 Lnear Moels Popular n machne learnng / Statstcs ue to ther

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

An introduction to chaining, and applications to sublinear algorithms

An introduction to chaining, and applications to sublinear algorithms An ntroducton to channg, and applcatons to sublnear algorthms Jelan Nelson Harvard August 28, 2015 What s ths talk about? What s ths talk about? Gven a collecton of random varables X 1, X 2,...,, we would

More information

ECE559VV Project Report

ECE559VV Project Report ECE559VV Project Report (Supplementary Notes Loc Xuan Bu I. MAX SUM-RATE SCHEDULING: THE UPLINK CASE We have seen (n the presentaton that, for downlnk (broadcast channels, the strategy maxmzng the sum-rate

More information

Error Bars in both X and Y

Error Bars in both X and Y Error Bars n both X and Y Wrong ways to ft a lne : 1. y(x) a x +b (σ x 0). x(y) c y + d (σ y 0) 3. splt dfference between 1 and. Example: Prmordal He abundance: Extrapolate ft lne to [ O / H ] 0. [ He

More information

Maximum Likelihood Estimation (MLE)

Maximum Likelihood Estimation (MLE) Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y

More information

Manifold Learning for Complex Visual Analytics: Benefits from and to Neural Architectures

Manifold Learning for Complex Visual Analytics: Benefits from and to Neural Architectures Manfold Learnng for Complex Vsual Analytcs: Benefts from and to Neural Archtectures Stephane Marchand-Mallet Vper group Unversty of Geneva Swtzerland Edgar Roman-Rangel, Ke Sun (Vper) A. Agocs, D. Dardans,

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probablstc & Unsupervsed Learnng Convex Algorthms n Approxmate Inference Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computatonal Neuroscence Unt Unversty College London Term 1, Autumn 2008 Convexty A convex

More information

Why BP Works STAT 232B

Why BP Works STAT 232B Why BP Works STAT 232B Free Energes Helmholz & Gbbs Free Energes 1 Dstance between Probablstc Models - K-L dvergence b{ KL b{ p{ = b{ ln { } p{ Here, p{ s the eact ont prob. b{ s the appromaton, called

More information

3.1 ML and Empirical Distribution

3.1 ML and Empirical Distribution 67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum

More information

Lecture 5 September 17, 2015

Lecture 5 September 17, 2015 CS 229r: Algorthms for Bg Data Fall 205 Prof. Jelan Nelson Lecture 5 September 7, 205 Scrbe: Yakr Reshef Recap and overvew Last tme we dscussed the problem of norm estmaton for p-norms wth p > 2. We had

More information

Spectral Clustering. Shannon Quinn

Spectral Clustering. Shannon Quinn Spectral Clusterng Shannon Qunn (wth thanks to Wllam Cohen of Carnege Mellon Unverst, and J. Leskovec, A. Raaraman, and J. Ullman of Stanford Unverst) Graph Parttonng Undrected graph B- parttonng task:

More information

Efficient, General Point Cloud Registration with Kernel Feature Maps

Efficient, General Point Cloud Registration with Kernel Feature Maps Effcent, General Pont Cloud Regstraton wth Kernel Feature Maps Hanchen Xong, Sandor Szedmak, Justus Pater Insttute of Computer Scence Unversty of Innsbruck 30 May 2013 Hanchen Xong (Un.Innsbruck) 3D Regstraton

More information

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve

More information

Basic Statistical Analysis and Yield Calculations

Basic Statistical Analysis and Yield Calculations October 17, 007 Basc Statstcal Analyss and Yeld Calculatons Dr. José Ernesto Rayas Sánchez 1 Outlne Sources of desgn-performance uncertanty Desgn and development processes Desgn for manufacturablty A general

More information

Report on Image warping

Report on Image warping Report on Image warpng Xuan Ne, Dec. 20, 2004 Ths document summarzed the algorthms of our mage warpng soluton for further study, and there s a detaled descrpton about the mplementaton of these algorthms.

More information

Machine Learning. Classification. Theory of Classification and Nonparametric Classifier. Representing data: Hypothesis (classifier) Eric Xing

Machine Learning. Classification. Theory of Classification and Nonparametric Classifier. Representing data: Hypothesis (classifier) Eric Xing Machne Learnng 0-70/5 70/5-78, 78, Fall 008 Theory of Classfcaton and Nonarametrc Classfer Erc ng Lecture, Setember 0, 008 Readng: Cha.,5 CB and handouts Classfcaton Reresentng data: M K Hyothess classfer

More information

Introduction to the Introduction to Artificial Neural Network

Introduction to the Introduction to Artificial Neural Network Introducton to the Introducton to Artfcal Neural Netork Vuong Le th Hao Tang s sldes Part of the content of the sldes are from the Internet (possbly th modfcatons). The lecturer does not clam any onershp

More information

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018 INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton

More information

Grover s Algorithm + Quantum Zeno Effect + Vaidman

Grover s Algorithm + Quantum Zeno Effect + Vaidman Grover s Algorthm + Quantum Zeno Effect + Vadman CS 294-2 Bomb 10/12/04 Fall 2004 Lecture 11 Grover s algorthm Recall that Grover s algorthm for searchng over a space of sze wors as follows: consder the

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

Boostrapaggregating (Bagging)

Boostrapaggregating (Bagging) Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod

More information

U-Pb Geochronology Practical: Background

U-Pb Geochronology Practical: Background U-Pb Geochronology Practcal: Background Basc Concepts: accuracy: measure of the dfference between an expermental measurement and the true value precson: measure of the reproducblty of the expermental result

More information

Numerical Heat and Mass Transfer

Numerical Heat and Mass Transfer Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

Finding Dense Subgraphs in G(n, 1/2)

Finding Dense Subgraphs in G(n, 1/2) Fndng Dense Subgraphs n Gn, 1/ Atsh Das Sarma 1, Amt Deshpande, and Rav Kannan 1 Georga Insttute of Technology,atsh@cc.gatech.edu Mcrosoft Research-Bangalore,amtdesh,annan@mcrosoft.com Abstract. Fndng

More information

Regularized Discriminant Analysis for Face Recognition

Regularized Discriminant Analysis for Face Recognition 1 Regularzed Dscrmnant Analyss for Face Recognton Itz Pma, Mayer Aladem Department of Electrcal and Computer Engneerng, Ben-Guron Unversty of the Negev P.O.Box 653, Beer-Sheva, 845, Israel. Abstract Ths

More information

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding Recall: man dea of lnear regresson Lecture 9: Lnear regresson: centerng, hypothess testng, multple covarates, and confoundng Sandy Eckel seckel@jhsph.edu 6 May 8 Lnear regresson can be used to study an

More information

Calculation of time complexity (3%)

Calculation of time complexity (3%) Problem 1. (30%) Calculaton of tme complexty (3%) Gven n ctes, usng exhaust search to see every result takes O(n!). Calculaton of tme needed to solve the problem (2%) 40 ctes:40! dfferent tours 40 add

More information

Other NN Models. Reinforcement learning (RL) Probabilistic neural networks

Other NN Models. Reinforcement learning (RL) Probabilistic neural networks Other NN Models Renforcement learnng (RL) Probablstc neural networks Support vector machne (SVM) Renforcement learnng g( (RL) Basc deas: Supervsed dlearnng: (delta rule, BP) Samples (x, f(x)) to learn

More information

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding Lecture 9: Lnear regresson: centerng, hypothess testng, multple covarates, and confoundng Sandy Eckel seckel@jhsph.edu 6 May 008 Recall: man dea of lnear regresson Lnear regresson can be used to study

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora prnceton unv. F 13 cos 521: Advanced Algorthm Desgn Lecture 3: Large devatons bounds and applcatons Lecturer: Sanjeev Arora Scrbe: Today s topc s devaton bounds: what s the probablty that a random varable

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

MAXIMUM A POSTERIORI TRANSDUCTION

MAXIMUM A POSTERIORI TRANSDUCTION MAXIMUM A POSTERIORI TRANSDUCTION LI-WEI WANG, JU-FU FENG School of Mathematcal Scences, Peng Unversty, Bejng, 0087, Chna Center for Informaton Scences, Peng Unversty, Bejng, 0087, Chna E-MIAL: {wanglw,

More information

Pattern Classification

Pattern Classification Pattern Classfcaton All materals n these sldes ere taken from Pattern Classfcaton (nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wley & Sons, 000 th the permsson of the authors and the publsher

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information