Scalable Multi-Class Gaussian Process Classification using Expectation Propagation

Size: px
Start display at page:

Download "Scalable Multi-Class Gaussian Process Classification using Expectation Propagation"

Transcription

1 Scalable Mult-Class Gaussan Process Classfcaton usng Expectaton Propagaton Carlos Vllacampa-Calvo and Danel Hernández Lobato Computer Scence Department Unversdad Autónoma de Madrd 1/22

2 Introducton to Mult-class Classfcaton wth GPs Gven x we want to make predctons about y 2{1,...,C}, C > 2. One can assume that (Km & Ghahraman, 2006): y =argmax k f k (x ) for k 2{1,...,C} 2/22

3 Introducton to Mult-class Classfcaton wth GPs Gven x we want to make predctons about y 2{1,...,C}, C > 2. One can assume that (Km & Ghahraman, 2006): y =argmax k f k (x ) for k 2{1,...,C} f(x) Labels x x 2/22

4 Introducton to Mult-class Classfcaton wth GPs Gven x we want to make predctons about y 2{1,...,C}, C > 2. One can assume that (Km & Ghahraman, 2006): y =argmax k f k (x ) for k 2{1,...,C} f(x) Labels x x Fnd p(f y) =p(y f)p(f)/p(y) under p(f k ) GP(0, k(, )). 2/22

5 Challenges n Mult-class Classfcaton wth GPs Bnary classfcaton has receved more attenton than mult-class! Challenges n the mult-class case: 1 Approxmate nference s more d cult. 3/22

6 Challenges n Mult-class Classfcaton wth GPs Bnary classfcaton has receved more attenton than mult-class! Challenges n the mult-class case: 1 Approxmate nference s more d cult. 2 C > 2 latent functons nstead of just one. 3/22

7 Challenges n Mult-class Classfcaton wth GPs Bnary classfcaton has receved more attenton than mult-class! Challenges n the mult-class case: 1 Approxmate nference s more d cult. 2 C > 2 latent functons nstead of just one. 3 Deal wth more complcated lkelhood factors. 3/22

8 Challenges n Mult-class Classfcaton wth GPs Bnary classfcaton has receved more attenton than mult-class! Challenges n the mult-class case: 1 Approxmate nference s more d cult. 2 C > 2 latent functons nstead of just one. 3 Deal wth more complcated lkelhood factors. 4 More expensve algorthms, computatonally. 3/22

9 Challenges n Mult-class Classfcaton wth GPs Bnary classfcaton has receved more attenton than mult-class! Challenges n the mult-class case: 1 Approxmate nference s more d cult. 2 C > 2 latent functons nstead of just one. 3 Deal wth more complcated lkelhood factors. 4 More expensve algorthms, computatonally. Most technques do not scale to large datasets: (Wllams & Barber, 1998; Km & Ghahraman, 2006; Grolam & Rogers, 2006; Cha, 2012; Rhmäk et al., 2013). 3/22

10 Challenges n Mult-class Classfcaton wth GPs Bnary classfcaton has receved more attenton than mult-class! Challenges n the mult-class case: 1 Approxmate nference s more d cult. 2 C > 2 latent functons nstead of just one. 3 Deal wth more complcated lkelhood factors. 4 More expensve algorthms, computatonally. Most technques do not scale to large datasets: (Wllams & Barber, 1998; Km & Ghahraman, 2006; Grolam & Rogers, 2006; Cha, 2012; Rhmäk et al., 2013). The best cost s O(CNM 2 ), f sparse prors are used. 3/22

11 Stochastc Varatonal Inference for Mult-class GPs Hensman et al., 2015, use a robust lkelhood functon: 8 p(y f )=(1 )p + < C 1 (1 p 1 f y = arg max ) wth p = k : 0 otherwse f k (x ) 4/22

12 Stochastc Varatonal Inference for Mult-class GPs Hensman et al., 2015, use a robust lkelhood functon: 8 p(y f )=(1 )p + < C 1 (1 p 1 f y = arg max ) wth p = k : 0 otherwse f k (x ) The posteror approxmaton s q(f) = R p(f f)q(f)df q(f) = Q C k=1 N (fk µ k, k ) f k =(f k (x k 1),...,f k (x k M)) T X k =(x k 1,...,x k M) T 4/22

13 Stochastc Varatonal Inference for Mult-class GPs Hensman et al., 2015, use a robust lkelhood functon: 8 p(y f )=(1 )p + < C 1 (1 p 1 f y = arg max ) wth p = k : 0 otherwse f k (x ) The posteror approxmaton s q(f) = R p(f f)q(f)df q(f) = Q C k=1 N (fk µ k, k ) f k =(f k (x k 1),...,f k (x k M)) T X k =(x k 1,...,x k M) T The number of latent varables goes from CN to CM, wthm N. 4/22

14 Stochastc Varatonal Inference for Mult-class GPs Hensman et al., 2015, use a robust lkelhood functon: 8 p(y f )=(1 )p + < C 1 (1 p 1 f y = arg max ) wth p = k : 0 otherwse f k (x ) The posteror approxmaton s q(f) = R p(f f)q(f)df q(f) = Q C k=1 N (fk µ k, k ) f k =(f k (x k 1),...,f k (x k M)) T X k =(x k 1,...,x k M) T The number of latent varables goes from CN to CM, wthm N. L(q) = NX E q [log p(y f )] =1 KL(q p) 4/22

15 Stochastc Varatonal Inference for Mult-class GPs Hensman et al., 2015, use a robust lkelhood functon: 8 p(y f )=(1 )p + < C 1 (1 p 1 f y = arg max ) wth p = k : 0 otherwse f k (x ) The posteror approxmaton s q(f) = R p(f f)q(f)df q(f) = Q C k=1 N (fk µ k, k ) f k =(f k (x k 1),...,f k (x k M)) T X k =(x k 1,...,x k M) T The number of latent varables goes from CN to CM, wthm N. L(q) = NX E q [log p(y f )] =1 KL(q p) The cost s O(CM 3 )(usesquadratures)! 4/22

16 Stochastc Varatonal Inference for Mult-class GPs Hensman et al., 2015, use a robust lkelhood functon: 8 p(y f )=(1 )p + < C 1 (1 p 1 f y = arg max ) wth p = k : 0 otherwse f k (x ) The posteror approxmaton s q(f) = R p(f f)q(f)df q(f) = Q C k=1 N (fk µ k, k ) f k =(f k (x k 1),...,f k (x k M)) T X k =(x k 1,...,x k M) T The number of latent varables goes from CN to CM, wthm N. L(q) = NX E q [log p(y f )] =1 KL(q p) The cost s O(CM 3 )(usesquadratures)! Can we do that wth EP? 4/22

17 Expectaton Propagaton (EP) Let summarze the latent varables of the model. Approxmates p( ) / p 0 ( ) Q N n=1 f n( ) wth q( ) / p 0 ( ) Q N n=1 f n ( ) 5/22

18 Expectaton Propagaton (EP) Let summarze the latent varables of the model. Approxmates p( ) / p0 ( ) QN n=1 fn ( ) wth q( ) / p0 ( ) QN n=1 fn ( ) 5 / 22

19 Expectaton Propagaton (EP) Let summarze the latent varables of the model. Approxmates p( ) / p0 ( ) QN n=1 fn ( ) wth q( ) / p0 ( ) QN n=1 fn ( ) The f n are tuned by mnmzng the KL dvergence DKL [pn q] for n = 1,..., N, where Q pn ( ) / fn ( ) j6=n f j ( ) Q. q( ) / f n ( ) j6=n f j ( ) 5 / 22

20 Model Specfcaton We consder that y =argmax k p(y f) = Q N =1 p(y f )= Q N =1 f k (x ), whch gves the lkelhood: Q k6=y (f y (x ) f k (x )) 6/22

21 Model Specfcaton We consder that y =argmax k p(y f) = Q N =1 p(y f )= Q N =1 f k (x ), whch gves the lkelhood: Q k6=y (f y (x ) f k (x )) The posteror approxmaton s also set to be q(f) = R p(f f)q(f)df. 6/22

22 Model Specfcaton We consder that y =argmax k p(y f) = Q N =1 p(y f )= Q N =1 f k (x ), whch gves the lkelhood: Q k6=y (f y (x ) f k (x )) The posteror approxmaton s also set to be q(f) = R p(f f)q(f)df. The posteror over f s: R p(y f)p(f f)dfp(f) p(f y) = p(y) [Q N =1 R p(y f )p(f f)df ]p(f) p(y) where we have used the FITC approxmaton p(f f) Q N =1 p(f f). 6/22

23 Model Specfcaton We consder that y =argmax k p(y f) = Q N =1 p(y f )= Q N =1 f k (x ), whch gves the lkelhood: Q k6=y (f y (x ) f k (x )) The posteror approxmaton s also set to be q(f) = R p(f f)q(f)df. The posteror over f s: R p(y f)p(f f)dfp(f) p(f y) = p(y) [Q N =1 R p(y f )p(f f)df ]p(f) p(y) where we have used the FITC approxmaton p(f f) Q N =1 p(f f). The correspondng lkelhood factors are: Z hq (f) = f y QC k6=y f k k=1 p(f k f k )df 6/22

24 Model Specfcaton We consder that y =argmax k p(y f) = Q N =1 p(y f )= Q N =1 f k (x ), whch gves the lkelhood: Q k6=y (f y (x ) f k (x )) The posteror approxmaton s also set to be q(f) = R p(f f)q(f)df. The posteror over f s: R p(y f)p(f f)dfp(f) p(f y) = p(y) [Q N =1 R p(y f )p(f f)df ]p(f) p(y) where we have used the FITC approxmaton p(f f) Q N =1 p(f f). The correspondng lkelhood factors are: Z hq (f) = f y QC k6=y f k k=1 p(f k f k )df The ntegral s ntractable and we cannot evaluate (f) n closed form! 6/22

25 Approxmate Lkelhood Factors It s possble to show that: (f) =p(f y > f 1,...,f y > f y 1, f y > f y +1,...,f y > f C ) 7/22

26 Approxmate Lkelhood Factors It s possble to show that: (f) =p(f y > f 1,...,f y > f y 1, f y = p(f y > f 1...,f y > f y 1, f y p(f y > f 2...,f y > f y 1, f y p(f y > f C 1 f y > f C ) p(f y > f y +1 > f y +1 > f y +1,...,f y,...,f y,...,f y > f C ) > f C ) > f C ) > f C ) 7/22

27 Approxmate Lkelhood Factors It s possble to show that: (f) =p(f y > f 1,...,f y > f y 1, f y = p(f y > f 1...,f y > f y 1, f y p(f y > f 2...,f y > f y 1, f y p(f y > f C 1 f y > f C ) p(f y Y k6=y p(f y > f k )= Y k6=y ( k ) > f y +1 > f y +1 > f y +1,...,f y,...,f y,...,f y > f C ) > f C ) > f C ) > f C ) 7/22

28 Approxmate Lkelhood Factors It s possble to show that: (f) =p(f y > f 1,...,f y > f y 1, f y = p(f y > f 1...,f y > f y 1, f y p(f y > f 2...,f y > f y 1, f y p(f y > f C 1 f y > f C ) p(f y Y k6=y p(f y > f k )= Y k6=y ( k ) > f y +1 > f y +1 > f y +1,...,f y,...,f y,...,f y > f C ) > f C ) > f C ) > f C ) where ( ) s the cdf of a standard Gaussan and we have defned q k =(m y m k )/ v y + v k wth m y, m k, v y and v k the mean and varances of f y and f k. 7/22

29 EP Approxmaton of the Lkelhood Factors EP approxmates each lkelhood factor k wth a Gaussan factor: ( k )= k (f) k 1 (f) = s,k exp exp 2 (fy ) T Ṽ y,k fy +(f y ) T m y,k 1 2 (fk ) T Ṽ k,k fk +(f k ) T m k,k 8/22

30 EP Approxmaton of the Lkelhood Factors EP approxmates each lkelhood factor k wth a Gaussan factor: ( k )= k (f) k 1 (f) = s,k exp exp 2 (fy ) T Ṽ y,k fy +(f y ) T m y,k 1 2 (fk ) T Ṽ k,k fk +(f k ) T m k,k Ṽ y,k and Ṽk,k are 1-rank matrces. Each k only has O(M) parameters. 8/22

31 EP Approxmaton of the Lkelhood Factors EP approxmates each lkelhood factor k wth a Gaussan factor: ( k )= k (f) k 1 (f) = s,k exp exp 2 (fy ) T Ṽ y,k fy +(f y ) T m y,k 1 2 (fk ) T Ṽ k,k fk +(f k ) T m k,k Ṽ y,k and Ṽk,k are 1-rank matrces. Each k The posteror approxmaton s: only has O(M) parameters. q(f) = 1 Z q N Y =1 Y k6=y k (f)p(f) and Z q approxmates the margnal lkelhood of the model. 8/22

32 Approx. Maxmzaton of the Margnal Lkelhood Z q s maxmzed w.r.t. k and X k to fnd good hyper-parameters. 9/22

33 Approx. Maxmzaton of the Margnal Lkelhood Z q s maxmzed w.r.t. k and X k to fnd good hyper-parameters. If EP converges, the gradent of log Z q s gven log Z k j = k j pror j k + where Z,k s the normalzaton constant of NX log k k6=y j =1,kq \,k wth q \,k / q/,k. 9/22

34 Approx. Maxmzaton of the Margnal Lkelhood Z q s maxmzed w.r.t. k and X k to fnd good hyper-parameters. If EP converges, the gradent of log Z q s gven log Z k j = k j pror j k + where Z,k s the normalzaton constant of NX log k k6=y j =1,kq \,k wth q \,k / q/,k. Hernández-Lobato and Hernández-Lobato, 2016 show convergence s not needed log Zq EP - Inner - Approx. Gradent EP - Inner - Exact Gradent EP - Outer Update Tranng Tme n Seconds 9/22

35 Expectaton Propagaton usng Mn-batches Consder a mnbatch of data M b : 10 / 22

36 Expectaton Propagaton usng Mn-batches Consder a mnbatch of data M b : 1 Refne n parallel all approxmate factors,k correspondng to M b. 10 / 22

37 Expectaton Propagaton usng Mn-batches Consder a mnbatch of data M b : 1 Refne n parallel all approxmate factors,k correspondng to M b. 2 Reconstruct the posteror approxmaton q. 10 / 22

38 Expectaton Propagaton usng Mn-batches Consder a mnbatch of data M b : 1 Refne n parallel all approxmate factors,k correspondng to M b. 2 Reconstruct the posteror approxmaton q. 3 Get a nosy estmate of the grad of log Z q w.r.t to each k j and x k,d. 10 / 22

39 Expectaton Propagaton usng Mn-batches Consder a mnbatch of data M b : 1 Refne n parallel all approxmate factors,k correspondng to M b. 2 Reconstruct the posteror approxmaton q. 3 Get a nosy estmate of the grad of log Z q w.r.t to each j k and x k,d. 4 Update all model hyper-parameters. 10 / 22

40 Expectaton Propagaton usng Mn-batches Consder a mnbatch of data M b : 1 Refne n parallel all approxmate factors,k correspondng to M b. 2 Reconstruct the posteror approxmaton q. 3 Get a nosy estmate of the grad of log Z q w.r.t to each j k and x k,d. 4 Update all model hyper-parameters. 5 Reconstruct the posteror approxmaton q. 10 / 22

41 Expectaton Propagaton usng Mn-batches Consder a mnbatch of data M b : 1 Refne n parallel all approxmate factors,k correspondng to M b. 2 Reconstruct the posteror approxmaton q. 3 Get a nosy estmate of the grad of log Z q w.r.t to each j k and x k,d. 4 Update all model hyper-parameters. 5 Reconstruct the posteror approxmaton q. If M b < M the cost s O(CM 3 ). Memory cost s O(NCM). 10 / 22

42 Stochastc Expectaton Propagaton L et al., 2015 suggest to store n memory only the product of the k : NY Y = k =1 k6=y 11 / 22

43 Stochastc Expectaton Propagaton L et al., 2015 suggest to store n memory only the product of the k : = NY Y k =1 k6=y The cavty dstrbuton s computed as q \,k / q/ 1 N factors. 11 / 22

44 Stochastc Expectaton Propagaton L et al., 2015 suggest to store n memory only the product of the k : = NY Y k =1 k6=y The cavty dstrbuton s computed as q \,k / q/ 1 N factors. EP SEP 11 / 22

45 Stochastc Expectaton Propagaton L et al., 2015 suggest to store n memory only the product of the k : = NY Y k =1 k6=y The cavty dstrbuton s computed as q \,k / q/ 1 N factors. EP SEP The memory cost s reduced to O(CM 2 ). 11 / 22

46 Baselne Method: Generalzed FITC Approxmaton The same lkelhood as the proposed method (Km & Ghahraman, 2006). 12 / 22

47 Baselne Method: Generalzed FITC Approxmaton The same lkelhood as the proposed method (Km & Ghahraman, 2006). Orgnal GFITC formulaton (Nash-Guzman & Hoden, 2008). 12 / 22

48 Baselne Method: Generalzed FITC Approxmaton The same lkelhood as the proposed method (Km & Ghahraman, 2006). Orgnal GFITC formulaton (Nash-Guzman & Hoden, 2008). Key d erence: The latent varables correspondng to the nducng ponts f are margnalzed out to obtan an approxmate pror: Z p(f) = p(f f)p(f)df CY N k=1 f k 0, Q k NN dag K k NN Q k NN 12 / 22

49 Baselne Method: Generalzed FITC Approxmaton The same lkelhood as the proposed method (Km & Ghahraman, 2006). Orgnal GFITC formulaton (Nash-Guzman & Hoden, 2008). Key d erence: The latent varables correspondng to the nducng ponts f are margnalzed out to obtan an approxmate pror: Z p(f) = p(f f)p(f)df CY N k=1 f k 0, Q k NN dag K k NN Q k NN Tranng costs O(CNM 2 ). Does not allow for scalable tranng! 12 / 22

50 UCI Repostory datasets Intal comparson on small datasets and batch tranng. Dataset #Instances #Attrbutes #Classes Glass New-thyrod Satellte Svmgude Vehcle Vowel Waveform Wne / 22

51 UCI Repostory (test error) M = 5% M = 10% M = 20% Problem GFITC EP SEP VI Glass 0.23 ± ± ± ± 0.02 New-thyrod 0.02 ± ± ± ± 0.01 Satellte 0.12 ± ± ± ± 0.01 Svmgude2 0.2 ± ± ± ± 0.01 Vehcle 0.17 ± ± ± ± 0.01 Vowel 0.05 ± ± ± ± 0.01 Waveform 0.17 ± ± ± ± 0.01 Wne 0.03 ± ± ± ± 0.01 Avg. Rank 2.24 ± ± ± ± 0.08 Avg. Tme 131 ± ± ± ± 0.59 Glass 0.2 ± ± ± ± 0.02 New-thyrod 0.03 ± ± ± ± 0.01 Satellte 0.11 ± ± ± ± 0.01 Svmgude ± ± ± ± 0.02 Vehcle 0.17 ± ± ± ± 0.01 Vowel 0.03 ± ± ± ± 0.01 Waveform 0.17 ± ± ± ± 0.01 Wne 0.04 ± ± ± ± 0.01 Avg. Rank 2.4 ± ± ± ± 0.08 Avg. Tme 264 ± ± ± ± 0.78 Glass 0.2 ± ± ± ± 0.02 New-thyrod 0.03 ± ± ± ± 0.01 Satellte 0.11 ± ± ± ± 0.01 Svmgude2 0.2 ± ± ± ± 0.02 Vehcle 0.17 ± ± ± ± 0.01 Vowel 0.03 ± ± ± ± 0.01 Waveform 0.17 ± ± ± ± 0.01 Wne 0.04 ± ± ± ± 0.01 Avg. Rank 2.48 ± ± ± ± 0.08 Avg. Tme 683 ± ± ± ± / 22

52 UCI Repostory (test error) M = 5% M = 10% M = 20% Problem GFITC EP SEP VI Glass 0.23 ± ± ± ± 0.02 New-thyrod 0.02 ± ± ± ± 0.01 Satellte 0.12 ± ± ± ± 0.01 Svmgude2 0.2 ± ± ± ± 0.01 Vehcle 0.17 ± ± ± ± 0.01 Vowel 0.05 ± ± ± ± 0.01 Waveform 0.17 ± ± ± ± 0.01 Wne 0.03 ± ± ± ± 0.01 Avg. Rank 2.24 ± ± ± ± 0.08 Avg. Tme 131 ± ± ± ± 0.59 Glass 0.2 ± ± ± ± 0.02 New-thyrod 0.03 ± ± ± ± 0.01 Satellte 0.11 ± ± ± ± 0.01 Svmgude ± ± ± ± 0.02 Vehcle 0.17 ± ± ± ± 0.01 Vowel 0.03 ± ± ± ± 0.01 Waveform 0.17 ± ± ± ± 0.01 Wne 0.04 ± ± ± ± 0.01 Avg. Rank 2.4 ± ± ± ± 0.08 Avg. Tme 264 ± ± ± ± 0.78 Glass 0.2 ± ± ± ± 0.02 New-thyrod 0.03 ± ± ± ± 0.01 Satellte 0.11 ± ± ± ± 0.01 Svmgude2 0.2 ± ± ± ± 0.02 Vehcle 0.17 ± ± ± ± 0.01 Vowel 0.03 ± ± ± ± 0.01 Waveform 0.17 ± ± ± ± 0.01 Wne 0.04 ± ± ± ± 0.01 Avg. Rank 2.48 ± ± ± ± 0.08 Avg. Tme 683 ± ± ± ± / 22

53 UCI Repostory (negatve test log-lkelhood) M = 5% M = 10% M = 20% Problem GFITC EP SEP VI Glass 0.61 ± ± ± ± 0.14 New-thyrod 0.06 ± ± ± ± 0.02 Satellte 0.33 ± ± ± ± 0.01 Svmgude ± ± ± ± 0.08 Vehcle 0.32 ± ± ± ± 0.05 Vowel 0.16 ± ± ± ± 0.05 Waveform 0.42 ± ± ± ± 0.02 Wne 0.08 ± ± ± ± 0.02 Avg. Rank 1.92 ± ± ± ± 0.08 Avg. Tme 131 ± ± ± ± 0.59 Glass 0.58 ± ± ± ± 0.14 New-thyrod 0.07 ± ± ± ± 0.01 Satellte 0.34 ± ± ± ± 0.01 Svmgude ± ± ± ± 0.10 Vehcle 0.33 ± ± ± ± 0.04 Vowel 0.14 ± ± ± ± 0.04 Waveform 0.42 ± ± ± ± 0.01 Wne 0.07 ± ± ± ± 0.01 Avg. Rank 2.11 ± ± ± ± 0.1 Avg. Tme 264 ± ± ± ± 0.78 Glass 0.6 ± ± ± ± 0.15 New-thyrod 0.07 ± ± ± ± 0.01 Satellte 0.34 ± ± ± ± 0.01 Svmgude ± ± ± ± 0.08 Vehcle 0.33 ± ± ± ± 0.04 Vowel 0.12 ± ± ± ± 0.03 Waveform 0.43 ± ± ± ± 0.01 Wne 0.07 ± ± ± ± 0.02 Avg. Rank 2.17 ± ± ± ± 0.1 Avg. Tme 683 ± ± ± ± / 22

54 UCI Repostory (negatve test log-lkelhood) M = 5% M = 10% M = 20% Problem GFITC EP SEP VI Glass 0.61 ± ± ± ± 0.14 New-thyrod 0.06 ± ± ± ± 0.02 Satellte 0.33 ± ± ± ± 0.01 Svmgude ± ± ± ± 0.08 Vehcle 0.32 ± ± ± ± 0.05 Vowel 0.16 ± ± ± ± 0.05 Waveform 0.42 ± ± ± ± 0.02 Wne 0.08 ± ± ± ± 0.02 Avg. Rank 1.92 ± ± ± ± 0.08 Avg. Tme 131 ± ± ± ± 0.59 Glass 0.58 ± ± ± ± 0.14 New-thyrod 0.07 ± ± ± ± 0.01 Satellte 0.34 ± ± ± ± 0.01 Svmgude ± ± ± ± 0.10 Vehcle 0.33 ± ± ± ± 0.04 Vowel 0.14 ± ± ± ± 0.04 Waveform 0.42 ± ± ± ± 0.01 Wne 0.07 ± ± ± ± 0.01 Avg. Rank 2.11 ± ± ± ± 0.1 Avg. Tme 264 ± ± ± ± 0.78 Glass 0.6 ± ± ± ± 0.15 New-thyrod 0.07 ± ± ± ± 0.01 Satellte 0.34 ± ± ± ± 0.01 Svmgude ± ± ± ± 0.08 Vehcle 0.33 ± ± ± ± 0.04 Vowel 0.12 ± ± ± ± 0.03 Waveform 0.43 ± ± ± ± 0.01 Wne 0.07 ± ± ± ± 0.02 Avg. Rank 2.17 ± ± ± ± 0.1 Avg. Tme 683 ± ± ± ± / 22

55 Inducng Pont Placement Analyss M=1 M=2 M=4 M=8 GFITC M = 32 EP SEP VI M = 256 M = / 22

56 Inducng Pont Placement Analyss M=1 M=2 M=4 M=8 GFITC M = 32 EP SEP VI EP based methods perform nducng pont prunng M = 256 M = 128 (Bauer et al., 2016)! 16 / 22

57 Performance n Terms of Tme (Satellte Dataset) Avg. Test Error GFITC M = 4 GFITC M = 20 GFITC M = 100 SEP M = 4 SEP M = 20 SEP M = 100 VI M = 4 VI M = 20 VI M = Avg. Tme n seconds n log 10 Avg. Negatve Test Log Lkelhood GFITC M = 4 GFITC M = 20 GFITC M = 100 SEP M = 4 SEP M = 20 SEP M = 100 VI M = 4 VI M = 20 VI M = Avg. Tme n seconds n log / 22

58 Mnbatch Tranng: MNIST Dataset M =200 Test Error Neg. Test Log Lkelhood Tranng Tme n Seconds n a Log10 Scale Tranng Tme n Seconds n a Log10 Scale Methods EP SEP VI Methods EP SEP VI 18 / 22

59 Mnbatch Tranng: MNIST Dataset M =200 Method Test Error n % Neg. Test Log-Lkelhood EP SEP VI / 22

60 Mnbatch Tranng: Arlne-delays M =200 Test Error Methods EP Lnear SEP VI Neg. Test Log Lkelhood e+01 1e+03 1e+05 Tranng Tme n Seconds n a Log10 Scale Methods EP Lnear SEP VI 1e+01 1e+03 1e+05 Tranng Tme n Seconds n a Log10 Scale 19 / 22

61 Mnbatch Tranng: Arlne-delays M = Tranng Tme n a Log10 Scale 20 / 22

62 Conclusons EP method for mult-class classfcaton usng GPs. 21 / 22

63 Conclusons EP method for mult-class classfcaton usng GPs. E cent tranng and memory usage wth cost O(CM 3 ). 21 / 22

64 Conclusons EP method for mult-class classfcaton usng GPs. E cent tranng and memory usage wth cost O(CM 3 ). Extensve expermental comparson wth related methods. 21 / 22

65 Conclusons EP method for mult-class classfcaton usng GPs. E cent tranng and memory usage wth cost O(CM 3 ). Extensve expermental comparson wth related methods. SEP s slghtly faster than VI and s quadrature free. 21 / 22

66 Conclusons EP method for mult-class classfcaton usng GPs. E cent tranng and memory usage wth cost O(CM 3 ). Extensve expermental comparson wth related methods. SEP s slghtly faster than VI and s quadrature free. EP methods carry out nducng pont prunng. 21 / 22

67 Conclusons EP method for mult-class classfcaton usng GPs. E cent tranng and memory usage wth cost O(CM 3 ). Extensve expermental comparson wth related methods. SEP s slghtly faster than VI and s quadrature free. EP methods carry out nducng pont prunng. VI sometmes gves bad test log-lkelhoods. 21 / 22

68 Conclusons EP method for mult-class classfcaton usng GPs. E cent tranng and memory usage wth cost O(CM 3 ). Extensve expermental comparson wth related methods. SEP s slghtly faster than VI and s quadrature free. EP methods carry out nducng pont prunng. VI sometmes gves bad test log-lkelhoods. Thank you for your attenton! 21 / 22

69 References Bauer, M., van der Wlk, M., and Rasmussen, C. E. Understandng probablstc sparse Gaussan process approxmatons. NIPS 29, pp Cha, K. M. A. Varatonal multnomal logt Gaussan process. JMLR, 13: , Grolam, M. and Rogers, S. Varatonal Bayesan multnomal probt regresson wth Gaussan process prors. Neural Computaton, 18: , Hensman, J., Matthews, A. G., Flppone, M., and Ghahraman, Z. MCMC for varatonally sparse Gaussan processes. NIPS 28, pp Hernández-Lobato, D. and Hernández-Lobato, J. M. Scal- able Gaussan process classfcaton va expectaton propagaton. AISTATS, pp , Km, H.-C. and Ghahraman, Z. Bayesan Gaussan process classfcaton wth the EM-EP algorthm. IEEE PAMI, 28, , L, Y., Hernandez-Lobato, J. M., and Turner, R. E. Stochas- tc expectaton propagaton. NIPS 28, pp Nash-Guzman, A. and Holden, S. The generalzed FITC approxmaton. NIPS 20, pp Rhmäk, J., Jylänk, P., and Vehtar, A. Nested expectaton propagaton for Gaussan process classfcaton wth a multnomal probt lkelhood. JMLR, 14, , Snelson, E. and Ghahraman, Z. Sparse Gaussan processes usng pseudo-nputs. NIPS 18, pp , Wllams, C. K. I. and Barber, D. Bayesan classfcaton wth Gaussan processes. IEEE PAMI, 20, , / 22

Gaussian process classification: a message-passing viewpoint

Gaussian process classification: a message-passing viewpoint Gaussan process classfcaton: a message-passng vewpont Flpe Rodrgues fmpr@de.uc.pt November 014 Abstract The goal of ths short paper s to provde a message-passng vewpont of the Expectaton Propagaton EP

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Probabilistic Classification: Bayes Classifiers. Lecture 6:

Probabilistic Classification: Bayes Classifiers. Lecture 6: Probablstc Classfcaton: Bayes Classfers Lecture : Classfcaton Models Sam Rowes January, Generatve model: p(x, y) = p(y)p(x y). p(y) are called class prors. p(x y) are called class condtonal feature dstrbutons.

More information

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ CSE 455/555 Sprng 2013 Homework 7: Parametrc Technques Jason J. Corso Computer Scence and Engneerng SUY at Buffalo jcorso@buffalo.edu Solutons by Yngbo Zhou Ths assgnment does not need to be submtted and

More information

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010 Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton

More information

EM and Structure Learning

EM and Structure Learning EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder

More information

SDMML HT MSc Problem Sheet 4

SDMML HT MSc Problem Sheet 4 SDMML HT 06 - MSc Problem Sheet 4. The recever operatng characterstc ROC curve plots the senstvty aganst the specfcty of a bnary classfer as the threshold for dscrmnaton s vared. Let the data space be

More information

Instance-Based Learning (a.k.a. memory-based learning) Part I: Nearest Neighbor Classification

Instance-Based Learning (a.k.a. memory-based learning) Part I: Nearest Neighbor Classification Instance-Based earnng (a.k.a. memory-based learnng) Part I: Nearest Neghbor Classfcaton Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n

More information

Natural Images, Gaussian Mixtures and Dead Leaves Supplementary Material

Natural Images, Gaussian Mixtures and Dead Leaves Supplementary Material Natural Images, Gaussan Mxtures and Dead Leaves Supplementary Materal Danel Zoran Interdscplnary Center for Neural Computaton Hebrew Unversty of Jerusalem Israel http://www.cs.huj.ac.l/ danez Yar Wess

More information

1 Motivation and Introduction

1 Motivation and Introduction Instructor: Dr. Volkan Cevher EXPECTATION PROPAGATION September 30, 2008 Rce Unversty STAT 63 / ELEC 633: Graphcal Models Scrbes: Ahmad Beram Andrew Waters Matthew Nokleby Index terms: Approxmate nference,

More information

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore 8/5/17 Data Modelng Patrce Koehl Department of Bologcal Scences atonal Unversty of Sngapore http://www.cs.ucdavs.edu/~koehl/teachng/bl59 koehl@cs.ucdavs.edu Data Modelng Ø Data Modelng: least squares Ø

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression 11 MACHINE APPLIED MACHINE LEARNING LEARNING MACHINE LEARNING Gaussan Mture Regresson 22 MACHINE APPLIED MACHINE LEARNING LEARNING Bref summary of last week s lecture 33 MACHINE APPLIED MACHINE LEARNING

More information

Kristin P. Bennett. Rensselaer Polytechnic Institute

Kristin P. Bennett. Rensselaer Polytechnic Institute Support Vector Machnes and Other Kernel Methods Krstn P. Bennett Mathematcal Scences Department Rensselaer Polytechnc Insttute Support Vector Machnes (SVM) A methodology for nference based on Statstcal

More information

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2) 1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons

More information

Stat 543 Exam 2 Spring 2016

Stat 543 Exam 2 Spring 2016 Stat 543 Exam 2 Sprng 2016 I have nether gven nor receved unauthorzed assstance on ths exam. Name Sgned Date Name Prnted Ths Exam conssts of 11 questons. Do at least 10 of the 11 parts of the man exam.

More information

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before

More information

Stat 543 Exam 2 Spring 2016

Stat 543 Exam 2 Spring 2016 Stat 543 Exam 2 Sprng 206 I have nether gven nor receved unauthorzed assstance on ths exam. Name Sgned Date Name Prnted Ths Exam conssts of questons. Do at least 0 of the parts of the man exam. I wll score

More information

Retrieval Models: Language models

Retrieval Models: Language models CS-590I Informaton Retreval Retreval Models: Language models Luo S Department of Computer Scence Purdue Unversty Introducton to language model Ungram language model Document language model estmaton Maxmum

More information

Classification as a Regression Problem

Classification as a Regression Problem Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

How its computed. y outcome data λ parameters hyperparameters. where P denotes the Laplace approximation. k i k k. Andrew B Lawson 2013

How its computed. y outcome data λ parameters hyperparameters. where P denotes the Laplace approximation. k i k k. Andrew B Lawson 2013 Andrew Lawson MUSC INLA INLA s a relatvely new tool that can be used to approxmate posteror dstrbutons n Bayesan models INLA stands for ntegrated Nested Laplace Approxmaton The approxmaton has been known

More information

Semi-Supervised Learning

Semi-Supervised Learning Sem-Supervsed Learnng Consder the problem of Prepostonal Phrase Attachment. Buy car wth money ; buy car wth wheel There are several ways to generate features. Gven the lmted representaton, we can assume

More information

Statistical learning

Statistical learning Statstcal learnng Model the data generaton process Learn the model parameters Crteron to optmze: Lkelhood of the dataset (maxmzaton) Maxmum Lkelhood (ML) Estmaton: Dataset X Statstcal model p(x;θ) (θ parameters)

More information

Motion Perception Under Uncertainty. Hongjing Lu Department of Psychology University of Hong Kong

Motion Perception Under Uncertainty. Hongjing Lu Department of Psychology University of Hong Kong Moton Percepton Under Uncertanty Hongjng Lu Department of Psychology Unversty of Hong Kong Outlne Uncertanty n moton stmulus Correspondence problem Qualtatve fttng usng deal observer models Based on sgnal

More information

Singular Value Decomposition: Theory and Applications

Singular Value Decomposition: Theory and Applications Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real

More information

Statistical analysis using matlab. HY 439 Presented by: George Fortetsanakis

Statistical analysis using matlab. HY 439 Presented by: George Fortetsanakis Statstcal analyss usng matlab HY 439 Presented by: George Fortetsanaks Roadmap Probablty dstrbutons Statstcal estmaton Fttng data to probablty dstrbutons Contnuous dstrbutons Contnuous random varable X

More information

Sparse Gaussian Processes Using Backward Elimination

Sparse Gaussian Processes Using Backward Elimination Sparse Gaussan Processes Usng Backward Elmnaton Lefeng Bo, Lng Wang, and Lcheng Jao Insttute of Intellgent Informaton Processng and Natonal Key Laboratory for Radar Sgnal Processng, Xdan Unversty, X an

More information

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1 On an Extenson of Stochastc Approxmaton EM Algorthm for Incomplete Data Problems Vahd Tadayon Abstract: The Stochastc Approxmaton EM (SAEM algorthm, a varant stochastc approxmaton of EM, s a versatle tool

More information

Gaussian Conditional Random Field Network for Semantic Segmentation - Supplementary Material

Gaussian Conditional Random Field Network for Semantic Segmentation - Supplementary Material Gaussan Condtonal Random Feld Networ for Semantc Segmentaton - Supplementary Materal Ravtea Vemulapall, Oncel Tuzel *, Mng-Yu Lu *, and Rama Chellappa Center for Automaton Research, UMIACS, Unversty of

More information

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maxmum Lkelhood Estmaton INFO-2301: Quanttatve Reasonng 2 Mchael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 1 of 9 Why MLE?

More information

Maximum Likelihood Estimation (MLE)

Maximum Likelihood Estimation (MLE) Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y

More information

A quantum-statistical-mechanical extension of Gaussian mixture model

A quantum-statistical-mechanical extension of Gaussian mixture model A quantum-statstcal-mechancal extenson of Gaussan mxture model Kazuyuk Tanaka, and Koj Tsuda 2 Graduate School of Informaton Scences, Tohoku Unversty, 6-3-09 Aramak-aza-aoba, Aoba-ku, Senda 980-8579, Japan

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

Differentiating Gaussian Processes

Differentiating Gaussian Processes Dfferentatng Gaussan Processes Andrew McHutchon Aprl 17, 013 1 Frst Order Dervatve of the Posteror Mean The posteror mean of a GP s gven by, f = x, X KX, X 1 y x, X α 1 Only the x, X term depends on the

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

Hidden Markov Models

Hidden Markov Models CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte

More information

Linear Classification, SVMs and Nearest Neighbors

Linear Classification, SVMs and Nearest Neighbors 1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush

More information

Other NN Models. Reinforcement learning (RL) Probabilistic neural networks

Other NN Models. Reinforcement learning (RL) Probabilistic neural networks Other NN Models Renforcement learnng (RL) Probablstc neural networks Support vector machne (SVM) Renforcement learnng g( (RL) Basc deas: Supervsed dlearnng: (delta rule, BP) Samples (x, f(x)) to learn

More information

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING MACHINE LEANING Vasant Honavar Bonformatcs and Computatonal Bology rogram Center for Computatonal Intellgence, Learnng, & Dscovery Iowa State Unversty honavar@cs.astate.edu www.cs.astate.edu/~honavar/

More information

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of

More information

Support Vector Machines

Support Vector Machines CS 2750: Machne Learnng Support Vector Machnes Prof. Adrana Kovashka Unversty of Pttsburgh February 17, 2016 Announcement Homework 2 deadlne s now 2/29 We ll have covered everythng you need today or at

More information

System Identification of Nonlinear State-Space Models with Linearly Dependent Unknown Parameters Based on Variational Bayes

System Identification of Nonlinear State-Space Models with Linearly Dependent Unknown Parameters Based on Variational Bayes SICE Journal of Control Measurement and System Integraton Vol. 11 No. 6 pp. 456 462 November 2018 System Identfcaton of Nonlnear State-Space Models wth Lnearly Dependent Unknown Parameters Based on Varatonal

More information

Logistic Regression Maximum Likelihood Estimation

Logistic Regression Maximum Likelihood Estimation Harvard-MIT Dvson of Health Scences and Technology HST.951J: Medcal Decson Support, Fall 2005 Instructors: Professor Lucla Ohno-Machado and Professor Staal Vnterbo 6.873/HST.951 Medcal Decson Support Fall

More information

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Analyss of Varance and Desgn of Experments- MODULE LECTURE - 6 EXPERMENTAL DESGN MODELS Dr. Shalabh Department of Mathematcs and Statstcs ndan nsttute of Technology Kanpur Two-way classfcaton wth nteractons

More information

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Mamum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models for

More information

A Robust Method for Calculating the Correlation Coefficient

A Robust Method for Calculating the Correlation Coefficient A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal

More information

Introduction to Hidden Markov Models

Introduction to Hidden Markov Models Introducton to Hdden Markov Models Alperen Degrmenc Ths document contans dervatons and algorthms for mplementng Hdden Markov Models. The content presented here s a collecton of my notes and personal nsghts

More information

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015 CS 3710: Vsual Recognton Classfcaton and Detecton Adrana Kovashka Department of Computer Scence January 13, 2015 Plan for Today Vsual recognton bascs part 2: Classfcaton and detecton Adrana s research

More information

Global Gaussian approximations in latent Gaussian models

Global Gaussian approximations in latent Gaussian models Global Gaussan approxmatons n latent Gaussan models Botond Cseke Aprl 9, 2010 Abstract A revew of global approxmaton methods n latent Gaussan models. 1 Latent Gaussan models In ths secton we ntroduce notaton

More information

Collaborative Multi-Output Gaussian Processes for Collections of Sparse Multivariate Time Series

Collaborative Multi-Output Gaussian Processes for Collections of Sparse Multivariate Time Series Collaboratve Mult-Output Gaussan Processes for Collectons of Sparse Multvarate Tme Seres Steven Cheng-Xan L Benamn Marln College of Informaton & Computer Scences Uversty of Massachusetts Amherst {cxl,marln}@cs.umass.edu

More information

Generative classification models

Generative classification models CS 675 Intro to Machne Learnng Lecture Generatve classfcaton models Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Data: D { d, d,.., dn} d, Classfcaton represents a dscrete class value Goal: learn

More information

Logistic Classifier CISC 5800 Professor Daniel Leeds

Logistic Classifier CISC 5800 Professor Daniel Leeds lon 9/7/8 Logstc Classfer CISC 58 Professor Danel Leeds Classfcaton strategy: generatve vs. dscrmnatve Generatve, e.g., Bayes/Naïve Bayes: 5 5 Identfy probablty dstrbuton for each class Determne class

More information

Support Vector Machines

Support Vector Machines Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x n class

More information

An R implementation of bootstrap procedures for mixed models

An R implementation of bootstrap procedures for mixed models The R User Conference 2009 July 8-10, Agrocampus-Ouest, Rennes, France An R mplementaton of bootstrap procedures for mxed models José A. Sánchez-Espgares Unverstat Poltècnca de Catalunya Jord Ocaña Unverstat

More information

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number

More information

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing CSC321 Tutoral 9: Revew of Boltzmann machnes and smulated annealng (Sldes based on Lecture 16-18 and selected readngs) Yue L Emal: yuel@cs.toronto.edu Wed 11-12 March 19 Fr 10-11 March 21 Outlne Boltzmann

More information

The EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X

The EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X The EM Algorthm (Dempster, Lard, Rubn 1977 The mssng data or ncomplete data settng: An Observed Data Lkelhood (ODL that s a mxture or ntegral of Complete Data Lkelhoods (CDL. (1a ODL(;Y = [Y;] = [Y,][

More information

Clustering gene expression data & the EM algorithm

Clustering gene expression data & the EM algorithm CG, Fall 2011-12 Clusterng gene expresson data & the EM algorthm CG 08 Ron Shamr 1 How Gene Expresson Data Looks Entres of the Raw Data matrx: Rato values Absolute values Row = gene s expresson pattern

More information

Discriminative classifier: Logistic Regression. CS534-Machine Learning

Discriminative classifier: Logistic Regression. CS534-Machine Learning Dscrmnatve classfer: Logstc Regresson CS534-Machne Learnng robablstc Classfer Gven an nstance, hat does a probablstc classfer do dfferentl compared to, sa, perceptron? It does not drectl predct Instead,

More information

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs

More information

Structured Perceptrons & Structural SVMs

Structured Perceptrons & Structural SVMs Structured Perceptrons Structural SVMs 4/6/27 CS 59: Advanced Topcs n Machne Learnng Recall: Sequence Predcton Input: x = (x,,x M ) Predct: y = (y,,y M ) Each y one of L labels. x = Fsh Sleep y = (N, V)

More information

MAXIMUM A POSTERIORI TRANSDUCTION

MAXIMUM A POSTERIORI TRANSDUCTION MAXIMUM A POSTERIORI TRANSDUCTION LI-WEI WANG, JU-FU FENG School of Mathematcal Scences, Peng Unversty, Bejng, 0087, Chna Center for Informaton Scences, Peng Unversty, Bejng, 0087, Chna E-MIAL: {wanglw,

More information

Gaussian Mixture Models

Gaussian Mixture Models Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous

More information

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z ) C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING 1 ADVANCED ACHINE LEARNING ADVANCED ACHINE LEARNING Non-lnear regresson technques 2 ADVANCED ACHINE LEARNING Regresson: Prncple N ap N-dm. nput x to a contnuous output y. Learn a functon of the type: N

More information

Classification learning II

Classification learning II Lecture 8 Classfcaton learnng II Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Logstc regresson model Defnes a lnear decson boundar Dscrmnant functons: g g g g here g z / e z f, g g - s a logstc functon

More information

Multilayer Perceptron (MLP)

Multilayer Perceptron (MLP) Multlayer Perceptron (MLP) Seungjn Cho Department of Computer Scence and Engneerng Pohang Unversty of Scence and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjn@postech.ac.kr 1 / 20 Outlne

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.

More information

Online Classification: Perceptron and Winnow

Online Classification: Perceptron and Winnow E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng

More information

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering / Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons

More information

Relevance Vector Machines Explained

Relevance Vector Machines Explained October 19, 2010 Relevance Vector Machnes Explaned Trstan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introducton Ths document has been wrtten n an attempt to make Tppng s [1] Relevance Vector Machnes

More information

13 Principal Components Analysis

13 Principal Components Analysis Prncpal Components Analyss 13 Prncpal Components Analyss We now dscuss an unsupervsed learnng algorthm, called Prncpal Components Analyss, or PCA. The method s unsupervsed because we are learnng a mappng

More information

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD Matrx Approxmaton va Samplng, Subspace Embeddng Lecturer: Anup Rao Scrbe: Rashth Sharma, Peng Zhang 0/01/016 1 Solvng Lnear Systems Usng SVD Two applcatons of SVD have been covered so far. Today we loo

More information

Artificial Intelligence Bayesian Networks

Artificial Intelligence Bayesian Networks Artfcal Intellgence Bayesan Networks Adapted from sldes by Tm Fnn and Mare desjardns. Some materal borrowed from Lse Getoor. 1 Outlne Bayesan networks Network structure Condtonal probablty tables Condtonal

More information

Support Vector Machines

Support Vector Machines /14/018 Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x

More information

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan Kernels n Support Vector Machnes Based on lectures of Martn Law, Unversty of Mchgan Non Lnear separable problems AND OR NOT() The XOR problem cannot be solved wth a perceptron. XOR Per Lug Martell - Systems

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

Discriminative classifier: Logistic Regression. CS534-Machine Learning

Discriminative classifier: Logistic Regression. CS534-Machine Learning Dscrmnatve classfer: Logstc Regresson CS534-Machne Learnng 2 Logstc Regresson Gven tranng set D stc regresson learns the condtonal dstrbuton We ll assume onl to classes and a parametrc form for here s

More information

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30 STATS 306B: Unsupervsed Learnng Sprng 2014 Lecture 10 Aprl 30 Lecturer: Lester Mackey Scrbe: Joey Arthur, Rakesh Achanta 10.1 Factor Analyss 10.1.1 Recap Recall the factor analyss (FA) model for lnear

More information

Engineering Risk Benefit Analysis

Engineering Risk Benefit Analysis Engneerng Rsk Beneft Analyss.55, 2.943, 3.577, 6.938, 0.86, 3.62, 6.862, 22.82, ESD.72, ESD.72 RPRA 2. Elements of Probablty Theory George E. Apostolaks Massachusetts Insttute of Technology Sprng 2007

More information

Expectation propagation

Expectation propagation Expectaton propagaton Lloyd Ellott May 17, 2011 Suppose p(x) s a pdf and we have a factorzaton p(x) = 1 Z n f (x). (1) =1 Expectaton propagaton s an nference algorthm desgned to approxmate the factors

More information

Mean Field / Variational Approximations

Mean Field / Variational Approximations Mean Feld / Varatonal Appromatons resented by Jose Nuñez 0/24/05 Outlne Introducton Mean Feld Appromaton Structured Mean Feld Weghted Mean Feld Varatonal Methods Introducton roblem: We have dstrbuton but

More information

Supplementary Material: Learning Structured Weight Uncertainty in Bayesian Neural Networks

Supplementary Material: Learning Structured Weight Uncertainty in Bayesian Neural Networks Shengyang Sun, Changyou Chen, Lawrence Carn Suppementary Matera: Learnng Structured Weght Uncertanty n Bayesan Neura Networks Shengyang Sun Changyou Chen Lawrence Carn Tsnghua Unversty Duke Unversty Duke

More information

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics /7/7 CSE 73: Artfcal Intellgence Bayesan - Learnng Deter Fox Sldes adapted from Dan Weld, Jack Breese, Dan Klen, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer What s Beng Learned? Space

More information

Statistical pattern recognition

Statistical pattern recognition Statstcal pattern recognton Bayes theorem Problem: decdng f a patent has a partcular condton based on a partcular test However, the test s mperfect Someone wth the condton may go undetected (false negatve

More information

Learning with Maximum Likelihood

Learning with Maximum Likelihood Learnng wth Mamum Lelhood Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm,

More information

Latent Autoregressive Gaussian Processes Models for Robust System Identification

Latent Autoregressive Gaussian Processes Models for Robust System Identification Preprnt, 11th IFAC Symposum on Dynamcs and Control of Process Systems, ncludng Bosystems Latent Autoregressve Gaussan Processes Models for Robust System Identfcaton César Lncoln C. Mattos Andreas Damanou

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

Excess Error, Approximation Error, and Estimation Error

Excess Error, Approximation Error, and Estimation Error E0 370 Statstcal Learnng Theory Lecture 10 Sep 15, 011 Excess Error, Approxaton Error, and Estaton Error Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton So far, we have consdered the fnte saple

More information

Meshless Surfaces. presented by Niloy J. Mitra. An Nguyen

Meshless Surfaces. presented by Niloy J. Mitra. An Nguyen Meshless Surfaces presented by Nloy J. Mtra An Nguyen Outlne Mesh-Independent Surface Interpolaton D. Levn Outlne Mesh-Independent Surface Interpolaton D. Levn Pont Set Surfaces M. Alexa, J. Behr, D. Cohen-Or,

More information

BAYESIAN CURVE FITTING USING PIECEWISE POLYNOMIALS. Dariusz Biskup

BAYESIAN CURVE FITTING USING PIECEWISE POLYNOMIALS. Dariusz Biskup BAYESIAN CURVE FITTING USING PIECEWISE POLYNOMIALS Darusz Bskup 1. Introducton The paper presents a nonparaetrc procedure for estaton of an unknown functon f n the regresson odel y = f x + ε = N. (1) (

More information

Limited Dependent Variables and Panel Data. Tibor Hanappi

Limited Dependent Variables and Panel Data. Tibor Hanappi Lmted Dependent Varables and Panel Data Tbor Hanapp 30.06.2010 Lmted Dependent Varables Dscrete: Varables that can take onl a countable number of values Censored/Truncated: Data ponts n some specfc range

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information