Scalable Multi-Class Gaussian Process Classification using Expectation Propagation

Size: px

Start display at page:

Download "Scalable Multi-Class Gaussian Process Classification using Expectation Propagation"

Augusta Nelson
5 years ago
Views:

1 Scalable Mult-Class Gaussan Process Classfcaton usng Expectaton Propagaton Carlos Vllacampa-Calvo and Danel Hernández Lobato Computer Scence Department Unversdad Autónoma de Madrd 1/22

2 Introducton to Mult-class Classfcaton wth GPs Gven x we want to make predctons about y 2{1,...,C}, C > 2. One can assume that (Km & Ghahraman, 2006): y =argmax k f k (x ) for k 2{1,...,C} 2/22

3 Introducton to Mult-class Classfcaton wth GPs Gven x we want to make predctons about y 2{1,...,C}, C > 2. One can assume that (Km & Ghahraman, 2006): y =argmax k f k (x ) for k 2{1,...,C} f(x) Labels x x 2/22

4 Introducton to Mult-class Classfcaton wth GPs Gven x we want to make predctons about y 2{1,...,C}, C > 2. One can assume that (Km & Ghahraman, 2006): y =argmax k f k (x ) for k 2{1,...,C} f(x) Labels x x Fnd p(f y) =p(y f)p(f)/p(y) under p(f k ) GP(0, k(, )). 2/22

5 Challenges n Mult-class Classfcaton wth GPs Bnary classfcaton has receved more attenton than mult-class! Challenges n the mult-class case: 1 Approxmate nference s more d cult. 3/22

6 Challenges n Mult-class Classfcaton wth GPs Bnary classfcaton has receved more attenton than mult-class! Challenges n the mult-class case: 1 Approxmate nference s more d cult. 2 C > 2 latent functons nstead of just one. 3/22

7 Challenges n Mult-class Classfcaton wth GPs Bnary classfcaton has receved more attenton than mult-class! Challenges n the mult-class case: 1 Approxmate nference s more d cult. 2 C > 2 latent functons nstead of just one. 3 Deal wth more complcated lkelhood factors. 3/22

8 Challenges n Mult-class Classfcaton wth GPs Bnary classfcaton has receved more attenton than mult-class! Challenges n the mult-class case: 1 Approxmate nference s more d cult. 2 C > 2 latent functons nstead of just one. 3 Deal wth more complcated lkelhood factors. 4 More expensve algorthms, computatonally. 3/22

9 Challenges n Mult-class Classfcaton wth GPs Bnary classfcaton has receved more attenton than mult-class! Challenges n the mult-class case: 1 Approxmate nference s more d cult. 2 C > 2 latent functons nstead of just one. 3 Deal wth more complcated lkelhood factors. 4 More expensve algorthms, computatonally. Most technques do not scale to large datasets: (Wllams & Barber, 1998; Km & Ghahraman, 2006; Grolam & Rogers, 2006; Cha, 2012; Rhmäk et al., 2013). 3/22

10 Challenges n Mult-class Classfcaton wth GPs Bnary classfcaton has receved more attenton than mult-class! Challenges n the mult-class case: 1 Approxmate nference s more d cult. 2 C > 2 latent functons nstead of just one. 3 Deal wth more complcated lkelhood factors. 4 More expensve algorthms, computatonally. Most technques do not scale to large datasets: (Wllams & Barber, 1998; Km & Ghahraman, 2006; Grolam & Rogers, 2006; Cha, 2012; Rhmäk et al., 2013). The best cost s O(CNM 2 ), f sparse prors are used. 3/22

11 Stochastc Varatonal Inference for Mult-class GPs Hensman et al., 2015, use a robust lkelhood functon: 8 p(y f )=(1 )p + < C 1 (1 p 1 f y = arg max ) wth p = k : 0 otherwse f k (x ) 4/22

12 Stochastc Varatonal Inference for Mult-class GPs Hensman et al., 2015, use a robust lkelhood functon: 8 p(y f )=(1 )p + < C 1 (1 p 1 f y = arg max ) wth p = k : 0 otherwse f k (x ) The posteror approxmaton s q(f) = R p(f f)q(f)df q(f) = Q C k=1 N (fk µ k, k ) f k =(f k (x k 1),...,f k (x k M)) T X k =(x k 1,...,x k M) T 4/22

13 Stochastc Varatonal Inference for Mult-class GPs Hensman et al., 2015, use a robust lkelhood functon: 8 p(y f )=(1 )p + < C 1 (1 p 1 f y = arg max ) wth p = k : 0 otherwse f k (x ) The posteror approxmaton s q(f) = R p(f f)q(f)df q(f) = Q C k=1 N (fk µ k, k ) f k =(f k (x k 1),...,f k (x k M)) T X k =(x k 1,...,x k M) T The number of latent varables goes from CN to CM, wthm N. 4/22

14 Stochastc Varatonal Inference for Mult-class GPs Hensman et al., 2015, use a robust lkelhood functon: 8 p(y f )=(1 )p + < C 1 (1 p 1 f y = arg max ) wth p = k : 0 otherwse f k (x ) The posteror approxmaton s q(f) = R p(f f)q(f)df q(f) = Q C k=1 N (fk µ k, k ) f k =(f k (x k 1),...,f k (x k M)) T X k =(x k 1,...,x k M) T The number of latent varables goes from CN to CM, wthm N. L(q) = NX E q [log p(y f )] =1 KL(q p) 4/22

15 Stochastc Varatonal Inference for Mult-class GPs Hensman et al., 2015, use a robust lkelhood functon: 8 p(y f )=(1 )p + < C 1 (1 p 1 f y = arg max ) wth p = k : 0 otherwse f k (x ) The posteror approxmaton s q(f) = R p(f f)q(f)df q(f) = Q C k=1 N (fk µ k, k ) f k =(f k (x k 1),...,f k (x k M)) T X k =(x k 1,...,x k M) T The number of latent varables goes from CN to CM, wthm N. L(q) = NX E q [log p(y f )] =1 KL(q p) The cost s O(CM 3 )(usesquadratures)! 4/22

16 Stochastc Varatonal Inference for Mult-class GPs Hensman et al., 2015, use a robust lkelhood functon: 8 p(y f )=(1 )p + < C 1 (1 p 1 f y = arg max ) wth p = k : 0 otherwse f k (x ) The posteror approxmaton s q(f) = R p(f f)q(f)df q(f) = Q C k=1 N (fk µ k, k ) f k =(f k (x k 1),...,f k (x k M)) T X k =(x k 1,...,x k M) T The number of latent varables goes from CN to CM, wthm N. L(q) = NX E q [log p(y f )] =1 KL(q p) The cost s O(CM 3 )(usesquadratures)! Can we do that wth EP? 4/22

17 Expectaton Propagaton (EP) Let summarze the latent varables of the model. Approxmates p( ) / p 0 ( ) Q N n=1 f n( ) wth q( ) / p 0 ( ) Q N n=1 f n ( ) 5/22

18 Expectaton Propagaton (EP) Let summarze the latent varables of the model. Approxmates p( ) / p0 ( ) QN n=1 fn ( ) wth q( ) / p0 ( ) QN n=1 fn ( ) 5 / 22

19 Expectaton Propagaton (EP) Let summarze the latent varables of the model. Approxmates p( ) / p0 ( ) QN n=1 fn ( ) wth q( ) / p0 ( ) QN n=1 fn ( ) The f n are tuned by mnmzng the KL dvergence DKL [pn q] for n = 1,..., N, where Q pn ( ) / fn ( ) j6=n f j ( ) Q. q( ) / f n ( ) j6=n f j ( ) 5 / 22

20 Model Specfcaton We consder that y =argmax k p(y f) = Q N =1 p(y f )= Q N =1 f k (x ), whch gves the lkelhood: Q k6=y (f y (x ) f k (x )) 6/22

21 Model Specfcaton We consder that y =argmax k p(y f) = Q N =1 p(y f )= Q N =1 f k (x ), whch gves the lkelhood: Q k6=y (f y (x ) f k (x )) The posteror approxmaton s also set to be q(f) = R p(f f)q(f)df. 6/22

22 Model Specfcaton We consder that y =argmax k p(y f) = Q N =1 p(y f )= Q N =1 f k (x ), whch gves the lkelhood: Q k6=y (f y (x ) f k (x )) The posteror approxmaton s also set to be q(f) = R p(f f)q(f)df. The posteror over f s: R p(y f)p(f f)dfp(f) p(f y) = p(y) [Q N =1 R p(y f )p(f f)df ]p(f) p(y) where we have used the FITC approxmaton p(f f) Q N =1 p(f f). 6/22

23 Model Specfcaton We consder that y =argmax k p(y f) = Q N =1 p(y f )= Q N =1 f k (x ), whch gves the lkelhood: Q k6=y (f y (x ) f k (x )) The posteror approxmaton s also set to be q(f) = R p(f f)q(f)df. The posteror over f s: R p(y f)p(f f)dfp(f) p(f y) = p(y) [Q N =1 R p(y f )p(f f)df ]p(f) p(y) where we have used the FITC approxmaton p(f f) Q N =1 p(f f). The correspondng lkelhood factors are: Z hq (f) = f y QC k6=y f k k=1 p(f k f k )df 6/22

24 Model Specfcaton We consder that y =argmax k p(y f) = Q N =1 p(y f )= Q N =1 f k (x ), whch gves the lkelhood: Q k6=y (f y (x ) f k (x )) The posteror approxmaton s also set to be q(f) = R p(f f)q(f)df. The posteror over f s: R p(y f)p(f f)dfp(f) p(f y) = p(y) [Q N =1 R p(y f )p(f f)df ]p(f) p(y) where we have used the FITC approxmaton p(f f) Q N =1 p(f f). The correspondng lkelhood factors are: Z hq (f) = f y QC k6=y f k k=1 p(f k f k )df The ntegral s ntractable and we cannot evaluate (f) n closed form! 6/22

25 Approxmate Lkelhood Factors It s possble to show that: (f) =p(f y > f 1,...,f y > f y 1, f y > f y +1,...,f y > f C ) 7/22

26 Approxmate Lkelhood Factors It s possble to show that: (f) =p(f y > f 1,...,f y > f y 1, f y = p(f y > f 1...,f y > f y 1, f y p(f y > f 2...,f y > f y 1, f y p(f y > f C 1 f y > f C ) p(f y > f y +1 > f y +1 > f y +1,...,f y,...,f y,...,f y > f C ) > f C ) > f C ) > f C ) 7/22

27 Approxmate Lkelhood Factors It s possble to show that: (f) =p(f y > f 1,...,f y > f y 1, f y = p(f y > f 1...,f y > f y 1, f y p(f y > f 2...,f y > f y 1, f y p(f y > f C 1 f y > f C ) p(f y Y k6=y p(f y > f k )= Y k6=y ( k ) > f y +1 > f y +1 > f y +1,...,f y,...,f y,...,f y > f C ) > f C ) > f C ) > f C ) 7/22

28 Approxmate Lkelhood Factors It s possble to show that: (f) =p(f y > f 1,...,f y > f y 1, f y = p(f y > f 1...,f y > f y 1, f y p(f y > f 2...,f y > f y 1, f y p(f y > f C 1 f y > f C ) p(f y Y k6=y p(f y > f k )= Y k6=y ( k ) > f y +1 > f y +1 > f y +1,...,f y,...,f y,...,f y > f C ) > f C ) > f C ) > f C ) where ( ) s the cdf of a standard Gaussan and we have defned q k =(m y m k )/ v y + v k wth m y, m k, v y and v k the mean and varances of f y and f k. 7/22

29 EP Approxmaton of the Lkelhood Factors EP approxmates each lkelhood factor k wth a Gaussan factor: ( k )= k (f) k 1 (f) = s,k exp exp 2 (fy ) T Ṽ y,k fy +(f y ) T m y,k 1 2 (fk ) T Ṽ k,k fk +(f k ) T m k,k 8/22

30 EP Approxmaton of the Lkelhood Factors EP approxmates each lkelhood factor k wth a Gaussan factor: ( k )= k (f) k 1 (f) = s,k exp exp 2 (fy ) T Ṽ y,k fy +(f y ) T m y,k 1 2 (fk ) T Ṽ k,k fk +(f k ) T m k,k Ṽ y,k and Ṽk,k are 1-rank matrces. Each k only has O(M) parameters. 8/22

31 EP Approxmaton of the Lkelhood Factors EP approxmates each lkelhood factor k wth a Gaussan factor: ( k )= k (f) k 1 (f) = s,k exp exp 2 (fy ) T Ṽ y,k fy +(f y ) T m y,k 1 2 (fk ) T Ṽ k,k fk +(f k ) T m k,k Ṽ y,k and Ṽk,k are 1-rank matrces. Each k The posteror approxmaton s: only has O(M) parameters. q(f) = 1 Z q N Y =1 Y k6=y k (f)p(f) and Z q approxmates the margnal lkelhood of the model. 8/22

32 Approx. Maxmzaton of the Margnal Lkelhood Z q s maxmzed w.r.t. k and X k to fnd good hyper-parameters. 9/22

33 Approx. Maxmzaton of the Margnal Lkelhood Z q s maxmzed w.r.t. k and X k to fnd good hyper-parameters. If EP converges, the gradent of log Z q s gven log Z k j = k j pror j k + where Z,k s the normalzaton constant of NX log k k6=y j =1,kq \,k wth q \,k / q/,k. 9/22

34 Approx. Maxmzaton of the Margnal Lkelhood Z q s maxmzed w.r.t. k and X k to fnd good hyper-parameters. If EP converges, the gradent of log Z q s gven log Z k j = k j pror j k + where Z,k s the normalzaton constant of NX log k k6=y j =1,kq \,k wth q \,k / q/,k. Hernández-Lobato and Hernández-Lobato, 2016 show convergence s not needed log Zq EP - Inner - Approx. Gradent EP - Inner - Exact Gradent EP - Outer Update Tranng Tme n Seconds 9/22

35 Expectaton Propagaton usng Mn-batches Consder a mnbatch of data M b : 10 / 22

36 Expectaton Propagaton usng Mn-batches Consder a mnbatch of data M b : 1 Refne n parallel all approxmate factors,k correspondng to M b. 10 / 22

37 Expectaton Propagaton usng Mn-batches Consder a mnbatch of data M b : 1 Refne n parallel all approxmate factors,k correspondng to M b. 2 Reconstruct the posteror approxmaton q. 10 / 22

38 Expectaton Propagaton usng Mn-batches Consder a mnbatch of data M b : 1 Refne n parallel all approxmate factors,k correspondng to M b. 2 Reconstruct the posteror approxmaton q. 3 Get a nosy estmate of the grad of log Z q w.r.t to each k j and x k,d. 10 / 22

39 Expectaton Propagaton usng Mn-batches Consder a mnbatch of data M b : 1 Refne n parallel all approxmate factors,k correspondng to M b. 2 Reconstruct the posteror approxmaton q. 3 Get a nosy estmate of the grad of log Z q w.r.t to each j k and x k,d. 4 Update all model hyper-parameters. 10 / 22

40 Expectaton Propagaton usng Mn-batches Consder a mnbatch of data M b : 1 Refne n parallel all approxmate factors,k correspondng to M b. 2 Reconstruct the posteror approxmaton q. 3 Get a nosy estmate of the grad of log Z q w.r.t to each j k and x k,d. 4 Update all model hyper-parameters. 5 Reconstruct the posteror approxmaton q. 10 / 22

41 Expectaton Propagaton usng Mn-batches Consder a mnbatch of data M b : 1 Refne n parallel all approxmate factors,k correspondng to M b. 2 Reconstruct the posteror approxmaton q. 3 Get a nosy estmate of the grad of log Z q w.r.t to each j k and x k,d. 4 Update all model hyper-parameters. 5 Reconstruct the posteror approxmaton q. If M b < M the cost s O(CM 3 ). Memory cost s O(NCM). 10 / 22

42 Stochastc Expectaton Propagaton L et al., 2015 suggest to store n memory only the product of the k : NY Y = k =1 k6=y 11 / 22

43 Stochastc Expectaton Propagaton L et al., 2015 suggest to store n memory only the product of the k : = NY Y k =1 k6=y The cavty dstrbuton s computed as q \,k / q/ 1 N factors. 11 / 22

44 Stochastc Expectaton Propagaton L et al., 2015 suggest to store n memory only the product of the k : = NY Y k =1 k6=y The cavty dstrbuton s computed as q \,k / q/ 1 N factors. EP SEP 11 / 22

45 Stochastc Expectaton Propagaton L et al., 2015 suggest to store n memory only the product of the k : = NY Y k =1 k6=y The cavty dstrbuton s computed as q \,k / q/ 1 N factors. EP SEP The memory cost s reduced to O(CM 2 ). 11 / 22

46 Baselne Method: Generalzed FITC Approxmaton The same lkelhood as the proposed method (Km & Ghahraman, 2006). 12 / 22

47 Baselne Method: Generalzed FITC Approxmaton The same lkelhood as the proposed method (Km & Ghahraman, 2006). Orgnal GFITC formulaton (Nash-Guzman & Hoden, 2008). 12 / 22

48 Baselne Method: Generalzed FITC Approxmaton The same lkelhood as the proposed method (Km & Ghahraman, 2006). Orgnal GFITC formulaton (Nash-Guzman & Hoden, 2008). Key d erence: The latent varables correspondng to the nducng ponts f are margnalzed out to obtan an approxmate pror: Z p(f) = p(f f)p(f)df CY N k=1 f k 0, Q k NN dag K k NN Q k NN 12 / 22

49 Baselne Method: Generalzed FITC Approxmaton The same lkelhood as the proposed method (Km & Ghahraman, 2006). Orgnal GFITC formulaton (Nash-Guzman & Hoden, 2008). Key d erence: The latent varables correspondng to the nducng ponts f are margnalzed out to obtan an approxmate pror: Z p(f) = p(f f)p(f)df CY N k=1 f k 0, Q k NN dag K k NN Q k NN Tranng costs O(CNM 2 ). Does not allow for scalable tranng! 12 / 22

50 UCI Repostory datasets Intal comparson on small datasets and batch tranng. Dataset #Instances #Attrbutes #Classes Glass New-thyrod Satellte Svmgude Vehcle Vowel Waveform Wne / 22

51 UCI Repostory (test error) M = 5% M = 10% M = 20% Problem GFITC EP SEP VI Glass 0.23 ± ± ± ± 0.02 New-thyrod 0.02 ± ± ± ± 0.01 Satellte 0.12 ± ± ± ± 0.01 Svmgude2 0.2 ± ± ± ± 0.01 Vehcle 0.17 ± ± ± ± 0.01 Vowel 0.05 ± ± ± ± 0.01 Waveform 0.17 ± ± ± ± 0.01 Wne 0.03 ± ± ± ± 0.01 Avg. Rank 2.24 ± ± ± ± 0.08 Avg. Tme 131 ± ± ± ± 0.59 Glass 0.2 ± ± ± ± 0.02 New-thyrod 0.03 ± ± ± ± 0.01 Satellte 0.11 ± ± ± ± 0.01 Svmgude ± ± ± ± 0.02 Vehcle 0.17 ± ± ± ± 0.01 Vowel 0.03 ± ± ± ± 0.01 Waveform 0.17 ± ± ± ± 0.01 Wne 0.04 ± ± ± ± 0.01 Avg. Rank 2.4 ± ± ± ± 0.08 Avg. Tme 264 ± ± ± ± 0.78 Glass 0.2 ± ± ± ± 0.02 New-thyrod 0.03 ± ± ± ± 0.01 Satellte 0.11 ± ± ± ± 0.01 Svmgude2 0.2 ± ± ± ± 0.02 Vehcle 0.17 ± ± ± ± 0.01 Vowel 0.03 ± ± ± ± 0.01 Waveform 0.17 ± ± ± ± 0.01 Wne 0.04 ± ± ± ± 0.01 Avg. Rank 2.48 ± ± ± ± 0.08 Avg. Tme 683 ± ± ± ± / 22

52 UCI Repostory (test error) M = 5% M = 10% M = 20% Problem GFITC EP SEP VI Glass 0.23 ± ± ± ± 0.02 New-thyrod 0.02 ± ± ± ± 0.01 Satellte 0.12 ± ± ± ± 0.01 Svmgude2 0.2 ± ± ± ± 0.01 Vehcle 0.17 ± ± ± ± 0.01 Vowel 0.05 ± ± ± ± 0.01 Waveform 0.17 ± ± ± ± 0.01 Wne 0.03 ± ± ± ± 0.01 Avg. Rank 2.24 ± ± ± ± 0.08 Avg. Tme 131 ± ± ± ± 0.59 Glass 0.2 ± ± ± ± 0.02 New-thyrod 0.03 ± ± ± ± 0.01 Satellte 0.11 ± ± ± ± 0.01 Svmgude ± ± ± ± 0.02 Vehcle 0.17 ± ± ± ± 0.01 Vowel 0.03 ± ± ± ± 0.01 Waveform 0.17 ± ± ± ± 0.01 Wne 0.04 ± ± ± ± 0.01 Avg. Rank 2.4 ± ± ± ± 0.08 Avg. Tme 264 ± ± ± ± 0.78 Glass 0.2 ± ± ± ± 0.02 New-thyrod 0.03 ± ± ± ± 0.01 Satellte 0.11 ± ± ± ± 0.01 Svmgude2 0.2 ± ± ± ± 0.02 Vehcle 0.17 ± ± ± ± 0.01 Vowel 0.03 ± ± ± ± 0.01 Waveform 0.17 ± ± ± ± 0.01 Wne 0.04 ± ± ± ± 0.01 Avg. Rank 2.48 ± ± ± ± 0.08 Avg. Tme 683 ± ± ± ± / 22

53 UCI Repostory (negatve test log-lkelhood) M = 5% M = 10% M = 20% Problem GFITC EP SEP VI Glass 0.61 ± ± ± ± 0.14 New-thyrod 0.06 ± ± ± ± 0.02 Satellte 0.33 ± ± ± ± 0.01 Svmgude ± ± ± ± 0.08 Vehcle 0.32 ± ± ± ± 0.05 Vowel 0.16 ± ± ± ± 0.05 Waveform 0.42 ± ± ± ± 0.02 Wne 0.08 ± ± ± ± 0.02 Avg. Rank 1.92 ± ± ± ± 0.08 Avg. Tme 131 ± ± ± ± 0.59 Glass 0.58 ± ± ± ± 0.14 New-thyrod 0.07 ± ± ± ± 0.01 Satellte 0.34 ± ± ± ± 0.01 Svmgude ± ± ± ± 0.10 Vehcle 0.33 ± ± ± ± 0.04 Vowel 0.14 ± ± ± ± 0.04 Waveform 0.42 ± ± ± ± 0.01 Wne 0.07 ± ± ± ± 0.01 Avg. Rank 2.11 ± ± ± ± 0.1 Avg. Tme 264 ± ± ± ± 0.78 Glass 0.6 ± ± ± ± 0.15 New-thyrod 0.07 ± ± ± ± 0.01 Satellte 0.34 ± ± ± ± 0.01 Svmgude ± ± ± ± 0.08 Vehcle 0.33 ± ± ± ± 0.04 Vowel 0.12 ± ± ± ± 0.03 Waveform 0.43 ± ± ± ± 0.01 Wne 0.07 ± ± ± ± 0.02 Avg. Rank 2.17 ± ± ± ± 0.1 Avg. Tme 683 ± ± ± ± / 22

54 UCI Repostory (negatve test log-lkelhood) M = 5% M = 10% M = 20% Problem GFITC EP SEP VI Glass 0.61 ± ± ± ± 0.14 New-thyrod 0.06 ± ± ± ± 0.02 Satellte 0.33 ± ± ± ± 0.01 Svmgude ± ± ± ± 0.08 Vehcle 0.32 ± ± ± ± 0.05 Vowel 0.16 ± ± ± ± 0.05 Waveform 0.42 ± ± ± ± 0.02 Wne 0.08 ± ± ± ± 0.02 Avg. Rank 1.92 ± ± ± ± 0.08 Avg. Tme 131 ± ± ± ± 0.59 Glass 0.58 ± ± ± ± 0.14 New-thyrod 0.07 ± ± ± ± 0.01 Satellte 0.34 ± ± ± ± 0.01 Svmgude ± ± ± ± 0.10 Vehcle 0.33 ± ± ± ± 0.04 Vowel 0.14 ± ± ± ± 0.04 Waveform 0.42 ± ± ± ± 0.01 Wne 0.07 ± ± ± ± 0.01 Avg. Rank 2.11 ± ± ± ± 0.1 Avg. Tme 264 ± ± ± ± 0.78 Glass 0.6 ± ± ± ± 0.15 New-thyrod 0.07 ± ± ± ± 0.01 Satellte 0.34 ± ± ± ± 0.01 Svmgude ± ± ± ± 0.08 Vehcle 0.33 ± ± ± ± 0.04 Vowel 0.12 ± ± ± ± 0.03 Waveform 0.43 ± ± ± ± 0.01 Wne 0.07 ± ± ± ± 0.02 Avg. Rank 2.17 ± ± ± ± 0.1 Avg. Tme 683 ± ± ± ± / 22

55 Inducng Pont Placement Analyss M=1 M=2 M=4 M=8 GFITC M = 32 EP SEP VI M = 256 M = / 22

56 Inducng Pont Placement Analyss M=1 M=2 M=4 M=8 GFITC M = 32 EP SEP VI EP based methods perform nducng pont prunng M = 256 M = 128 (Bauer et al., 2016)! 16 / 22

57 Performance n Terms of Tme (Satellte Dataset) Avg. Test Error GFITC M = 4 GFITC M = 20 GFITC M = 100 SEP M = 4 SEP M = 20 SEP M = 100 VI M = 4 VI M = 20 VI M = Avg. Tme n seconds n log 10 Avg. Negatve Test Log Lkelhood GFITC M = 4 GFITC M = 20 GFITC M = 100 SEP M = 4 SEP M = 20 SEP M = 100 VI M = 4 VI M = 20 VI M = Avg. Tme n seconds n log / 22

58 Mnbatch Tranng: MNIST Dataset M =200 Test Error Neg. Test Log Lkelhood Tranng Tme n Seconds n a Log10 Scale Tranng Tme n Seconds n a Log10 Scale Methods EP SEP VI Methods EP SEP VI 18 / 22

59 Mnbatch Tranng: MNIST Dataset M =200 Method Test Error n % Neg. Test Log-Lkelhood EP SEP VI / 22

60 Mnbatch Tranng: Arlne-delays M =200 Test Error Methods EP Lnear SEP VI Neg. Test Log Lkelhood e+01 1e+03 1e+05 Tranng Tme n Seconds n a Log10 Scale Methods EP Lnear SEP VI 1e+01 1e+03 1e+05 Tranng Tme n Seconds n a Log10 Scale 19 / 22

61 Mnbatch Tranng: Arlne-delays M = Tranng Tme n a Log10 Scale 20 / 22

62 Conclusons EP method for mult-class classfcaton usng GPs. 21 / 22

63 Conclusons EP method for mult-class classfcaton usng GPs. E cent tranng and memory usage wth cost O(CM 3 ). 21 / 22

64 Conclusons EP method for mult-class classfcaton usng GPs. E cent tranng and memory usage wth cost O(CM 3 ). Extensve expermental comparson wth related methods. 21 / 22

65 Conclusons EP method for mult-class classfcaton usng GPs. E cent tranng and memory usage wth cost O(CM 3 ). Extensve expermental comparson wth related methods. SEP s slghtly faster than VI and s quadrature free. 21 / 22

66 Conclusons EP method for mult-class classfcaton usng GPs. E cent tranng and memory usage wth cost O(CM 3 ). Extensve expermental comparson wth related methods. SEP s slghtly faster than VI and s quadrature free. EP methods carry out nducng pont prunng. 21 / 22

67 Conclusons EP method for mult-class classfcaton usng GPs. E cent tranng and memory usage wth cost O(CM 3 ). Extensve expermental comparson wth related methods. SEP s slghtly faster than VI and s quadrature free. EP methods carry out nducng pont prunng. VI sometmes gves bad test log-lkelhoods. 21 / 22

68 Conclusons EP method for mult-class classfcaton usng GPs. E cent tranng and memory usage wth cost O(CM 3 ). Extensve expermental comparson wth related methods. SEP s slghtly faster than VI and s quadrature free. EP methods carry out nducng pont prunng. VI sometmes gves bad test log-lkelhoods. Thank you for your attenton! 21 / 22

69 References Bauer, M., van der Wlk, M., and Rasmussen, C. E. Understandng probablstc sparse Gaussan process approxmatons. NIPS 29, pp Cha, K. M. A. Varatonal multnomal logt Gaussan process. JMLR, 13: , Grolam, M. and Rogers, S. Varatonal Bayesan multnomal probt regresson wth Gaussan process prors. Neural Computaton, 18: , Hensman, J., Matthews, A. G., Flppone, M., and Ghahraman, Z. MCMC for varatonally sparse Gaussan processes. NIPS 28, pp Hernández-Lobato, D. and Hernández-Lobato, J. M. Scal- able Gaussan process classfcaton va expectaton propagaton. AISTATS, pp , Km, H.-C. and Ghahraman, Z. Bayesan Gaussan process classfcaton wth the EM-EP algorthm. IEEE PAMI, 28, , L, Y., Hernandez-Lobato, J. M., and Turner, R. E. Stochas- tc expectaton propagaton. NIPS 28, pp Nash-Guzman, A. and Holden, S. The generalzed FITC approxmaton. NIPS 20, pp Rhmäk, J., Jylänk, P., and Vehtar, A. Nested expectaton propagaton for Gaussan process classfcaton wth a multnomal probt lkelhood. JMLR, 14, , Snelson, E. and Ghahraman, Z. Sparse Gaussan processes usng pseudo-nputs. NIPS 18, pp , Wllams, C. K. I. and Barber, D. Bayesan classfcaton wth Gaussan processes. IEEE PAMI, 20, , / 22

Gaussian process classification: a message-passing viewpoint

Gaussian process classification: a message-passing viewpoint Gaussan process classfcaton: a message-passng vewpont Flpe Rodrgues fmpr@de.uc.pt November 014 Abstract The goal of ths short paper s to provde a message-passng vewpont of the Expectaton Propagaton EP