Scalable Multi-Class Gaussian Process Classification using Expectation Propagation
|
|
- Augusta Nelson
- 5 years ago
- Views:
Transcription
1 Scalable Mult-Class Gaussan Process Classfcaton usng Expectaton Propagaton Carlos Vllacampa-Calvo and Danel Hernández Lobato Computer Scence Department Unversdad Autónoma de Madrd 1/22
2 Introducton to Mult-class Classfcaton wth GPs Gven x we want to make predctons about y 2{1,...,C}, C > 2. One can assume that (Km & Ghahraman, 2006): y =argmax k f k (x ) for k 2{1,...,C} 2/22
3 Introducton to Mult-class Classfcaton wth GPs Gven x we want to make predctons about y 2{1,...,C}, C > 2. One can assume that (Km & Ghahraman, 2006): y =argmax k f k (x ) for k 2{1,...,C} f(x) Labels x x 2/22
4 Introducton to Mult-class Classfcaton wth GPs Gven x we want to make predctons about y 2{1,...,C}, C > 2. One can assume that (Km & Ghahraman, 2006): y =argmax k f k (x ) for k 2{1,...,C} f(x) Labels x x Fnd p(f y) =p(y f)p(f)/p(y) under p(f k ) GP(0, k(, )). 2/22
5 Challenges n Mult-class Classfcaton wth GPs Bnary classfcaton has receved more attenton than mult-class! Challenges n the mult-class case: 1 Approxmate nference s more d cult. 3/22
6 Challenges n Mult-class Classfcaton wth GPs Bnary classfcaton has receved more attenton than mult-class! Challenges n the mult-class case: 1 Approxmate nference s more d cult. 2 C > 2 latent functons nstead of just one. 3/22
7 Challenges n Mult-class Classfcaton wth GPs Bnary classfcaton has receved more attenton than mult-class! Challenges n the mult-class case: 1 Approxmate nference s more d cult. 2 C > 2 latent functons nstead of just one. 3 Deal wth more complcated lkelhood factors. 3/22
8 Challenges n Mult-class Classfcaton wth GPs Bnary classfcaton has receved more attenton than mult-class! Challenges n the mult-class case: 1 Approxmate nference s more d cult. 2 C > 2 latent functons nstead of just one. 3 Deal wth more complcated lkelhood factors. 4 More expensve algorthms, computatonally. 3/22
9 Challenges n Mult-class Classfcaton wth GPs Bnary classfcaton has receved more attenton than mult-class! Challenges n the mult-class case: 1 Approxmate nference s more d cult. 2 C > 2 latent functons nstead of just one. 3 Deal wth more complcated lkelhood factors. 4 More expensve algorthms, computatonally. Most technques do not scale to large datasets: (Wllams & Barber, 1998; Km & Ghahraman, 2006; Grolam & Rogers, 2006; Cha, 2012; Rhmäk et al., 2013). 3/22
10 Challenges n Mult-class Classfcaton wth GPs Bnary classfcaton has receved more attenton than mult-class! Challenges n the mult-class case: 1 Approxmate nference s more d cult. 2 C > 2 latent functons nstead of just one. 3 Deal wth more complcated lkelhood factors. 4 More expensve algorthms, computatonally. Most technques do not scale to large datasets: (Wllams & Barber, 1998; Km & Ghahraman, 2006; Grolam & Rogers, 2006; Cha, 2012; Rhmäk et al., 2013). The best cost s O(CNM 2 ), f sparse prors are used. 3/22
11 Stochastc Varatonal Inference for Mult-class GPs Hensman et al., 2015, use a robust lkelhood functon: 8 p(y f )=(1 )p + < C 1 (1 p 1 f y = arg max ) wth p = k : 0 otherwse f k (x ) 4/22
12 Stochastc Varatonal Inference for Mult-class GPs Hensman et al., 2015, use a robust lkelhood functon: 8 p(y f )=(1 )p + < C 1 (1 p 1 f y = arg max ) wth p = k : 0 otherwse f k (x ) The posteror approxmaton s q(f) = R p(f f)q(f)df q(f) = Q C k=1 N (fk µ k, k ) f k =(f k (x k 1),...,f k (x k M)) T X k =(x k 1,...,x k M) T 4/22
13 Stochastc Varatonal Inference for Mult-class GPs Hensman et al., 2015, use a robust lkelhood functon: 8 p(y f )=(1 )p + < C 1 (1 p 1 f y = arg max ) wth p = k : 0 otherwse f k (x ) The posteror approxmaton s q(f) = R p(f f)q(f)df q(f) = Q C k=1 N (fk µ k, k ) f k =(f k (x k 1),...,f k (x k M)) T X k =(x k 1,...,x k M) T The number of latent varables goes from CN to CM, wthm N. 4/22
14 Stochastc Varatonal Inference for Mult-class GPs Hensman et al., 2015, use a robust lkelhood functon: 8 p(y f )=(1 )p + < C 1 (1 p 1 f y = arg max ) wth p = k : 0 otherwse f k (x ) The posteror approxmaton s q(f) = R p(f f)q(f)df q(f) = Q C k=1 N (fk µ k, k ) f k =(f k (x k 1),...,f k (x k M)) T X k =(x k 1,...,x k M) T The number of latent varables goes from CN to CM, wthm N. L(q) = NX E q [log p(y f )] =1 KL(q p) 4/22
15 Stochastc Varatonal Inference for Mult-class GPs Hensman et al., 2015, use a robust lkelhood functon: 8 p(y f )=(1 )p + < C 1 (1 p 1 f y = arg max ) wth p = k : 0 otherwse f k (x ) The posteror approxmaton s q(f) = R p(f f)q(f)df q(f) = Q C k=1 N (fk µ k, k ) f k =(f k (x k 1),...,f k (x k M)) T X k =(x k 1,...,x k M) T The number of latent varables goes from CN to CM, wthm N. L(q) = NX E q [log p(y f )] =1 KL(q p) The cost s O(CM 3 )(usesquadratures)! 4/22
16 Stochastc Varatonal Inference for Mult-class GPs Hensman et al., 2015, use a robust lkelhood functon: 8 p(y f )=(1 )p + < C 1 (1 p 1 f y = arg max ) wth p = k : 0 otherwse f k (x ) The posteror approxmaton s q(f) = R p(f f)q(f)df q(f) = Q C k=1 N (fk µ k, k ) f k =(f k (x k 1),...,f k (x k M)) T X k =(x k 1,...,x k M) T The number of latent varables goes from CN to CM, wthm N. L(q) = NX E q [log p(y f )] =1 KL(q p) The cost s O(CM 3 )(usesquadratures)! Can we do that wth EP? 4/22
17 Expectaton Propagaton (EP) Let summarze the latent varables of the model. Approxmates p( ) / p 0 ( ) Q N n=1 f n( ) wth q( ) / p 0 ( ) Q N n=1 f n ( ) 5/22
18 Expectaton Propagaton (EP) Let summarze the latent varables of the model. Approxmates p( ) / p0 ( ) QN n=1 fn ( ) wth q( ) / p0 ( ) QN n=1 fn ( ) 5 / 22
19 Expectaton Propagaton (EP) Let summarze the latent varables of the model. Approxmates p( ) / p0 ( ) QN n=1 fn ( ) wth q( ) / p0 ( ) QN n=1 fn ( ) The f n are tuned by mnmzng the KL dvergence DKL [pn q] for n = 1,..., N, where Q pn ( ) / fn ( ) j6=n f j ( ) Q. q( ) / f n ( ) j6=n f j ( ) 5 / 22
20 Model Specfcaton We consder that y =argmax k p(y f) = Q N =1 p(y f )= Q N =1 f k (x ), whch gves the lkelhood: Q k6=y (f y (x ) f k (x )) 6/22
21 Model Specfcaton We consder that y =argmax k p(y f) = Q N =1 p(y f )= Q N =1 f k (x ), whch gves the lkelhood: Q k6=y (f y (x ) f k (x )) The posteror approxmaton s also set to be q(f) = R p(f f)q(f)df. 6/22
22 Model Specfcaton We consder that y =argmax k p(y f) = Q N =1 p(y f )= Q N =1 f k (x ), whch gves the lkelhood: Q k6=y (f y (x ) f k (x )) The posteror approxmaton s also set to be q(f) = R p(f f)q(f)df. The posteror over f s: R p(y f)p(f f)dfp(f) p(f y) = p(y) [Q N =1 R p(y f )p(f f)df ]p(f) p(y) where we have used the FITC approxmaton p(f f) Q N =1 p(f f). 6/22
23 Model Specfcaton We consder that y =argmax k p(y f) = Q N =1 p(y f )= Q N =1 f k (x ), whch gves the lkelhood: Q k6=y (f y (x ) f k (x )) The posteror approxmaton s also set to be q(f) = R p(f f)q(f)df. The posteror over f s: R p(y f)p(f f)dfp(f) p(f y) = p(y) [Q N =1 R p(y f )p(f f)df ]p(f) p(y) where we have used the FITC approxmaton p(f f) Q N =1 p(f f). The correspondng lkelhood factors are: Z hq (f) = f y QC k6=y f k k=1 p(f k f k )df 6/22
24 Model Specfcaton We consder that y =argmax k p(y f) = Q N =1 p(y f )= Q N =1 f k (x ), whch gves the lkelhood: Q k6=y (f y (x ) f k (x )) The posteror approxmaton s also set to be q(f) = R p(f f)q(f)df. The posteror over f s: R p(y f)p(f f)dfp(f) p(f y) = p(y) [Q N =1 R p(y f )p(f f)df ]p(f) p(y) where we have used the FITC approxmaton p(f f) Q N =1 p(f f). The correspondng lkelhood factors are: Z hq (f) = f y QC k6=y f k k=1 p(f k f k )df The ntegral s ntractable and we cannot evaluate (f) n closed form! 6/22
25 Approxmate Lkelhood Factors It s possble to show that: (f) =p(f y > f 1,...,f y > f y 1, f y > f y +1,...,f y > f C ) 7/22
26 Approxmate Lkelhood Factors It s possble to show that: (f) =p(f y > f 1,...,f y > f y 1, f y = p(f y > f 1...,f y > f y 1, f y p(f y > f 2...,f y > f y 1, f y p(f y > f C 1 f y > f C ) p(f y > f y +1 > f y +1 > f y +1,...,f y,...,f y,...,f y > f C ) > f C ) > f C ) > f C ) 7/22
27 Approxmate Lkelhood Factors It s possble to show that: (f) =p(f y > f 1,...,f y > f y 1, f y = p(f y > f 1...,f y > f y 1, f y p(f y > f 2...,f y > f y 1, f y p(f y > f C 1 f y > f C ) p(f y Y k6=y p(f y > f k )= Y k6=y ( k ) > f y +1 > f y +1 > f y +1,...,f y,...,f y,...,f y > f C ) > f C ) > f C ) > f C ) 7/22
28 Approxmate Lkelhood Factors It s possble to show that: (f) =p(f y > f 1,...,f y > f y 1, f y = p(f y > f 1...,f y > f y 1, f y p(f y > f 2...,f y > f y 1, f y p(f y > f C 1 f y > f C ) p(f y Y k6=y p(f y > f k )= Y k6=y ( k ) > f y +1 > f y +1 > f y +1,...,f y,...,f y,...,f y > f C ) > f C ) > f C ) > f C ) where ( ) s the cdf of a standard Gaussan and we have defned q k =(m y m k )/ v y + v k wth m y, m k, v y and v k the mean and varances of f y and f k. 7/22
29 EP Approxmaton of the Lkelhood Factors EP approxmates each lkelhood factor k wth a Gaussan factor: ( k )= k (f) k 1 (f) = s,k exp exp 2 (fy ) T Ṽ y,k fy +(f y ) T m y,k 1 2 (fk ) T Ṽ k,k fk +(f k ) T m k,k 8/22
30 EP Approxmaton of the Lkelhood Factors EP approxmates each lkelhood factor k wth a Gaussan factor: ( k )= k (f) k 1 (f) = s,k exp exp 2 (fy ) T Ṽ y,k fy +(f y ) T m y,k 1 2 (fk ) T Ṽ k,k fk +(f k ) T m k,k Ṽ y,k and Ṽk,k are 1-rank matrces. Each k only has O(M) parameters. 8/22
31 EP Approxmaton of the Lkelhood Factors EP approxmates each lkelhood factor k wth a Gaussan factor: ( k )= k (f) k 1 (f) = s,k exp exp 2 (fy ) T Ṽ y,k fy +(f y ) T m y,k 1 2 (fk ) T Ṽ k,k fk +(f k ) T m k,k Ṽ y,k and Ṽk,k are 1-rank matrces. Each k The posteror approxmaton s: only has O(M) parameters. q(f) = 1 Z q N Y =1 Y k6=y k (f)p(f) and Z q approxmates the margnal lkelhood of the model. 8/22
32 Approx. Maxmzaton of the Margnal Lkelhood Z q s maxmzed w.r.t. k and X k to fnd good hyper-parameters. 9/22
33 Approx. Maxmzaton of the Margnal Lkelhood Z q s maxmzed w.r.t. k and X k to fnd good hyper-parameters. If EP converges, the gradent of log Z q s gven log Z k j = k j pror j k + where Z,k s the normalzaton constant of NX log k k6=y j =1,kq \,k wth q \,k / q/,k. 9/22
34 Approx. Maxmzaton of the Margnal Lkelhood Z q s maxmzed w.r.t. k and X k to fnd good hyper-parameters. If EP converges, the gradent of log Z q s gven log Z k j = k j pror j k + where Z,k s the normalzaton constant of NX log k k6=y j =1,kq \,k wth q \,k / q/,k. Hernández-Lobato and Hernández-Lobato, 2016 show convergence s not needed log Zq EP - Inner - Approx. Gradent EP - Inner - Exact Gradent EP - Outer Update Tranng Tme n Seconds 9/22
35 Expectaton Propagaton usng Mn-batches Consder a mnbatch of data M b : 10 / 22
36 Expectaton Propagaton usng Mn-batches Consder a mnbatch of data M b : 1 Refne n parallel all approxmate factors,k correspondng to M b. 10 / 22
37 Expectaton Propagaton usng Mn-batches Consder a mnbatch of data M b : 1 Refne n parallel all approxmate factors,k correspondng to M b. 2 Reconstruct the posteror approxmaton q. 10 / 22
38 Expectaton Propagaton usng Mn-batches Consder a mnbatch of data M b : 1 Refne n parallel all approxmate factors,k correspondng to M b. 2 Reconstruct the posteror approxmaton q. 3 Get a nosy estmate of the grad of log Z q w.r.t to each k j and x k,d. 10 / 22
39 Expectaton Propagaton usng Mn-batches Consder a mnbatch of data M b : 1 Refne n parallel all approxmate factors,k correspondng to M b. 2 Reconstruct the posteror approxmaton q. 3 Get a nosy estmate of the grad of log Z q w.r.t to each j k and x k,d. 4 Update all model hyper-parameters. 10 / 22
40 Expectaton Propagaton usng Mn-batches Consder a mnbatch of data M b : 1 Refne n parallel all approxmate factors,k correspondng to M b. 2 Reconstruct the posteror approxmaton q. 3 Get a nosy estmate of the grad of log Z q w.r.t to each j k and x k,d. 4 Update all model hyper-parameters. 5 Reconstruct the posteror approxmaton q. 10 / 22
41 Expectaton Propagaton usng Mn-batches Consder a mnbatch of data M b : 1 Refne n parallel all approxmate factors,k correspondng to M b. 2 Reconstruct the posteror approxmaton q. 3 Get a nosy estmate of the grad of log Z q w.r.t to each j k and x k,d. 4 Update all model hyper-parameters. 5 Reconstruct the posteror approxmaton q. If M b < M the cost s O(CM 3 ). Memory cost s O(NCM). 10 / 22
42 Stochastc Expectaton Propagaton L et al., 2015 suggest to store n memory only the product of the k : NY Y = k =1 k6=y 11 / 22
43 Stochastc Expectaton Propagaton L et al., 2015 suggest to store n memory only the product of the k : = NY Y k =1 k6=y The cavty dstrbuton s computed as q \,k / q/ 1 N factors. 11 / 22
44 Stochastc Expectaton Propagaton L et al., 2015 suggest to store n memory only the product of the k : = NY Y k =1 k6=y The cavty dstrbuton s computed as q \,k / q/ 1 N factors. EP SEP 11 / 22
45 Stochastc Expectaton Propagaton L et al., 2015 suggest to store n memory only the product of the k : = NY Y k =1 k6=y The cavty dstrbuton s computed as q \,k / q/ 1 N factors. EP SEP The memory cost s reduced to O(CM 2 ). 11 / 22
46 Baselne Method: Generalzed FITC Approxmaton The same lkelhood as the proposed method (Km & Ghahraman, 2006). 12 / 22
47 Baselne Method: Generalzed FITC Approxmaton The same lkelhood as the proposed method (Km & Ghahraman, 2006). Orgnal GFITC formulaton (Nash-Guzman & Hoden, 2008). 12 / 22
48 Baselne Method: Generalzed FITC Approxmaton The same lkelhood as the proposed method (Km & Ghahraman, 2006). Orgnal GFITC formulaton (Nash-Guzman & Hoden, 2008). Key d erence: The latent varables correspondng to the nducng ponts f are margnalzed out to obtan an approxmate pror: Z p(f) = p(f f)p(f)df CY N k=1 f k 0, Q k NN dag K k NN Q k NN 12 / 22
49 Baselne Method: Generalzed FITC Approxmaton The same lkelhood as the proposed method (Km & Ghahraman, 2006). Orgnal GFITC formulaton (Nash-Guzman & Hoden, 2008). Key d erence: The latent varables correspondng to the nducng ponts f are margnalzed out to obtan an approxmate pror: Z p(f) = p(f f)p(f)df CY N k=1 f k 0, Q k NN dag K k NN Q k NN Tranng costs O(CNM 2 ). Does not allow for scalable tranng! 12 / 22
50 UCI Repostory datasets Intal comparson on small datasets and batch tranng. Dataset #Instances #Attrbutes #Classes Glass New-thyrod Satellte Svmgude Vehcle Vowel Waveform Wne / 22
51 UCI Repostory (test error) M = 5% M = 10% M = 20% Problem GFITC EP SEP VI Glass 0.23 ± ± ± ± 0.02 New-thyrod 0.02 ± ± ± ± 0.01 Satellte 0.12 ± ± ± ± 0.01 Svmgude2 0.2 ± ± ± ± 0.01 Vehcle 0.17 ± ± ± ± 0.01 Vowel 0.05 ± ± ± ± 0.01 Waveform 0.17 ± ± ± ± 0.01 Wne 0.03 ± ± ± ± 0.01 Avg. Rank 2.24 ± ± ± ± 0.08 Avg. Tme 131 ± ± ± ± 0.59 Glass 0.2 ± ± ± ± 0.02 New-thyrod 0.03 ± ± ± ± 0.01 Satellte 0.11 ± ± ± ± 0.01 Svmgude ± ± ± ± 0.02 Vehcle 0.17 ± ± ± ± 0.01 Vowel 0.03 ± ± ± ± 0.01 Waveform 0.17 ± ± ± ± 0.01 Wne 0.04 ± ± ± ± 0.01 Avg. Rank 2.4 ± ± ± ± 0.08 Avg. Tme 264 ± ± ± ± 0.78 Glass 0.2 ± ± ± ± 0.02 New-thyrod 0.03 ± ± ± ± 0.01 Satellte 0.11 ± ± ± ± 0.01 Svmgude2 0.2 ± ± ± ± 0.02 Vehcle 0.17 ± ± ± ± 0.01 Vowel 0.03 ± ± ± ± 0.01 Waveform 0.17 ± ± ± ± 0.01 Wne 0.04 ± ± ± ± 0.01 Avg. Rank 2.48 ± ± ± ± 0.08 Avg. Tme 683 ± ± ± ± / 22
52 UCI Repostory (test error) M = 5% M = 10% M = 20% Problem GFITC EP SEP VI Glass 0.23 ± ± ± ± 0.02 New-thyrod 0.02 ± ± ± ± 0.01 Satellte 0.12 ± ± ± ± 0.01 Svmgude2 0.2 ± ± ± ± 0.01 Vehcle 0.17 ± ± ± ± 0.01 Vowel 0.05 ± ± ± ± 0.01 Waveform 0.17 ± ± ± ± 0.01 Wne 0.03 ± ± ± ± 0.01 Avg. Rank 2.24 ± ± ± ± 0.08 Avg. Tme 131 ± ± ± ± 0.59 Glass 0.2 ± ± ± ± 0.02 New-thyrod 0.03 ± ± ± ± 0.01 Satellte 0.11 ± ± ± ± 0.01 Svmgude ± ± ± ± 0.02 Vehcle 0.17 ± ± ± ± 0.01 Vowel 0.03 ± ± ± ± 0.01 Waveform 0.17 ± ± ± ± 0.01 Wne 0.04 ± ± ± ± 0.01 Avg. Rank 2.4 ± ± ± ± 0.08 Avg. Tme 264 ± ± ± ± 0.78 Glass 0.2 ± ± ± ± 0.02 New-thyrod 0.03 ± ± ± ± 0.01 Satellte 0.11 ± ± ± ± 0.01 Svmgude2 0.2 ± ± ± ± 0.02 Vehcle 0.17 ± ± ± ± 0.01 Vowel 0.03 ± ± ± ± 0.01 Waveform 0.17 ± ± ± ± 0.01 Wne 0.04 ± ± ± ± 0.01 Avg. Rank 2.48 ± ± ± ± 0.08 Avg. Tme 683 ± ± ± ± / 22
53 UCI Repostory (negatve test log-lkelhood) M = 5% M = 10% M = 20% Problem GFITC EP SEP VI Glass 0.61 ± ± ± ± 0.14 New-thyrod 0.06 ± ± ± ± 0.02 Satellte 0.33 ± ± ± ± 0.01 Svmgude ± ± ± ± 0.08 Vehcle 0.32 ± ± ± ± 0.05 Vowel 0.16 ± ± ± ± 0.05 Waveform 0.42 ± ± ± ± 0.02 Wne 0.08 ± ± ± ± 0.02 Avg. Rank 1.92 ± ± ± ± 0.08 Avg. Tme 131 ± ± ± ± 0.59 Glass 0.58 ± ± ± ± 0.14 New-thyrod 0.07 ± ± ± ± 0.01 Satellte 0.34 ± ± ± ± 0.01 Svmgude ± ± ± ± 0.10 Vehcle 0.33 ± ± ± ± 0.04 Vowel 0.14 ± ± ± ± 0.04 Waveform 0.42 ± ± ± ± 0.01 Wne 0.07 ± ± ± ± 0.01 Avg. Rank 2.11 ± ± ± ± 0.1 Avg. Tme 264 ± ± ± ± 0.78 Glass 0.6 ± ± ± ± 0.15 New-thyrod 0.07 ± ± ± ± 0.01 Satellte 0.34 ± ± ± ± 0.01 Svmgude ± ± ± ± 0.08 Vehcle 0.33 ± ± ± ± 0.04 Vowel 0.12 ± ± ± ± 0.03 Waveform 0.43 ± ± ± ± 0.01 Wne 0.07 ± ± ± ± 0.02 Avg. Rank 2.17 ± ± ± ± 0.1 Avg. Tme 683 ± ± ± ± / 22
54 UCI Repostory (negatve test log-lkelhood) M = 5% M = 10% M = 20% Problem GFITC EP SEP VI Glass 0.61 ± ± ± ± 0.14 New-thyrod 0.06 ± ± ± ± 0.02 Satellte 0.33 ± ± ± ± 0.01 Svmgude ± ± ± ± 0.08 Vehcle 0.32 ± ± ± ± 0.05 Vowel 0.16 ± ± ± ± 0.05 Waveform 0.42 ± ± ± ± 0.02 Wne 0.08 ± ± ± ± 0.02 Avg. Rank 1.92 ± ± ± ± 0.08 Avg. Tme 131 ± ± ± ± 0.59 Glass 0.58 ± ± ± ± 0.14 New-thyrod 0.07 ± ± ± ± 0.01 Satellte 0.34 ± ± ± ± 0.01 Svmgude ± ± ± ± 0.10 Vehcle 0.33 ± ± ± ± 0.04 Vowel 0.14 ± ± ± ± 0.04 Waveform 0.42 ± ± ± ± 0.01 Wne 0.07 ± ± ± ± 0.01 Avg. Rank 2.11 ± ± ± ± 0.1 Avg. Tme 264 ± ± ± ± 0.78 Glass 0.6 ± ± ± ± 0.15 New-thyrod 0.07 ± ± ± ± 0.01 Satellte 0.34 ± ± ± ± 0.01 Svmgude ± ± ± ± 0.08 Vehcle 0.33 ± ± ± ± 0.04 Vowel 0.12 ± ± ± ± 0.03 Waveform 0.43 ± ± ± ± 0.01 Wne 0.07 ± ± ± ± 0.02 Avg. Rank 2.17 ± ± ± ± 0.1 Avg. Tme 683 ± ± ± ± / 22
55 Inducng Pont Placement Analyss M=1 M=2 M=4 M=8 GFITC M = 32 EP SEP VI M = 256 M = / 22
56 Inducng Pont Placement Analyss M=1 M=2 M=4 M=8 GFITC M = 32 EP SEP VI EP based methods perform nducng pont prunng M = 256 M = 128 (Bauer et al., 2016)! 16 / 22
57 Performance n Terms of Tme (Satellte Dataset) Avg. Test Error GFITC M = 4 GFITC M = 20 GFITC M = 100 SEP M = 4 SEP M = 20 SEP M = 100 VI M = 4 VI M = 20 VI M = Avg. Tme n seconds n log 10 Avg. Negatve Test Log Lkelhood GFITC M = 4 GFITC M = 20 GFITC M = 100 SEP M = 4 SEP M = 20 SEP M = 100 VI M = 4 VI M = 20 VI M = Avg. Tme n seconds n log / 22
58 Mnbatch Tranng: MNIST Dataset M =200 Test Error Neg. Test Log Lkelhood Tranng Tme n Seconds n a Log10 Scale Tranng Tme n Seconds n a Log10 Scale Methods EP SEP VI Methods EP SEP VI 18 / 22
59 Mnbatch Tranng: MNIST Dataset M =200 Method Test Error n % Neg. Test Log-Lkelhood EP SEP VI / 22
60 Mnbatch Tranng: Arlne-delays M =200 Test Error Methods EP Lnear SEP VI Neg. Test Log Lkelhood e+01 1e+03 1e+05 Tranng Tme n Seconds n a Log10 Scale Methods EP Lnear SEP VI 1e+01 1e+03 1e+05 Tranng Tme n Seconds n a Log10 Scale 19 / 22
61 Mnbatch Tranng: Arlne-delays M = Tranng Tme n a Log10 Scale 20 / 22
62 Conclusons EP method for mult-class classfcaton usng GPs. 21 / 22
63 Conclusons EP method for mult-class classfcaton usng GPs. E cent tranng and memory usage wth cost O(CM 3 ). 21 / 22
64 Conclusons EP method for mult-class classfcaton usng GPs. E cent tranng and memory usage wth cost O(CM 3 ). Extensve expermental comparson wth related methods. 21 / 22
65 Conclusons EP method for mult-class classfcaton usng GPs. E cent tranng and memory usage wth cost O(CM 3 ). Extensve expermental comparson wth related methods. SEP s slghtly faster than VI and s quadrature free. 21 / 22
66 Conclusons EP method for mult-class classfcaton usng GPs. E cent tranng and memory usage wth cost O(CM 3 ). Extensve expermental comparson wth related methods. SEP s slghtly faster than VI and s quadrature free. EP methods carry out nducng pont prunng. 21 / 22
67 Conclusons EP method for mult-class classfcaton usng GPs. E cent tranng and memory usage wth cost O(CM 3 ). Extensve expermental comparson wth related methods. SEP s slghtly faster than VI and s quadrature free. EP methods carry out nducng pont prunng. VI sometmes gves bad test log-lkelhoods. 21 / 22
68 Conclusons EP method for mult-class classfcaton usng GPs. E cent tranng and memory usage wth cost O(CM 3 ). Extensve expermental comparson wth related methods. SEP s slghtly faster than VI and s quadrature free. EP methods carry out nducng pont prunng. VI sometmes gves bad test log-lkelhoods. Thank you for your attenton! 21 / 22
69 References Bauer, M., van der Wlk, M., and Rasmussen, C. E. Understandng probablstc sparse Gaussan process approxmatons. NIPS 29, pp Cha, K. M. A. Varatonal multnomal logt Gaussan process. JMLR, 13: , Grolam, M. and Rogers, S. Varatonal Bayesan multnomal probt regresson wth Gaussan process prors. Neural Computaton, 18: , Hensman, J., Matthews, A. G., Flppone, M., and Ghahraman, Z. MCMC for varatonally sparse Gaussan processes. NIPS 28, pp Hernández-Lobato, D. and Hernández-Lobato, J. M. Scal- able Gaussan process classfcaton va expectaton propagaton. AISTATS, pp , Km, H.-C. and Ghahraman, Z. Bayesan Gaussan process classfcaton wth the EM-EP algorthm. IEEE PAMI, 28, , L, Y., Hernandez-Lobato, J. M., and Turner, R. E. Stochas- tc expectaton propagaton. NIPS 28, pp Nash-Guzman, A. and Holden, S. The generalzed FITC approxmaton. NIPS 20, pp Rhmäk, J., Jylänk, P., and Vehtar, A. Nested expectaton propagaton for Gaussan process classfcaton wth a multnomal probt lkelhood. JMLR, 14, , Snelson, E. and Ghahraman, Z. Sparse Gaussan processes usng pseudo-nputs. NIPS 18, pp , Wllams, C. K. I. and Barber, D. Bayesan classfcaton wth Gaussan processes. IEEE PAMI, 20, , / 22
Gaussian process classification: a message-passing viewpoint
Gaussan process classfcaton: a message-passng vewpont Flpe Rodrgues fmpr@de.uc.pt November 014 Abstract The goal of ths short paper s to provde a message-passng vewpont of the Expectaton Propagaton EP
More informationLogistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton
More informationMLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012
MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationProbabilistic Classification: Bayes Classifiers. Lecture 6:
Probablstc Classfcaton: Bayes Classfers Lecture : Classfcaton Models Sam Rowes January, Generatve model: p(x, y) = p(y)p(x y). p(y) are called class prors. p(x y) are called class condtonal feature dstrbutons.
More informationxp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ
CSE 455/555 Sprng 2013 Homework 7: Parametrc Technques Jason J. Corso Computer Scence and Engneerng SUY at Buffalo jcorso@buffalo.edu Solutons by Yngbo Zhou Ths assgnment does not need to be submtted and
More informationParametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010
Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton
More informationEM and Structure Learning
EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder
More informationSDMML HT MSc Problem Sheet 4
SDMML HT 06 - MSc Problem Sheet 4. The recever operatng characterstc ROC curve plots the senstvty aganst the specfcty of a bnary classfer as the threshold for dscrmnaton s vared. Let the data space be
More informationInstance-Based Learning (a.k.a. memory-based learning) Part I: Nearest Neighbor Classification
Instance-Based earnng (a.k.a. memory-based learnng) Part I: Nearest Neghbor Classfcaton Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n
More informationNatural Images, Gaussian Mixtures and Dead Leaves Supplementary Material
Natural Images, Gaussan Mxtures and Dead Leaves Supplementary Materal Danel Zoran Interdscplnary Center for Neural Computaton Hebrew Unversty of Jerusalem Israel http://www.cs.huj.ac.l/ danez Yar Wess
More information1 Motivation and Introduction
Instructor: Dr. Volkan Cevher EXPECTATION PROPAGATION September 30, 2008 Rce Unversty STAT 63 / ELEC 633: Graphcal Models Scrbes: Ahmad Beram Andrew Waters Matthew Nokleby Index terms: Approxmate nference,
More information8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore
8/5/17 Data Modelng Patrce Koehl Department of Bologcal Scences atonal Unversty of Sngapore http://www.cs.ucdavs.edu/~koehl/teachng/bl59 koehl@cs.ucdavs.edu Data Modelng Ø Data Modelng: least squares Ø
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationMACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression
11 MACHINE APPLIED MACHINE LEARNING LEARNING MACHINE LEARNING Gaussan Mture Regresson 22 MACHINE APPLIED MACHINE LEARNING LEARNING Bref summary of last week s lecture 33 MACHINE APPLIED MACHINE LEARNING
More informationKristin P. Bennett. Rensselaer Polytechnic Institute
Support Vector Machnes and Other Kernel Methods Krstn P. Bennett Mathematcal Scences Department Rensselaer Polytechnc Insttute Support Vector Machnes (SVM) A methodology for nference based on Statstcal
More informationMATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)
1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons
More informationStat 543 Exam 2 Spring 2016
Stat 543 Exam 2 Sprng 2016 I have nether gven nor receved unauthorzed assstance on ths exam. Name Sgned Date Name Prnted Ths Exam conssts of 11 questons. Do at least 10 of the 11 parts of the man exam.
More informationCS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements
CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before
More informationStat 543 Exam 2 Spring 2016
Stat 543 Exam 2 Sprng 206 I have nether gven nor receved unauthorzed assstance on ths exam. Name Sgned Date Name Prnted Ths Exam conssts of questons. Do at least 0 of the parts of the man exam. I wll score
More informationRetrieval Models: Language models
CS-590I Informaton Retreval Retreval Models: Language models Luo S Department of Computer Scence Purdue Unversty Introducton to language model Ungram language model Document language model estmaton Maxmum
More informationClassification as a Regression Problem
Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class
More informationCIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M
CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute
More informationHow its computed. y outcome data λ parameters hyperparameters. where P denotes the Laplace approximation. k i k k. Andrew B Lawson 2013
Andrew Lawson MUSC INLA INLA s a relatvely new tool that can be used to approxmate posteror dstrbutons n Bayesan models INLA stands for ntegrated Nested Laplace Approxmaton The approxmaton has been known
More informationSemi-Supervised Learning
Sem-Supervsed Learnng Consder the problem of Prepostonal Phrase Attachment. Buy car wth money ; buy car wth wheel There are several ways to generate features. Gven the lmted representaton, we can assume
More informationStatistical learning
Statstcal learnng Model the data generaton process Learn the model parameters Crteron to optmze: Lkelhood of the dataset (maxmzaton) Maxmum Lkelhood (ML) Estmaton: Dataset X Statstcal model p(x;θ) (θ parameters)
More informationMotion Perception Under Uncertainty. Hongjing Lu Department of Psychology University of Hong Kong
Moton Percepton Under Uncertanty Hongjng Lu Department of Psychology Unversty of Hong Kong Outlne Uncertanty n moton stmulus Correspondence problem Qualtatve fttng usng deal observer models Based on sgnal
More informationSingular Value Decomposition: Theory and Applications
Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real
More informationStatistical analysis using matlab. HY 439 Presented by: George Fortetsanakis
Statstcal analyss usng matlab HY 439 Presented by: George Fortetsanaks Roadmap Probablty dstrbutons Statstcal estmaton Fttng data to probablty dstrbutons Contnuous dstrbutons Contnuous random varable X
More informationSparse Gaussian Processes Using Backward Elimination
Sparse Gaussan Processes Usng Backward Elmnaton Lefeng Bo, Lng Wang, and Lcheng Jao Insttute of Intellgent Informaton Processng and Natonal Key Laboratory for Radar Sgnal Processng, Xdan Unversty, X an
More informationOn an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1
On an Extenson of Stochastc Approxmaton EM Algorthm for Incomplete Data Problems Vahd Tadayon Abstract: The Stochastc Approxmaton EM (SAEM algorthm, a varant stochastc approxmaton of EM, s a versatle tool
More informationGaussian Conditional Random Field Network for Semantic Segmentation - Supplementary Material
Gaussan Condtonal Random Feld Networ for Semantc Segmentaton - Supplementary Materal Ravtea Vemulapall, Oncel Tuzel *, Mng-Yu Lu *, and Rama Chellappa Center for Automaton Research, UMIACS, Unversty of
More informationMaximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models
ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models
More informationMaximum Likelihood Estimation
Maxmum Lkelhood Estmaton INFO-2301: Quanttatve Reasonng 2 Mchael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 1 of 9 Why MLE?
More informationMaximum Likelihood Estimation (MLE)
Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y
More informationA quantum-statistical-mechanical extension of Gaussian mixture model
A quantum-statstcal-mechancal extenson of Gaussan mxture model Kazuyuk Tanaka, and Koj Tsuda 2 Graduate School of Informaton Scences, Tohoku Unversty, 6-3-09 Aramak-aza-aoba, Aoba-ku, Senda 980-8579, Japan
More informationEEE 241: Linear Systems
EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they
More informationDifferentiating Gaussian Processes
Dfferentatng Gaussan Processes Andrew McHutchon Aprl 17, 013 1 Frst Order Dervatve of the Posteror Mean The posteror mean of a GP s gven by, f = x, X KX, X 1 y x, X α 1 Only the x, X term depends on the
More informationEnsemble Methods: Boosting
Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement
More informationHidden Markov Models
CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte
More informationLinear Classification, SVMs and Nearest Neighbors
1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush
More informationOther NN Models. Reinforcement learning (RL) Probabilistic neural networks
Other NN Models Renforcement learnng (RL) Probablstc neural networks Support vector machne (SVM) Renforcement learnng g( (RL) Basc deas: Supervsed dlearnng: (delta rule, BP) Samples (x, f(x)) to learn
More informationDepartment of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING
MACHINE LEANING Vasant Honavar Bonformatcs and Computatonal Bology rogram Center for Computatonal Intellgence, Learnng, & Dscovery Iowa State Unversty honavar@cs.astate.edu www.cs.astate.edu/~honavar/
More informationFinite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin
Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of
More informationSupport Vector Machines
CS 2750: Machne Learnng Support Vector Machnes Prof. Adrana Kovashka Unversty of Pttsburgh February 17, 2016 Announcement Homework 2 deadlne s now 2/29 We ll have covered everythng you need today or at
More informationSystem Identification of Nonlinear State-Space Models with Linearly Dependent Unknown Parameters Based on Variational Bayes
SICE Journal of Control Measurement and System Integraton Vol. 11 No. 6 pp. 456 462 November 2018 System Identfcaton of Nonlnear State-Space Models wth Lnearly Dependent Unknown Parameters Based on Varatonal
More informationLogistic Regression Maximum Likelihood Estimation
Harvard-MIT Dvson of Health Scences and Technology HST.951J: Medcal Decson Support, Fall 2005 Instructors: Professor Lucla Ohno-Machado and Professor Staal Vnterbo 6.873/HST.951 Medcal Decson Support Fall
More informationDr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur
Analyss of Varance and Desgn of Experments- MODULE LECTURE - 6 EXPERMENTAL DESGN MODELS Dr. Shalabh Department of Mathematcs and Statstcs ndan nsttute of Technology Kanpur Two-way classfcaton wth nteractons
More informationMaximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models
ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Mamum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models for
More informationA Robust Method for Calculating the Correlation Coefficient
A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal
More informationIntroduction to Hidden Markov Models
Introducton to Hdden Markov Models Alperen Degrmenc Ths document contans dervatons and algorthms for mplementng Hdden Markov Models. The content presented here s a collecton of my notes and personal nsghts
More informationCS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015
CS 3710: Vsual Recognton Classfcaton and Detecton Adrana Kovashka Department of Computer Scence January 13, 2015 Plan for Today Vsual recognton bascs part 2: Classfcaton and detecton Adrana s research
More informationGlobal Gaussian approximations in latent Gaussian models
Global Gaussan approxmatons n latent Gaussan models Botond Cseke Aprl 9, 2010 Abstract A revew of global approxmaton methods n latent Gaussan models. 1 Latent Gaussan models In ths secton we ntroduce notaton
More informationCollaborative Multi-Output Gaussian Processes for Collections of Sparse Multivariate Time Series
Collaboratve Mult-Output Gaussan Processes for Collectons of Sparse Multvarate Tme Seres Steven Cheng-Xan L Benamn Marln College of Informaton & Computer Scences Uversty of Massachusetts Amherst {cxl,marln}@cs.umass.edu
More informationGenerative classification models
CS 675 Intro to Machne Learnng Lecture Generatve classfcaton models Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Data: D { d, d,.., dn} d, Classfcaton represents a dscrete class value Goal: learn
More informationLogistic Classifier CISC 5800 Professor Daniel Leeds
lon 9/7/8 Logstc Classfer CISC 58 Professor Danel Leeds Classfcaton strategy: generatve vs. dscrmnatve Generatve, e.g., Bayes/Naïve Bayes: 5 5 Identfy probablty dstrbuton for each class Determne class
More informationSupport Vector Machines
Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x n class
More informationAn R implementation of bootstrap procedures for mixed models
The R User Conference 2009 July 8-10, Agrocampus-Ouest, Rennes, France An R mplementaton of bootstrap procedures for mxed models José A. Sánchez-Espgares Unverstat Poltècnca de Catalunya Jord Ocaña Unverstat
More informationOutline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline
Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number
More informationCSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing
CSC321 Tutoral 9: Revew of Boltzmann machnes and smulated annealng (Sldes based on Lecture 16-18 and selected readngs) Yue L Emal: yuel@cs.toronto.edu Wed 11-12 March 19 Fr 10-11 March 21 Outlne Boltzmann
More informationThe EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X
The EM Algorthm (Dempster, Lard, Rubn 1977 The mssng data or ncomplete data settng: An Observed Data Lkelhood (ODL that s a mxture or ntegral of Complete Data Lkelhoods (CDL. (1a ODL(;Y = [Y;] = [Y,][
More informationClustering gene expression data & the EM algorithm
CG, Fall 2011-12 Clusterng gene expresson data & the EM algorthm CG 08 Ron Shamr 1 How Gene Expresson Data Looks Entres of the Raw Data matrx: Rato values Absolute values Row = gene s expresson pattern
More informationDiscriminative classifier: Logistic Regression. CS534-Machine Learning
Dscrmnatve classfer: Logstc Regresson CS534-Machne Learnng robablstc Classfer Gven an nstance, hat does a probablstc classfer do dfferentl compared to, sa, perceptron? It does not drectl predct Instead,
More informationMarkov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement
Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs
More informationStructured Perceptrons & Structural SVMs
Structured Perceptrons Structural SVMs 4/6/27 CS 59: Advanced Topcs n Machne Learnng Recall: Sequence Predcton Input: x = (x,,x M ) Predct: y = (y,,y M ) Each y one of L labels. x = Fsh Sleep y = (N, V)
More informationMAXIMUM A POSTERIORI TRANSDUCTION
MAXIMUM A POSTERIORI TRANSDUCTION LI-WEI WANG, JU-FU FENG School of Mathematcal Scences, Peng Unversty, Bejng, 0087, Chna Center for Informaton Scences, Peng Unversty, Bejng, 0087, Chna E-MIAL: {wanglw,
More informationGaussian Mixture Models
Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous
More informationC4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )
C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z
More informationADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING
1 ADVANCED ACHINE LEARNING ADVANCED ACHINE LEARNING Non-lnear regresson technques 2 ADVANCED ACHINE LEARNING Regresson: Prncple N ap N-dm. nput x to a contnuous output y. Learn a functon of the type: N
More informationClassification learning II
Lecture 8 Classfcaton learnng II Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Logstc regresson model Defnes a lnear decson boundar Dscrmnant functons: g g g g here g z / e z f, g g - s a logstc functon
More informationMultilayer Perceptron (MLP)
Multlayer Perceptron (MLP) Seungjn Cho Department of Computer Scence and Engneerng Pohang Unversty of Scence and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjn@postech.ac.kr 1 / 20 Outlne
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationPredictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore
Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.
More informationOnline Classification: Perceptron and Winnow
E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng
More informationP R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /
Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons
More informationRelevance Vector Machines Explained
October 19, 2010 Relevance Vector Machnes Explaned Trstan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introducton Ths document has been wrtten n an attempt to make Tppng s [1] Relevance Vector Machnes
More information13 Principal Components Analysis
Prncpal Components Analyss 13 Prncpal Components Analyss We now dscuss an unsupervsed learnng algorthm, called Prncpal Components Analyss, or PCA. The method s unsupervsed because we are learnng a mappng
More informationMatrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD
Matrx Approxmaton va Samplng, Subspace Embeddng Lecturer: Anup Rao Scrbe: Rashth Sharma, Peng Zhang 0/01/016 1 Solvng Lnear Systems Usng SVD Two applcatons of SVD have been covered so far. Today we loo
More informationArtificial Intelligence Bayesian Networks
Artfcal Intellgence Bayesan Networks Adapted from sldes by Tm Fnn and Mare desjardns. Some materal borrowed from Lse Getoor. 1 Outlne Bayesan networks Network structure Condtonal probablty tables Condtonal
More informationSupport Vector Machines
/14/018 Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x
More informationKernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan
Kernels n Support Vector Machnes Based on lectures of Martn Law, Unversty of Mchgan Non Lnear separable problems AND OR NOT() The XOR problem cannot be solved wth a perceptron. XOR Per Lug Martell - Systems
More informationLINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity
LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have
More informationHomework Assignment 3 Due in class, Thursday October 15
Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.
More informationDiscriminative classifier: Logistic Regression. CS534-Machine Learning
Dscrmnatve classfer: Logstc Regresson CS534-Machne Learnng 2 Logstc Regresson Gven tranng set D stc regresson learns the condtonal dstrbuton We ll assume onl to classes and a parametrc form for here s
More informationSTATS 306B: Unsupervised Learning Spring Lecture 10 April 30
STATS 306B: Unsupervsed Learnng Sprng 2014 Lecture 10 Aprl 30 Lecturer: Lester Mackey Scrbe: Joey Arthur, Rakesh Achanta 10.1 Factor Analyss 10.1.1 Recap Recall the factor analyss (FA) model for lnear
More informationEngineering Risk Benefit Analysis
Engneerng Rsk Beneft Analyss.55, 2.943, 3.577, 6.938, 0.86, 3.62, 6.862, 22.82, ESD.72, ESD.72 RPRA 2. Elements of Probablty Theory George E. Apostolaks Massachusetts Insttute of Technology Sprng 2007
More informationExpectation propagation
Expectaton propagaton Lloyd Ellott May 17, 2011 Suppose p(x) s a pdf and we have a factorzaton p(x) = 1 Z n f (x). (1) =1 Expectaton propagaton s an nference algorthm desgned to approxmate the factors
More informationMean Field / Variational Approximations
Mean Feld / Varatonal Appromatons resented by Jose Nuñez 0/24/05 Outlne Introducton Mean Feld Appromaton Structured Mean Feld Weghted Mean Feld Varatonal Methods Introducton roblem: We have dstrbuton but
More informationSupplementary Material: Learning Structured Weight Uncertainty in Bayesian Neural Networks
Shengyang Sun, Changyou Chen, Lawrence Carn Suppementary Matera: Learnng Structured Weght Uncertanty n Bayesan Neura Networks Shengyang Sun Changyou Chen Lawrence Carn Tsnghua Unversty Duke Unversty Duke
More informationSpace of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics
/7/7 CSE 73: Artfcal Intellgence Bayesan - Learnng Deter Fox Sldes adapted from Dan Weld, Jack Breese, Dan Klen, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer What s Beng Learned? Space
More informationStatistical pattern recognition
Statstcal pattern recognton Bayes theorem Problem: decdng f a patent has a partcular condton based on a partcular test However, the test s mperfect Someone wth the condton may go undetected (false negatve
More informationLearning with Maximum Likelihood
Learnng wth Mamum Lelhood Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm,
More informationLatent Autoregressive Gaussian Processes Models for Robust System Identification
Preprnt, 11th IFAC Symposum on Dynamcs and Control of Process Systems, ncludng Bosystems Latent Autoregressve Gaussan Processes Models for Robust System Identfcaton César Lncoln C. Mattos Andreas Damanou
More information1 Convex Optimization
Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,
More informationExcess Error, Approximation Error, and Estimation Error
E0 370 Statstcal Learnng Theory Lecture 10 Sep 15, 011 Excess Error, Approxaton Error, and Estaton Error Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton So far, we have consdered the fnte saple
More informationMeshless Surfaces. presented by Niloy J. Mitra. An Nguyen
Meshless Surfaces presented by Nloy J. Mtra An Nguyen Outlne Mesh-Independent Surface Interpolaton D. Levn Outlne Mesh-Independent Surface Interpolaton D. Levn Pont Set Surfaces M. Alexa, J. Behr, D. Cohen-Or,
More informationBAYESIAN CURVE FITTING USING PIECEWISE POLYNOMIALS. Dariusz Biskup
BAYESIAN CURVE FITTING USING PIECEWISE POLYNOMIALS Darusz Bskup 1. Introducton The paper presents a nonparaetrc procedure for estaton of an unknown functon f n the regresson odel y = f x + ε = N. (1) (
More informationLimited Dependent Variables and Panel Data. Tibor Hanappi
Lmted Dependent Varables and Panel Data Tbor Hanapp 30.06.2010 Lmted Dependent Varables Dscrete: Varables that can take onl a countable number of values Censored/Truncated: Data ponts n some specfc range
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More information