Bayesian Deep Learning
|
|
- Reynold Berry
- 5 years ago
- Views:
Transcription
1 Bayesian Deep Learning Mohammad Emtiyaz Khan RIKEN Center for AI Project, Tokyo
2 Keywords Bayesian Statistics Gaussian distribution, Bayes rule Continues Optimization Gradient descent, Least-squares. Deep Learning Stochastic gradient descent, RMSprop Information Geometry Riemannian Manifold, Natural Gradients 2
3 The Goal of My Research To understand the fundamental principles of learning from data and use them to develop algorithms that can learn like living beings. 3
4 Learning by exploring at the age of 6 months
5 Converged at the age of 12 months
6 Transfer Learning at 14 months 6
7 The Goal of My Research To understand the fundamental principles of learning from data and use them to develop algorithms that can learn like living beings. 7
8 Bayesian Inference Compute the posterior distribution Instead of just a point estimate (e.g. MLE). A natural representation of all the past information which can then be sequentially updated with new information useful for active learning, sequential experiment design, continual learning, RL. But also for global optimization, causality, etc. Eventually, for ML methods which can learn like humans (data efficient, robust, causal). 8
9 Uncertainty in Deep Learning To estimate the confidence in the predictions of a deep-learning system 9
10 Example: Which is a Better Fit? Frequency 57% Red 43% Blue Magnitude of Earthquake Real data from Tohoku (Japan). Example taken from Nate Silver s book The signal and noise 10
11 Example: Which is a Better Fit? Frequency Magnitude of Earthquake When the data is scarce and noisy, e.g., in medicine, and robotics.
12 Uncertainty for Image Segmentation Image (a) Input Image Input Image (a)(a) Input Image Truth Prediction (b) Ground Truth Semantic Ground Truth (c) Semantic (b)(b) Ground Truth (c)(c) Semantic Segmentation Segmentation Segmentation Uncertainty (d) Aleatoric (d) Aleatoric (d) Aleatoric Uncertainty Uncertainty Uncertainty (e)(e) Epistemic (e) Epistemic Epistemic Uncertainty Uncertainty Uncertainty Figure 1: Illustrating the difference between aleatoric and epistemic uncertainty for semantic segmentation Figure Illustrating the difference between aleatoric and epistemic uncertainty for semantic segmentation Figure 1:1: Illustrating the difference between aleatoric and epistemic uncertainty for semantic segmentation (taken from Kendall et al. 2017) on the CamVid dataset [8]. Aleatoric uncertainty captures noise inherent ininin the observations. InInIn (d) our model the CamVid dataset [8]. Aleatoric uncertainty captures noise inherent the observations. (d) our model onon the CamVid dataset [8]. Aleatoric uncertainty captures noise inherent the observations. (d) our model 12 exhibits increased aleatoric uncertainty on object boundaries and for objects far from the camera. exhibits increased aleatoric uncertainty object boundaries and for objects far from the camera.epistemic Epistemic exhibits increased aleatoric uncertainty onon object boundaries and for objects far from the camera. Epistemic
13 Variational Inference (VI) Approximate the posterior using optimization Popular in reinforcement learning, unsupervised learning, online learning, active learning etc. We need accurate VI algorithms that are general (apply to many models), scalable (for large data and models), fast (converge quickly), simple (easy to implement). This talk: New algorithms with such features. 13
14 Gradient vs Natural-Gradient Gradient Descent (GD) Rely on stochastic and automatic gradients. Simple, general, and scalable, but can have suboptimal convergence. Practical VI (2011), Black-box VI (2014), Bayes by backprop (2015), ADVI (2015), and many more. Natural-Gradient Descent (NGD) Fast convergence, but computationally difficult, therefore not simple, general, and scalable (Sato (2001), Riemannian CG (2010), Stochastic VI (2013), etc. Fast and simple NGD for complex models, such as those containing deep networks. 14
15 Talk Outline Variational Inference with gradient descent and natural-gradient descent. NGD with Conjugate-Computation VI Generalization of forward-backward algorithms, SVI, Message Passing (AIstats 2017). Deep Nets (ICML 2018, NeurIPS 2018). Generalizations and Extensions, Structured VAEs (ICLR 2018), Mixture of Exponential Family approximations, Evolution strategy (ArXiv 2017), etc. 15
16 Variational Inference Gradient Descent (GD) Vs Natural-Gradient Descent (NGD) 16
17 A Naïve Method Data Output Input Parameters Neural network Generate Prior distribution 17
18 Bayesian Inference Bayes rule : Narrow Posterior distribution Intractable integral Wide 18
19 Parameters Variational Inference Data Intractable integral Variational Approximation Natural parameters Maximize the Evidence Lower Bound (ELBO): 19
20 Relationship to Other Fields Perturbation to avoid local minima. Gaussian homotopy and Continuation method. Smoothed/graduated optimization. Online learning. Exponentiated weighted averaging. Reinforcement learning. Structured distribution. q is the policy x environment (Levin 2018). 20
21 Gradient Descent Maximize the Evidence Lower Bound (ELBO): Gradient descent (GD) : 21 21
22 VI with Natural-Gradient Descent Sato 2001, Honkela et al. 2010, Hoffman et.al NGD: Natural Gradient Fisher Information Matrix (FIM) Fast convergence due to optimization in Riemannian manifold (not Euclidean space). But requires additional computations. Can we simplify/reduce this computation? 22
23 Can we simplify NGD computation? Yes, by using algorithms such as message passing/ backprop. Conjugate-Computation VI Khan and Lin, AI-STATS
24 The key idea: Expectation Parameters Expectation/moment/ mean parameters Sufficient statistics For Gaussians, it s mean and correlation matrix A key relationship: Natural Gradient wrt natural parameter Gradient wrt expectation parameter NGD : 24
25 Conjugate-Computation VI (CVI) In a conjugate model, this is equivalent to simply adding the natural parameters of the factors of a model. This is a type of conjugate computation, and enables simple updates for complex models. 25
26 <latexit sha1_base64="rx8iynjebji9kltelka+f3dm0qq=">aaacixicbvdlsgnbeoz1bxxfpqoykiiihl0vepkaf48kxgsymfrojsngzo4y0yue4ifkiok/4swdit7en3gs9rafbqnfvtu9xvgqpcxf//amjqemz2bn5gsli0vlk8xvtsubzialck9uymorwqfklcoksylaagtqsilqdhm69ku3wlizxjfus0vdyyewbcmrnnqshu/2dmohdqxh3nviscrgblbpwg5qjswx8kdom8vtv+spwp6s4jtslzchgwcaog8w38nwwjmtyuikra0hfkqnphqsxim7qphzksk/wy6ooxqjfrbrh114x3ac0mltxlgxexup4xn91nb2dossgqlrf3td8t+vnlh7ungxczqrihm+qj0prgkb1sva0ghoqucicipdxxnvokfortsckyh4ffjfcnvycvxscohaoiecc7abw7alarxbgc7ghcra4r6e4avevufv2xvz3vpohpc9sw4/4h1+axsopk0=</latexit> <latexit sha1_base64="nzi+y9dk5ea4/asyelzd4abbz9m=">aaacixicbvdlsgmxfm3ud31vxqoslejfldnu7eoeny4v7am6tdxj0zy0mrmso0ip/rbdupfx3lhqxj34m6adlmrrgcdhnho5useiptdout9ozmfxaxllds27vrg5tz3b2a2ykngml1kki10lwhapql5ggzlxys1bbzjxg97vyk8+cg1efn5hp+ynbz1qtaudtfizvyr0t2s+djnc8b2puuynbhpc/q4obtqv0kdkm7m8w3thoppem5d85chjce83zdyx34pyoniitiixdc+nsteajyjjpsz6ieexsb50en3sebq3jch4wie9skqltintx4h0re5pdeaz01ebtsrarpn1ruj/xj3bdqkxeggcia9zuqidsiorhdvfw0jzhrjvctat7f8p64ighrburc3bmz15nltoip5b9g5tgxckxsrzj4ekqdxyti7jnbkhzclim3kl7+tdexhene/nk41mnmnmhvkd5+cx0cumcg==</latexit> <latexit sha1_base64="nzi+y9dk5ea4/asyelzd4abbz9m=">aaacixicbvdlsgmxfm3ud31vxqoslejfldnu7eoeny4v7am6tdxj0zy0mrmso0ip/rbdupfx3lhqxj34m6adlmrrgcdhnho5useiptdout9ozmfxaxllds27vrg5tz3b2a2ykngml1kki10lwhapql5ggzlxys1bbzjxg97vyk8+cg1efn5hp+ynbz1qtaudtfizvyr0t2s+djnc8b2puuynbhpc/q4obtqv0kdkm7m8w3thoppem5d85chjce83zdyx34pyoniitiixdc+nsteajyjjpsz6ieexsb50en3sebq3jch4wie9skqltintx4h0re5pdeaz01ebtsrarpn1ruj/xj3bdqkxeggcia9zuqidsiorhdvfw0jzhrjvctat7f8p64ighrburc3bmz15nltoip5b9g5tgxckxsrzj4ekqdxyti7jnbkhzclim3kl7+tdexhene/nk41mnmnmhvkd5+cx0cumcg==</latexit> <latexit sha1_base64="8pef7szy+c83gda6d/6vh96xwme=">aaacixicbvdlsgmxfm34rpvvdekmwiskwgbc2juu3lisyb/qqevomrahycyq3bfk6a+48vfcufcko/fntdtd1nydgcm553jztxblydb1v5219y3nre3mtnz3b//gmhd0xdnrohmvskhguhga4vkeviocjw/emomkjk8hg7upx3/m2ogofmrhzfskeqhocgzopxauvbhenxzsc4sljx+jmc4i9jl6pvakacqkgzs3c3m36m5av4k3j3kyr6wdm/idicwkh8gkgnp03bhbi9aomotjrj8yhgmbqi83lq1bcdmazs4c03ordgg30vafsgfq4sqildfdfdikauybzw8q/uc1e+ywwimrxgnykkwluomkgnfpxbqjngcoh5ya08l+lbi+agbos83aerzlk1dj7brouuxvwc2xb+d1zmgposmf4pebuib3pekqhjex8ky+ykfz6rw7x84kja4585kt8gfozy+avkj9</latexit> CVI on Bayesian Linear Regression likelihood prior (y X ) > (y X )+ > approx Expectation params Natural Gradient 26
27 NGD == Newton s Method Least-square solution For rho=1, converges in 1 step (Newton s method). Gradient descent is suboptimal: This property generalizes to all conjugate models, where forward-backward algorithm returns the natural-gradients of ELBO. 27
28 Conditionally-Conjugate Models VMP: Sequential update with rho =1 SVI: Update local variable with rho=1 and global variable with rho in (0,1) For CVI, rho can follow any schedule, and updates can be sequential or parallel. Local Global Data Images taken from Hoffman et al. (2013) and 28
29 <latexit sha1_base64="m+6etnltu37ozbs8uvh48gttv8w=">aaacdxicbvfda9rafj3er7p+rr8pgghjq1itbpn90cdieruukrhtyscnn7otznjjbzm3whlzd/x1vvnmb/dfv292t6ctfy4czjl37syzpdbayrd88pwlfy9dvrjxdxdt+o2bt4a37xy6qrfstwrlknucgfngl2qcgo06rq2cijhqkjnv9/rrf2wdrsrpukhvvebw6lrlqkli4tdraozj0r7puhitsykxx/m2mhtcdoi5f8fpctvfctv+bffyvcltyxjpj0hxfadio6kf2y4/7mdb1wowdq7xczjix3fwohrozwwcjlv246mdnlud5rjfw61gfcylnwfhgmztvf15j1edxmpvylbjplalsgpotcogxqgfi1oa1q1e41qncg6zmhisovauapepdfwjmtoevpa6rl5k/55ooxbuustk7dnyz7we/j82btb9fbw6rbtupvwtshvdsel9f/cztkqiwraaatxdlcsckb2kjxpqcohzj58hh+nrgizct5tge7aqdfaabbjtfrkxbi+9ywdswit75d33hnmb3m//of/yf7qy+t565i77p/zdp+o+vu8=</latexit> <latexit sha1_base64="0nzoxzhtfllmbxvsa/3nkkzchy4=">aaacdxicbvfdaxqxfm2mwtvv2tw+cclevqufuj3zf30sflfboylbfjbt4u42mxm280fyr1ji/ip+ot988zf44pngzncl2nrhwugcc3otk6rw0maqfpf8gzdvrdxexevdubt+b6n//8gjqrrnxyhxqtjncrihzclgkfgjs1olkbiltpppuaeffhhaykr8jlnaravkpuwlb3ru3l9gbwcejpzns9lrmy0p+0p3mhintcce0hf0etvpftjs3qom88qzzodzf+r00xwkbjtvwo3ww1ectjabqnoi95wrw7q/fdkzmivgfnjaj5cot1nllmco7m8hg2be9doil2d78o2pzd/rk7pjup+ntsrefkjerscycrjugfnqklksby81rttap5cjsymlfmjedp5as585zkltsrsukc7zvycsfmbmisq5u4zmva0j/6eng0xfrvawdyoi5itfaamovrt7ajqrwnbumweaa+nusnkolh10h9vziyrxn3wdnawhytaip7k03pnfrzjhzivskjc8jifkhtkmi8ljt++h98tb8n75j/2n/vof1fewm5vkn/ip/gdxf77x</latexit> <latexit sha1_base64="0nzoxzhtfllmbxvsa/3nkkzchy4=">aaacdxicbvfdaxqxfm2mwtvv2tw+cclevqufuj3zf30sflfboylbfjbt4u42mxm280fyr1ji/ip+ot988zf44pngzncl2nrhwugcc3otk6rw0maqfpf8gzdvrdxexevdubt+b6n//8gjqrrnxyhxqtjncrihzclgkfgjs1olkbiltpppuaeffhhaykr8jlnaravkpuwlb3ru3l9gbwcejpzns9lrmy0p+0p3mhintcce0hf0etvpftjs3qom88qzzodzf+r00xwkbjtvwo3ww1ectjabqnoi95wrw7q/fdkzmivgfnjaj5cot1nllmco7m8hg2be9doil2d78o2pzd/rk7pjup+ntsrefkjerscycrjugfnqklksby81rttap5cjsymlfmjedp5as585zkltsrsukc7zvycsfmbmisq5u4zmva0j/6eng0xfrvawdyoi5itfaamovrt7ajqrwnbumweaa+nusnkolh10h9vziyrxn3wdnawhytaip7k03pnfrzjhzivskjc8jifkhtkmi8ljt++h98tb8n75j/2n/vof1fewm5vkn/ip/gdxf77x</latexit> <latexit sha1_base64="lkyi1kvktuess2rpnhro0d8sr40=">aaacdxicbvfdaxqxfm2mx3x92ryvggixq9ja3m7ss30slykcqgw3lwymw51szizs5opkjrde+qf+ot/6n3zx1cx2ctp64clhnhnzk5okvtjgefx4/q3bd+7ew7s/epdw0emnw/wne1m1mospr1slzxiwqslstfgieme1flakspwmi6nop/0mtjfv+rwxtygkyeqzsg7oqhj4gxwaezly9y1lhzkbufadbjpltphdvkbv6rw2i92wptt7toevm51pvv7i6alrfn1oqohbyaejoggta1xnel9xrmzpbi9yzmrwwpmktz+vhg6zllmoutwcbengvfqmchswin0dx8ofbf7xphalcgxgzmkgxsicrsmvaaesmaigvobmzbwsoramsqvuwvrkmxoavtp1ixtf/j1hotbmwsto2wvkrmsd+t9t1md6lrkyrbsujb9cldakykw7l6bzqqvhtxqaujburptn4njb91edf0j4/ck3wclkhabj8eswovjyx7fgnpetsk1csk8oyadytkaek1/eu++ft+x99p/7l/3xl1bf62c2yt/l7/0b4q+71a==</latexit> Convergence Rates for CVI Lipschitz constant of (nonconvex) ELBO h E k( k k+1 )/ k 2i apple apple 2LC0 2 t Gradient noise variance + c 2 M Strong convexity of the Fisher Information Matrix Mini-batch size See Khan et al. UAI The proof is based on Ghadimi, Lan, and Zhang (2014) 29
30 NGD for Deep Learning Using CVI on Bayesian deep learning with Gaussian approximation. Reduces to a Newton step. 30
31 CVI for Bayesian Neural Network likelihood prior approx neural network Back-propagated gradient & Hessian 31
32 CVI for Bayesian Neural Network Back-propagated gradient & Hessian 32
33 Variational Adam for Mean-Field Adaptive learning-rate method (e.g., Adam) Variational Adam (Vadam) for gamma =1 ICML 2018 Approximate the Hessian by square of gradients. Mean 1. Select a minibatch Variance 2. Compute gradient using backpropagation 3. Compute a scale vector to adapt the learning rate 4. Take a gradient step 33
34 Illustration: Classification Logistic regression (30 data points, 2 dimensional input). Sampled from Gaussian mixture with 2 components 34
35 Adam vs Vadam (on Logistic-Reg) Adam Vadam (mean) Vadam (samples) M = 5, Rho = 0.01, Gamma =
36 Adam vs Vadam (on Neural Nets) Adam Vadam (mean) Vadam (samples) (By Runa E.) 36
37 LeNet-5 on CIFAR10 VOGN Adam Log Loss Error (By Anirudh Jain) 37
38 Faster, Simpler, and More Robust Regression on Australian-Scale dataset using deep neural nets for various number of minibatch size. Existing Method (BBVI) Our method (Vadam) Our method (VOGN) 38
39 Faster, Simpler, and More Robust Results on MNIST digit classification (for various values of Gaussian prior precision parameter) Existing Method (BBVI) Our method (Vadam) 39
40 Parameter-Space Noise for Deep RL On OpenAI Gym Cheetah with DDPG with DNN with [400,300] ReLU Vadam(noise using natural-gradients) Reward 5264 SGD (noise using standard gradients) SGD (no noise) Reward 2038 Ruckstriesh et.al.2010, Fortunato et.al. 2017, Plapper et.al
41 Avoiding Local Minima An example taken from Casella and Robert s book. Vadam reaches the flat minima, but GD gets stuck at a local minima. Optimization by smoothing, Gaussian homotopy/blurring etc., Entropy SGLD 41
42 Stochastic, Low-Rank, Approximate, Natural-Gradient (SLANG) NeurIPS 2018 Low-rank + diagonal covariance matrix. SLANG is linear in D! Low-Rank + diagonal D L L D D M M D D L L D gradient gradient fast_eig + = 42
43 SLANG is Faster than GD Classification on USPS with BNNs 43
44 Generalization and Extensions 44
45 CLR 2018 Published as a conference Deep Nets + Graphical Models CLR 2018 Neural Nets + GMM Latent mixture Neural Nets + Linear Dynamical System Latent state-space model PGM PGM x1 x2 Latent state-space model zn x x x x PGMx NN NN 2 y1 y2 y3 y4 x1 x2 x3 x4 (c) Generative Model PGM y1 y2 x1 y2 y3 y4 x4 NN xnny3 x2 x3 y4 PGM x4 (d) SIN Model (a) Generative y NN y1 3 yn NN PGM PGM PGM xn NN NN PG Latent mixtur PGM PGM zn y1 n y2 N y3 NN y4 Figure 1: Fig. and (c) s examples of generative models that combine deep(a) models with (c) Generative Model (d) SIN Fig. (a) Generative Model PGMs, while (b) and 45
46 Amortized Inference on VAE + Probabilistic Graphical Models (PGM) ICLR 2018 Graphical model + Deep Model Structured Inference Network Decoder Encoder Backprop on DNN, and forward-backward on PGM. 46
47 Going Beyond Exponential Family Fast and Simple NGD for approximations outside exponential family, Scale mixture of Gaussians, e.g., T-distribution, Finite mixture of Gaussian, Matrix Variate Gaussian, Skew-Gaussians. The updates can be implemented using message passing and back-propagation. 47
48 Summary of the Talk Fast yet simple NGD for VI using Conjugate-Computation VI (AI-STATS 2017), Generalization of forward-backward algorithm, Stochastic VI, Variational Message Passing etc. Beyond conjugacy: Extends fast and simple NGD to deep nets (ICML 2018, NeurIPS 2018). Generalizations and Extensions, VAEs (ICLR 2018), Mixture of Exponential Family, Evolution strategy (ArXiv 2017), etc. 48
49 Related Works Sorry, if I miss some important work! Please me. 49
50 EM, Forward-Backward, and VI Sato (1998), Fast Learning of On-line EM Algorithm. Sato (2001), Online Model Selection Based on the Variational Bayes. Jordan et al. (1999), An Introduction to Variational Methods for Graphical Models. Winn and Bishop (2005), Variational Message Passing. Knowles and Minka (2011), Non-conjugate Variational Message Passing for Multinomial and Binary Regression. 50
51 NGD: Author Name Starting with an H Honkela et al. (2007), Natural Conjugate Gradient in Variational Inference. Honkela et al. (2010), Approximate Riemannian Conjugate Gradient Learning for Fixed-Form Variational Bayes. Hensman et al. (2012), Fast Variational Inference in the Conjugate Exponential Family. Hoffman et al. (2013), Stochastic Variational Inference. 51
52 NGD: Author Name Starting with an S Salimans and Knowles (2013), Fixed-Form Variational Posterior Approximation through Stochastic Linear Regression. Approximate Natural-Gradient steps. Seth and Khardon (2016), Monte Carlo Structured SVI for Two-Level Non-Conjugate Models. Applies to models with two level of hierarchy. Salimbani et al. (2018), Natural Gradients in Practice: Non-Conjugate Variational Inference in Gaussian Process Models. Fast convergence on GP models 52
53 NGD for Bayesian Deep Learning Zhang et al. (2018), Noisy Natural Gradient as Variational Inference For Bayesian deep learning (similar to Variational Adam). 53
54 Issues and Open Problems Automatic natural-gradient computation. Good implementation of message passing. Gradient with respect to covariance matrices. Structured approximation for covariance. Comparisons on really large problems. Applications. Flexible posterior approximations. 54
55 References Available at 55
56 References Available at 56
57 A 5 page review 57
58 Acknowledgement RIKEN AIP Wu Lin (now at UBC), Didrik Nielsen (now at DTU), Voot Tangkaratt, Nicolas Hubacher, Masashi Sugiyama, Sunichi-Amari. Interns at RIKEN AIP Zuozhu Liu (SUTD, Singapore), Aaron Mishkin (UBC), Frederik Kunstner (EPFL). Collaborators Mark Schmidt (UBC), Yarin Gal (University of Oxford), Akash Srivastava (University of Edinburgh), Reza Babanezhad (UBC). 58
59 Thanks! Slides, papers, and code available at 59
Fast yet Simple Natural-Gradient Variational Inference in Complex Models
Fast yet Simple Natural-Gradient Variational Inference in Complex Models Mohammad Emtiyaz Khan RIKEN Center for AI Project, Tokyo, Japan http://emtiyaz.github.io Joint work with Wu Lin (UBC) DidrikNielsen,
More informationFaster Stochastic Variational Inference using Proximal-Gradient Methods with General Divergence Functions
Faster Stochastic Variational Inference using Proximal-Gradient Methods with General Divergence Functions Mohammad Emtiyaz Khan, Reza Babanezhad, Wu Lin, Mark Schmidt, Masashi Sugiyama Conference on Uncertainty
More informationBayesian Deep Learning
Bayesian Deep Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp June 06, 2018 Mohammad Emtiyaz Khan 2018 1 What will you learn? Why is Bayesian inference
More informationTutorial on: Optimization I. (from a deep learning perspective) Jimmy Ba
Tutorial on: Optimization I (from a deep learning perspective) Jimmy Ba Outline Random search v.s. gradient descent Finding better search directions Design white-box optimization methods to improve computation
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationSum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017
Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth
More informationDay 3 Lecture 3. Optimizing deep networks
Day 3 Lecture 3 Optimizing deep networks Convex optimization A function is convex if for all α [0,1]: f(x) Tangent line Examples Quadratics 2-norms Properties Local minimum is global minimum x Gradient
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationMachine Learning Techniques for Computer Vision
Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM
More informationConvergence of Proximal-Gradient Stochastic Variational Inference under Non-Decreasing Step-Size Sequence
Convergence of Proximal-Gradient Stochastic Variational Inference under Non-Decreasing Step-Size Sequence Mohammad Emtiyaz Khan Ecole Polytechnique Fédérale de Lausanne Lausanne Switzerland emtiyaz@gmail.com
More informationReading Group on Deep Learning Session 1
Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular
More informationThe connection of dropout and Bayesian statistics
The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationNormalization Techniques in Training of Deep Neural Networks
Normalization Techniques in Training of Deep Neural Networks Lei Huang ( 黄雷 ) State Key Laboratory of Software Development Environment, Beihang University Mail:huanglei@nlsde.buaa.edu.cn August 17 th,
More informationPILCO: A Model-Based and Data-Efficient Approach to Policy Search
PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationVariational Autoencoder
Variational Autoencoder Göker Erdo gan August 8, 2017 The variational autoencoder (VA) [1] is a nonlinear latent variable model with an efficient gradient-based training procedure based on variational
More informationDeep Learning & Artificial Intelligence WS 2018/2019
Deep Learning & Artificial Intelligence WS 2018/2019 Linear Regression Model Model Error Function: Squared Error Has no special meaning except it makes gradients look nicer Prediction Ground truth / target
More informationOnline Algorithms for Sum-Product
Online Algorithms for Sum-Product Networks with Continuous Variables Priyank Jaini Ph.D. Seminar Consistent/Robust Tensor Decomposition And Spectral Learning Offline Bayesian Learning ADF, EP, SGD, oem
More informationDeep Learning Lab Course 2017 (Deep Learning Practical)
Deep Learning Lab Course 207 (Deep Learning Practical) Labs: (Computer Vision) Thomas Brox, (Robotics) Wolfram Burgard, (Machine Learning) Frank Hutter, (Neurorobotics) Joschka Boedecker University of
More informationProbabilistic & Bayesian deep learning. Andreas Damianou
Probabilistic & Bayesian deep learning Andreas Damianou Amazon Research Cambridge, UK Talk at University of Sheffield, 19 March 2019 In this talk Not in this talk: CRFs, Boltzmann machines,... In this
More informationVariational Inference via Stochastic Backpropagation
Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationAfternoon Meeting on Bayesian Computation 2018 University of Reading
Gabriele Abbati 1, Alessra Tosi 2, Seth Flaxman 3, Michael A Osborne 1 1 University of Oxford, 2 Mind Foundry Ltd, 3 Imperial College London Afternoon Meeting on Bayesian Computation 2018 University of
More informationApprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning
Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire
More informationECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4
ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian
More informationLecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu
Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks Delivered by Mark Ebden With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable
More informationThe Success of Deep Generative Models
The Success of Deep Generative Models Jakub Tomczak AMLAB, University of Amsterdam CERN, 2018 What is AI about? What is AI about? Decision making: What is AI about? Decision making: new data High probability
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationApproximate Bayesian inference
Approximate Bayesian inference Variational and Monte Carlo methods Christian A. Naesseth 1 Exchange rate data 0 20 40 60 80 100 120 Month Image data 2 1 Bayesian inference 2 Variational inference 3 Stochastic
More informationThe K-FAC method for neural network optimization
The K-FAC method for neural network optimization James Martens Thanks to my various collaborators on K-FAC research and engineering: Roger Grosse, Jimmy Ba, Vikram Tankasali, Matthew Johnson, Daniel Duckworth,
More informationNeural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications
Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of
More informationProbabilistic Graphical Models
10-708 Probabilistic Graphical Models Homework 3 (v1.1.0) Due Apr 14, 7:00 PM Rules: 1. Homework is due on the due date at 7:00 PM. The homework should be submitted via Gradescope. Solution to each problem
More informationExpectation Propagation in Dynamical Systems
Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex
More informationHOMEWORK #4: LOGISTIC REGRESSION
HOMEWORK #4: LOGISTIC REGRESSION Probabilistic Learning: Theory and Algorithms CS 274A, Winter 2019 Due: 11am Monday, February 25th, 2019 Submit scan of plots/written responses to Gradebook; submit your
More informationAdvances in Variational Inference
1 Advances in Variational Inference Cheng Zhang Judith Bütepage Hedvig Kjellström Stephan Mandt arxiv:1711.05597v1 [cs.lg] 15 Nov 2017 Abstract Many modern unsupervised or semi-supervised machine learning
More informationDistributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College
Distributed Estimation, Information Loss and Exponential Families Qiang Liu Department of Computer Science Dartmouth College Statistical Learning / Estimation Learning generative models from data Topic
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationBlack-box α-divergence Minimization
Black-box α-divergence Minimization José Miguel Hernández-Lobato, Yingzhen Li, Daniel Hernández-Lobato, Thang Bui, Richard Turner, Harvard University, University of Cambridge, Universidad Autónoma de Madrid.
More informationDeep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016
Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is Variational Inference? What is Variational Inference? Want to estimate some distribution, p*(x) p*(x) What is
More informationECE521 Lectures 9 Fully Connected Neural Networks
ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance
More informationVariational Inference in TensorFlow. Danijar Hafner Stanford CS University College London, Google Brain
Variational Inference in TensorFlow Danijar Hafner Stanford CS 20 2018-02-16 University College London, Google Brain Outline Variational Inference Tensorflow Distributions VAE in TensorFlow Variational
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationAlgorithms for Variational Learning of Mixture of Gaussians
Algorithms for Variational Learning of Mixture of Gaussians Instructors: Tapani Raiko and Antti Honkela Bayes Group Adaptive Informatics Research Center 28.08.2008 Variational Bayesian Inference Mixture
More informationVariational Dropout via Empirical Bayes
Variational Dropout via Empirical Bayes Valery Kharitonov 1 kharvd@gmail.com Dmitry Molchanov 1, dmolch111@gmail.com Dmitry Vetrov 1, vetrovd@yandex.ru 1 National Research University Higher School of Economics,
More informationNatural Gradients via the Variational Predictive Distribution
Natural Gradients via the Variational Predictive Distribution Da Tang Columbia University datang@cs.columbia.edu Rajesh Ranganath New York University rajeshr@cims.nyu.edu Abstract Variational inference
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA
More informationNeutron inverse kinetics via Gaussian Processes
Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques
More informationDistributed Bayesian Learning with Stochastic Natural-gradient EP and the Posterior Server
Distributed Bayesian Learning with Stochastic Natural-gradient EP and the Posterior Server in collaboration with: Minjie Xu, Balaji Lakshminarayanan, Leonard Hasenclever, Thibaut Lienart, Stefan Webb,
More informationCSC411: Final Review. James Lucas & David Madras. December 3, 2018
CSC411: Final Review James Lucas & David Madras December 3, 2018 Agenda 1. A brief overview 2. Some sample questions Basic ML Terminology The final exam will be on the entire course; however, it will be
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationAuto-Encoding Variational Bayes
Auto-Encoding Variational Bayes Diederik P Kingma, Max Welling June 18, 2018 Diederik P Kingma, Max Welling Auto-Encoding Variational Bayes June 18, 2018 1 / 39 Outline 1 Introduction 2 Variational Lower
More informationProbabilistic numerics for deep learning
Presenter: Shijia Wang Department of Engineering Science, University of Oxford rning (RLSS) Summer School, Montreal 2017 Outline 1 Introduction Probabilistic Numerics 2 Components Probabilistic modeling
More informationNon-convex optimization. Issam Laradji
Non-convex optimization Issam Laradji Strongly Convex Objective function f(x) x Strongly Convex Objective function Assumptions Gradient Lipschitz continuous f(x) Strongly convex x Strongly Convex Objective
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationBayesian Semi-supervised Learning with Deep Generative Models
Bayesian Semi-supervised Learning with Deep Generative Models Jonathan Gordon Department of Engineering Cambridge University jg801@cam.ac.uk José Miguel Hernández-Lobato Department of Engineering Cambridge
More informationVariational Autoencoders
Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly
More informationarxiv: v1 [stat.ml] 10 Dec 2017
Sensitivity Analysis for Predictive Uncertainty in Bayesian Neural Networks Stefan Depeweg 1,2, José Miguel Hernández-Lobato 3, Steffen Udluft 2, Thomas Runkler 1,2 arxiv:1712.03605v1 [stat.ml] 10 Dec
More informationIntroduction to Convolutional Neural Networks (CNNs)
Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationGaussian with mean ( µ ) and standard deviation ( σ)
Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (
More informationCS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes
CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationLast updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory
More informationLearning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014
Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of
More informationECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning
ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Summary of Class Advanced Topics Dhruv Batra Virginia Tech HW1 Grades Mean: 28.5/38 ~= 74.9%
More informationDeep Generative Models
Deep Generative Models Durk Kingma Max Welling Deep Probabilistic Models Worksop Wednesday, 1st of Oct, 2014 D.P. Kingma Deep generative models Transformations between Bayes nets and Neural nets Transformation
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationPROBABILISTIC PROGRAMMING: BAYESIAN MODELLING MADE EASY. Arto Klami
PROBABILISTIC PROGRAMMING: BAYESIAN MODELLING MADE EASY Arto Klami 1 PROBABILISTIC PROGRAMMING Probabilistic programming is to probabilistic modelling as deep learning is to neural networks (Antti Honkela,
More informationVariational Dropout and the Local Reparameterization Trick
Variational ropout and the Local Reparameterization Trick iederik P. Kingma, Tim Salimans and Max Welling Machine Learning Group, University of Amsterdam Algoritmica University of California, Irvine, and
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationHow to do backpropagation in a brain
How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep
More informationComments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms
Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:
More informationLatent Variable Models
Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationStein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm
Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu and Dilin Wang NIPS 2016 Discussion by Yunchen Pu March 17, 2017 March 17, 2017 1 / 8 Introduction Let x R d
More informationPROBABILISTIC PROGRAMMING: BAYESIAN MODELLING MADE EASY
PROBABILISTIC PROGRAMMING: BAYESIAN MODELLING MADE EASY Arto Klami Adapted from my talk in AIHelsinki seminar Dec 15, 2016 1 MOTIVATING INTRODUCTION Most of the artificial intelligence success stories
More informationVariational Autoencoders (VAEs)
September 26 & October 3, 2017 Section 1 Preliminaries Kullback-Leibler divergence KL divergence (continuous case) p(x) andq(x) are two density distributions. Then the KL-divergence is defined as Z KL(p
More informationStochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints
Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints Thang D. Bui Richard E. Turner tdb40@cam.ac.uk ret26@cam.ac.uk Computational and Biological Learning
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationComparison of Modern Stochastic Optimization Algorithms
Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationIndex. Santanu Pattanayak 2017 S. Pattanayak, Pro Deep Learning with TensorFlow,
Index A Activation functions, neuron/perceptron binary threshold activation function, 102 103 linear activation function, 102 rectified linear unit, 106 sigmoid activation function, 103 104 SoftMax activation
More informationFundamentals of Machine Learning
Fundamentals of Machine Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://icapeople.epfl.ch/mekhan/ emtiyaz@gmail.com Jan 2, 27 Mohammad Emtiyaz Khan 27 Goals Understand (some) fundamentals of Machine
More informationAuto-Encoding Variational Bayes. Stochastic Backpropagation and Approximate Inference in Deep Generative Models
Auto-Encoding Variational Bayes Diederik Kingma and Max Welling Stochastic Backpropagation and Approximate Inference in Deep Generative Models Danilo J. Rezende, Shakir Mohamed, Daan Wierstra Neural Variational
More informationLogistic Regression. Seungjin Choi
Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationThe Variational Gaussian Approximation Revisited
The Variational Gaussian Approximation Revisited Manfred Opper Cédric Archambeau March 16, 2009 Abstract The variational approximation of posterior distributions by multivariate Gaussians has been much
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationFundamentals of Machine Learning (Part I)
Fundamentals of Machine Learning (Part I) Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp April 12, 2018 Mohammad Emtiyaz Khan 2018 1 Goals Understand (some) fundamentals
More informationStochastic Variance Reduction for Nonconvex Optimization. Barnabás Póczos
1 Stochastic Variance Reduction for Nonconvex Optimization Barnabás Póczos Contents 2 Stochastic Variance Reduction for Nonconvex Optimization Joint work with Sashank Reddi, Ahmed Hefny, Suvrit Sra, and
More informationIntroduction to Neural Networks
CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology Copyright by Nguyen, Hotta and Nakagawa 1 Pattern classification Which category of an input? Example: Character
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More information