Faster Stochastic Variational Inference using Proximal-Gradient Methods with General Divergence Functions
|
|
- Rolf Alvin Dean
- 5 years ago
- Views:
Transcription
1 Faster Stochastic Variational Inference using Proximal-Gradient Methods with General Divergence Functions Mohammad Emtiyaz Khan, Reza Babanezhad, Wu Lin, Mark Schmidt, Masashi Sugiyama Conference on Uncertainty in Artificial Intelligence 2016 Discussion led by Yan Kaganovsky Duke University 1 / 21
2 Outline 1 Intro: Variational Inference and Proximal Methods 2 Proximal-Gradient Stochastic Variational Inference (PG-SVI) 3 Convergence of PG-SVI for Fixed Step Size 4 Experimental Results 2 / 21
3 Intro: Variational Inference and Proximal Methods Outline 1 Intro: Variational Inference and Proximal Methods 2 Proximal-Gradient Stochastic Variational Inference (PG-SVI) 3 Convergence of PG-SVI for Fixed Step Size 4 Experimental Results 2 / 21
4 Intro: Variational Inference and Proximal Methods Variational Inference Bayesian inference with a general latent variable model Data vector y of length N Latent vector z of length D Approximate the evidence p(y) with the ELBO p(y, z) log p(y) = log q(z λ) dz (1) q(z λ) max λ S L(λ) := E q(z λ) [ log p(y, z) q(z λ) ] (2) and the problem reduces to finding the parameters λ of q λ = min λ S L(λ) (3) 3 / 21
5 Intro: Variational Inference and Proximal Methods Proximal-Gradient Methods: Gradient Descent Form a linear approximation of the objective L L(λ k ) λ T L(λ k ) and minimize it within some distance from the previous solution. The simplest case: λ k+1 = arg min λ S which reduces to gradient descent [ λ T L(λ k ) + 1 ] λ λ k 2 2 2β k (4) λ k+1 = λ k + β k L(λ k ) (5) 4 / 21
6 Intro: Variational Inference and Proximal Methods Proximal-Gradient Methods: Gradient Descent Two problems with gradient descent: Impractical when we have a large dataset and some terms in the ELBO are intractable Uses Euclidean distance and thus ignores the geometry of the variational-parameter space slow convergence 5 / 21
7 Intro: Variational Inference and Proximal Methods Proximal-Gradient Methods: Gradient Descent The euclidean distance is a poor measure of dissimilarity between distributions N (0, 10000) and N (10, 10000) yield λ 1 λ 2 2 = 10 N (0, 0.01)and N (0.1, 0.01) yield λ 1 λ 2 2 = / 21
8 Intro: Variational Inference and Proximal Methods Proximal-Gradient Methods: Natural Gradient The problem is addressed by replacing the Euclidean distance with another divergence. In the Natural-Gradient Method (Hoffman et al. 2013) [ λ k+1 = arg min λ T L(λ k ) + 1 D sym[ q(z λ) q(z λ k ) ] λ S β k (6) This leads to the update λ k+1 = λ k + β k [ 2 G(λ k ) ] 1 L(λk ) (7) where G is the Fisher information matrix } G := E q(z λ) {[ log q(z λ)][ log q(z λ)] (8) 6 / 21
9 Intro: Variational Inference and Proximal Methods Proximal-Gradient Methods: Natural Gradient Limitations of the Natural Gradient: Conditionally conjugate exponential family models Factorization as q(z) = i p(z i pa i ) (z i are disjoint sets and pa i are parents of z i in a directed acyclic graph) Each conditional distribution p(z i pa i ) is in the exponential family In the general case computing the Fisher matrix is very costly 7 / 21
10 Intro: Variational Inference and Proximal Methods Proximal-Gradient Methods: KL-Based Methods KL Proximal Variational Inference - Khan et al. (2015) Uses KL-divergence as in Thei & Hoffman (2013) Splits the objective L := f + h where f contains the difficult terms Leading to the iterations λ k+1 = arg min λ S [λ T [ f (λ k )] + h(λ) + 1βk D KL [ q(z λ) q(z λk ) ]] Limitations: Exact gradient not feasible for large datasets Deterministic closed-form updates limited to simple models and Gaussian q (9) 8 / 21
11 Proximal-Gradient Stochastic Variational Inference (PG-SVI) Outline 1 Intro: Variational Inference and Proximal Methods 2 Proximal-Gradient Stochastic Variational Inference (PG-SVI) 3 Convergence of PG-SVI for Fixed Step Size 4 Experimental Results 8 / 21
12 Proximal-Gradient Stochastic Variational Inference (PG-SVI) The Proposed Method The authors propose a proximal-gradient stochastic variational inference (PG-SVI) method λ k+1 = arg min λ S [ λ T [ ˆ f (λ k )] + h(λ) + 1 β k D [ λ λ k ] ] (10) Contributions: Splitting L into a simple and difficult term similar to Khan (2015) Stochastic approximation ˆ f of the gradient of the difficult term Divergence functions D that incorporate the geometry of the parameters space 9 / 21
13 Proximal-Gradient Stochastic Variational Inference (PG-SVI) Splitting The split is p(y, z)/q(z λ) = c p d (z λ) p e (z λ) (11) Substituting into the ELBO we get L(λ) = E q [log p d (z λ)] + E }{{} q [log p e (z λ)] +log c (12) }{{} f (λ) h(λ) The following assumptions are made The function f is differentiable and is L-Lipschitz continuous, i.e., λ and λ S we have f (λ) f (λ ) L λ λ (13) The function h is a general convex function 10 / 21
14 Proximal-Gradient Stochastic Variational Inference (PG-SVI) Example 1: Gaussian Process Models Non-Gaussian likelihood p(y n z n ) Let z n = f (x n ) be a latent function drawn from a GP with mean zero and covariance K The EBLO is split into E q [log p(y n z n )] n } {{ } f (λ) N p(y, z) q(z λ) = p(y n z n ) n=1 }{{} p d (z λ) N (z 0, K) N (z m, V ) } {{ } p e(z λ) D KL [N (z m, V ) N (z 0, K)] }{{} h(λ) (h is convex and this leads to a closed-form update) (14) (15) 11 / 21
15 Proximal-Gradient Stochastic Variational Inference (PG-SVI) Example 2: Generalized Linear Model N p(y, z) q(z λ) = p(y n x T n z) n=1 }{{} p d (z λ) The EBLO is split into E q [log p(y n x T n z)] n } {{ } f (λ) N (z 0, I ) N (z m, V ) } {{ } p e(z λ) D KL [N (z m, V ) N (z 0, I )] }{{} h(λ) (16) (17) 12 / 21
16 Proximal-Gradient Stochastic Variational Inference (PG-SVI) Example 3: Correlated Topic Model Multinomial model with latent Gaussian variable The split is p(z µ, Σ) = N (z µ, Σ) (18) p(t n = k z) = exp(z k) K j=1 exp(z j) (19) p(observing a word v t n, β) = β v,tn (20) N p(y, z) q(z λ) = n=1 [ K k=1 exp(z k ) β n,k j exp(z k) ] yn } {{ } p d (z λ) N (z µ, Σ) N (z m, V ) } {{ } p e(z λ) (21) 13 / 21
17 Proximal-Gradient Stochastic Variational Inference (PG-SVI) Stochastic Approximations Computing the gradient of the expectation term E q [ f n (z)] ĝ(λ, ξ n ) := 1 S S f n (z (s) ) [log q(z (s) λ)] s=1 where ξ n is the noise in the stochastic approximation ĝ The identity q(z λ) = q(z λ) [log q(z λ)] was used (22) The stochastic-gradient is formed by randomly selecting a mini-batch of size M ˆ f (λ) = N M ĝ(λ, ξ M ni ) (23) i=1 14 / 21
18 Convergence of PG-SVI for Fixed Step Size Outline 1 Intro: Variational Inference and Proximal Methods 2 Proximal-Gradient Stochastic Variational Inference (PG-SVI) 3 Convergence of PG-SVI for Fixed Step Size 4 Experimental Results 14 / 21
19 Convergence of PG-SVI for Fixed Step Size Additional Assumptions D(λ λ ) > 0 λ λ There exist an α > 0 such that for all λ, λ generated by the algorithm (λ λ ) T λ D(λ λ ) α λ λ 2 (24) The estimate for the gradient is unbiased E[ĝ(λ, ξ)] = f (λ) The variance of the gradient estimate is bounded Var[ĝ(λ, ξ n )] σ 2 15 / 21
20 Convergence of PG-SVI for Fixed Step Size Convergence Proof for the Deterministic Case Proposition 1. Let the assumptions made above be satisfied. If we run t iterations with a fixed step-size β k = α/l for all k and an exact gradient f (λ), then we have min λ k+1 λ k 2 2C 0 k {0,1,...,t 1} αt (25) where C 0 = L(λ ) L(λ 0 ) is the initial sub-optimality 16 / 21
21 Convergence of PG-SVI for Fixed Step Size Convergence Proof for the Stochastic Case Proposition 3. If we run t iterations with a fixed step-size β k = γα /L (where 0 < γ < 2 is a scalar) and fixed batch-size M k = M for all k with a stochastic gradient ˆ f (λ), then we have E R,ξ ( λ R+1 λ R 2 ) 1 [ ] 2C0 2 γ α t + γcσ2 (26) ML where c > 1/2α and α := α 1/2c and the expectation is with respect to the noise ξ due to the mini-batch selection and a random variable R drawn from Prob(R = k) = 1/t, k {0, 1,..., t 1}. 17 / 21
22 Experimental Results Outline 1 Intro: Variational Inference and Proximal Methods 2 Proximal-Gradient Stochastic Variational Inference (PG-SVI) 3 Convergence of PG-SVI for Fixed Step Size 4 Experimental Results 17 / 21
23 Experimental Results Gaussian Process Classification Experiment Zero-mean GP prior with squared-exponential covariance function (hyperparameters set by cross-validation) Stochastic estimate of the gradient using a mini-batch size of 5 (Sonor, Ionosphere) and 20 (USPS) Number of MC samples for the expectation in ELBO is 2000 (Sonor, USPS 3v5) and 500 (Ionosphere) Parameters: mean m and Cholesky factor L for V Fixed step size for the proposed PG-SVI method Compare to adaptive stochastic methods (SGD, ADAGRAD, RMS-Prop, etc.) 18 / 21
24 Experimental Results Gaussian Process Classification Results 19 / 21
25 Experimental Results Correlated Topic Model Experiment NIPS + Associated Press (AP) Datasets NIPS = 1500 documents (vocabulary size=12,419, total words=1.9m) AP = 2,246 documents (vocabulary size=10,473, total worlds=436k) 50% 50% split for training and testing Compare to the Delta and Laplace methods from Wang & Blei (2013) Compare to mean-field from Blei & Lafferty (2007) 20 / 21
26 Experimental Results Correlated Topic Model Results 21 / 21
Faster Stochastic Variational Inference using Proximal-Gradient Methods with General Divergence Functions
Faster Stochastic Variational Inference using Proximal-Gradient Methods with General Divergence Functions Mohammad Emtiyaz Khan Ecole Polytechnique Fédérale de Lausanne Lausanne Switzerland emtiyaz@gmail.com
More informationFaster Stochastic Variational Inference using Proximal-Gradient Methods with General Divergence Functions
Faster Stochastic Variational Inference using Proximal-Gradient Methods with General Divergence Functions Mohammad Emtiyaz Khan Ecole Polytechnique Fédérale de Lausanne Lausanne Switzerland emtiyaz@gmail.com
More informationConvergence of Proximal-Gradient Stochastic Variational Inference under Non-Decreasing Step-Size Sequence
Convergence of Proximal-Gradient Stochastic Variational Inference under Non-Decreasing Step-Size Sequence Mohammad Emtiyaz Khan Ecole Polytechnique Fédérale de Lausanne Lausanne Switzerland emtiyaz@gmail.com
More informationKullback-Leibler Proximal Variational Inference
Kullback-Leibler Proximal Variational Inference Mohammad Emtiyaz Khan Ecole Polytechnique Fédérale de Lausanne Lausanne, Switzerland emtiyaz@gmail.com François Fleuret Idiap Research Institute Martigny,
More informationBayesian Deep Learning
Bayesian Deep Learning Mohammad Emtiyaz Khan RIKEN Center for AI Project, Tokyo http://emtiyaz.github.io Keywords Bayesian Statistics Gaussian distribution, Bayes rule Continues Optimization Gradient descent,
More informationConvergence Rate of Expectation-Maximization
Convergence Rate of Expectation-Maximiation Raunak Kumar University of British Columbia Mark Schmidt University of British Columbia Abstract raunakkumar17@outlookcom schmidtm@csubcca Expectation-maximiation
More informationCSC2541 Lecture 5 Natural Gradient
CSC2541 Lecture 5 Natural Gradient Roger Grosse Roger Grosse CSC2541 Lecture 5 Natural Gradient 1 / 12 Motivation Two classes of optimization procedures used throughout ML (stochastic) gradient descent,
More informationKullback-Leibler Proximal Variational Inference
Kullbac-Leibler Proximal Variational Inference Mohammad Emtiyaz Khan Ecole Polytechnique Fédérale de Lausanne Lausanne, Switzerland emtiyaz@gmail.com François Fleuret Idiap Research Institute Martigny,
More informationNatural Gradients via the Variational Predictive Distribution
Natural Gradients via the Variational Predictive Distribution Da Tang Columbia University datang@cs.columbia.edu Rajesh Ranganath New York University rajeshr@cims.nyu.edu Abstract Variational inference
More informationAdvances in Variational Inference
1 Advances in Variational Inference Cheng Zhang Judith Bütepage Hedvig Kjellström Stephan Mandt arxiv:1711.05597v1 [cs.lg] 15 Nov 2017 Abstract Many modern unsupervised or semi-supervised machine learning
More informationFast yet Simple Natural-Gradient Variational Inference in Complex Models
Fast yet Simple Natural-Gradient Variational Inference in Complex Models Mohammad Emtiyaz Khan RIKEN Center for AI Project, Tokyo, Japan http://emtiyaz.github.io Joint work with Wu Lin (UBC) DidrikNielsen,
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationStochastic Variational Inference
Stochastic Variational Inference David M. Blei Princeton University (DRAFT: DO NOT CITE) December 8, 2011 We derive a stochastic optimization algorithm for mean field variational inference, which we call
More informationWhy should you care about the solution strategies?
Optimization Why should you care about the solution strategies? Understanding the optimization approaches behind the algorithms makes you more effectively choose which algorithm to run Understanding the
More informationAuto-Encoding Variational Bayes
Auto-Encoding Variational Bayes Diederik P Kingma, Max Welling June 18, 2018 Diederik P Kingma, Max Welling Auto-Encoding Variational Bayes June 18, 2018 1 / 39 Outline 1 Introduction 2 Variational Lower
More informationStochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints
Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints Thang D. Bui Richard E. Turner tdb40@cam.ac.uk ret26@cam.ac.uk Computational and Biological Learning
More informationDeep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationProbabilistic Graphical Models for Image Analysis - Lecture 4
Probabilistic Graphical Models for Image Analysis - Lecture 4 Stefan Bauer 12 October 2018 Max Planck ETH Center for Learning Systems Overview 1. Repetition 2. α-divergence 3. Variational Inference 4.
More informationDecoupled Variational Gaussian Inference
Decoupled Variational Gaussian Inference Mohammad Emtiyaz Khan Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland emtiyaz@gmail.com Abstract Variational Gaussian (VG) inference methods that optimize
More informationExpectation Propagation Algorithm
Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,
More informationNonparametric Inference for Auto-Encoding Variational Bayes
Nonparametric Inference for Auto-Encoding Variational Bayes Erik Bodin * Iman Malik * Carl Henrik Ek * Neill D. F. Campbell * University of Bristol University of Bath Variational approximations are an
More informationLogistic Regression. Stochastic Gradient Descent
Tutorial 8 CPSC 340 Logistic Regression Stochastic Gradient Descent Logistic Regression Model A discriminative probabilistic model for classification e.g. spam filtering Let x R d be input and y { 1, 1}
More informationBayesian Deep Learning
Bayesian Deep Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp June 06, 2018 Mohammad Emtiyaz Khan 2018 1 What will you learn? Why is Bayesian inference
More informationComputer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization
Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions
More informationTutorial on: Optimization I. (from a deep learning perspective) Jimmy Ba
Tutorial on: Optimization I (from a deep learning perspective) Jimmy Ba Outline Random search v.s. gradient descent Finding better search directions Design white-box optimization methods to improve computation
More informationVariational Inference via Stochastic Backpropagation
Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation
More informationVariational Inference. Sargur Srihari
Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of discussion We first describe inference with PGMs and the intractability of exact inference Then give a taxonomy of inference algorithms
More informationECS171: Machine Learning
ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018 Gradient descent Optimization Goal: find the minimizer of a function min f (w) w For now we assume f
More informationCollapsed Variational Bayesian Inference for Hidden Markov Models
Collapsed Variational Bayesian Inference for Hidden Markov Models Pengyu Wang, Phil Blunsom Department of Computer Science, University of Oxford International Conference on Artificial Intelligence and
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationBayesian Approaches Data Mining Selected Technique
Bayesian Approaches Data Mining Selected Technique Henry Xiao xiao@cs.queensu.ca School of Computing Queen s University Henry Xiao CISC 873 Data Mining p. 1/17 Probabilistic Bases Review the fundamentals
More informationComparison of Modern Stochastic Optimization Algorithms
Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,
More informationTowards stability and optimality in stochastic gradient descent
Towards stability and optimality in stochastic gradient descent Panos Toulis, Dustin Tran and Edoardo M. Airoldi August 26, 2016 Discussion by Ikenna Odinaka Duke University Outline Introduction 1 Introduction
More informationBasic Sampling Methods
Basic Sampling Methods Sargur Srihari srihari@cedar.buffalo.edu 1 1. Motivation Topics Intractability in ML How sampling can help 2. Ancestral Sampling Using BNs 3. Transforming a Uniform Distribution
More informationProbabilistic numerics for deep learning
Presenter: Shijia Wang Department of Engineering Science, University of Oxford rning (RLSS) Summer School, Montreal 2017 Outline 1 Introduction Probabilistic Numerics 2 Components Probabilistic modeling
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationVariational Inference with Copula Augmentation
Variational Inference with Copula Augmentation Dustin Tran 1 David M. Blei 2 Edoardo M. Airoldi 1 1 Department of Statistics, Harvard University 2 Department of Statistics & Computer Science, Columbia
More informationLatent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data
Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data Yarin Gal Yutian Chen Zoubin Ghahramani yg279@cam.ac.uk Distribution Estimation Distribution estimation of categorical
More informationVariational Autoencoders
Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly
More informationPattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods
Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs
More informationCS260: Machine Learning Algorithms
CS260: Machine Learning Algorithms Lecture 4: Stochastic Gradient Descent Cho-Jui Hsieh UCLA Jan 16, 2019 Large-scale Problems Machine learning: usually minimizing the training loss min w { 1 N min w {
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationStochastic Analogues to Deterministic Optimizers
Stochastic Analogues to Deterministic Optimizers ISMP 2018 Bordeaux, France Vivak Patel Presented by: Mihai Anitescu July 6, 2018 1 Apology I apologize for not being here to give this talk myself. I injured
More informationReading Group on Deep Learning Session 1
Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular
More informationSparse Stochastic Inference for Latent Dirichlet Allocation
Sparse Stochastic Inference for Latent Dirichlet Allocation David Mimno 1, Matthew D. Hoffman 2, David M. Blei 1 1 Dept. of Computer Science, Princeton U. 2 Dept. of Statistics, Columbia U. Presentation
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationRecurrent Latent Variable Networks for Session-Based Recommendation
Recurrent Latent Variable Networks for Session-Based Recommendation Panayiotis Christodoulou Cyprus University of Technology paa.christodoulou@edu.cut.ac.cy 27/8/2017 Panayiotis Christodoulou (C.U.T.)
More informationNesterov s Acceleration
Nesterov s Acceleration Nesterov Accelerated Gradient min X f(x)+ (X) f -smooth. Set s 1 = 1 and = 1. Set y 0. Iterate by increasing t: g t 2 @f(y t ) s t+1 = 1+p 1+4s 2 t 2 y t = x t + s t 1 s t+1 (x
More informationFundamentals of Machine Learning. Mohammad Emtiyaz Khan EPFL Aug 25, 2015
Fundamentals of Machine Learning Mohammad Emtiyaz Khan EPFL Aug 25, 25 Mohammad Emtiyaz Khan 24 Contents List of concepts 2 Course Goals 3 2 Regression 4 3 Model: Linear Regression 7 4 Cost Function: MSE
More informationDeep Learning & Artificial Intelligence WS 2018/2019
Deep Learning & Artificial Intelligence WS 2018/2019 Linear Regression Model Model Error Function: Squared Error Has no special meaning except it makes gradients look nicer Prediction Ground truth / target
More informationProbabilistic Graphical Models & Applications
Probabilistic Graphical Models & Applications Learning of Graphical Models Bjoern Andres and Bernt Schiele Max Planck Institute for Informatics The slides of today s lecture are authored by and shown with
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationLinear Regression (continued)
Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression
More informationAfternoon Meeting on Bayesian Computation 2018 University of Reading
Gabriele Abbati 1, Alessra Tosi 2, Seth Flaxman 3, Michael A Osborne 1 1 University of Oxford, 2 Mind Foundry Ltd, 3 Imperial College London Afternoon Meeting on Bayesian Computation 2018 University of
More informationProbability and Information Theory. Sargur N. Srihari
Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal
More informationOutline Lecture 2 2(32)
Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic
More informationSample questions for Fundamentals of Machine Learning 2018
Sample questions for Fundamentals of Machine Learning 2018 Teacher: Mohammad Emtiyaz Khan A few important informations: In the final exam, no electronic devices are allowed except a calculator. Make sure
More informationLogistic Regression: Online, Lazy, Kernelized, Sequential, etc.
Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010
More informationMachine Learning Basics: Maximum Likelihood Estimation
Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning
More informationJ. Sadeghi E. Patelli M. de Angelis
J. Sadeghi E. Patelli Institute for Risk and, Department of Engineering, University of Liverpool, United Kingdom 8th International Workshop on Reliable Computing, Computing with Confidence University of
More informationProximity Variational Inference
Jaan Altosaar Rajesh Ranganath David M. Blei altosaar@princeton.edu rajeshr@cims.nyu.edu david.blei@columbia.edu Princeton University New York University Columbia University Abstract Variational inference
More informationVariational Bayesian Logistic Regression
Variational Bayesian Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Models for Classification Overview 1. Discriminant Functions 2. Probabilistic
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationVariational Autoencoders (VAEs)
September 26 & October 3, 2017 Section 1 Preliminaries Kullback-Leibler divergence KL divergence (continuous case) p(x) andq(x) are two density distributions. Then the KL-divergence is defined as Z KL(p
More informationDay 3 Lecture 3. Optimizing deep networks
Day 3 Lecture 3 Optimizing deep networks Convex optimization A function is convex if for all α [0,1]: f(x) Tangent line Examples Quadratics 2-norms Properties Local minimum is global minimum x Gradient
More informationApproximate Bayesian inference
Approximate Bayesian inference Variational and Monte Carlo methods Christian A. Naesseth 1 Exchange rate data 0 20 40 60 80 100 120 Month Image data 2 1 Bayesian inference 2 Variational inference 3 Stochastic
More informationVariational Bayes and Variational Message Passing
Variational Bayes and Variational Message Passing Mohammad Emtiyaz Khan CS,UBC Variational Bayes and Variational Message Passing p.1/16 Variational Inference Find a tractable distribution Q(H) that closely
More informationCSCI 1951-G Optimization Methods in Finance Part 12: Variants of Gradient Descent
CSCI 1951-G Optimization Methods in Finance Part 12: Variants of Gradient Descent April 27, 2018 1 / 32 Outline 1) Moment and Nesterov s accelerated gradient descent 2) AdaGrad and RMSProp 4) Adam 5) Stochastic
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More informationNonparameteric Regression:
Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationVariational Inference: A Review for Statisticians
Variational Inference: A Review for Statisticians David M. Blei Department of Computer Science and Statistics Columbia University Alp Kucukelbir Department of Computer Science Columbia University Jon D.
More informationCS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning
CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning Professor Erik Sudderth Brown University Computer Science October 4, 2016 Some figures and materials courtesy
More informationIntroduction to Bayesian inference
Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions
More informationDeep Generative Models
Deep Generative Models Durk Kingma Max Welling Deep Probabilistic Models Worksop Wednesday, 1st of Oct, 2014 D.P. Kingma Deep generative models Transformations between Bayes nets and Neural nets Transformation
More informationSparse Approximations for Non-Conjugate Gaussian Process Regression
Sparse Approximations for Non-Conjugate Gaussian Process Regression Thang Bui and Richard Turner Computational and Biological Learning lab Department of Engineering University of Cambridge November 11,
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationGaussian Mixture Models
Gaussian Mixture Models David Rosenberg, Brett Bernstein New York University April 26, 2017 David Rosenberg, Brett Bernstein (New York University) DS-GA 1003 April 26, 2017 1 / 42 Intro Question Intro
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationCOMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS16
COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS6 Lecture 3: Classification with Logistic Regression Advanced optimization techniques Underfitting & Overfitting Model selection (Training-
More informationStochastic Gradient Descent with Variance Reduction
Stochastic Gradient Descent with Variance Reduction Rie Johnson, Tong Zhang Presenter: Jiawen Yao March 17, 2015 Rie Johnson, Tong Zhang Presenter: JiawenStochastic Yao Gradient Descent with Variance Reduction
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationLinear Classification
Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative
More informationAn Evolving Gradient Resampling Method for Machine Learning. Jorge Nocedal
An Evolving Gradient Resampling Method for Machine Learning Jorge Nocedal Northwestern University NIPS, Montreal 2015 1 Collaborators Figen Oztoprak Stefan Solntsev Richard Byrd 2 Outline 1. How to improve
More informationGaussian Mixture Model
Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,
More informationSummary and discussion of: Dropout Training as Adaptive Regularization
Summary and discussion of: Dropout Training as Adaptive Regularization Statistics Journal Club, 36-825 Kirstin Early and Calvin Murdock November 21, 2014 1 Introduction Multi-layered (i.e. deep) artificial
More informationVariational inference
Simon Leglaive Télécom ParisTech, CNRS LTCI, Université Paris Saclay November 18, 2016, Télécom ParisTech, Paris, France. Outline Introduction Probabilistic model Problem Log-likelihood decomposition EM
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David
More informationADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II
1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector
More informationVariational Inference in TensorFlow. Danijar Hafner Stanford CS University College London, Google Brain
Variational Inference in TensorFlow Danijar Hafner Stanford CS 20 2018-02-16 University College London, Google Brain Outline Variational Inference Tensorflow Distributions VAE in TensorFlow Variational
More informationLatent Dirichlet Allocation
Outlines Advanced Artificial Intelligence October 1, 2009 Outlines Part I: Theoretical Background Part II: Application and Results 1 Motive Previous Research Exchangeability 2 Notation and Terminology
More informationOptimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method
Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Davood Hajinezhad Iowa State University Davood Hajinezhad Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method 1 / 35 Co-Authors
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationSum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017
Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth
More informationOnline Bayesian Passive-Aggressive Learning"
Online Bayesian Passive-Aggressive Learning" Tianlin Shi! stl501@gmail.com! Jun Zhu! dcszj@mail.tsinghua.edu.cn! The BIG DATA challenge" Large amounts of data.! Big data:!! Big Science: 25 PB annual data.!
More informationOnline Bayesian Passive-Aggressive Learning
Online Bayesian Passive-Aggressive Learning Full Journal Version: http://qr.net/b1rd Tianlin Shi Jun Zhu ICML 2014 T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 1 / 35 Outline Introduction Motivation Framework
More informationAlgorithms for Variational Learning of Mixture of Gaussians
Algorithms for Variational Learning of Mixture of Gaussians Instructors: Tapani Raiko and Antti Honkela Bayes Group Adaptive Informatics Research Center 28.08.2008 Variational Bayesian Inference Mixture
More informationLecture 3. Linear Regression II Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More information