Small-variance Asymptotics for Dirichlet Process Mixtures of SVMs

Size: px
Start display at page:

Download "Small-variance Asymptotics for Dirichlet Process Mixtures of SVMs"

Transcription

1 Small-variance Asymptotics for Dirichlet Process Mixtures of SVMs Yining Wang Jun Zhu Tsinghua University July, 2014 Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25

2 Outline 1 Infinite SVM (Zhu et al., 2011) The Infinite SVM (isvm) Model Gibbs-iSVM 2 The Max-Margin DP-means (M 2 DPM) Algorithm Small-variance Asymptotic (SVA) Analysis The M 2 DPM Algorithm 3 Experiments 4 Summary Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25

3 Outline 1 Infinite SVM (Zhu et al., 2011) The Infinite SVM (isvm) Model Gibbs-iSVM 2 The Max-Margin DP-means (M 2 DPM) Algorithm Small-variance Asymptotic (SVA) Analysis The M 2 DPM Algorithm 3 Experiments 4 Summary Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25

4 Infinite SVM (Zhu et al., ICML 2011) Solving clustering and classification simultaneously Motivation: Improve classification by finding underlying clusters. Improve clustering by incorporating supervised information Unknown # of clusters a nonparametric treatment. Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25

5 Infinite SVM CRP(α) z x µ ρ 2 K y N η K ν 2 Nonparametric prior: p 0 (z i = k α, z i ) Max-margin classification model: { n i,k, if n i,k > 0 α, otherwise φ(y i x i, η, z i = k) = exp( 2c max(0, 1 y i η k x i)) Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25

6 Outline 1 Infinite SVM (Zhu et al., 2011) The Infinite SVM (isvm) Model Gibbs-iSVM 2 The Max-Margin DP-means (M 2 DPM) Algorithm Small-variance Asymptotic (SVA) Analysis The M 2 DPM Algorithm 3 Experiments 4 Summary Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25

7 Gibbs sampler for isvm Iteratively sample model parameters: 1 Mean parameter µ k : Gaussian distribution. 2 Linear classifier η k : data augmentation. 3 Cluster assignments z i : categorical distribution. Problems: slow! q(z i = z new ) α N (x i ; µ, σ 2 I)dp 0 (µ) φ(y i x i, η)dp 0 (η) } {{ } difficult to evaluate Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25

8 Outline 1 Infinite SVM (Zhu et al., 2011) The Infinite SVM (isvm) Model Gibbs-iSVM 2 The Max-Margin DP-means (M 2 DPM) Algorithm Small-variance Asymptotic (SVA) Analysis The M 2 DPM Algorithm 3 Experiments 4 Summary Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25

9 Small-variance Asymptotic (SVA) Analysis 1 Assumptions: model variance 0. 2 Consequences: Connections between Bayesian posterior and deterministic loss. Novel inference algorithms. 3 Examples: Probabilistic PCA vs. PCA. (Tipping et al., 1999) DP-means. (Kulis et al., 2012) Efficient feature learning. (Broderick et al., 2013) Asymp-iHMM. (Roychowdhury et al., 2013) Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25

10 Small-variance Asymptotic (SVA) Analysis 1 Assumptions: model variance 0. 2 Consequences: Connections between Bayesian posterior and deterministic loss. Novel inference algorithms. 3 Examples: Probabilistic PCA vs. PCA. (Tipping et al., 1999) DP-means. (Kulis et al., 2012) Efficient feature learning. (Broderick et al., 2013) Asymp-iHMM. (Roychowdhury et al., 2013) Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25

11 Small-variance Asymptotic (SVA) Analysis 1 Assumptions: model variance 0. 2 Consequences: Connections between Bayesian posterior and deterministic loss. Novel inference algorithms. 3 Examples: Probabilistic PCA vs. PCA. (Tipping et al., 1999) DP-means. (Kulis et al., 2012) Efficient feature learning. (Broderick et al., 2013) Asymp-iHMM. (Roychowdhury et al., 2013) Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25

12 SVA on Gibbs-iSVM SVA Assumptions: σ 2, ν 2 0, c, α. Existing clusters: q(z i = k) n i,k exp ( x i µ k 2 ) σ 2 2c max(0, 1 y i ηk x i) Q i (k) = s x i µ k 2 + 2c(1 y }{{} i ηk x i) + ; }{{} Clustering Classification Creating new clusters: q(z i = z new ) α N (x i ; µ, σ 2 I)dp 0 (µ) φ(y i x i, η)dp 0 (η) Q i (new) = }{{} λ + 2c(1 y i η x i ) + + η 2 /ν 2. }{{}}{{} AIC Classification Regularization Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25

13 SVA on Gibbs-iSVM SVA Assumptions: σ 2, ν 2 0, c, α. Existing clusters: q(z i = k) n i,k exp ( x i µ k 2 ) σ 2 2c max(0, 1 y i ηk x i) Q i (k) = s x i µ k 2 + 2c(1 y }{{} i ηk x i) + ; }{{} Clustering Classification Creating new clusters: q(z i = z new ) α N (x i ; µ, σ 2 I)dp 0 (µ) φ(y i x i, η)dp 0 (η) Q i (new) = }{{} λ + 2c(1 y i η x i ) + + η 2 /ν 2. }{{}}{{} AIC Classification Regularization Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25

14 Outline 1 Infinite SVM (Zhu et al., 2011) The Infinite SVM (isvm) Model Gibbs-iSVM 2 The Max-Margin DP-means (M 2 DPM) Algorithm Small-variance Asymptotic (SVA) Analysis The M 2 DPM Algorithm 3 Experiments 4 Summary Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25

15 The M 2 DPM Algorithm 1 Repeat until convergence: For each instance i: z i argmin k Q i (k). For each cluster k: µ k x k. Update {η k } K k=1 using data augmentation. 2 Deterministic loss function (via SVA on posterior): L(z, µ, η) K η k 2 = 2ν 2 k=1 }{{} Regularization + 2c n (ζ z i i=1 i ) + } {{ } Classification + s n x i µ zi 2 + λ }{{ K}. i=1 }{{} AIC Clustering 3 Extension to exponential family distributions and multi-class classification. Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25

16 The M 2 DPM Algorithm 1 Repeat until convergence: For each instance i: z i argmin k Q i (k). For each cluster k: µ k x k. Update {η k } K k=1 using data augmentation. 2 Deterministic loss function (via SVA on posterior): L(z, µ, η) K η k 2 = 2ν 2 k=1 }{{} Regularization + 2c n (ζ z i i=1 i ) + } {{ } Classification + s n x i µ zi 2 + λ }{{ K}. i=1 }{{} AIC Clustering 3 Extension to exponential family distributions and multi-class classification. Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25

17 The M 2 DPM Algorithm 1 Repeat until convergence: For each instance i: z i argmin k Q i (k). For each cluster k: µ k x k. Update {η k } K k=1 using data augmentation. 2 Deterministic loss function (via SVA on posterior): L(z, µ, η) K η k 2 = 2ν 2 k=1 }{{} Regularization + 2c n (ζ z i i=1 i ) + } {{ } Classification + s n x i µ zi 2 + λ }{{ K}. i=1 }{{} AIC Clustering 3 Extension to exponential family distributions and multi-class classification. Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25

18 Outline 1 Infinite SVM (Zhu et al., 2011) The Infinite SVM (isvm) Model Gibbs-iSVM 2 The Max-Margin DP-means (M 2 DPM) Algorithm Small-variance Asymptotic (SVA) Analysis The M 2 DPM Algorithm 3 Experiments 4 Summary Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25

19 Datasets and Algorithms 1 Protein fold classification 27 classes, 696 instances, 21 features 2 Pakinson s disease detection 2 classes, 195 instances, 22 features 3 Algorithms Classification models: MNL, Linear-SVM, RBF-SVM Hybrid models: dpmnl (Shahbaba et al., 2009), DP+SVM, Gibbs-iSVM, M 2 DPM. Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25

20 Classification performance Protein Parkinson Algo F1 (%) Times (s) Acc. (%) Times (s) MNL L-SVM RBF-SVM dpmnl DP+SVM Gibbs-iSVM M 2 DPM Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25

21 Clustering performance 1 Nonparametric clustering on synthetic datasets: K 0 vs. K. n K K Clustering on potential Parkinson s disease patients. Group I II III IV V Avg. age Avg. stage (0-4) Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25

22 Clustering performance 1 Nonparametric clustering on synthetic datasets: K 0 vs. K. n K K Clustering on potential Parkinson s disease patients. Group I II III IV V Avg. age Avg. stage (0-4) Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25

23 Summary Infinite SVM: clustering + classification Small variance analysis M 2 DPM: fast and accurate. Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25

24 Summary Infinite SVM: clustering + classification Small variance analysis M 2 DPM: fast and accurate. Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25

25 Summary Infinite SVM: clustering + classification Small variance analysis M 2 DPM: fast and accurate. Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25

26 Questions Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25

27 The data augmentation trick Data augmentation for SVM classifiers (Polson and Scott, 2011) φ(y i z i = k, η) = exp( 2c max(0, 1 y i ηk x i)) ( 1 = exp (ω i + cζi k ) 2 2πωi 2ω i = 0 0 φ(y i, ω i z i = k, η)dω i ) dω i Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25

28 Extension to multi-class classification Multi-class hinge loss: (Crammer and Singer, 2001) φ m (y i z i = k, η) = exp( 2c max(δ y,yi + η y k,y x i ηk,y i x i )) Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25

29 The M 2 DPM Algorithm 1 Repeat until convergence: For each instance i: z i argmin k Q i (k). For each cluster k: µ k x k. Update {η k } K k=1 using data augmentation Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25

30 The M 2 DPM Algorithm 1 Repeat until convergence: For each instance i: z i argmin k Q i (k). For each cluster k: µ k x k. Update {η k } K k=1 using data augmentation Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25

31 The M 2 DPM Algorithm 1 Repeat until convergence: For each instance i: z i argmin k Q i (k). For each cluster k: µ k x k. Update {η k } K k=1 using data augmentation Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25

32 The M 2 DPM Algorithm 1 Repeat until convergence: For each instance i: z i argmin k Q i (k). For each cluster k: µ k x k. Update {η k } K k=1 using data augmentation Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25

33 The M 2 DPM Algorithm 1 Repeat until convergence: For each instance i: z i argmin k Q i (k). For each cluster k: µ k x k. Update {η k } K k=1 using data augmentation Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25

Fast Approximate MAP Inference for Bayesian Nonparametrics

Fast Approximate MAP Inference for Bayesian Nonparametrics Fast Approximate MAP Inference for Bayesian Nonparametrics Y. Raykov A. Boukouvalas M.A. Little Department of Mathematics Aston University 10th Conference on Bayesian Nonparametrics, 2015 1 Iterated Conditional

More information

MAD-Bayes: MAP-based Asymptotic Derivations from Bayes

MAD-Bayes: MAP-based Asymptotic Derivations from Bayes MAD-Bayes: MAP-based Asymptotic Derivations from Bayes Tamara Broderick Brian Kulis Michael I. Jordan Cat Clusters Mouse clusters Dog 1 Cat Clusters Dog Mouse Lizard Sheep Picture 1 Picture 2 Picture 3

More information

Online Bayesian Passive-Agressive Learning

Online Bayesian Passive-Agressive Learning Online Bayesian Passive-Agressive Learning International Conference on Machine Learning, 2014 Tianlin Shi Jun Zhu Tsinghua University, China 21 August 2015 Presented by: Kyle Ulrich Introduction Online

More information

Online Bayesian Passive-Aggressive Learning

Online Bayesian Passive-Aggressive Learning Online Bayesian Passive-Aggressive Learning Full Journal Version: http://qr.net/b1rd Tianlin Shi Jun Zhu ICML 2014 T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 1 / 35 Outline Introduction Motivation Framework

More information

Spatial Normalized Gamma Process

Spatial Normalized Gamma Process Spatial Normalized Gamma Process Vinayak Rao Yee Whye Teh Presented at NIPS 2009 Discussion and Slides by Eric Wang June 23, 2010 Outline Introduction Motivation The Gamma Process Spatial Normalized Gamma

More information

Construction of Dependent Dirichlet Processes based on Poisson Processes

Construction of Dependent Dirichlet Processes based on Poisson Processes 1 / 31 Construction of Dependent Dirichlet Processes based on Poisson Processes Dahua Lin Eric Grimson John Fisher CSAIL MIT NIPS 2010 Outstanding Student Paper Award Presented by Shouyuan Chen Outline

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Simple approximate MAP inference for Dirichlet processes mixtures

Simple approximate MAP inference for Dirichlet processes mixtures Vol. 0 (2015) 1 8 Simple approximate MAP inference for Dirichlet processes mixtures Yordan P. Raykov Aston University e-mail: yordan.raykov@gmail.com Alexis Boukouvalas University of Manchester e-mail:

More information

DP-space: Bayesian Nonparametric Subspace Clustering with Small-variance Asymptotics

DP-space: Bayesian Nonparametric Subspace Clustering with Small-variance Asymptotics with Small-variance Asymptotics Yining Wang YININGWA@CS.CMU.EDU Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA Jun Zhu DCSZJ@TSINGHUA.EDU.CN Dept. of Comp. Sci. & Tech.,

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory

More information

CMPS 242: Project Report

CMPS 242: Project Report CMPS 242: Project Report RadhaKrishna Vuppala Univ. of California, Santa Cruz vrk@soe.ucsc.edu Abstract The classification procedures impose certain models on the data and when the assumption match the

More information

Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems

Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems Scott W. Linderman Matthew J. Johnson Andrew C. Miller Columbia University Harvard and Google Brain Harvard University Ryan

More information

Infinite SVM: a Dirichlet Process Mixture of Large-margin Kernel Machines

Infinite SVM: a Dirichlet Process Mixture of Large-margin Kernel Machines Infinite SVM: a Dirichlet Process Mixture of Large-margin Kernel Machines Jun Zhu junzhu@cs.cmu.edu Ning Chen chenn07@mails.thu.edu.cn Eric P. Xing epxing@cs.cmu.edu School of Computer Science, Carnegie

More information

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian

More information

Probabilistic Time Series Classification

Probabilistic Time Series Classification Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign

More information

Online Bayesian Passive-Aggressive Learning"

Online Bayesian Passive-Aggressive Learning Online Bayesian Passive-Aggressive Learning" Tianlin Shi! stl501@gmail.com! Jun Zhu! dcszj@mail.tsinghua.edu.cn! The BIG DATA challenge" Large amounts of data.! Big data:!! Big Science: 25 PB annual data.!

More information

Kernel methods, kernel SVM and ridge regression

Kernel methods, kernel SVM and ridge regression Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;

More information

Lecture 16-17: Bayesian Nonparametrics I. STAT 6474 Instructor: Hongxiao Zhu

Lecture 16-17: Bayesian Nonparametrics I. STAT 6474 Instructor: Hongxiao Zhu Lecture 16-17: Bayesian Nonparametrics I STAT 6474 Instructor: Hongxiao Zhu Plan for today Why Bayesian Nonparametrics? Dirichlet Distribution and Dirichlet Processes. 2 Parameter and Patterns Reference:

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Gaussian Processes (10/16/13)

Gaussian Processes (10/16/13) STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs

More information

Image segmentation combining Markov Random Fields and Dirichlet Processes

Image segmentation combining Markov Random Fields and Dirichlet Processes Image segmentation combining Markov Random Fields and Dirichlet Processes Jessica SODJO IMS, Groupe Signal Image, Talence Encadrants : A. Giremus, J.-F. Giovannelli, F. Caron, N. Dobigeon Jessica SODJO

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Bayesian non-parametric model to longitudinally predict churn

Bayesian non-parametric model to longitudinally predict churn Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics

More information

Lightweight Probabilistic Deep Networks Supplemental Material

Lightweight Probabilistic Deep Networks Supplemental Material Lightweight Probabilistic Deep Networks Supplemental Material Jochen Gast Stefan Roth Department of Computer Science, TU Darmstadt In this supplemental material we derive the recipe to create uncertainty

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined

More information

Nonparametric Bayes tensor factorizations for big data

Nonparametric Bayes tensor factorizations for big data Nonparametric Bayes tensor factorizations for big data David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & DARPA N66001-09-C-2082 Motivation Conditional

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Yuriy Sverchkov Intelligent Systems Program University of Pittsburgh October 6, 2011 Outline Latent Semantic Analysis (LSA) A quick review Probabilistic LSA (plsa)

More information

Integrated Non-Factorized Variational Inference

Integrated Non-Factorized Variational Inference Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

STAT Advanced Bayesian Inference

STAT Advanced Bayesian Inference 1 / 32 STAT 625 - Advanced Bayesian Inference Meng Li Department of Statistics Jan 23, 218 The Dirichlet distribution 2 / 32 θ Dirichlet(a 1,...,a k ) with density p(θ 1,θ 2,...,θ k ) = k j=1 Γ(a j) Γ(

More information

Probabilistic Machine Learning. Industrial AI Lab.

Probabilistic Machine Learning. Industrial AI Lab. Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear

More information

Nonparametric Bayes Density Estimation and Regression with High Dimensional Data

Nonparametric Bayes Density Estimation and Regression with High Dimensional Data Nonparametric Bayes Density Estimation and Regression with High Dimensional Data Abhishek Bhattacharya, Garritt Page Department of Statistics, Duke University Joint work with Prof. D.Dunson September 2010

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Part IV: Monte Carlo and nonparametric Bayes

Part IV: Monte Carlo and nonparametric Bayes Part IV: Monte Carlo and nonparametric Bayes Outline Monte Carlo methods Nonparametric Bayesian models Outline Monte Carlo methods Nonparametric Bayesian models The Monte Carlo principle The expectation

More information

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2 STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised

More information

Gaussian Mixture Model

Gaussian Mixture Model Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,

More information

Bayesian Data Fusion with Gaussian Process Priors : An Application to Protein Fold Recognition

Bayesian Data Fusion with Gaussian Process Priors : An Application to Protein Fold Recognition Bayesian Data Fusion with Gaussian Process Priors : An Application to Protein Fold Recognition Mar Girolami 1 Department of Computing Science University of Glasgow girolami@dcs.gla.ac.u 1 Introduction

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

Relevance Vector Machines

Relevance Vector Machines LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian

More information

CSC411 Fall 2018 Homework 5

CSC411 Fall 2018 Homework 5 Homework 5 Deadline: Wednesday, Nov. 4, at :59pm. Submission: You need to submit two files:. Your solutions to Questions and 2 as a PDF file, hw5_writeup.pdf, through MarkUs. (If you submit answers to

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Collapsed Variational Dirichlet Process Mixture Models

Collapsed Variational Dirichlet Process Mixture Models Collapsed Variational Dirichlet Process Mixture Models Kenichi Kurihara Dept. of Computer Science Tokyo Institute of Technology, Japan kurihara@mi.cs.titech.ac.jp Max Welling Dept. of Computer Science

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning 10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Max-margin learning of GM Eric Xing Lecture 28, Apr 28, 2014 b r a c e Reading: 1 Classical Predictive Models Input and output space: Predictive

More information

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014 Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of

More information

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute

More information

Variational Inference via Stochastic Backpropagation

Variational Inference via Stochastic Backpropagation Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

STA 216, GLM, Lecture 16. October 29, 2007

STA 216, GLM, Lecture 16. October 29, 2007 STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural

More information

Variational Autoencoders

Variational Autoencoders Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly

More information

Two Useful Bounds for Variational Inference

Two Useful Bounds for Variational Inference Two Useful Bounds for Variational Inference John Paisley Department of Computer Science Princeton University, Princeton, NJ jpaisley@princeton.edu Abstract We review and derive two lower bounds on the

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Expectation Propagation for Approximate Bayesian Inference

Expectation Propagation for Approximate Bayesian Inference Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given

More information

A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles

A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles Jeremy Gaskins Department of Bioinformatics & Biostatistics University of Louisville Joint work with Claudio Fuentes

More information

Kernel adaptive Sequential Monte Carlo

Kernel adaptive Sequential Monte Carlo Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline

More information

Bayesian model selection in graphs by using BDgraph package

Bayesian model selection in graphs by using BDgraph package Bayesian model selection in graphs by using BDgraph package A. Mohammadi and E. Wit March 26, 2013 MOTIVATION Flow cytometry data with 11 proteins from Sachs et al. (2005) RESULT FOR CELL SIGNALING DATA

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

Bayes methods for categorical data. April 25, 2017

Bayes methods for categorical data. April 25, 2017 Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

A Process over all Stationary Covariance Kernels

A Process over all Stationary Covariance Kernels A Process over all Stationary Covariance Kernels Andrew Gordon Wilson June 9, 0 Abstract I define a process over all stationary covariance kernels. I show how one might be able to perform inference that

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 27, 2015 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass

More information

Classifier Complexity and Support Vector Classifiers

Classifier Complexity and Support Vector Classifiers Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write

More information

Active and Semi-supervised Kernel Classification

Active and Semi-supervised Kernel Classification Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),

More information

Introduction to SVM and RVM

Introduction to SVM and RVM Introduction to SVM and RVM Machine Learning Seminar HUS HVL UIB Yushu Li, UIB Overview Support vector machine SVM First introduced by Vapnik, et al. 1992 Several literature and wide applications Relevance

More information

Bayesian Nonparametrics

Bayesian Nonparametrics Bayesian Nonparametrics Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent

More information

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures

More information

Basis Expansion and Nonlinear SVM. Kai Yu

Basis Expansion and Nonlinear SVM. Kai Yu Basis Expansion and Nonlinear SVM Kai Yu Linear Classifiers f(x) =w > x + b z(x) = sign(f(x)) Help to learn more general cases, e.g., nonlinear models 8/7/12 2 Nonlinear Classifiers via Basis Expansion

More information

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts ICML 2015 Scalable Nonparametric Bayesian Inference on Point Processes with Gaussian Processes Machine Learning Research Group and Oxford-Man Institute University of Oxford July 8, 2015 Point Processes

More information

Partial factor modeling: predictor-dependent shrinkage for linear regression

Partial factor modeling: predictor-dependent shrinkage for linear regression modeling: predictor-dependent shrinkage for linear Richard Hahn, Carlos Carvalho and Sayan Mukherjee JASA 2013 Review by Esther Salazar Duke University December, 2013 Factor framework The factor framework

More information

Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis

Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Stéphanie Allassonnière CIS, JHU July, 15th 28 Context : Computational Anatomy Context and motivations :

More information

Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation

Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation Dahua Lin Toyota Technological Institute at Chicago dhlin@ttic.edu Abstract Reliance on computationally expensive

More information

The supervised hierarchical Dirichlet process

The supervised hierarchical Dirichlet process 1 The supervised hierarchical Dirichlet process Andrew M. Dai and Amos J. Storkey Abstract We propose the supervised hierarchical Dirichlet process (shdp), a nonparametric generative model for the joint

More information

Sparse Proteomics Analysis (SPA)

Sparse Proteomics Analysis (SPA) Sparse Proteomics Analysis (SPA) Toward a Mathematical Theory for Feature Selection from Forward Models Martin Genzel Technische Universität Berlin Winter School on Compressed Sensing December 5, 2015

More information

Time-Sensitive Dirichlet Process Mixture Models

Time-Sensitive Dirichlet Process Mixture Models Time-Sensitive Dirichlet Process Mixture Models Xiaojin Zhu Zoubin Ghahramani John Lafferty May 25 CMU-CALD-5-4 School of Computer Science Carnegie Mellon University Pittsburgh, PA 523 Abstract We introduce

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms

VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms 03/Feb/2010 VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms Presented by Andriy Temko Department of Electrical and Electronic Engineering Page 2 of

More information

Scaling Neighbourhood Methods

Scaling Neighbourhood Methods Quick Recap Scaling Neighbourhood Methods Collaborative Filtering m = #items n = #users Complexity : m * m * n Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion)

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

Deep Generative Models

Deep Generative Models Deep Generative Models Durk Kingma Max Welling Deep Probabilistic Models Worksop Wednesday, 1st of Oct, 2014 D.P. Kingma Deep generative models Transformations between Bayes nets and Neural nets Transformation

More information

Parsimonious Adaptive Rejection Sampling

Parsimonious Adaptive Rejection Sampling Parsimonious Adaptive Rejection Sampling Luca Martino Image Processing Laboratory, Universitat de València (Spain). Abstract Monte Carlo (MC) methods have become very popular in signal processing during

More information

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

More information

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 24

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 24 Big Data Analytics Special Topics for Computer Science CSE 4095-001 CSE 5095-005 Feb 24 Fei Wang Associate Professor Department of Computer Science and Engineering fei_wang@uconn.edu Prediction III Goal

More information

Bayesian Sparse Linear Regression with Unknown Symmetric Error

Bayesian Sparse Linear Regression with Unknown Symmetric Error Bayesian Sparse Linear Regression with Unknown Symmetric Error Minwoo Chae 1 Joint work with Lizhen Lin 2 David B. Dunson 3 1 Department of Mathematics, The University of Texas at Austin 2 Department of

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 18, 2016 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass

More information

Supervised Learning Coursework

Supervised Learning Coursework Supervised Learning Coursework John Shawe-Taylor Tom Diethe Dorota Glowacka November 30, 2009; submission date: noon December 18, 2009 Abstract Using a series of synthetic examples, in this exercise session

More information