Small-variance Asymptotics for Dirichlet Process Mixtures of SVMs
|
|
- Ruth Taylor
- 6 years ago
- Views:
Transcription
1 Small-variance Asymptotics for Dirichlet Process Mixtures of SVMs Yining Wang Jun Zhu Tsinghua University July, 2014 Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25
2 Outline 1 Infinite SVM (Zhu et al., 2011) The Infinite SVM (isvm) Model Gibbs-iSVM 2 The Max-Margin DP-means (M 2 DPM) Algorithm Small-variance Asymptotic (SVA) Analysis The M 2 DPM Algorithm 3 Experiments 4 Summary Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25
3 Outline 1 Infinite SVM (Zhu et al., 2011) The Infinite SVM (isvm) Model Gibbs-iSVM 2 The Max-Margin DP-means (M 2 DPM) Algorithm Small-variance Asymptotic (SVA) Analysis The M 2 DPM Algorithm 3 Experiments 4 Summary Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25
4 Infinite SVM (Zhu et al., ICML 2011) Solving clustering and classification simultaneously Motivation: Improve classification by finding underlying clusters. Improve clustering by incorporating supervised information Unknown # of clusters a nonparametric treatment. Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25
5 Infinite SVM CRP(α) z x µ ρ 2 K y N η K ν 2 Nonparametric prior: p 0 (z i = k α, z i ) Max-margin classification model: { n i,k, if n i,k > 0 α, otherwise φ(y i x i, η, z i = k) = exp( 2c max(0, 1 y i η k x i)) Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25
6 Outline 1 Infinite SVM (Zhu et al., 2011) The Infinite SVM (isvm) Model Gibbs-iSVM 2 The Max-Margin DP-means (M 2 DPM) Algorithm Small-variance Asymptotic (SVA) Analysis The M 2 DPM Algorithm 3 Experiments 4 Summary Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25
7 Gibbs sampler for isvm Iteratively sample model parameters: 1 Mean parameter µ k : Gaussian distribution. 2 Linear classifier η k : data augmentation. 3 Cluster assignments z i : categorical distribution. Problems: slow! q(z i = z new ) α N (x i ; µ, σ 2 I)dp 0 (µ) φ(y i x i, η)dp 0 (η) } {{ } difficult to evaluate Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25
8 Outline 1 Infinite SVM (Zhu et al., 2011) The Infinite SVM (isvm) Model Gibbs-iSVM 2 The Max-Margin DP-means (M 2 DPM) Algorithm Small-variance Asymptotic (SVA) Analysis The M 2 DPM Algorithm 3 Experiments 4 Summary Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25
9 Small-variance Asymptotic (SVA) Analysis 1 Assumptions: model variance 0. 2 Consequences: Connections between Bayesian posterior and deterministic loss. Novel inference algorithms. 3 Examples: Probabilistic PCA vs. PCA. (Tipping et al., 1999) DP-means. (Kulis et al., 2012) Efficient feature learning. (Broderick et al., 2013) Asymp-iHMM. (Roychowdhury et al., 2013) Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25
10 Small-variance Asymptotic (SVA) Analysis 1 Assumptions: model variance 0. 2 Consequences: Connections between Bayesian posterior and deterministic loss. Novel inference algorithms. 3 Examples: Probabilistic PCA vs. PCA. (Tipping et al., 1999) DP-means. (Kulis et al., 2012) Efficient feature learning. (Broderick et al., 2013) Asymp-iHMM. (Roychowdhury et al., 2013) Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25
11 Small-variance Asymptotic (SVA) Analysis 1 Assumptions: model variance 0. 2 Consequences: Connections between Bayesian posterior and deterministic loss. Novel inference algorithms. 3 Examples: Probabilistic PCA vs. PCA. (Tipping et al., 1999) DP-means. (Kulis et al., 2012) Efficient feature learning. (Broderick et al., 2013) Asymp-iHMM. (Roychowdhury et al., 2013) Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25
12 SVA on Gibbs-iSVM SVA Assumptions: σ 2, ν 2 0, c, α. Existing clusters: q(z i = k) n i,k exp ( x i µ k 2 ) σ 2 2c max(0, 1 y i ηk x i) Q i (k) = s x i µ k 2 + 2c(1 y }{{} i ηk x i) + ; }{{} Clustering Classification Creating new clusters: q(z i = z new ) α N (x i ; µ, σ 2 I)dp 0 (µ) φ(y i x i, η)dp 0 (η) Q i (new) = }{{} λ + 2c(1 y i η x i ) + + η 2 /ν 2. }{{}}{{} AIC Classification Regularization Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25
13 SVA on Gibbs-iSVM SVA Assumptions: σ 2, ν 2 0, c, α. Existing clusters: q(z i = k) n i,k exp ( x i µ k 2 ) σ 2 2c max(0, 1 y i ηk x i) Q i (k) = s x i µ k 2 + 2c(1 y }{{} i ηk x i) + ; }{{} Clustering Classification Creating new clusters: q(z i = z new ) α N (x i ; µ, σ 2 I)dp 0 (µ) φ(y i x i, η)dp 0 (η) Q i (new) = }{{} λ + 2c(1 y i η x i ) + + η 2 /ν 2. }{{}}{{} AIC Classification Regularization Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25
14 Outline 1 Infinite SVM (Zhu et al., 2011) The Infinite SVM (isvm) Model Gibbs-iSVM 2 The Max-Margin DP-means (M 2 DPM) Algorithm Small-variance Asymptotic (SVA) Analysis The M 2 DPM Algorithm 3 Experiments 4 Summary Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25
15 The M 2 DPM Algorithm 1 Repeat until convergence: For each instance i: z i argmin k Q i (k). For each cluster k: µ k x k. Update {η k } K k=1 using data augmentation. 2 Deterministic loss function (via SVA on posterior): L(z, µ, η) K η k 2 = 2ν 2 k=1 }{{} Regularization + 2c n (ζ z i i=1 i ) + } {{ } Classification + s n x i µ zi 2 + λ }{{ K}. i=1 }{{} AIC Clustering 3 Extension to exponential family distributions and multi-class classification. Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25
16 The M 2 DPM Algorithm 1 Repeat until convergence: For each instance i: z i argmin k Q i (k). For each cluster k: µ k x k. Update {η k } K k=1 using data augmentation. 2 Deterministic loss function (via SVA on posterior): L(z, µ, η) K η k 2 = 2ν 2 k=1 }{{} Regularization + 2c n (ζ z i i=1 i ) + } {{ } Classification + s n x i µ zi 2 + λ }{{ K}. i=1 }{{} AIC Clustering 3 Extension to exponential family distributions and multi-class classification. Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25
17 The M 2 DPM Algorithm 1 Repeat until convergence: For each instance i: z i argmin k Q i (k). For each cluster k: µ k x k. Update {η k } K k=1 using data augmentation. 2 Deterministic loss function (via SVA on posterior): L(z, µ, η) K η k 2 = 2ν 2 k=1 }{{} Regularization + 2c n (ζ z i i=1 i ) + } {{ } Classification + s n x i µ zi 2 + λ }{{ K}. i=1 }{{} AIC Clustering 3 Extension to exponential family distributions and multi-class classification. Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25
18 Outline 1 Infinite SVM (Zhu et al., 2011) The Infinite SVM (isvm) Model Gibbs-iSVM 2 The Max-Margin DP-means (M 2 DPM) Algorithm Small-variance Asymptotic (SVA) Analysis The M 2 DPM Algorithm 3 Experiments 4 Summary Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25
19 Datasets and Algorithms 1 Protein fold classification 27 classes, 696 instances, 21 features 2 Pakinson s disease detection 2 classes, 195 instances, 22 features 3 Algorithms Classification models: MNL, Linear-SVM, RBF-SVM Hybrid models: dpmnl (Shahbaba et al., 2009), DP+SVM, Gibbs-iSVM, M 2 DPM. Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25
20 Classification performance Protein Parkinson Algo F1 (%) Times (s) Acc. (%) Times (s) MNL L-SVM RBF-SVM dpmnl DP+SVM Gibbs-iSVM M 2 DPM Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25
21 Clustering performance 1 Nonparametric clustering on synthetic datasets: K 0 vs. K. n K K Clustering on potential Parkinson s disease patients. Group I II III IV V Avg. age Avg. stage (0-4) Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25
22 Clustering performance 1 Nonparametric clustering on synthetic datasets: K 0 vs. K. n K K Clustering on potential Parkinson s disease patients. Group I II III IV V Avg. age Avg. stage (0-4) Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25
23 Summary Infinite SVM: clustering + classification Small variance analysis M 2 DPM: fast and accurate. Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25
24 Summary Infinite SVM: clustering + classification Small variance analysis M 2 DPM: fast and accurate. Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25
25 Summary Infinite SVM: clustering + classification Small variance analysis M 2 DPM: fast and accurate. Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25
26 Questions Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25
27 The data augmentation trick Data augmentation for SVM classifiers (Polson and Scott, 2011) φ(y i z i = k, η) = exp( 2c max(0, 1 y i ηk x i)) ( 1 = exp (ω i + cζi k ) 2 2πωi 2ω i = 0 0 φ(y i, ω i z i = k, η)dω i ) dω i Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25
28 Extension to multi-class classification Multi-class hinge loss: (Crammer and Singer, 2001) φ m (y i z i = k, η) = exp( 2c max(δ y,yi + η y k,y x i ηk,y i x i )) Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25
29 The M 2 DPM Algorithm 1 Repeat until convergence: For each instance i: z i argmin k Q i (k). For each cluster k: µ k x k. Update {η k } K k=1 using data augmentation Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25
30 The M 2 DPM Algorithm 1 Repeat until convergence: For each instance i: z i argmin k Q i (k). For each cluster k: µ k x k. Update {η k } K k=1 using data augmentation Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25
31 The M 2 DPM Algorithm 1 Repeat until convergence: For each instance i: z i argmin k Q i (k). For each cluster k: µ k x k. Update {η k } K k=1 using data augmentation Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25
32 The M 2 DPM Algorithm 1 Repeat until convergence: For each instance i: z i argmin k Q i (k). For each cluster k: µ k x k. Update {η k } K k=1 using data augmentation Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25
33 The M 2 DPM Algorithm 1 Repeat until convergence: For each instance i: z i argmin k Q i (k). For each cluster k: µ k x k. Update {η k } K k=1 using data augmentation Y. Wang and J. Zhu (Tsinghua University) Max-Margin DP-means July, / 25
Fast Approximate MAP Inference for Bayesian Nonparametrics
Fast Approximate MAP Inference for Bayesian Nonparametrics Y. Raykov A. Boukouvalas M.A. Little Department of Mathematics Aston University 10th Conference on Bayesian Nonparametrics, 2015 1 Iterated Conditional
More informationMAD-Bayes: MAP-based Asymptotic Derivations from Bayes
MAD-Bayes: MAP-based Asymptotic Derivations from Bayes Tamara Broderick Brian Kulis Michael I. Jordan Cat Clusters Mouse clusters Dog 1 Cat Clusters Dog Mouse Lizard Sheep Picture 1 Picture 2 Picture 3
More informationOnline Bayesian Passive-Agressive Learning
Online Bayesian Passive-Agressive Learning International Conference on Machine Learning, 2014 Tianlin Shi Jun Zhu Tsinghua University, China 21 August 2015 Presented by: Kyle Ulrich Introduction Online
More informationOnline Bayesian Passive-Aggressive Learning
Online Bayesian Passive-Aggressive Learning Full Journal Version: http://qr.net/b1rd Tianlin Shi Jun Zhu ICML 2014 T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 1 / 35 Outline Introduction Motivation Framework
More informationSpatial Normalized Gamma Process
Spatial Normalized Gamma Process Vinayak Rao Yee Whye Teh Presented at NIPS 2009 Discussion and Slides by Eric Wang June 23, 2010 Outline Introduction Motivation The Gamma Process Spatial Normalized Gamma
More informationConstruction of Dependent Dirichlet Processes based on Poisson Processes
1 / 31 Construction of Dependent Dirichlet Processes based on Poisson Processes Dahua Lin Eric Grimson John Fisher CSAIL MIT NIPS 2010 Outstanding Student Paper Award Presented by Shouyuan Chen Outline
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationSimple approximate MAP inference for Dirichlet processes mixtures
Vol. 0 (2015) 1 8 Simple approximate MAP inference for Dirichlet processes mixtures Yordan P. Raykov Aston University e-mail: yordan.raykov@gmail.com Alexis Boukouvalas University of Manchester e-mail:
More informationDP-space: Bayesian Nonparametric Subspace Clustering with Small-variance Asymptotics
with Small-variance Asymptotics Yining Wang YININGWA@CS.CMU.EDU Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA Jun Zhu DCSZJ@TSINGHUA.EDU.CN Dept. of Comp. Sci. & Tech.,
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory
More informationCMPS 242: Project Report
CMPS 242: Project Report RadhaKrishna Vuppala Univ. of California, Santa Cruz vrk@soe.ucsc.edu Abstract The classification procedures impose certain models on the data and when the assumption match the
More informationBayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems
Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems Scott W. Linderman Matthew J. Johnson Andrew C. Miller Columbia University Harvard and Google Brain Harvard University Ryan
More informationInfinite SVM: a Dirichlet Process Mixture of Large-margin Kernel Machines
Infinite SVM: a Dirichlet Process Mixture of Large-margin Kernel Machines Jun Zhu junzhu@cs.cmu.edu Ning Chen chenn07@mails.thu.edu.cn Eric P. Xing epxing@cs.cmu.edu School of Computer Science, Carnegie
More informationOverview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated
Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian
More informationProbabilistic Time Series Classification
Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign
More informationOnline Bayesian Passive-Aggressive Learning"
Online Bayesian Passive-Aggressive Learning" Tianlin Shi! stl501@gmail.com! Jun Zhu! dcszj@mail.tsinghua.edu.cn! The BIG DATA challenge" Large amounts of data.! Big data:!! Big Science: 25 PB annual data.!
More informationKernel methods, kernel SVM and ridge regression
Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;
More informationLecture 16-17: Bayesian Nonparametrics I. STAT 6474 Instructor: Hongxiao Zhu
Lecture 16-17: Bayesian Nonparametrics I STAT 6474 Instructor: Hongxiao Zhu Plan for today Why Bayesian Nonparametrics? Dirichlet Distribution and Dirichlet Processes. 2 Parameter and Patterns Reference:
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationGaussian Processes (10/16/13)
STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs
More informationImage segmentation combining Markov Random Fields and Dirichlet Processes
Image segmentation combining Markov Random Fields and Dirichlet Processes Jessica SODJO IMS, Groupe Signal Image, Talence Encadrants : A. Giremus, J.-F. Giovannelli, F. Caron, N. Dobigeon Jessica SODJO
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationBayesian non-parametric model to longitudinally predict churn
Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics
More informationLightweight Probabilistic Deep Networks Supplemental Material
Lightweight Probabilistic Deep Networks Supplemental Material Jochen Gast Stefan Roth Department of Computer Science, TU Darmstadt In this supplemental material we derive the recipe to create uncertainty
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationPart 1: Expectation Propagation
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud
More informationMotivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University
Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined
More informationNonparametric Bayes tensor factorizations for big data
Nonparametric Bayes tensor factorizations for big data David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & DARPA N66001-09-C-2082 Motivation Conditional
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Yuriy Sverchkov Intelligent Systems Program University of Pittsburgh October 6, 2011 Outline Latent Semantic Analysis (LSA) A quick review Probabilistic LSA (plsa)
More informationIntegrated Non-Factorized Variational Inference
Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationSTAT Advanced Bayesian Inference
1 / 32 STAT 625 - Advanced Bayesian Inference Meng Li Department of Statistics Jan 23, 218 The Dirichlet distribution 2 / 32 θ Dirichlet(a 1,...,a k ) with density p(θ 1,θ 2,...,θ k ) = k j=1 Γ(a j) Γ(
More informationProbabilistic Machine Learning. Industrial AI Lab.
Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear
More informationNonparametric Bayes Density Estimation and Regression with High Dimensional Data
Nonparametric Bayes Density Estimation and Regression with High Dimensional Data Abhishek Bhattacharya, Garritt Page Department of Statistics, Duke University Joint work with Prof. D.Dunson September 2010
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationPart IV: Monte Carlo and nonparametric Bayes
Part IV: Monte Carlo and nonparametric Bayes Outline Monte Carlo methods Nonparametric Bayesian models Outline Monte Carlo methods Nonparametric Bayesian models The Monte Carlo principle The expectation
More informationSTATS 306B: Unsupervised Learning Spring Lecture 2 April 2
STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised
More informationGaussian Mixture Model
Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,
More informationBayesian Data Fusion with Gaussian Process Priors : An Application to Protein Fold Recognition
Bayesian Data Fusion with Gaussian Process Priors : An Application to Protein Fold Recognition Mar Girolami 1 Department of Computing Science University of Glasgow girolami@dcs.gla.ac.u 1 Introduction
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s
More informationRelevance Vector Machines
LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian
More informationCSC411 Fall 2018 Homework 5
Homework 5 Deadline: Wednesday, Nov. 4, at :59pm. Submission: You need to submit two files:. Your solutions to Questions and 2 as a PDF file, hw5_writeup.pdf, through MarkUs. (If you submit answers to
More information27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling
10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel
More informationCollapsed Variational Dirichlet Process Mixture Models
Collapsed Variational Dirichlet Process Mixture Models Kenichi Kurihara Dept. of Computer Science Tokyo Institute of Technology, Japan kurihara@mi.cs.titech.ac.jp Max Welling Dept. of Computer Science
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Max-margin learning of GM Eric Xing Lecture 28, Apr 28, 2014 b r a c e Reading: 1 Classical Predictive Models Input and output space: Predictive
More informationLearning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014
Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of
More informationLearning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute
More informationVariational Inference via Stochastic Backpropagation
Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationSTA 216, GLM, Lecture 16. October 29, 2007
STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural
More informationVariational Autoencoders
Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly
More informationTwo Useful Bounds for Variational Inference
Two Useful Bounds for Variational Inference John Paisley Department of Computer Science Princeton University, Princeton, NJ jpaisley@princeton.edu Abstract We review and derive two lower bounds on the
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationExpectation Propagation for Approximate Bayesian Inference
Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given
More informationA Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles
A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles Jeremy Gaskins Department of Bioinformatics & Biostatistics University of Louisville Joint work with Claudio Fuentes
More informationKernel adaptive Sequential Monte Carlo
Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline
More informationBayesian model selection in graphs by using BDgraph package
Bayesian model selection in graphs by using BDgraph package A. Mohammadi and E. Wit March 26, 2013 MOTIVATION Flow cytometry data with 11 proteins from Sachs et al. (2005) RESULT FOR CELL SIGNALING DATA
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationClustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26
Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar
More informationBayes methods for categorical data. April 25, 2017
Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,
More information13 : Variational Inference: Loopy Belief Propagation and Mean Field
10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction
More informationA Process over all Stationary Covariance Kernels
A Process over all Stationary Covariance Kernels Andrew Gordon Wilson June 9, 0 Abstract I define a process over all stationary covariance kernels. I show how one might be able to perform inference that
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 27, 2015 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass
More informationClassifier Complexity and Support Vector Classifiers
Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write
More informationActive and Semi-supervised Kernel Classification
Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),
More informationIntroduction to SVM and RVM
Introduction to SVM and RVM Machine Learning Seminar HUS HVL UIB Yushu Li, UIB Overview Support vector machine SVM First introduced by Vapnik, et al. 1992 Several literature and wide applications Relevance
More informationBayesian Nonparametrics
Bayesian Nonparametrics Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent
More informationDEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE
Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures
More informationBasis Expansion and Nonlinear SVM. Kai Yu
Basis Expansion and Nonlinear SVM Kai Yu Linear Classifiers f(x) =w > x + b z(x) = sign(f(x)) Help to learn more general cases, e.g., nonlinear models 8/7/12 2 Nonlinear Classifiers via Basis Expansion
More informationICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts
ICML 2015 Scalable Nonparametric Bayesian Inference on Point Processes with Gaussian Processes Machine Learning Research Group and Oxford-Man Institute University of Oxford July 8, 2015 Point Processes
More informationPartial factor modeling: predictor-dependent shrinkage for linear regression
modeling: predictor-dependent shrinkage for linear Richard Hahn, Carlos Carvalho and Sayan Mukherjee JASA 2013 Review by Esther Salazar Duke University December, 2013 Factor framework The factor framework
More informationGenerative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis
Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Stéphanie Allassonnière CIS, JHU July, 15th 28 Context : Computational Anatomy Context and motivations :
More informationOnline Learning of Nonparametric Mixture Models via Sequential Variational Approximation
Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation Dahua Lin Toyota Technological Institute at Chicago dhlin@ttic.edu Abstract Reliance on computationally expensive
More informationThe supervised hierarchical Dirichlet process
1 The supervised hierarchical Dirichlet process Andrew M. Dai and Amos J. Storkey Abstract We propose the supervised hierarchical Dirichlet process (shdp), a nonparametric generative model for the joint
More informationSparse Proteomics Analysis (SPA)
Sparse Proteomics Analysis (SPA) Toward a Mathematical Theory for Feature Selection from Forward Models Martin Genzel Technische Universität Berlin Winter School on Compressed Sensing December 5, 2015
More informationTime-Sensitive Dirichlet Process Mixture Models
Time-Sensitive Dirichlet Process Mixture Models Xiaojin Zhu Zoubin Ghahramani John Lafferty May 25 CMU-CALD-5-4 School of Computer Science Carnegie Mellon University Pittsburgh, PA 523 Abstract We introduce
More informationNonparameteric Regression:
Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationVC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms
03/Feb/2010 VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms Presented by Andriy Temko Department of Electrical and Electronic Engineering Page 2 of
More informationScaling Neighbourhood Methods
Quick Recap Scaling Neighbourhood Methods Collaborative Filtering m = #items n = #users Complexity : m * m * n Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion)
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationOr How to select variables Using Bayesian LASSO
Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection
More informationDeep Generative Models
Deep Generative Models Durk Kingma Max Welling Deep Probabilistic Models Worksop Wednesday, 1st of Oct, 2014 D.P. Kingma Deep generative models Transformations between Bayes nets and Neural nets Transformation
More informationParsimonious Adaptive Rejection Sampling
Parsimonious Adaptive Rejection Sampling Luca Martino Image Processing Laboratory, Universitat de València (Spain). Abstract Monte Carlo (MC) methods have become very popular in signal processing during
More informationBayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang
Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January
More informationBig Data Analytics. Special Topics for Computer Science CSE CSE Feb 24
Big Data Analytics Special Topics for Computer Science CSE 4095-001 CSE 5095-005 Feb 24 Fei Wang Associate Professor Department of Computer Science and Engineering fei_wang@uconn.edu Prediction III Goal
More informationBayesian Sparse Linear Regression with Unknown Symmetric Error
Bayesian Sparse Linear Regression with Unknown Symmetric Error Minwoo Chae 1 Joint work with Lizhen Lin 2 David B. Dunson 3 1 Department of Mathematics, The University of Texas at Austin 2 Department of
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 18, 2016 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass
More informationSupervised Learning Coursework
Supervised Learning Coursework John Shawe-Taylor Tom Diethe Dorota Glowacka November 30, 2009; submission date: noon December 18, 2009 Abstract Using a series of synthetic examples, in this exercise session
More information