Inferring Sparsity: Compressed Sensing Using Generalized Restricted Boltzmann Machines. Eric W. Tramel. itwist 2016 Aalborg, DK 24 August 2016
|
|
- Phyllis Allen
- 6 years ago
- Views:
Transcription
1 Inferring Sparsity: Compressed Sensing Using Generalized Restricted Boltzmann Machines Eric W. Tramel itwist 2016 Aalborg, DK 24 August 2016 Andre MANOEL, Francesco CALTAGIRONE, Marylou GABRIE, Florent KRZAKALA
2 Inverse Problems General Linear Problem: y = g (F x) (M N) (N 1) (M 1) F x g y Projection Matrix Signal? Channel Measurements Compressed Sensing, Regression, Deconvolution/Debluring, Localization, Super-Resolution, Medical Image Reconstruction (CT/MRI), In-Painting, Denoising, Inference, etc.
3 Example Compressed Sensing y = F x + w w µ N (0, ) How do we obtain x from y knowing. OLS is under-determined, in general we can t! g is AWGN, x is K-Sparse, F is iid random, and M << N? With Sparsity, we can! Null(F ) x y F x + s x y F x + s (EC & TT, 2005) (EC, JR, & TT, 2006) (EC & MW, 2008)
4 Sparsity & Recovery For M>=K, we can recover with OLS, up to noise, if we are given the support locations by an oracle. = = However: Without an oracle, finding S brute-force is a combinatorial problem! arg min S2S y F S x S 2 2
5 Optimization Approaches y = F x + w w µ N (0, ) Greedy Approach ˆx = arg min x x 0 s.t. y F x 2 2 apple ˆx =argmin x y F x 2 2 s.t. x 0 apple K Greedily searching for support, solving OLS support. Convex Approach ˆx =argmin x 1 s.t. y F x 2 2 apple x ˆx = arg min y F x x 1 x Relax L0 penalty to convex L1 penalty ( pointiest convex Lp)
6 Phase Diagram for CS StOMP L1 NP-Hard? Un-Recoverable
7 Bayesian Approaches y = F x + w w µ N (0, ) Maximum a posteriori (MAP) ˆx = arg max x P (x y,f) Find signal to maximize probability. Can use unnormalized posterior minimize negative log prob. For some settings maps to convex optimization. Minimum Mean Square Error (MMSE) ˆx = E[x] = Z dx x P (x y,f) Average over posterior distribution.
8 Defining the Posterior Bayes' Rule P (x y,f)= 1 Z P (y x,f)p 0(x) Likelihood defined by stochastic description of g. Posterior Factorized Prior, AWGN Channel P (x y,f)= 1 Z Y µ 1 p 2 exp 8 < : 1 2 y µ X i! 9 2 = Y F µi x i ; i P 0 (x i ) For exact posterior, we must calculate an intractable Z! Inference: We can approximate it with Belief Propagation.
9 Graphical Model of Posterior x 1 P 0 (x i ) x 2 x 3 y 1,F 1 y 2,F 2 y M,F M P (y F x) x N Factors Prior Variables Coefficients Factors Measurements Loopy Belief Propagation The presence of many loops makes exact inference impossible, but approximate inference may be tractable and accurate [Weiss 2000].
10 Inference via BP Messages Messages Variable Factor Variable Factor P 0 x 1 PDFs m i!µ (x i ) P 0 x 1 PDFs m µ!i (x i ) y 1,F 1 y 1,F 1 x 2 x 2 y 2,F 2 y 2,F 2 x 3 x 3 y M,F M y M,F M x N x N Factors Prior Variables Coefficients Factors Measurements Factors Prior Variables Coefficients Factors Measurements Goal: Produce P (x i y,f) / P 0 (x i ) Y µ m µ!i (x i )
11 r-bp to AMP via TAP A Corrected Mean-Field (Donoho, Maleki, Montanari 2009) If F is dense and if its entries are uncorrelated, then message means and variances are nearly independent of any single edge message in the limit N. x 1 x 2 Onsager Correction y 1,F 1 x 3 y µ,f µ P 0 (x i ) x i y 2,F 2 y M,F M x N Big Savings: Compute Burden O( N 2 )! O((1 + )N)
12 r-bp/amp with GB Prior StOMP L1 BP/AMP NP-Hard? Un-Recoverable
13 r-bp/tap Convergence Small iteration count far from transition provides efficient estimation. (Krzakala et al., 2012) However, the approach encounters critical slowing at or near the transition, as shown via state evolution analysis.
14 Structured Signal Priors But what if we have more information about our signal? Correlations? Exact Inference Approximate Inference x 1 x 2 x 2 x 4 x 1 x 2 x 1 x 2 x 1 x 3 x 4 x 3 x 4 x 3 x 3 Pairwise Interactions High Order Int. For completely general visible models, see (Rangan et al, Hybrid-GAMP, 2012).
15 Regression & Latent Variables x 1 y 1,F 1 x 1 y 1,F 1 y 2,F 2 y 2,F 2 x 2 x 2 W 11 x 3 x 3 W 21 W 31 h 1 h 1 W 12 W 22 h 2 h 2 W 32 Binary Restricted Boltzmann Machine (RBM) P 0 (x, h) = 1 Z ep il x iw il h l + P i b ix i + P l c lh l Latent Model: Model data via a nonlinear composition of features. Unsupervised: Extract features from unlabelled training data. Tuning: Scale memorization/generalization via number of latent variables. Training: Sampling (Contrastive Divergence [Hinton 2002]) Mean-field (NMF [Welling, Hinton 2002], EMF [Gabrié, Tramel, Krzakala 2015])
16 AMP with RBM Support Prior E.W.T, Drémeau, Krzakala, AMP with Boltzmann machine priors, JSTAT Gabrié, E.W.T, Krzakala, Training RBMs with the TAP FE, NIPS For structured sparse signals, we can train a binary RBM to model the signal support, subsequently use it in AMP. Demonstrates: The RBM and AMP interact only via local biases inference on each essentially agnostic to the other.
17 AMP with General RBM Prior A General RBM P 0 (x, h ) = 1 Z ext Wh Y i P 0 (x i x ) Y l P 0 (h l h ) Using this generalized Boltzmann prior, we can model signals directly, without invoking sparsity.
18 AMP with General RBM Prior A General RBM P 0 (x, h ) = 1 Z ext Wh Y i P 0 (x i x ) Y l P 0 (h l h ) Exact knowledge of P(x) is intractable and sampling impractical for AMP. TAP Approximation of GRBM ln Z F(a, c ;,W), X X ln Z i (Bi,A i ; i ) Bi a i i i + X W ij a i a j + 1 X W 2 ijc 2 i c j (i,j) (i,j) X i A i ((a i ) 2 + c i ) Inference: Moments at each visible variable can be approximated by fixed-point iteration.
19 AMP with General RBM Prior What does this algorithm look like?
20 AMP with General RBM Prior 1. Update information from observation factors given current state.
21 AMP with General RBM Prior 2. Calculate local fields from AMP to use as a bias during GRBM inference.
22 AMP with General RBM Prior 3. Run GRBM inference in order to obtain marginal estimate of moments at each signal coefficient.
23 AMP with General RBM Prior 4. Apply light damping in order to avoid thrashing between GRBM modes and repeat. Extra Cost: Proportional to the number of interior steps you take, though can be set < 10.
24 Experimental Framework Offline Training 60k real-valued MNIST training samples 784 (28x28) Trunc. Gauss-Bernoulli Visible 500 Binary Hidden 100 sample mini-batches 150 Epochs (90k parameter updates) Reconstruction 1k real-valued MNIST test samples (h.o.) Noise Variance: 10-8 IID Random Projection Matrix F Methods non-.i.i.d. Use empirical support probability with GB prior AMP. BRBM Binary RBM to model support location. GRBM Generalized RBM to model entire signal. =0.125 =0.100 =0.075 =0.050 =0.025 non-i.i.d. BRBM GRBM (Tramel et al., 2016) Measure Reconstruction quality as a function of number of measurements.
25 AMP with General RBM Prior 0.35 > > > 10 MSE = M/N < < < Correlation = M/N = K/N < = K/N < = K/N < 0.8 non-i.i.d. AMP Support RBM-AMP (TDK, 2016) General RBM-AMP (TMCGK, 2016)
26 Learning GRBMs Open Work 1. What can be gained with deep architectures? (a) Can we push the boundaries even further with DBMs? 2. Is there an upper limit? (a) What can we learn from density evolution in random case? 3. Further experiments on non-sparse data. (a) Can an RBM be trained to a sufficient level to allow for CS-like reconstruction of non-sparse structured signals? 4. Time Indexing. (a) Parallel time indexing for BP on pairwise graphical models not set in stone (b) Need to re-derive time indices via pairwise r-bp. 5. Comparison with r-bp learning. 6. The Influence of Hidden Unit Distribution (a) E.g. Gauss-Bernoulli hidden units to allow for modeling of visible covariance structure (ala SS-RBM).
27 Statistical PHysics of INformation extraction «ou» Statistical PHysics of INverse complex sysems Questions? Merci! Collaborators Andre MANOEL Francesco CALTAGIRONE Marylou GABRIE Florent KRZAKALA LPS-ENS INRIA Paris LPS-ENS LPS-ENS
28 Supplements
29 Learning GRBMs How do we train the necessary GRBM models? Iterating the same equations allows us to evaluate the TAP f.e. approximation of the GRBM. => Stochastic Gradient Ascent on dataset likelihood. Biggest Issue Inescapable negative variances! Signed definition required to make messages Gaussian, but now it bites us. What to do? Current sol n: Truncated Distributions, allow neg. vars!
30 Learning GRBMs For a Gaussian distributed unit The `ERF - ERF` in the denominators make for terrible numerical issues for sufficiently likely arguments. - We solved via high-order Taylor approximation. Not themes elegant solution
31 Relaxed BP (r-bp) Problems Analytically intractable messages. Messages are continuous objects (PDFs), not fit for a computable algorithm. Assumption All values of F scale as O(1/N). Remedy (Rangan, 2010), (Krzakala et al., 2012) Given the above, we can perform a small-weight expansion on the messages, allowing for all messages to be written as Normal Distributions via the CLT, and are parameterized by a i!µ, Z dx i x i m i!µ (x i ), v i!µ, (a i!µ ) 2 + Z dx i x 2 i m i!µ (x i )
32 AMP with RBM Support Prior (Tramel, Drémeau, Krzakala, 2016) MNIST Experiments Goal: CS reconstruction of 300 test set digits given training set of 60,000 samples.
33 AMP with RBM Support Prior a b c d a b c d (Tramel, Drémeau, Krzakala, 2016) decreasing measurements decreasing measurements (a) i.i.d. GB-AMP, (b) non-i.i.d. GB-AMP, (c) naive mean-field RBM-AMP (d) TAP mean-field RBM-AMP
On convergence of Approximate Message Passing
On convergence of Approximate Message Passing Francesco Caltagirone (1), Florent Krzakala (2) and Lenka Zdeborova (1) (1) Institut de Physique Théorique, CEA Saclay (2) LPS, Ecole Normale Supérieure, Paris
More informationRigorous Dynamics and Consistent Estimation in Arbitrarily Conditioned Linear Systems
1 Rigorous Dynamics and Consistent Estimation in Arbitrarily Conditioned Linear Systems Alyson K. Fletcher, Mojtaba Sahraee-Ardakan, Philip Schniter, and Sundeep Rangan Abstract arxiv:1706.06054v1 cs.it
More informationThe Origin of Deep Learning. Lili Mou Jan, 2015
The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets
More informationDeep Learning Srihari. Deep Belief Nets. Sargur N. Srihari
Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous
More informationSparse Superposition Codes for the Gaussian Channel
Sparse Superposition Codes for the Gaussian Channel Florent Krzakala (LPS, Ecole Normale Supérieure, France) J. Barbier (ENS) arxiv:1403.8024 presented at ISIT 14 Long version in preparation Communication
More informationarxiv: v2 [cs.ne] 22 Feb 2013
Sparse Penalty in Deep Belief Networks: Using the Mixed Norm Constraint arxiv:1301.3533v2 [cs.ne] 22 Feb 2013 Xanadu C. Halkias DYNI, LSIS, Universitè du Sud, Avenue de l Université - BP20132, 83957 LA
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationUNSUPERVISED LEARNING
UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training
More informationReading Group on Deep Learning Session 4 Unsupervised Neural Networks
Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Jakob Verbeek & Daan Wynen 206-09-22 Jakob Verbeek & Daan Wynen Unsupervised Neural Networks Outline Autoencoders Restricted) Boltzmann
More informationContrastive Divergence
Contrastive Divergence Training Products of Experts by Minimizing CD Hinton, 2002 Helmut Puhr Institute for Theoretical Computer Science TU Graz June 9, 2010 Contents 1 Theory 2 Argument 3 Contrastive
More informationApproximate Message Passing Algorithms
November 4, 2017 Outline AMP (Donoho et al., 2009, 2010a) Motivations Derivations from a message-passing perspective Limitations Extensions Generalized Approximate Message Passing (GAMP) (Rangan, 2011)
More informationLecture 16 Deep Neural Generative Models
Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed
More informationRestricted Boltzmann Machines
Restricted Boltzmann Machines Boltzmann Machine(BM) A Boltzmann machine extends a stochastic Hopfield network to include hidden units. It has binary (0 or 1) visible vector unit x and hidden (latent) vector
More informationLearning Deep Architectures for AI. Part II - Vijay Chakilam
Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model
More informationDeep unsupervised learning
Deep unsupervised learning Advanced data-mining Yongdai Kim Department of Statistics, Seoul National University, South Korea Unsupervised learning In machine learning, there are 3 kinds of learning paradigm.
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationApproximate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery
Approimate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery arxiv:1606.00901v1 [cs.it] Jun 016 Shuai Huang, Trac D. Tran Department of Electrical and Computer Engineering Johns
More informationDeep Boltzmann Machines
Deep Boltzmann Machines Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel University of Illinois Urbana Champaign agoel10@illinois.edu December 2, 2016 Ruslan Salakutdinov and Geoffrey E. Hinton Amish
More information13 : Variational Inference: Loopy Belief Propagation and Mean Field
10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction
More informationUnsupervised Learning
CS 3750 Advanced Machine Learning hkc6@pitt.edu Unsupervised Learning Data: Just data, no labels Goal: Learn some underlying hidden structure of the data P(, ) P( ) Principle Component Analysis (Dimensionality
More informationSpeaker Representation and Verification Part II. by Vasileios Vasilakakis
Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation
More informationCompressive Sensing under Matrix Uncertainties: An Approximate Message Passing Approach
Compressive Sensing under Matrix Uncertainties: An Approximate Message Passing Approach Asilomar 2011 Jason T. Parker (AFRL/RYAP) Philip Schniter (OSU) Volkan Cevher (EPFL) Problem Statement Traditional
More informationAuto-Encoding Variational Bayes
Auto-Encoding Variational Bayes Diederik P Kingma, Max Welling June 18, 2018 Diederik P Kingma, Max Welling Auto-Encoding Variational Bayes June 18, 2018 1 / 39 Outline 1 Introduction 2 Variational Lower
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationVariational Autoencoder
Variational Autoencoder Göker Erdo gan August 8, 2017 The variational autoencoder (VA) [1] is a nonlinear latent variable model with an efficient gradient-based training procedure based on variational
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More informationVector Approximate Message Passing. Phil Schniter
Vector Approximate Message Passing Phil Schniter Collaborators: Sundeep Rangan (NYU), Alyson Fletcher (UCLA) Supported in part by NSF grant CCF-1527162. itwist @ Aalborg University Aug 24, 2016 Standard
More informationCS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning
CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning Professor Erik Sudderth Brown University Computer Science October 4, 2016 Some figures and materials courtesy
More informationRisk and Noise Estimation in High Dimensional Statistics via State Evolution
Risk and Noise Estimation in High Dimensional Statistics via State Evolution Mohsen Bayati Stanford University Joint work with Jose Bento, Murat Erdogdu, Marc Lelarge, and Andrea Montanari Statistical
More informationGreedy Layer-Wise Training of Deep Networks
Greedy Layer-Wise Training of Deep Networks Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle NIPS 2007 Presented by Ahmed Hefny Story so far Deep neural nets are more expressive: Can learn
More informationAu-delà de la Machine de Boltzmann Restreinte. Hugo Larochelle University of Toronto
Au-delà de la Machine de Boltzmann Restreinte Hugo Larochelle University of Toronto Introduction Restricted Boltzmann Machines (RBMs) are useful feature extractors They are mostly used to initialize deep
More informationApproximate Message Passing for Bilinear Models
Approximate Message Passing for Bilinear Models Volkan Cevher Laboratory for Informa4on and Inference Systems LIONS / EPFL h,p://lions.epfl.ch & Idiap Research Ins=tute joint work with Mitra Fatemi @Idiap
More informationRepresentational Power of Restricted Boltzmann Machines and Deep Belief Networks. Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber
Representational Power of Restricted Boltzmann Machines and Deep Belief Networks Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber Introduction Representational abilities of functions with some
More informationProbabilistic Graphical Models & Applications
Probabilistic Graphical Models & Applications Learning of Graphical Models Bjoern Andres and Bernt Schiele Max Planck Institute for Informatics The slides of today s lecture are authored by and shown with
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationMessage passing and approximate message passing
Message passing and approximate message passing Arian Maleki Columbia University 1 / 47 What is the problem? Given pdf µ(x 1, x 2,..., x n ) we are interested in arg maxx1,x 2,...,x n µ(x 1, x 2,..., x
More informationLearning in Markov Random Fields An Empirical Study
Learning in Markov Random Fields An Empirical Study Sridevi Parise, Max Welling University of California, Irvine sparise,welling@ics.uci.edu Abstract Learning the parameters of an undirected graphical
More informationBayesian Learning in Undirected Graphical Models
Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More informationDensity estimation. Computing, and avoiding, partition functions. Iain Murray
Density estimation Computing, and avoiding, partition functions Roadmap: Motivation: density estimation Understanding annealing/tempering NADE Iain Murray School of Informatics, University of Edinburgh
More informationLarge-Scale Feature Learning with Spike-and-Slab Sparse Coding
Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab
More informationRestricted Boltzmann Machines for Collaborative Filtering
Restricted Boltzmann Machines for Collaborative Filtering Authors: Ruslan Salakhutdinov Andriy Mnih Geoffrey Hinton Benjamin Schwehn Presentation by: Ioan Stanculescu 1 Overview The Netflix prize problem
More informationBias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions
- Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions Simon Luo The University of Sydney Data61, CSIRO simon.luo@data61.csiro.au Mahito Sugiyama National Institute of
More informationDeep Learning Basics Lecture 8: Autoencoder & DBM. Princeton University COS 495 Instructor: Yingyu Liang
Deep Learning Basics Lecture 8: Autoencoder & DBM Princeton University COS 495 Instructor: Yingyu Liang Autoencoder Autoencoder Neural networks trained to attempt to copy its input to its output Contain
More informationKnowledge Extraction from DBNs for Images
Knowledge Extraction from DBNs for Images Son N. Tran and Artur d Avila Garcez Department of Computer Science City University London Contents 1 Introduction 2 Knowledge Extraction from DBNs 3 Experimental
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationStatistical Image Recovery: A Message-Passing Perspective. Phil Schniter
Statistical Image Recovery: A Message-Passing Perspective Phil Schniter Collaborators: Sundeep Rangan (NYU) and Alyson Fletcher (UC Santa Cruz) Supported in part by NSF grants CCF-1018368 and NSF grant
More informationDistributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College
Distributed Estimation, Information Loss and Exponential Families Qiang Liu Department of Computer Science Dartmouth College Statistical Learning / Estimation Learning generative models from data Topic
More informationLearning Deep Architectures
Learning Deep Architectures Yoshua Bengio, U. Montreal Microsoft Cambridge, U.K. July 7th, 2009, Montreal Thanks to: Aaron Courville, Pascal Vincent, Dumitru Erhan, Olivier Delalleau, Olivier Breuleux,
More informationAutoencoders and Score Matching. Based Models. Kevin Swersky Marc Aurelio Ranzato David Buchman Benjamin M. Marlin Nando de Freitas
On for Energy Based Models Kevin Swersky Marc Aurelio Ranzato David Buchman Benjamin M. Marlin Nando de Freitas Toronto Machine Learning Group Meeting, 2011 Motivation Models Learning Goal: Unsupervised
More informationAnnealing Between Distributions by Averaging Moments
Annealing Between Distributions by Averaging Moments Chris J. Maddison Dept. of Comp. Sci. University of Toronto Roger Grosse CSAIL MIT Ruslan Salakhutdinov University of Toronto Partition Functions We
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationDeep Learning Architecture for Univariate Time Series Forecasting
CS229,Technical Report, 2014 Deep Learning Architecture for Univariate Time Series Forecasting Dmitry Vengertsev 1 Abstract This paper studies the problem of applying machine learning with deep architecture
More informationUNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS
UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS JONATHAN YEDIDIA, WILLIAM FREEMAN, YAIR WEISS 2001 MERL TECH REPORT Kristin Branson and Ian Fasel June 11, 2003 1. Inference Inference problems
More informationProbabilistic and Bayesian Machine Learning
Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3
More informationScale Mixture Modeling of Priors for Sparse Signal Recovery
Scale Mixture Modeling of Priors for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri Outline Outline Sparse
More informationExpectation Propagation Algorithm
Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationStochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence
ESANN 0 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), 7-9 April 0, idoc.com publ., ISBN 97-7707-. Stochastic Gradient
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationChapter 16. Structured Probabilistic Models for Deep Learning
Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe
More informationDeep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)
CSCE 636 Neural Networks Instructor: Yoonsuck Choe Deep Learning What Is Deep Learning? Learning higher level abstractions/representations from data. Motivation: how the brain represents sensory information
More informationStein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm
Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu and Dilin Wang NIPS 2016 Discussion by Yunchen Pu March 17, 2017 March 17, 2017 1 / 8 Introduction Let x R d
More informationGaussian Cardinality Restricted Boltzmann Machines
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Gaussian Cardinality Restricted Boltzmann Machines Cheng Wan, Xiaoming Jin, Guiguang Ding and Dou Shen School of Software, Tsinghua
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II
1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationPILCO: A Model-Based and Data-Efficient Approach to Policy Search
PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol
More informationMotivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble
Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Zhilin Zhang and Ritwik Giri Motivation Sparse Signal Recovery is an interesting
More informationLatent Variable Models
Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4
ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori
More informationApproximate Message Passing
Approximate Message Passing Mohammad Emtiyaz Khan CS, UBC February 8, 2012 Abstract In this note, I summarize Sections 5.1 and 5.2 of Arian Maleki s PhD thesis. 1 Notation We denote scalars by small letters
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationHow to do backpropagation in a brain. Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto
1 How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto What is wrong with back-propagation? It requires labeled training data. (fixed) Almost
More informationLearning MN Parameters with Approximation. Sargur Srihari
Learning MN Parameters with Approximation Sargur srihari@cedar.buffalo.edu 1 Topics Iterative exact learning of MN parameters Difficulty with exact methods Approximate methods Approximate Inference Belief
More informationMeasuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information
Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information Mathias Berglund, Tapani Raiko, and KyungHyun Cho Department of Information and Computer Science Aalto University
More informationMismatched Estimation in Large Linear Systems
Mismatched Estimation in Large Linear Systems Yanting Ma, Dror Baron, Ahmad Beirami Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, NC 7695, USA Department
More informationBlack-box α-divergence Minimization
Black-box α-divergence Minimization José Miguel Hernández-Lobato, Yingzhen Li, Daniel Hernández-Lobato, Thang Bui, Richard Turner, Harvard University, University of Cambridge, Universidad Autónoma de Madrid.
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationAlgorithms other than SGD. CS6787 Lecture 10 Fall 2017
Algorithms other than SGD CS6787 Lecture 10 Fall 2017 Machine learning is not just SGD Once a model is trained, we need to use it to classify new examples This inference task is not computed with SGD There
More informationAd Placement Strategies
Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January
More informationIntroduction to Restricted Boltzmann Machines
Introduction to Restricted Boltzmann Machines Ilija Bogunovic and Edo Collins EPFL {ilija.bogunovic,edo.collins}@epfl.ch October 13, 2014 Introduction Ingredients: 1. Probabilistic graphical models (undirected,
More informationPattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM
Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures
More information