DIFFUSION NETWORKS, PRODUCT OF EXPERTS, AND FACTOR ANALYSIS. Tim K. Marks & Javier R. Movellan
|
|
- Rosa Horn
- 6 years ago
- Views:
Transcription
1 DIFFUSION NETWORKS PRODUCT OF EXPERTS AND FACTOR ANALYSIS Tim K Marks & Javier R Movellan University of California San Diego 9500 Gilman Drive La Jolla CA ABSTRACT Hinton (in press) recently proposed a learning algorithm called contrastive divergence learning for a class of proailistic models called product of experts (PoE) Whereas in standard mixture models the eliefs of individual experts are averaged in PoEs the eliefs are multiplied together and then renormalized One advantage of this approach is that the comined eliefs can e much sharper than the individual eliefs of each expert It has een shown that a restricted version of the Boltzmann machine in which there are no lateral connections etween hidden units or etween oservation units is a PoE In this paper we generalize these results to diffusion networks a continuous-time continuous-state version of the Boltzmann machine We show that when the unit activation functions are linear this PoE architecture is equivalent to a factor analyzer This result suggests novel non-linear generalizations of factor analysis and independent component analysis that could e implemented using interactive neural circuitry 1 INTRODUCTION Hinton (in press) recently proposed a fast learning algorithm known as contrastive divergence learning for a class of models called product of experts (PoE) In standard mixture models the eliefs of individual experts are averaged ut in PoEs the eliefs are multiplied together and then renormalized ie (1) is the proaility of the oserved vector under expert and is the proaility assiged to when the opinions of all the experts are comined One advantage of this approach is that the comined eliefs are sharper than the individual eliefs of each expert potentially avoiding the prolems of standard mixture models when applied to high-dimensional prolems It can e shown that a restricted version of the Boltzmann machine in which hidden units and oservale units form a ipartite graph is in fact a PoE (Hinton in press) Indeed Hinton (in press) recently trained these restricted Boltzmann machines (RBMs) using contrastive divergence learning with very good results One prolem with Boltzmann machines is that they use inarystate units a representation which may not e optimal for continuous data such as images and sounds This paper concerns the diffusion network a continuoustime continuous-state version of the Boltzmann machine Like Boltzmman machines diffusion networks have idirectional connections which in this paper we assume to e symmetric If a diffusion network is given the same connectivity structure as an RBM the result will also e a PoE model The main result presented here is that when the activation functions are linear these restricted diffusion networks model the same class of oservale distriutions as factor analysis This is somewhat surprising given the fact that diffusion networks are feedack models while factor analyzers are feedforward models Most importantly the result suggests novel non-linear generalizations of factor analysis and independent component analysis that could e implemented using interactive circuitry We show results of simulations in which we trained restricted diffusion networks using contrastive divergence 2 DIFFUSION NETWORKS A diffusion network is a neural network specified y a stochastic differential equation (2) is a random vector descriing the state of the system at time and is a standard Brownian motion process The term is a constant known as the dispersion that controls the amount of noise in the system The function known as the drift represents the deterministic kernel of the system If the drift is the gradient of an energy function!#"%$'&)( and the energy function satisfies certain conditions (Gidas 1986) then it can e shown that has a limit proaility density * +-0/1 2"%3 ( which is independent of the initial conditions 481
2 [ [ [ x a 21 Linear diffusions If we let 9;: 3 and the energy is a quadratic function ( 4!< =>5 3)?4@1A A is a symmetric positive definite matrix then the limit distriution is B +-0/1 2" = A DC (3) This distriution is Gaussian with zero mean and covariance matrix AFEG Taking the gradient of the energy function we get 4 *#"%$ & ( H#"IA) (4) and thus the activation dynamics are linear: #"IA J (5) These dynamics have the following neural network interpretation: Let AKL "NMO M represents a matrix of synaptic conductances and is a diagonal matrix of transmemrane conductances )P (current leakage terms) We then get the following form QP *<RTS QP U" )PVQP?W B ) (6) QP S KX M P (7) is interpreted as the net input current to neuron Y and QP as the activation of that neuron 3 FACTOR ANALYSIS Factor analysis is a proailistic model of the form [ N\ _^ (8) is an `1a dimensional random vector representing oservale data; is an `1 dimensional Gaussian vector representing hidden independent sources c dfe!lk the identity matrix; and ^ is a Gaussian vec- tor independent of representing noise in the oservation generation process c m^o Qne m^o oqp a diagonal matrix If is logistic instead of Gaussian and in the limit as psrte Equation (8) is the standard model underlying independent component analysis (ICA) (Bell & Sejnowski 1995; Attias 1999) Note that in factor analysis as well as in ICA the oservations are generated in a feedforward manner from the hidden sources and the noise process It follows from Equation 8 that and have covariances vu dw yx Taking the inverse of this lock matrix we get \ \z\z@{_p \z@ k (9) p EG "Op E7G \ E7G yx "I\z@ p EG ko\z@up EG \ (10) 4 RESTRICTED DIFFUSION NETWORKS In this section we discuss a suclass of diffusion networks that have a PoE architecture To egin with we divide the state vector of a diffusion network into oservale and hidden [ It is easy to show that if the connectivity matrix is restricted so that there are no lateral connections from hidden units to hidden units and no lateral connections from oservale units to oservale units ie if "M A' a "M (11) a and are diagonal then the limit distriution of is a PoE Hereafter we refer to a diffusion network with no hidden-hidden or oservale-oservale connections as a restricted diffusion network (RDN) Suppose we are given an aritrary linear RDN The limit distriution B [ has Gaussian marginal distriutions B [ and B whose covariances can e found y inverting the lock matrix in Equation 11: # " M ae7g M a EG (12) [ * a " M a EG a E7G (13) Let ƒ a represent the set containing every distriution on U that is the limit distriution of the oservale units of any linear RDN with ` hidden units In some cases we can think of the hidden units of a network as the network s internal representation of the world Some authors such as Barlow (1994) have postulated that one of the organizing principles of early processing in the rain is for these internal representations to e as independent as possile Thus it is of interest to study the set of linear RDNs whose hidden units are independent ie for which is diagonal One might wonder for example whether restricting to e the identity matrix imposes any constraints on [ Let ƒ Pa represent the set containing every distriution on ˆ that is the limit distriution of the oservale units of any linear RDN with ` hidden units that are independent ˆ k We now show that forcing the hidden units to e independent does not constrain the class of oservale distriutions that can e represented y a linear RDN Theorem 1 : ƒ Pa ƒ a Proof : Since ƒ Pa is a suset of ƒ a we just need to show that any distriution in ƒ a is also in ƒ P a Let [ e the limit distriution of an aritrary linear RDN with parameters M a and a We will show there exists a new linear RDN with limit distriution [ˆŠ Š and parameters M Š a Š and a such that Š Nk and [ Š [ 482
3 First we take the eigenvalue decomposition k " E7GŒ 6 a E7G a M a EGŒ 6 N7 (14) is a unitary matrix and is diagonal Equation 14 can e rewritten a E7G a M a " GŒ 6 Š M Š We define N EG Ž F a M a 7 GŒ 6 E7GŒ 6 C (15) 7 E7GŒ 6 C (16) By Equation 12 (and after some derivations) we find that Š Š " M Š lk C ae7g M Š a EG By Equation 13 (and after some derivations) we find that [ Š a " M Š a Š DE7G M Š EG a " M a EG M%@ a E7G [ DC (17) (18) Theorem 1 will help to elucidate the relationship etween factor analysis and linear RDNs We will show that the class of distriutions over oservale variales that can e generated y factor analysis models with ` hidden sources is the same as the class of distriutions over oservale units that can e generated y linear RDNs with `1 hidden units 5 FACTOR ANALYSIS AND DIFFUSION NETS In this section we examine the relationship etween feedforward factor analysis models and feedack diffusion networks Let a represent the class of proaility distriutions over oservale units that can e generated y factor analysis models with ` hidden units In Theorem 2 we will show that this class of distriutions is equivalent to ƒ a the class of distriutions generated y linear RDNs To do so we first need to prove a Lemma regarding the suclass of factor analysis models for which the lower right lock of the matrix in Equation 10 kb \z@up E7G \ is diagonal We define Pa as the class of proaility distriutions on that can e generated y such restricted factor analysis models Lemma : Pa a Proof : Since Pa is a suset of a we just need to show that any distriution in a is also in P a Let [ e the distriution generated y an aritrary factor analysis model with parameters \ and p We will show that there exists a new factor analysis model with joint distriution [ˆŠ and parameters \ Š and p such that [ˆŠ H [ and such that kol \ p EG \ Š is diagonal First we take the eigenvalue decomposition p E7G \IK (19) S is a unitary matrix and the eigenvalue matrix is diagonal Now define a rotation š y š l Ž F \ Š N\ š C (20) Then V\ p EG \ Š N (21) which is diagonal Thus k < V\ [ˆŠ Š 2@ p E7G \ is diagonal Furthermore from the equation 9\ Š ^ we can derive [ˆŠ l\ Š \ Š 2@œ pkn\z\@{ p [ DC Now we are ready to prove the main result of this paper: the fact that factor analysis models are equivalent to linear RDNs Theorem 2 : ƒ a a Proof : [ž : 'ŸQJƒ Ÿ Let B [ e the distriution over oservale units generated y any factor analysis model By the Lemma there exists a factor analysis model with and parameters \j~dp such that its marginal distriution over the oservale units equals the given distriution B [ and for which k \@ p E7G \ is diagonal e the limit distriution of the linear diffusion network with parameters M a a given y setting the connection matrix of Equation 11 equal to the inverse covariance matrix of Equation 10: A' EG C (22) Then 9p EG and a <k \z@ p E7G \ are oth diagonal so this diffusion net is a RDN By Equation 3 the limit distriution of is Gaussian with covariance la E7G from which it follows that ˆ In particular [ [ Thus [ ˆlB [ is the limit distriution of a linear RDN R W ƒ ŸQ Ÿ Let [ e the limit distriution over oservale units of any linear RDN By Theorem 1 there exists a linear RDN with distriution and parameters M a a such that its marginal distriution over the oservale units equals the given distriution B [ and for which lk Consider the factor analysis model with parameters p a E7G ~ˆ\ aeg M a C Using reasoning similar to that used in the first part of this proof one can verify that this factor analysis model generates a distriution over oservale units that is equal to B [ 483
4 6 SIMULATIONS RDNs have continuous states rather than inary states so they can represent pixel intensities in a more natural manner than restricted Boltzmann machines (RBMs) can We used contrastive divergence learning rules to train a linear RDN with 10 hidden units in an unsupervised manner on a dataase of 48 face images (16 people in three different lighting conditions) from the MIT Face Dataase (Turk & Pentland 1991) The linear RDN had 3600 oservale units ( e ª e pixels) Each expert in the product of experts consists of one hidden unit and the learned connection weights etween that hidden unit and all of the oservale units These learned connection weights can e thought of as the receptive field of that hidden unit Figure 1 shows the receptive fields of the 10 hidden units after training Because of Fig 2: Eigenfaces calculated from the same dataase of 48 face images the unoccluded oservale units to the values «and let the network settle to equilirium The equilirium distriution of the network will then e the posterior distriution B [% ˆ [ «N «Using the linear RDN that was trained on face images we reconstructed occluded versions of the images in the training set y taking the mean of this posterior distriution Figure 3 shows the results of reconstructing two occluded face images Fig 1: Receptive fields of the 10 hidden units of a linear RDN that was trained on a dataase of 48 face images y minimizing contrastive divergence the correspondence etween linear RDNs and factor analysis that was proven in previous sections the receptive field of a hidden unit in the linear RDN should correspond to a column of the factor loading matrix \ in Equation 8 (after the column is normalized y premultiplying y p EG the inverse of the variance of the noise in each oservale dimension) Because of the relationship etween factor analysis and principal component analysis one would expect the receptive fields of the hidden units of the linear RDN to resemle the eigenfaces (Turk & Pentland 1991) which are the eigenvectors of the pixelwise covariance matrix of the same dataase We calculated the eigenfaces of the dataase The eigenfaces corresponding to the 10 largest eigenvalues are shown in Figure 2 The qualitative similarity etween the linear RDN receptive fields in Figure 1 and the eigenfaces in Figure 2 is readily apparent Partially occluded [ images correspond to splitting the oservale units into those [% [O«with known values and those with unknown values Given an image with known values «reconstructing the values of the occluded units involves finding the posterior distriution [O [O««The feedack architecture of diffusion networks provides a natural way to find such a posterior distriution: clamp Fig 3: Reconstruction of two occluded images Left: Image from the training set for the linear RDN Center: The oservale units corresponding to the visile pixels were clamped while the oservale units corresponding to the occluded pixels were allowed to run free Right: Reconstructed image shown is the mean of the equilirium distriution 7 DISCUSSION We proved the rather surprising result that a feedack model the linear RDN spans the same family of distriutions as factor analysis a feedworward model Furthermore we have shown that for this class of feedack models restricting the marginal distriution of the hidden units to e independent does not restrict the distriutions over the oservale units that they can generate Linear RDNs can e trained using the contrastive divergence learning method of Hinton (in press) which is relatively fast A main advan- 484
5 tage of feedack networks over feedforward models is that one can easily solve inference prolems such as pattern completion for occluded images using the same architecture and algorithm that are used for generation Most importantly diffusion networks can also e defined with nonlinear activation functions (Movellan Mineiro & Williams In Press) and we have egun to explore nonlinear extensions of the linear case outlined here Extensions of this work to nonlinear diffusion networks open the door to possiilities of new ICA-like algorithms ased on feedack rather than feedforward architectures References Attias H (1999) Independent Factor Analysis Neural Computation 11(4) Barlow H (1994) What is the computational goal of the neocortex? In C Koch (Ed) Large scale neuronal theories of the rain pages 1 22 Camridge MA: MIT Press Bell A & Sejnowski T (1995) An informationmaximisation approach to lind separation and lind deconvolution Neural Computation Gidas B (1986) Metropolis-type monte carlo simulation algorithms and simulated annealing In J L Snell (Ed) Topics in contermporary proaility and its applications pages Boca Raton: CRC Press Hinton G E (in press) Training Products of Experts y Minimizing Contrastive Divergence Neural Computation Movellan J R Mineiro P & Williams R J (In Press) Partially Oservale SDE Models for Image sequence recognition tasks In T Leen T G Dietterich & V Tresp (Eds) Advances in Neural Information Processing Systems numer 13 pages Camridge Massachusetts: MIT Press Turk M & Pentland A (1991) Eigenfaces for recognition Journal of Cognitive Neuroscience 3(1)
Unsupervised Discovery of Nonlinear Structure Using Contrastive Backpropagation
Cognitive Science 30 (2006) 725 731 Copyright 2006 Cognitive Science Society, Inc. All rights reserved. Unsupervised Discovery of Nonlinear Structure Using Contrastive Backpropagation Geoffrey Hinton,
More informationLecture 16 Deep Neural Generative Models
Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed
More informationProbabilistic Modelling and Reasoning: Assignment Modelling the skills of Go players. Instructor: Dr. Amos Storkey Teaching Assistant: Matt Graham
Mechanics Proailistic Modelling and Reasoning: Assignment Modelling the skills of Go players Instructor: Dr. Amos Storkey Teaching Assistant: Matt Graham Due in 16:00 Monday 7 th March 2016. Marks: This
More informationCSC321 Lecture 20: Autoencoders
CSC321 Lecture 20: Autoencoders Roger Grosse Roger Grosse CSC321 Lecture 20: Autoencoders 1 / 16 Overview Latent variable models so far: mixture models Boltzmann machines Both of these involve discrete
More informationHierarchy. Will Penny. 24th March Hierarchy. Will Penny. Linear Models. Convergence. Nonlinear Models. References
24th March 2011 Update Hierarchical Model Rao and Ballard (1999) presented a hierarchical model of visual cortex to show how classical and extra-classical Receptive Field (RF) effects could be explained
More informationDeep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?
More informationEigenfaces. Face Recognition Using Principal Components Analysis
Eigenfaces Face Recognition Using Principal Components Analysis M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive Neuroscience, 3(1), pp. 71-86, 1991. Slides : George Bebis, UNR
More informationStatic and Dynamic Modelling of Materials Forging
Static and Dynamic Modelling of Materials Forging Coryn A.L. Bailer-Jones, David J.C. MacKay Cavendish Laoratory, University of Camridge email: calj@mrao.cam.ac.uk; mackay@mrao.cam.ac.uk Tanya J. Sain
More informationarxiv: v1 [stat.ml] 10 Apr 2017
Integral Transforms from Finite Data: An Application of Gaussian Process Regression to Fourier Analysis arxiv:1704.02828v1 [stat.ml] 10 Apr 2017 Luca Amrogioni * and Eric Maris Radoud University, Donders
More informationLecture 7: Con3nuous Latent Variable Models
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA
More informationPrincipal Component Analysis -- PCA (also called Karhunen-Loeve transformation)
Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) PCA transforms the original input space into a lower dimensional space, by constructing dimensions that are linear combinations
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationTutorial on Principal Component Analysis
Tutorial on Principal Component Analysis Copyright c 1997, 2003 Javier R. Movellan. This is an open source document. Permission is granted to copy, distribute and/or modify this document under the terms
More informationContrastive Divergence
Contrastive Divergence Training Products of Experts by Minimizing CD Hinton, 2002 Helmut Puhr Institute for Theoretical Computer Science TU Graz June 9, 2010 Contents 1 Theory 2 Argument 3 Contrastive
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationDeriving Principal Component Analysis (PCA)
-0 Mathematical Foundations for Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deriving Principal Component Analysis (PCA) Matt Gormley Lecture 11 Oct.
More informationExpectation Maximization
Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same
More informationEquivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network
LETTER Communicated by Geoffrey Hinton Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network Xiaohui Xie xhx@ai.mit.edu Department of Brain and Cognitive Sciences, Massachusetts
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks Delivered by Mark Ebden With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable
More informationViewpoint invariant face recognition using independent component analysis and attractor networks
Viewpoint invariant face recognition using independent component analysis and attractor networks Marian Stewart Bartlett University of California San Diego The Salk Institute La Jolla, CA 92037 marni@salk.edu
More informationGaussian Process Approximations of Stochastic Differential Equations
Gaussian Process Approximations of Stochastic Differential Equations Cédric Archambeau Centre for Computational Statistics and Machine Learning University College London c.archambeau@cs.ucl.ac.uk CSML
More informationNonlinear reverse-correlation with synthesized naturalistic noise
Cognitive Science Online, Vol1, pp1 7, 2003 http://cogsci-onlineucsdedu Nonlinear reverse-correlation with synthesized naturalistic noise Hsin-Hao Yu Department of Cognitive Science University of California
More informationLarge-Scale Feature Learning with Spike-and-Slab Sparse Coding
Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab
More informationADAPTIVE LATERAL INHIBITION FOR NON-NEGATIVE ICA. Mark Plumbley
Submitteed to the International Conference on Independent Component Analysis and Blind Signal Separation (ICA2) ADAPTIVE LATERAL INHIBITION FOR NON-NEGATIVE ICA Mark Plumbley Audio & Music Lab Department
More informationBasic Principles of Unsupervised and Unsupervised
Basic Principles of Unsupervised and Unsupervised Learning Toward Deep Learning Shun ichi Amari (RIKEN Brain Science Institute) collaborators: R. Karakida, M. Okada (U. Tokyo) Deep Learning Self Organization
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 8 Continuous Latent Variable
More informationHow to do backpropagation in a brain
How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep
More informationDeep unsupervised learning
Deep unsupervised learning Advanced data-mining Yongdai Kim Department of Statistics, Seoul National University, South Korea Unsupervised learning In machine learning, there are 3 kinds of learning paradigm.
More informationEigenface-based facial recognition
Eigenface-based facial recognition Dimitri PISSARENKO December 1, 2002 1 General This document is based upon Turk and Pentland (1991b), Turk and Pentland (1991a) and Smith (2002). 2 How does it work? The
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationImplementation of a Restricted Boltzmann Machine in a Spiking Neural Network
Implementation of a Restricted Boltzmann Machine in a Spiking Neural Network Srinjoy Das Department of Electrical and Computer Engineering University of California, San Diego srinjoyd@gmail.com Bruno Umbria
More informationSpeaker Representation and Verification Part II. by Vasileios Vasilakakis
Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation
More informationLearning Energy-Based Models of High-Dimensional Data
Learning Energy-Based Models of High-Dimensional Data Geoffrey Hinton Max Welling Yee-Whye Teh Simon Osindero www.cs.toronto.edu/~hinton/energybasedmodelsweb.htm Discovering causal structure as a goal
More informationA NEW VIEW OF ICA. G.E. Hinton, M. Welling, Y.W. Teh. S. K. Osindero
( ( A NEW VIEW OF ICA G.E. Hinton, M. Welling, Y.W. Teh Department of Computer Science University of Toronto 0 Kings College Road, Toronto Canada M5S 3G4 S. K. Osindero Gatsby Computational Neuroscience
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationHigher Order Statistics
Higher Order Statistics Matthias Hennig Neural Information Processing School of Informatics, University of Edinburgh February 12, 2018 1 0 Based on Mark van Rossum s and Chris Williams s old NIP slides
More informationUnsupervised Neural Nets
Unsupervised Neural Nets (and ICA) Lyle Ungar (with contributions from Quoc Le, Socher & Manning) Lyle Ungar, University of Pennsylvania Semi-Supervised Learning Hypothesis:%P(c x)%can%be%more%accurately%computed%using%
More informationKnowledge Extraction from DBNs for Images
Knowledge Extraction from DBNs for Images Son N. Tran and Artur d Avila Garcez Department of Computer Science City University London Contents 1 Introduction 2 Knowledge Extraction from DBNs 3 Experimental
More informationRecursive Generalized Eigendecomposition for Independent Component Analysis
Recursive Generalized Eigendecomposition for Independent Component Analysis Umut Ozertem 1, Deniz Erdogmus 1,, ian Lan 1 CSEE Department, OGI, Oregon Health & Science University, Portland, OR, USA. {ozertemu,deniz}@csee.ogi.edu
More informationc Springer, Reprinted with permission.
Zhijian Yuan and Erkki Oja. A FastICA Algorithm for Non-negative Independent Component Analysis. In Puntonet, Carlos G.; Prieto, Alberto (Eds.), Proceedings of the Fifth International Symposium on Independent
More informationThe connection of dropout and Bayesian statistics
The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationSum-Product Networks: A New Deep Architecture
Sum-Product Networks: A New Deep Architecture Pedro Domingos Dept. Computer Science & Eng. University of Washington Joint work with Hoifung Poon 1 Graphical Models: Challenges Bayesian Network Markov Network
More informationChapter 11. Stochastic Methods Rooted in Statistical Mechanics
Chapter 11. Stochastic Methods Rooted in Statistical Mechanics Neural Networks and Learning Machines (Haykin) Lecture Notes on Self-learning Neural Algorithms Byoung-Tak Zhang School of Computer Science
More informationSupplement to: Guidelines for constructing a confidence interval for the intra-class correlation coefficient (ICC)
Supplement to: Guidelines for constructing a confidence interval for the intra-class correlation coefficient (ICC) Authors: Alexei C. Ionan, Mei-Yin C. Polley, Lisa M. McShane, Kevin K. Doin Section Page
More informationTechniques for Dimensionality Reduction. PCA and Other Matrix Factorization Methods
Techniques for Dimensionality Reduction PCA and Other Matrix Factorization Methods Outline Principle Compoments Analysis (PCA) Example (Bishop, ch 12) PCA as a mixture model variant With a continuous latent
More informationCS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)
CS4495/6495 Introduction to Computer Vision 8B-L2 Principle Component Analysis (and its use in Computer Vision) Wavelength 2 Wavelength 2 Principal Components Principal components are all about the directions
More informationSingle Channel Signal Separation Using MAP-based Subspace Decomposition
Single Channel Signal Separation Using MAP-based Subspace Decomposition Gil-Jin Jang, Te-Won Lee, and Yung-Hwan Oh 1 Spoken Language Laboratory, Department of Computer Science, KAIST 373-1 Gusong-dong,
More informationDeep Learning Srihari. Deep Belief Nets. Sargur N. Srihari
Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous
More informationEmergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces
LETTER Communicated by Bartlett Mel Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces Aapo Hyvärinen Patrik Hoyer Helsinki University
More informationConnections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. Revised submission to IEEE TNN
Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables Revised submission to IEEE TNN Aapo Hyvärinen Dept of Computer Science and HIIT University
More informationConvolution and Pooling as an Infinitely Strong Prior
Convolution and Pooling as an Infinitely Strong Prior Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Convolutional
More informationUnsupervised Learning
CS 3750 Advanced Machine Learning hkc6@pitt.edu Unsupervised Learning Data: Just data, no labels Goal: Learn some underlying hidden structure of the data P(, ) P( ) Principle Component Analysis (Dimensionality
More informationCS 4495 Computer Vision Principle Component Analysis
CS 4495 Computer Vision Principle Component Analysis (and it s use in Computer Vision) Aaron Bobick School of Interactive Computing Administrivia PS6 is out. Due *** Sunday, Nov 24th at 11:55pm *** PS7
More informationPrincipal Component Analysis (PCA)
Principal Component Analysis (PCA) Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page. Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2] Outline Introduction
More informationPrincipal Component Analysis
B: Chapter 1 HTF: Chapter 1.5 Principal Component Analysis Barnabás Póczos University of Alberta Nov, 009 Contents Motivation PCA algorithms Applications Face recognition Facial expression recognition
More informationUNSUPERVISED LEARNING
UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training
More informationDimensionality Reduction. CS57300 Data Mining Fall Instructor: Bruno Ribeiro
Dimensionality Reduction CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Visualize high dimensional data (and understand its Geometry) } Project the data into lower dimensional spaces }
More informationReconnaissance d objetsd et vision artificielle
Reconnaissance d objetsd et vision artificielle http://www.di.ens.fr/willow/teaching/recvis09 Lecture 6 Face recognition Face detection Neural nets Attention! Troisième exercice de programmation du le
More informationbelow, kernel PCA Eigenvectors, and linear combinations thereof. For the cases where the pre-image does exist, we can provide a means of constructing
Kernel PCA Pattern Reconstruction via Approximate Pre-Images Bernhard Scholkopf, Sebastian Mika, Alex Smola, Gunnar Ratsch, & Klaus-Robert Muller GMD FIRST, Rudower Chaussee 5, 12489 Berlin, Germany fbs,
More informationNonlinear Dimensionality Reduction
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap
More informationThe wake-sleep algorithm for unsupervised neural networks
The wake-sleep algorithm for unsupervised neural networks Geoffrey E Hinton Peter Dayan Brendan J Frey Radford M Neal Department of Computer Science University of Toronto 6 King s College Road Toronto
More informationDeep Belief Networks are compact universal approximators
1 Deep Belief Networks are compact universal approximators Nicolas Le Roux 1, Yoshua Bengio 2 1 Microsoft Research Cambridge 2 University of Montreal Keywords: Deep Belief Networks, Universal Approximation
More information1 Hoeffding s Inequality
Proailistic Method: Hoeffding s Inequality and Differential Privacy Lecturer: Huert Chan Date: 27 May 22 Hoeffding s Inequality. Approximate Counting y Random Sampling Suppose there is a ag containing
More informationDeep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)
CSCE 636 Neural Networks Instructor: Yoonsuck Choe Deep Learning What Is Deep Learning? Learning higher level abstractions/representations from data. Motivation: how the brain represents sensory information
More informationBayesian Mixtures of Bernoulli Distributions
Bayesian Mixtures of Bernoulli Distributions Laurens van der Maaten Department of Computer Science and Engineering University of California, San Diego Introduction The mixture of Bernoulli distributions
More informationClass Mean Vector Component and Discriminant Analysis for Kernel Subspace Learning
1 Class Mean Vector Component and Discriminant Analysis for Kernel Suspace Learning Alexandros Iosifidis Department of Engineering, ECE, Aarhus University, Denmark alexandros.iosifidis@eng.au.dk arxiv:1812.05988v1
More informationReplicated Softmax: an Undirected Topic Model. Stephen Turner
Replicated Softmax: an Undirected Topic Model Stephen Turner 1. Introduction 2. Replicated Softmax: A Generative Model of Word Counts 3. Evaluating Replicated Softmax as a Generative Model 4. Experimental
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems
More informationDensity Propagation for Continuous Temporal Chains Generative and Discriminative Models
$ Technical Report, University of Toronto, CSRG-501, October 2004 Density Propagation for Continuous Temporal Chains Generative and Discriminative Models Cristian Sminchisescu and Allan Jepson Department
More informationCSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection
CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection (non-examinable material) Matthew J. Beal February 27, 2004 www.variational-bayes.org Bayesian Model Selection
More informationGreedy Layer-Wise Training of Deep Networks
Greedy Layer-Wise Training of Deep Networks Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle NIPS 2007 Presented by Ahmed Hefny Story so far Deep neural nets are more expressive: Can learn
More informationTUTORIAL PART 1 Unsupervised Learning
TUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science Univ. of Toronto ranzato@cs.toronto.edu Co-organizers: Honglak Lee, Yoshua Bengio, Geoff Hinton, Yann LeCun, Andrew
More informationModeling human function learning with Gaussian processes
Modeling human function learning with Gaussian processes Thomas L. Griffiths Christopher G. Lucas Joseph J. Williams Department of Psychology University of California, Berkeley Berkeley, CA 94720-1650
More informationNatural Gradient Learning for Over- and Under-Complete Bases in ICA
NOTE Communicated by Jean-François Cardoso Natural Gradient Learning for Over- and Under-Complete Bases in ICA Shun-ichi Amari RIKEN Brain Science Institute, Wako-shi, Hirosawa, Saitama 351-01, Japan Independent
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationImage Analysis. PCA and Eigenfaces
Image Analysis PCA and Eigenfaces Christophoros Nikou cnikou@cs.uoi.gr Images taken from: D. Forsyth and J. Ponce. Computer Vision: A Modern Approach, Prentice Hall, 2003. Computer Vision course by Svetlana
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1
More informationA732: Exercise #7 Maximum Likelihood
A732: Exercise #7 Maximum Likelihood Due: 29 Novemer 2007 Analytic computation of some one-dimensional maximum likelihood estimators (a) Including the normalization, the exponential distriution function
More informationLecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University
Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationReading Group on Deep Learning Session 4 Unsupervised Neural Networks
Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Jakob Verbeek & Daan Wynen 206-09-22 Jakob Verbeek & Daan Wynen Unsupervised Neural Networks Outline Autoencoders Restricted) Boltzmann
More informationDimensionality Reduction
Lecture 5 1 Outline 1. Overview a) What is? b) Why? 2. Principal Component Analysis (PCA) a) Objectives b) Explaining variability c) SVD 3. Related approaches a) ICA b) Autoencoders 2 Example 1: Sportsball
More informationSystem 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to:
System 2 : Modelling & Recognising Modelling and Recognising Classes of Classes of Shapes Shape : PDM & PCA All the same shape? System 1 (last lecture) : limited to rigidly structured shapes System 2 :
More informationKyle Reing University of Southern California April 18, 2018
Renormalization Group and Information Theory Kyle Reing University of Southern California April 18, 2018 Overview Renormalization Group Overview Information Theoretic Preliminaries Real Space Mutual Information
More informationLecture: Face Recognition
Lecture: Face Recognition Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 12-1 What we will learn today Introduction to face recognition The Eigenfaces Algorithm Linear
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationInstantaneous Measurement of Nonlocal Variables
Instantaneous Measurement of Nonlocal Variales Richard A. Low richard@wentnet.com August 006 Astract We ask the question: is it possile to measure nonlocal properties of a state instantaneously? Relativistic
More informationLearning and Memory in Neural Networks
Learning and Memory in Neural Networks Guy Billings, Neuroinformatics Doctoral Training Centre, The School of Informatics, The University of Edinburgh, UK. Neural networks consist of computational units
More informationVariational Principal Components
Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings
More informationFace Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition
ace Recognition Identify person based on the appearance of face CSED441:Introduction to Computer Vision (2017) Lecture10: Subspace Methods and ace Recognition Bohyung Han CSE, POSTECH bhhan@postech.ac.kr
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory
More informationRegression with Input-Dependent Noise: A Bayesian Treatment
Regression with Input-Dependent oise: A Bayesian Treatment Christopher M. Bishop C.M.BishopGaston.ac.uk Cazhaow S. Qazaz qazazcsgaston.ac.uk eural Computing Research Group Aston University, Birmingham,
More informationBlind Machine Separation Te-Won Lee
Blind Machine Separation Te-Won Lee University of California, San Diego Institute for Neural Computation Blind Machine Separation Problem we want to solve: Single microphone blind source separation & deconvolution
More informationLarge-scale Collaborative Prediction Using a Nonparametric Random Effects Model
Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model Kai Yu Joint work with John Lafferty and Shenghuo Zhu NEC Laboratories America, Carnegie Mellon University First Prev Page
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationLecture 10: Dimension Reduction Techniques
Lecture 10: Dimension Reduction Techniques Radu Balan Department of Mathematics, AMSC, CSCAMM and NWC University of Maryland, College Park, MD April 17, 2018 Input Data It is assumed that there is a set
More informationCOMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection
COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection Instructor: Herke van Hoof (herke.vanhoof@cs.mcgill.ca) Based on slides by:, Jackie Chi Kit Cheung Class web page:
More information