Generative models for amplitude modulation: Gaussian Modulation Cascade Processes
|
|
- Juliet Lambert
- 5 years ago
- Views:
Transcription
1 Generative models for amplitude modulation: Gaussian Modulation Cascade Processes Richard Turner Maneesh Sahani Gatsby Computational Neuroscience Unit, 31/01/2007
2 Motivation time /s
3 Motivation time /s
4 Motivation: Traditional AM = time /s
5 Motivation: Traditional AM = time /s
6 Motivation: Demodulate the Modulator = time /s
7 Motivation: Demodulation Cascade = time /s
8 Goal and stucture of the talk Eventual GOAL: Construct a latent variable model for sounds to understand auditory processing in the brain. Amplitude modulation is an important statistical dimension of sounds Auditory system listens attentively to amplitude modulation First half of talk: Statistical models for amplitude demodulation Second half of talk: A generative model for sounds
9 Applications for Amplitude Demodulation Our focus: Natural scene statistics and auditory processing Psychophysics: Manipulation of speech Signal processing applications Bearing failure prediction (bearings account for half of all motor failures) Financial models
10 Traditional Demodulation Algorithms: Hilbert transform x 1 y and x time/s
11 Traditional Demodulation Algorithms: Hilbert transform x 1 y and x time/s
12 Traditional Demodulation Algorithms: Hilbert transform x 1 y and x time/s
13 Traditional Demodulation Algorithms: square and lowpass x 1 y and x time/s
14 Traditional Demodulation Algorithms: square and lowpass x 1 y and x time/s
15 Traditional Demodulation Algorithms: square and lowpass x 1 y and x time/s
16 Traditional Demodulation Algorithms are not sufficient Hilbert transform recovers a carrier of constant variance. recovers a modulator that may not correspond to the envelope we want. no tunable parameter Square and lowpass recovers a good-looking smooth envelope recovers a carrier of high variance i.e. not demodulated has a tunable parameter
17 Advantages of the probabilistic approach Representing a signal as: quickly varying carrier slowly varying positive envelope is fundamentally an ill posed problem we need prior information and therefore the probabilistic setting is the natural one. Makes assumptions explicit allowing them to be criticed and improved Allows us to put error-bars on our inferences We can develop a family of related algorithms: cheap and cheerful, slow and Bayesian, with(out) hand-tunable parameters Tap into the existing theory and methods to e.g. deal with missing data, model comparison etc.
18 A simple generative model for AM x (2) t = f a (z (2) t ) z (2) t Norm(λz (2) t 1, σ2 ) x (1) t Norm(0, 1) y t = x (1) t x (2) t
19 A simple generative model for AM l eff = 1/log(λ) x (2) 1:T x (2) t = f a (z (2) t ) z (2) t Norm(λz (2) t 1, σ2 ) x (1) t Norm(0, 1) y t = x (1) t x (2) t
20 A simple generative model for AM l eff = 1/log(λ) x (2) 1:T x (2) t = f a (z (2) t ) z (2) t Norm(λz (2) t 1, σ2 ) x (1) t Norm(0, 1) y t = x (1) t x (2) t x (1) 1:T
21 A simple generative model for AM l eff = 1/log(λ) x (2) 1:T x (2) t = f a (z (2) t ) z (2) t Norm(λz (2) t 1, σ2 ) x (1) t Norm(0, 1) y t = x (1) t x (2) t x (1) 1:T y 1:T
22 A cheap and cheerful learning algorithm Maximise over the envelope and other parameters p(y 1:T, x (1) 1:T, x(2) 1:T, λ σ2, a) = p(λ)p(y 1:T x (1) 1:T, x(2) 1:T )p(x(1) 1:T )p(x(2) 1:T )
23 A cheap and cheerful learning algorithm Maximise over the envelope and other parameters p(y 1:T, x (1) 1:T, x(2) 1:T, λ σ2, a) = p(λ)p(y 1:T x (1) 1:T, x(2) 1:T )p(x(1) p(z 1:T ) {z } TY dz t 1:T ) p(x(2) 1:T ) t=1 dx (2) t
24 A cheap and cheerful learning algorithm Maximise over the envelope and other parameters p(y 1:T, x (1) 1:T, x(2) 1:T, λ σ2, a) = p(λ)p(y 1:T x (1) 1:T, x(2) 1:T )p(x(1) p(y 1:T, x (2) 1:T, λ σ2, a) = Z p(z 1:T ) {z } TY dz t 1:T ) p(x(2) 1:T ) t=1 dx (2) t dx (1) 1:T p(y 1:T, x (1) 1:T, x(2) 1:T, λ σ2, a)
25 A cheap and cheerful learning algorithm Maximise over the envelope and other parameters p(y 1:T, x (1) 1:T, x(2) 1:T, λ σ2, a) = p(λ)p(y 1:T x (1) 1:T, x(2) 1:T )p(x(1) p(y 1:T, x (2) 1:T, λ σ2, a) = p(y 1:T, x (2) 1:T σ2, a) = Z Z p(z 1:T ) {z } TY dz t 1:T ) p(x(2) 1:T ) t=1 dx (2) t dx (1) 1:T p(y 1:T, x (1) 1:T, x(2) 1:T, λ σ2, a) Z dλ dx (1) 1:T p(y 1:T, x (1) 1:T, x(2) 1:T, λ σ2, a)
26 A cheap and cheerful learning algorithm Maximise over the envelope and other parameters p(y 1:T, x (1) 1:T, x(2) 1:T, λ σ2, a) = p(λ)p(y 1:T x (1) 1:T, x(2) 1:T )p(x(1) p(y 1:T, x (2) 1:T, λ σ2, a) = p(y 1:T, x (2) 1:T σ2, a) = Z Z p(z 1:T ) {z } TY dz t 1:T ) p(x(2) 1:T ) t=1 dx (2) t dx (1) 1:T p(y 1:T, x (1) 1:T, x(2) 1:T, λ σ2, a) Z dλ dx (1) 1:T p(y 1:T, x (1) 1:T, x(2) 1:T, λ σ2, a) arg max x (2) 1:T,σ2,a p(y 1:T, x (2) 1:T σ2, a)
27 Results x 1 y and x x 10 4
28 Results x 1 y and x x 10 4
29 Results x 1 y and x x 10 4
30 Desirable features of a new generative model We could use this algorithm to collect low level statistics of amplitude modulation in the multidimensional setting Alternative is to build a model which attempts to learn this structure in a higher-level prior.
31 Desirable features of a new generative model 1. Sparse outputs. 2. Explicit temporal dimension, smooth latent variables. 3. Hierarchical prior that captures the AM statistics of sounds at different time scales: cascade of modulatory processes, with slowly varying processes at the top modulating more rapidly varying signals towards the bottom. 4. Learnable; and we would like to preserve information about the uncertainty, and possibly correlations, in our inferences.
32 x (2) t,1 K 2 = x (1) t,1 K 1 = = (1) (2) y t = x x1,t 1,t t
33 x (2) t,1 K 2 = x (1) t,1 x (1) t,2 K 1 = g 11 g 21 = (1) (1) (2) y t = ( g 11 x 1,t + g x2,t 21 ) x1,t t
34 x (2) t,1 K 2 = x (1) t,1 x (1) t,2 K 1 = g 11 g 21 = (1) (1) (2) y t = ( x 1,t + x2,t) x1,t t
35 x (3) t,1 K 3 = x (2) t,1 x (2) t,2 K 2 = x (1) t,1 x (1) t,2 x (1) t,3 K 1 = y t = x (3) t,1 [ x(2) t,1 (x(1) t,1 g x(1) t,2 g x(1) t,3 g 311 ) + x(2) t,2 (x(1) t,1 g x(1) t,2 g x(1) t,3 g 321 ) ]
36 M = 3, K 1 = 3, K 2 = 2, K 3 = 1, D = 80
37 Inference and learning Inference and learning by EM has an intractable E-Step. But due to the structure of the non-linearity variational EM can be used. Key point: If we freeze all the latent time-series bar one, the distribution over the unfrozen chain is Gaussian. Leads to a family of efficient variational approximations: p(x Y ) m q(x(m) 1:T,1:K m ), where each approximating distribution is a Gaussian.
38 Inference and Learning: Proof of concept x (2) t,1 K 2 = x (1) t,1 x (1) t,2 K 1 = g 11 g 21 = (1) (1) (2) y t = ( x 1,t + x2,t) x1,t t
39 Inference and Learning: Proof of concept X (2) 1,t Y d X (1) 2,t X (1) 1,t Time
40 Current work and Future directions Apply to natural sounds (require good initialisation) and single microphone source separation Generalise the model to have non-local features: to capture sweeps correlations in the prior: to capture the mutual-exclusivity of voiced and unvoiced sections of speech Representation: real sounds live on a hyper-plane in STFT space: can we project our model onto this manifold? How do we map posteriors to spike-trains: p(spike trains y) = f[q(x y)]?
41 Extra slides...
42 AM: a candidate organising principle in the Auditory System? The auditory system listens attentively to Amplitude Modulation Examples include Comodulation Masking Release in psychophysics, and a possible topographic mapping of AM in the IC from electrophysiology. Main goal: discover computational principles underpinning auditory processing. But armed with a generative model you can do: segregation, fill in missing data/remove artifacts etc. sound denoising, source
43 What s up with current generative models? Model bubeicabube SC latents sparse share power slowly varying Assumption: Latent variables are sparse. learnable
44 What s up with current generative models? Model bubeicabube SC GSM latents sparse share power slowly varying learnable Assumption: Latents are sparse, and share power. x = λu λ 0 a positive scalar random variable, u G(0, Q)
45 What s up with current generative models? Model latents sparse share power slowly varying bubeicabube SC GSM SFA learnable Assumption: Latents are slow.
46 What s up with current generative models? Model latents sparse share power slowly varying bubeicabube SC GSM SFA Bubbles learnable Assumption: Latents are sparse, slow (and share power).
47 What s the point in a generative model for sounds? Theoretical Neuroscience Understand neural receptive fields: if a latent variable model provides a computationally effective representation of sounds, neurons might encode p(latent sound) Psychophysics: Play sounds to subjects (possibly drawn from the forward model), and compare their inferences with inference in the model. Machine Learning Fill in missing data: e.g. to fill in an intermittent recording or remove artifacts Denoise sounds, Stream segregation, Compression
48 Outline Natural auditory scene statistics - Amplitude modulation is a key component Amplitude modulation in the auditory system - a candidate organising principle (the auditory system is short on such things) Previous statistical models of natural scenes - Gaussian Scale Mixture Models and AM A new model for sounds: The Gaussian Modulation Cascade Process On going issues...
49 Natural Auditory Scene statistics: Acoustic ecology Marginal distribution very sparse (more so than in vision) WHY? Lots of soft sounds e.g. the pauses between utterances in speech sounds and rare highly structured localised events that carry substantial parts of the stimulus energy
50 Statistics of Amplitude Modulation (Attias & Schreiner) Methods: sound filterbank Hilbert transform find envelope a(t, ω) take logs and transform to zero-mean and unit variance: â(t, ω) Marginal statistics of â(t, ω) p[â(t, ω)] exp[ γâ(t,ω)] (β 2 +â(t,ω) 2 ) α/2 Spectrum of â(t, ω) F T [â(t, ω)] 1 (ω 2 0 +ω2 ) α/2
51 Results summary Marginal distribution very sparse (finite prob of arb. soft sounds): Independent of filter centre frequency, filter bandwidth, time resolution. If AM stats were uncorrelated over time and frequency, CLT would predict increasing the filter bandwidth/time resolution would make the distribution more Gaussian. Spectrum of the amplitude modulations Independent of filter centre frequency Modified power law, indicating long temporal correlations (scale invariance) Independent of filter bandwidth (cf. Voss and Clarke 1/f) Implication: A good generative model of sounds should capture long correlations in AM across frequency bands and time (> 100ms) 2. Each location on the cochlea sees the same AM stats
52 AM in the Auditory System - Highlights from lots of work Psychophysics: Comodulation masking release task: detect a tone in noise. Alter the bandwidth of the noise and measure threshold Repeat but amplitude modulate the masker.
53 Electrophysiological data Type 1 AN fibres phase lock to AM (implies temporal code), as we move up the neuraxis the tuning moves from temporal to rate. Evidence in IC for topographic AM tuning (Schreiner and Langer, 1988) Cortex: AM processing seems filter independent (modulation filter bank?) Jury still out but AM may be a fundamental organising principle
54 What s wrong with statistical models for natural scenes? ICA and Sparse coding I p(x) = p(x i ) (1) i=1 p(x i ) = sparse (2) Recognition weights: p ICA (x y) = δ(x Ry) Images Sounds Problems: p ICA (y x) = δ(gx y) (3) p SC (y x) = Norm(Gx, σ 2 I) (4) 1. Extracted latents are not independent: correlations in their power. 2. No explicit temporal dimension - not a true generative model for sounds or movies
55 1. Empirical distribution of the latents - power correlations
56 Caption for previous figure: Explanation Expected joint distribution of latents was starry (top left). Empirically the joint is found to be elliptical - many more instances of two latents being high than expected (top right). Another way of seeing this is to look at the conditionals (bottom): if one latent has high power then near by latents also tend to have high power. How do we improve the model? 1. Fix the bottom level p(y x) = δ(y Gx) 2. Fixes recognition distribution p(x y) = δ(x Ry) 3. p(x) = dyp(x y)p(y) 1 N n p(x y n) 4. These are exactly the distributions plotted on the previous page 5. No matter what R is we get similar empirical distributions 6. So choose a new prior to match them...
57 x = λu Gaussian Scale Mixtures (GSMs) λ 0 a scalar random variable u G(0, Q) λ and u are independent density of these semi-parametric models can be expressed as an integral: p(x) = Z p(x λ)p(λ)dλ = Z 2πλ 2 Q 1/2 exp xt Q 1 x 2λ 2! p(λ)dλ (5) e.g. p(λ) discrete MOG (components 0 mean), p(λ) =Gamma student-t. Imagine generalising this to the temporal setting: x(t) = λ(t)u(t) = positive envelope carrier Are we seeing the hall marks of AM in a non-temporal setting?
58 Learning a neighbourhood (Karklin and Lewicki 2003, 05, 06) The statistical dependencies between filters depends on their separation (in space, scale, and orientation.) We d like to learn these neighbourhoods of dependence Solution: Share multipliers in a linear fashion using a GSM with a generalised log-normal over the variances. p(z j ) = Norm(0, 1) (6) λ 2 i = exp j h ij z j (7) p(x i z, B) = Norm(0, λ i ) (8) p(y G, x) = Norm(Gx, σ 2 I) (9)
59 A temporal GSMM: The proof of concept Bubbles model p(z i,t ) = sparse point process (10) λ 2 i,t = f j h ij Φ(t) z j (t) (11) p(x i,t λ i,t ) = Norm(0, λ 2 i,t) (12) p(y t G, x t ) = δ(gx t y t ) (13) Temporal correlations between the multipliers are captured by the moving average Φ(t) u j (t) (creates a SLOW ENVELOPE) Bubbles of activity in latent space (both in space and time) Columns of h i,j fixed and change smoothly to induce topographic structure - computationally useful
60 Learning Common to learn the parameters using zero-temperature EM: q(x) = δ(x X MAP ) p(x Y ) (14) uncertainty and correlational information is lost. 1. Effects learning 2. To compare to neural data need to specify a mapping: p(spike trains y) = f[q(x y)] = f(x MAP ) for this approximation. 3. BUT we believe neural populations will represent uncertainty and correlations in latent variables. We d like to retain variance and correlational information, both for learning and for comparison to biology.
61 Motivations for the GMCP 1. Sparse outputs. 2. Explicit temporal dimension, smooth latent variables. 3. Hierarchical prior that captures the AM statistics of sounds at different time scales: cascade of modulatory processes, with slowly varying processes at the top modulating more rapidly varying signals towards the bottom. 4. Learnable; and we would like to preserve information about the uncertainty, and possibly correlations, in our inferences.
62 Dynamics: The Gaussian Modulation Cascade Process ( p(x (m) k,t x(m) k,t 1:t τ m, λ m k,1:τ m, σm,k 2 ) = Norm τm ) t =1 λ(m) k,t x (m) k,t t, σm,k 2 Emission Distribution: p(y t x (1:M) t, g k1 :k M, σ 2 y) = Norm ( k 1 :k M g k 1 :k M Time-frequency representation: y t = filter bank outputs ) M m=1 x(m) k m,t, σ2 yi
63 e.g. M = 2, K 1 = 2, K 2 = 1, D = 1 x (2) t, x (1) t,1 x (1) t, (1) (1) y t = ( g 11 x +g21 1,t x 2,t ) (2) x1,t g 11 + g 21 = t
64 e.g. M = 4, K 1 = 4, K 2 = 2, K 3 = 2, K 4 = 1, D = 80
65 Comments on the model A GSM: latent = rectified Gaussian Gaussian Emission distribution related to Grimes & Rao, Tenenbaum & Freeman (for M=2) and Vasilescu & Terzopoulos for general M., but the temporal priors allow us to be fully unsupervised. As p(x m 1:K,1:T x1:m m 1:K,1:T, y 1:T, θ) is Gaussian, there are a family of variational EM algorithms we can use to learn the model.
66 Inference and Learning: Proof of concept X (2) 1,t Y d X (1) 2,t X (1) 1,t Time
67 Future directions and current issues DIRECTIONS correlated prior: speech is usually periodic (voiced) or unvoiced - mutually exclusive, easier to represent sweeps. non-local features: make G temporal to capture sweeps ISSUES Representation: real sounds live on a hyper-plane in filter-bank space - can we project our model onto this manifold? Initialisation: The free energy has lots of local minima: use Slow modulations [da(g ] T 2 analysis to initialise arg min n y) gn dt such that a(gn T y)a(gmy) T = δ mn How do we map posteriors to spike-trains: p(spike trains y) = f[q(x y)]?
68 What s up with current generative models? Model p(x (1) ) p(y x (1) ) ICA sparse δ(y Gx (1) ) SC sparse Norm(Gx (1), σyi) 2 Assumption: Latent variables are sparse.
69 What s up with current generative models? Model p(x (2) ) p(x (1) x (2) ) p(y x (1) ) ICA N/A sparse δ(y Gx (1) ) SC N/A sparse Norm(Gx (1), σyi) 2 GSM Norm(0, I) Norm(0, λ 2 i ) Norm(Gx(1), σyi) 2 λ 2 i = exp(ht i x(2) ) Assumption: Latents are sparse, and share power. x (1) = λu λ 0 a positive scalar random variable, u G(0, Q)
70 What s up with current generative models? Model p(x (2) ) p(x (1) x (2) ) p(y x (1) ) ICA N/A sparse δ(y Gx (1) ) SC N/A sparse Norm(Gx (1), σyi) 2 GSM Norm(0, I) Norm(0, λ 2 i ) Norm(Gx(1), σyi) 2 λ 2 i = exp(ht i x(2) ) SFA N/A Norm(γx t 1, σ 2 ) δ(y Gx (1) ) Assumption: Latents are slow.
71 What s up with current generative models? Model p(x (2) ) p(x (1) x (2) ) p(y x (1) ) ICA N/A sparse δ(y Gx (1) ) SC N/A sparse Norm(Gx (1), σ 2 yi) GSM Norm(0, I) Norm(0, λ 2 i ) Norm(Gx(1), σ 2 yi) λ 2 i = exp(ht i x(2) ) SFA N/A Norm(γx t 1, σ 2 ) δ(y Gx (1) ) Bubbles point-process Norm(0, λ 2 i,t ) δ(y Gx(1) ) λ 2 i,t = f(ht i x(2) t φ t ) Assumption: Latents are sparse, slow (and share power).
Is early vision optimised for extracting higher order dependencies? Karklin and Lewicki, NIPS 2005
Is early vision optimised for extracting higher order dependencies? Karklin and Lewicki, NIPS 2005 Richard Turner (turner@gatsby.ucl.ac.uk) Gatsby Computational Neuroscience Unit, 02/03/2006 Outline Historical
More informationModeling Natural Sounds with Gaussian Modulation Cascade Processes
Modeling Natural Sounds with Gaussian Modulation Cascade Processes Anonymous Author(s) Affiliation Address City, State/Province, Postal Code, Country email Abstract The computational principles underpinning
More informationGaussian Processes for Audio Feature Extraction
Gaussian Processes for Audio Feature Extraction Dr. Richard E. Turner (ret26@cam.ac.uk) Computational and Biological Learning Lab Department of Engineering University of Cambridge Machine hearing pipeline
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College
More informationHierarchy. Will Penny. 24th March Hierarchy. Will Penny. Linear Models. Convergence. Nonlinear Models. References
24th March 2011 Update Hierarchical Model Rao and Ballard (1999) presented a hierarchical model of visual cortex to show how classical and extra-classical Receptive Field (RF) effects could be explained
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationPILCO: A Model-Based and Data-Efficient Approach to Policy Search
PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol
More informationThe 1d Kalman Filter. 1 Understanding the forward model. Richard Turner
The d Kalman Filter Richard Turner This is a Jekyll and Hyde of a document and should really be split up. We start with Jekyll which contains a very short derivation for the d Kalman filter, the purpose
More informationA Maximum-Likelihood Interpretation for Slow Feature Analysis
1 A Maximum-Likelihood Interpretation for Slow Feature Analysis Richard Turner turner@gatsby.ucl.ac.uk Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, University College
More informationSingle Channel Signal Separation Using MAP-based Subspace Decomposition
Single Channel Signal Separation Using MAP-based Subspace Decomposition Gil-Jin Jang, Te-Won Lee, and Yung-Hwan Oh 1 Spoken Language Laboratory, Department of Computer Science, KAIST 373-1 Gusong-dong,
More informationSUPPLEMENTARY INFORMATION
Supplementary discussion 1: Most excitatory and suppressive stimuli for model neurons The model allows us to determine, for each model neuron, the set of most excitatory and suppresive features. First,
More informationPopulation Coding. Maneesh Sahani Gatsby Computational Neuroscience Unit University College London
Population Coding Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2010 Coding so far... Time-series for both spikes and stimuli Empirical
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA
More informationNew Machine Learning Methods for Neuroimaging
New Machine Learning Methods for Neuroimaging Gatsby Computational Neuroscience Unit University College London, UK Dept of Computer Science University of Helsinki, Finland Outline Resting-state networks
More informationNatural Image Statistics
Natural Image Statistics A probabilistic approach to modelling early visual processing in the cortex Dept of Computer Science Early visual processing LGN V1 retina From the eye to the primary visual cortex
More informationExpectation Propagation in Dynamical Systems
Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex
More informationBayesian Methods for Sparse Signal Recovery
Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri Motivation Motivation Sparse Signal Recovery
More informationMachine Recognition of Sounds in Mixtures
Machine Recognition of Sounds in Mixtures Outline 1 2 3 4 Computational Auditory Scene Analysis Speech Recognition as Source Formation Sound Fragment Decoding Results & Conclusions Dan Ellis
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationLecture 7: Con3nuous Latent Variable Models
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationSignal Modeling Techniques in Speech Recognition. Hassan A. Kingravi
Signal Modeling Techniques in Speech Recognition Hassan A. Kingravi Outline Introduction Spectral Shaping Spectral Analysis Parameter Transforms Statistical Modeling Discussion Conclusions 1: Introduction
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationExpectation Maximization
Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger
More informationLecture 15. Probabilistic Models on Graph
Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how
More informationEfficient Coding. Odelia Schwartz 2017
Efficient Coding Odelia Schwartz 2017 1 Levels of modeling Descriptive (what) Mechanistic (how) Interpretive (why) 2 Levels of modeling Fitting a receptive field model to experimental data (e.g., using
More informationLatent Variable Models and EM algorithm
Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationModelling temporal structure (in noise and signal)
Modelling temporal structure (in noise and signal) Mark Woolrich, Christian Beckmann*, Salima Makni & Steve Smith FMRIB, Oxford *Imperial/FMRIB temporal noise: modelling temporal autocorrelation temporal
More informationTWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES. Mika Inki and Aapo Hyvärinen
TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES Mika Inki and Aapo Hyvärinen Neural Networks Research Centre Helsinki University of Technology P.O. Box 54, FIN-215 HUT, Finland ABSTRACT
More informationChapter 16. Structured Probabilistic Models for Deep Learning
Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More informationHidden Markov Models and Gaussian Mixture Models
Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of
More informationA two-layer ICA-like model estimated by Score Matching
A two-layer ICA-like model estimated by Score Matching Urs Köster and Aapo Hyvärinen University of Helsinki and Helsinki Institute for Information Technology Abstract. Capturing regularities in high-dimensional
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory
More informationUnsupervised Learning
2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and
More informationNeural coding Ecological approach to sensory coding: efficient adaptation to the natural environment
Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment Jean-Pierre Nadal CNRS & EHESS Laboratoire de Physique Statistique (LPS, UMR 8550 CNRS - ENS UPMC Univ.
More informationVariational Principal Components
Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationA Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement
A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationPart 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior Tom Heskes joint work with Marcel van Gerven
More informationHidden Markov Models and Gaussian Mixture Models
Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 25&29 January 2018 ASR Lectures 4&5 Hidden Markov Models and Gaussian
More informationA latent variable modelling approach to the acoustic-to-articulatory mapping problem
A latent variable modelling approach to the acoustic-to-articulatory mapping problem Miguel Á. Carreira-Perpiñán and Steve Renals Dept. of Computer Science, University of Sheffield {miguel,sjr}@dcs.shef.ac.uk
More informationGatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II
Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II Gatsby Unit University College London 27 Feb 2017 Outline Part I: Theory of ICA Definition and difference
More informationNeural networks: Unsupervised learning
Neural networks: Unsupervised learning 1 Previously The supervised learning paradigm: given example inputs x and target outputs t learning the mapping between them the trained network is supposed to give
More informationQuantitative Biology II Lecture 4: Variational Methods
10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate
More informationBayesian Inference. Will Penny. 24th February Bayesian Inference. Will Penny. Bayesian Inference. References
24th February 2011 Given probabilities p(a), p(b), and the joint probability p(a, B), we can write the conditional probabilities p(b A) = p(a B) = p(a, B) p(a) p(a, B) p(b) Eliminating p(a, B) gives p(b
More informationLatent state estimation using control theory
Latent state estimation using control theory Bert Kappen SNN Donders Institute, Radboud University, Nijmegen Gatsby Unit, UCL London August 3, 7 with Hans Christian Ruiz Bert Kappen Smoothing problem Given
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationGWAS V: Gaussian processes
GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011
More informationLinear Dynamical Systems (Kalman filter)
Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete
More information+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing
Linear recurrent networks Simpler, much more amenable to analytic treatment E.g. by choosing + ( + ) = Firing rates can be negative Approximates dynamics around fixed point Approximation often reasonable
More informationNeural Encoding Models
Neural Encoding Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2011 Studying sensory systems x(t) y(t) Decoding: ˆx(t)= G[y(t)]
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationSound Recognition in Mixtures
Sound Recognition in Mixtures Juhan Nam, Gautham J. Mysore 2, and Paris Smaragdis 2,3 Center for Computer Research in Music and Acoustics, Stanford University, 2 Advanced Technology Labs, Adobe Systems
More informationLecture 4: Probabilistic Learning
DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationLecture 21: Spectral Learning for Graphical Models
10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationDimension Reduction. David M. Blei. April 23, 2012
Dimension Reduction David M. Blei April 23, 2012 1 Basic idea Goal: Compute a reduced representation of data from p -dimensional to q-dimensional, where q < p. x 1,...,x p z 1,...,z q (1) We want to do
More informationProbabilistic & Unsupervised Learning. Latent Variable Models
Probabilistic & Unsupervised Learning Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London
More informationDynamic Bayesian Networks with Deterministic Latent Tables
Dynamic Bayesian Networks with Deterministic Latent Tables David Barber Institute for Adaptive and Neural Computation Edinburgh University 5 Forrest Hill, Edinburgh, EH1 QL, U.K. dbarber@anc.ed.ac.uk Abstract
More informationUnsupervised Learning with Permuted Data
Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University
More informationGaussian with mean ( µ ) and standard deviation ( σ)
Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks Delivered by Mark Ebden With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable
More informationLinking non-binned spike train kernels to several existing spike train metrics
Linking non-binned spike train kernels to several existing spike train metrics Benjamin Schrauwen Jan Van Campenhout ELIS, Ghent University, Belgium Benjamin.Schrauwen@UGent.be Abstract. This work presents
More informationMACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 8: Latent Variables, EM Yann LeCun
Y. LeCun: Machine Learning and Pattern Recognition p. 1/? MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 8: Latent Variables, EM Yann LeCun The Courant Institute, New York University http://yann.lecun.com
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationVariational Autoencoders
Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly
More informationUniversity of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I
University of Cambridge MPhil in Computer Speech Text & Internet Technology Module: Speech Processing II Lecture 2: Hidden Markov Models I o o o o o 1 2 3 4 T 1 b 2 () a 12 2 a 3 a 4 5 34 a 23 b () b ()
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationBayesian Paradigm. Maximum A Posteriori Estimation
Bayesian Paradigm Maximum A Posteriori Estimation Simple acquisition model noise + degradation Constraint minimization or Equivalent formulation Constraint minimization Lagrangian (unconstraint minimization)
More informationHigher Order Statistics
Higher Order Statistics Matthias Hennig Neural Information Processing School of Informatics, University of Edinburgh February 12, 2018 1 0 Based on Mark van Rossum s and Chris Williams s old NIP slides
More informationReview: Learning Bimodal Structures in Audio-Visual Data
Review: Learning Bimodal Structures in Audio-Visual Data CSE 704 : Readings in Joint Visual, Lingual and Physical Models and Inference Algorithms Suren Kumar Vision and Perceptual Machines Lab 106 Davis
More informationCIFAR Lectures: Non-Gaussian statistics and natural images
CIFAR Lectures: Non-Gaussian statistics and natural images Dept of Computer Science University of Helsinki, Finland Outline Part I: Theory of ICA Definition and difference to PCA Importance of non-gaussianity
More informationCOM336: Neural Computing
COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk
More informationGaussian Process Approximations of Stochastic Differential Equations
Gaussian Process Approximations of Stochastic Differential Equations Cédric Archambeau Centre for Computational Statistics and Machine Learning University College London c.archambeau@cs.ucl.ac.uk CSML
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationHow to do backpropagation in a brain
How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep
More informationBayesian probability theory and generative models
Bayesian probability theory and generative models Bruno A. Olshausen November 8, 2006 Abstract Bayesian probability theory provides a mathematical framework for peforming inference, or reasoning, using
More informationPosterior Regularization
Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods
More informationAlgorithmisches Lernen/Machine Learning
Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines
More informationWeek 3: The EM algorithm
Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent
More informationBayesian inference J. Daunizeau
Bayesian inference J. Daunizeau Brain and Spine Institute, Paris, France Wellcome Trust Centre for Neuroimaging, London, UK Overview of the talk 1 Probabilistic modelling and representation of uncertainty
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationGaussian Process Latent Variable Models for Dimensionality Reduction and Time Series Modeling
Gaussian Process Latent Variable Models for Dimensionality Reduction and Time Series Modeling Nakul Gopalan IAS, TU Darmstadt nakul.gopalan@stud.tu-darmstadt.de Abstract Time series data of high dimensions
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationDeep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?
More informationNonlinear reverse-correlation with synthesized naturalistic noise
Cognitive Science Online, Vol1, pp1 7, 2003 http://cogsci-onlineucsdedu Nonlinear reverse-correlation with synthesized naturalistic noise Hsin-Hao Yu Department of Cognitive Science University of California
More informationVision Modules and Cue Combination
Cue Coupling This section describes models for coupling different visual cues. The ideas in this section are logical extensions of the ideas in the earlier sections. But we are now addressing more complex
More informationTowards a Bayesian model for Cyber Security
Towards a Bayesian model for Cyber Security Mark Briers (mbriers@turing.ac.uk) Joint work with Henry Clausen and Prof. Niall Adams (Imperial College London) 27 September 2017 The Alan Turing Institute
More informationInference in Explicit Duration Hidden Markov Models
Inference in Explicit Duration Hidden Markov Models Frank Wood Joint work with Chris Wiggins, Mike Dewar Columbia University November, 2011 Wood (Columbia University) EDHMM Inference November, 2011 1 /
More informationDensity estimation. Computing, and avoiding, partition functions. Iain Murray
Density estimation Computing, and avoiding, partition functions Roadmap: Motivation: density estimation Understanding annealing/tempering NADE Iain Murray School of Informatics, University of Edinburgh
More informationProbabilistic Graphical Models for Image Analysis - Lecture 1
Probabilistic Graphical Models for Image Analysis - Lecture 1 Alexey Gronskiy, Stefan Bauer 21 September 2018 Max Planck ETH Center for Learning Systems Overview 1. Motivation - Why Graphical Models 2.
More information