Generative models for amplitude modulation: Gaussian Modulation Cascade Processes

Size: px
Start display at page:

Download "Generative models for amplitude modulation: Gaussian Modulation Cascade Processes"

Transcription

1 Generative models for amplitude modulation: Gaussian Modulation Cascade Processes Richard Turner Maneesh Sahani Gatsby Computational Neuroscience Unit, 31/01/2007

2 Motivation time /s

3 Motivation time /s

4 Motivation: Traditional AM = time /s

5 Motivation: Traditional AM = time /s

6 Motivation: Demodulate the Modulator = time /s

7 Motivation: Demodulation Cascade = time /s

8 Goal and stucture of the talk Eventual GOAL: Construct a latent variable model for sounds to understand auditory processing in the brain. Amplitude modulation is an important statistical dimension of sounds Auditory system listens attentively to amplitude modulation First half of talk: Statistical models for amplitude demodulation Second half of talk: A generative model for sounds

9 Applications for Amplitude Demodulation Our focus: Natural scene statistics and auditory processing Psychophysics: Manipulation of speech Signal processing applications Bearing failure prediction (bearings account for half of all motor failures) Financial models

10 Traditional Demodulation Algorithms: Hilbert transform x 1 y and x time/s

11 Traditional Demodulation Algorithms: Hilbert transform x 1 y and x time/s

12 Traditional Demodulation Algorithms: Hilbert transform x 1 y and x time/s

13 Traditional Demodulation Algorithms: square and lowpass x 1 y and x time/s

14 Traditional Demodulation Algorithms: square and lowpass x 1 y and x time/s

15 Traditional Demodulation Algorithms: square and lowpass x 1 y and x time/s

16 Traditional Demodulation Algorithms are not sufficient Hilbert transform recovers a carrier of constant variance. recovers a modulator that may not correspond to the envelope we want. no tunable parameter Square and lowpass recovers a good-looking smooth envelope recovers a carrier of high variance i.e. not demodulated has a tunable parameter

17 Advantages of the probabilistic approach Representing a signal as: quickly varying carrier slowly varying positive envelope is fundamentally an ill posed problem we need prior information and therefore the probabilistic setting is the natural one. Makes assumptions explicit allowing them to be criticed and improved Allows us to put error-bars on our inferences We can develop a family of related algorithms: cheap and cheerful, slow and Bayesian, with(out) hand-tunable parameters Tap into the existing theory and methods to e.g. deal with missing data, model comparison etc.

18 A simple generative model for AM x (2) t = f a (z (2) t ) z (2) t Norm(λz (2) t 1, σ2 ) x (1) t Norm(0, 1) y t = x (1) t x (2) t

19 A simple generative model for AM l eff = 1/log(λ) x (2) 1:T x (2) t = f a (z (2) t ) z (2) t Norm(λz (2) t 1, σ2 ) x (1) t Norm(0, 1) y t = x (1) t x (2) t

20 A simple generative model for AM l eff = 1/log(λ) x (2) 1:T x (2) t = f a (z (2) t ) z (2) t Norm(λz (2) t 1, σ2 ) x (1) t Norm(0, 1) y t = x (1) t x (2) t x (1) 1:T

21 A simple generative model for AM l eff = 1/log(λ) x (2) 1:T x (2) t = f a (z (2) t ) z (2) t Norm(λz (2) t 1, σ2 ) x (1) t Norm(0, 1) y t = x (1) t x (2) t x (1) 1:T y 1:T

22 A cheap and cheerful learning algorithm Maximise over the envelope and other parameters p(y 1:T, x (1) 1:T, x(2) 1:T, λ σ2, a) = p(λ)p(y 1:T x (1) 1:T, x(2) 1:T )p(x(1) 1:T )p(x(2) 1:T )

23 A cheap and cheerful learning algorithm Maximise over the envelope and other parameters p(y 1:T, x (1) 1:T, x(2) 1:T, λ σ2, a) = p(λ)p(y 1:T x (1) 1:T, x(2) 1:T )p(x(1) p(z 1:T ) {z } TY dz t 1:T ) p(x(2) 1:T ) t=1 dx (2) t

24 A cheap and cheerful learning algorithm Maximise over the envelope and other parameters p(y 1:T, x (1) 1:T, x(2) 1:T, λ σ2, a) = p(λ)p(y 1:T x (1) 1:T, x(2) 1:T )p(x(1) p(y 1:T, x (2) 1:T, λ σ2, a) = Z p(z 1:T ) {z } TY dz t 1:T ) p(x(2) 1:T ) t=1 dx (2) t dx (1) 1:T p(y 1:T, x (1) 1:T, x(2) 1:T, λ σ2, a)

25 A cheap and cheerful learning algorithm Maximise over the envelope and other parameters p(y 1:T, x (1) 1:T, x(2) 1:T, λ σ2, a) = p(λ)p(y 1:T x (1) 1:T, x(2) 1:T )p(x(1) p(y 1:T, x (2) 1:T, λ σ2, a) = p(y 1:T, x (2) 1:T σ2, a) = Z Z p(z 1:T ) {z } TY dz t 1:T ) p(x(2) 1:T ) t=1 dx (2) t dx (1) 1:T p(y 1:T, x (1) 1:T, x(2) 1:T, λ σ2, a) Z dλ dx (1) 1:T p(y 1:T, x (1) 1:T, x(2) 1:T, λ σ2, a)

26 A cheap and cheerful learning algorithm Maximise over the envelope and other parameters p(y 1:T, x (1) 1:T, x(2) 1:T, λ σ2, a) = p(λ)p(y 1:T x (1) 1:T, x(2) 1:T )p(x(1) p(y 1:T, x (2) 1:T, λ σ2, a) = p(y 1:T, x (2) 1:T σ2, a) = Z Z p(z 1:T ) {z } TY dz t 1:T ) p(x(2) 1:T ) t=1 dx (2) t dx (1) 1:T p(y 1:T, x (1) 1:T, x(2) 1:T, λ σ2, a) Z dλ dx (1) 1:T p(y 1:T, x (1) 1:T, x(2) 1:T, λ σ2, a) arg max x (2) 1:T,σ2,a p(y 1:T, x (2) 1:T σ2, a)

27 Results x 1 y and x x 10 4

28 Results x 1 y and x x 10 4

29 Results x 1 y and x x 10 4

30 Desirable features of a new generative model We could use this algorithm to collect low level statistics of amplitude modulation in the multidimensional setting Alternative is to build a model which attempts to learn this structure in a higher-level prior.

31 Desirable features of a new generative model 1. Sparse outputs. 2. Explicit temporal dimension, smooth latent variables. 3. Hierarchical prior that captures the AM statistics of sounds at different time scales: cascade of modulatory processes, with slowly varying processes at the top modulating more rapidly varying signals towards the bottom. 4. Learnable; and we would like to preserve information about the uncertainty, and possibly correlations, in our inferences.

32 x (2) t,1 K 2 = x (1) t,1 K 1 = = (1) (2) y t = x x1,t 1,t t

33 x (2) t,1 K 2 = x (1) t,1 x (1) t,2 K 1 = g 11 g 21 = (1) (1) (2) y t = ( g 11 x 1,t + g x2,t 21 ) x1,t t

34 x (2) t,1 K 2 = x (1) t,1 x (1) t,2 K 1 = g 11 g 21 = (1) (1) (2) y t = ( x 1,t + x2,t) x1,t t

35 x (3) t,1 K 3 = x (2) t,1 x (2) t,2 K 2 = x (1) t,1 x (1) t,2 x (1) t,3 K 1 = y t = x (3) t,1 [ x(2) t,1 (x(1) t,1 g x(1) t,2 g x(1) t,3 g 311 ) + x(2) t,2 (x(1) t,1 g x(1) t,2 g x(1) t,3 g 321 ) ]

36 M = 3, K 1 = 3, K 2 = 2, K 3 = 1, D = 80

37 Inference and learning Inference and learning by EM has an intractable E-Step. But due to the structure of the non-linearity variational EM can be used. Key point: If we freeze all the latent time-series bar one, the distribution over the unfrozen chain is Gaussian. Leads to a family of efficient variational approximations: p(x Y ) m q(x(m) 1:T,1:K m ), where each approximating distribution is a Gaussian.

38 Inference and Learning: Proof of concept x (2) t,1 K 2 = x (1) t,1 x (1) t,2 K 1 = g 11 g 21 = (1) (1) (2) y t = ( x 1,t + x2,t) x1,t t

39 Inference and Learning: Proof of concept X (2) 1,t Y d X (1) 2,t X (1) 1,t Time

40 Current work and Future directions Apply to natural sounds (require good initialisation) and single microphone source separation Generalise the model to have non-local features: to capture sweeps correlations in the prior: to capture the mutual-exclusivity of voiced and unvoiced sections of speech Representation: real sounds live on a hyper-plane in STFT space: can we project our model onto this manifold? How do we map posteriors to spike-trains: p(spike trains y) = f[q(x y)]?

41 Extra slides...

42 AM: a candidate organising principle in the Auditory System? The auditory system listens attentively to Amplitude Modulation Examples include Comodulation Masking Release in psychophysics, and a possible topographic mapping of AM in the IC from electrophysiology. Main goal: discover computational principles underpinning auditory processing. But armed with a generative model you can do: segregation, fill in missing data/remove artifacts etc. sound denoising, source

43 What s up with current generative models? Model bubeicabube SC latents sparse share power slowly varying Assumption: Latent variables are sparse. learnable

44 What s up with current generative models? Model bubeicabube SC GSM latents sparse share power slowly varying learnable Assumption: Latents are sparse, and share power. x = λu λ 0 a positive scalar random variable, u G(0, Q)

45 What s up with current generative models? Model latents sparse share power slowly varying bubeicabube SC GSM SFA learnable Assumption: Latents are slow.

46 What s up with current generative models? Model latents sparse share power slowly varying bubeicabube SC GSM SFA Bubbles learnable Assumption: Latents are sparse, slow (and share power).

47 What s the point in a generative model for sounds? Theoretical Neuroscience Understand neural receptive fields: if a latent variable model provides a computationally effective representation of sounds, neurons might encode p(latent sound) Psychophysics: Play sounds to subjects (possibly drawn from the forward model), and compare their inferences with inference in the model. Machine Learning Fill in missing data: e.g. to fill in an intermittent recording or remove artifacts Denoise sounds, Stream segregation, Compression

48 Outline Natural auditory scene statistics - Amplitude modulation is a key component Amplitude modulation in the auditory system - a candidate organising principle (the auditory system is short on such things) Previous statistical models of natural scenes - Gaussian Scale Mixture Models and AM A new model for sounds: The Gaussian Modulation Cascade Process On going issues...

49 Natural Auditory Scene statistics: Acoustic ecology Marginal distribution very sparse (more so than in vision) WHY? Lots of soft sounds e.g. the pauses between utterances in speech sounds and rare highly structured localised events that carry substantial parts of the stimulus energy

50 Statistics of Amplitude Modulation (Attias & Schreiner) Methods: sound filterbank Hilbert transform find envelope a(t, ω) take logs and transform to zero-mean and unit variance: â(t, ω) Marginal statistics of â(t, ω) p[â(t, ω)] exp[ γâ(t,ω)] (β 2 +â(t,ω) 2 ) α/2 Spectrum of â(t, ω) F T [â(t, ω)] 1 (ω 2 0 +ω2 ) α/2

51 Results summary Marginal distribution very sparse (finite prob of arb. soft sounds): Independent of filter centre frequency, filter bandwidth, time resolution. If AM stats were uncorrelated over time and frequency, CLT would predict increasing the filter bandwidth/time resolution would make the distribution more Gaussian. Spectrum of the amplitude modulations Independent of filter centre frequency Modified power law, indicating long temporal correlations (scale invariance) Independent of filter bandwidth (cf. Voss and Clarke 1/f) Implication: A good generative model of sounds should capture long correlations in AM across frequency bands and time (> 100ms) 2. Each location on the cochlea sees the same AM stats

52 AM in the Auditory System - Highlights from lots of work Psychophysics: Comodulation masking release task: detect a tone in noise. Alter the bandwidth of the noise and measure threshold Repeat but amplitude modulate the masker.

53 Electrophysiological data Type 1 AN fibres phase lock to AM (implies temporal code), as we move up the neuraxis the tuning moves from temporal to rate. Evidence in IC for topographic AM tuning (Schreiner and Langer, 1988) Cortex: AM processing seems filter independent (modulation filter bank?) Jury still out but AM may be a fundamental organising principle

54 What s wrong with statistical models for natural scenes? ICA and Sparse coding I p(x) = p(x i ) (1) i=1 p(x i ) = sparse (2) Recognition weights: p ICA (x y) = δ(x Ry) Images Sounds Problems: p ICA (y x) = δ(gx y) (3) p SC (y x) = Norm(Gx, σ 2 I) (4) 1. Extracted latents are not independent: correlations in their power. 2. No explicit temporal dimension - not a true generative model for sounds or movies

55 1. Empirical distribution of the latents - power correlations

56 Caption for previous figure: Explanation Expected joint distribution of latents was starry (top left). Empirically the joint is found to be elliptical - many more instances of two latents being high than expected (top right). Another way of seeing this is to look at the conditionals (bottom): if one latent has high power then near by latents also tend to have high power. How do we improve the model? 1. Fix the bottom level p(y x) = δ(y Gx) 2. Fixes recognition distribution p(x y) = δ(x Ry) 3. p(x) = dyp(x y)p(y) 1 N n p(x y n) 4. These are exactly the distributions plotted on the previous page 5. No matter what R is we get similar empirical distributions 6. So choose a new prior to match them...

57 x = λu Gaussian Scale Mixtures (GSMs) λ 0 a scalar random variable u G(0, Q) λ and u are independent density of these semi-parametric models can be expressed as an integral: p(x) = Z p(x λ)p(λ)dλ = Z 2πλ 2 Q 1/2 exp xt Q 1 x 2λ 2! p(λ)dλ (5) e.g. p(λ) discrete MOG (components 0 mean), p(λ) =Gamma student-t. Imagine generalising this to the temporal setting: x(t) = λ(t)u(t) = positive envelope carrier Are we seeing the hall marks of AM in a non-temporal setting?

58 Learning a neighbourhood (Karklin and Lewicki 2003, 05, 06) The statistical dependencies between filters depends on their separation (in space, scale, and orientation.) We d like to learn these neighbourhoods of dependence Solution: Share multipliers in a linear fashion using a GSM with a generalised log-normal over the variances. p(z j ) = Norm(0, 1) (6) λ 2 i = exp j h ij z j (7) p(x i z, B) = Norm(0, λ i ) (8) p(y G, x) = Norm(Gx, σ 2 I) (9)

59 A temporal GSMM: The proof of concept Bubbles model p(z i,t ) = sparse point process (10) λ 2 i,t = f j h ij Φ(t) z j (t) (11) p(x i,t λ i,t ) = Norm(0, λ 2 i,t) (12) p(y t G, x t ) = δ(gx t y t ) (13) Temporal correlations between the multipliers are captured by the moving average Φ(t) u j (t) (creates a SLOW ENVELOPE) Bubbles of activity in latent space (both in space and time) Columns of h i,j fixed and change smoothly to induce topographic structure - computationally useful

60 Learning Common to learn the parameters using zero-temperature EM: q(x) = δ(x X MAP ) p(x Y ) (14) uncertainty and correlational information is lost. 1. Effects learning 2. To compare to neural data need to specify a mapping: p(spike trains y) = f[q(x y)] = f(x MAP ) for this approximation. 3. BUT we believe neural populations will represent uncertainty and correlations in latent variables. We d like to retain variance and correlational information, both for learning and for comparison to biology.

61 Motivations for the GMCP 1. Sparse outputs. 2. Explicit temporal dimension, smooth latent variables. 3. Hierarchical prior that captures the AM statistics of sounds at different time scales: cascade of modulatory processes, with slowly varying processes at the top modulating more rapidly varying signals towards the bottom. 4. Learnable; and we would like to preserve information about the uncertainty, and possibly correlations, in our inferences.

62 Dynamics: The Gaussian Modulation Cascade Process ( p(x (m) k,t x(m) k,t 1:t τ m, λ m k,1:τ m, σm,k 2 ) = Norm τm ) t =1 λ(m) k,t x (m) k,t t, σm,k 2 Emission Distribution: p(y t x (1:M) t, g k1 :k M, σ 2 y) = Norm ( k 1 :k M g k 1 :k M Time-frequency representation: y t = filter bank outputs ) M m=1 x(m) k m,t, σ2 yi

63 e.g. M = 2, K 1 = 2, K 2 = 1, D = 1 x (2) t, x (1) t,1 x (1) t, (1) (1) y t = ( g 11 x +g21 1,t x 2,t ) (2) x1,t g 11 + g 21 = t

64 e.g. M = 4, K 1 = 4, K 2 = 2, K 3 = 2, K 4 = 1, D = 80

65 Comments on the model A GSM: latent = rectified Gaussian Gaussian Emission distribution related to Grimes & Rao, Tenenbaum & Freeman (for M=2) and Vasilescu & Terzopoulos for general M., but the temporal priors allow us to be fully unsupervised. As p(x m 1:K,1:T x1:m m 1:K,1:T, y 1:T, θ) is Gaussian, there are a family of variational EM algorithms we can use to learn the model.

66 Inference and Learning: Proof of concept X (2) 1,t Y d X (1) 2,t X (1) 1,t Time

67 Future directions and current issues DIRECTIONS correlated prior: speech is usually periodic (voiced) or unvoiced - mutually exclusive, easier to represent sweeps. non-local features: make G temporal to capture sweeps ISSUES Representation: real sounds live on a hyper-plane in filter-bank space - can we project our model onto this manifold? Initialisation: The free energy has lots of local minima: use Slow modulations [da(g ] T 2 analysis to initialise arg min n y) gn dt such that a(gn T y)a(gmy) T = δ mn How do we map posteriors to spike-trains: p(spike trains y) = f[q(x y)]?

68 What s up with current generative models? Model p(x (1) ) p(y x (1) ) ICA sparse δ(y Gx (1) ) SC sparse Norm(Gx (1), σyi) 2 Assumption: Latent variables are sparse.

69 What s up with current generative models? Model p(x (2) ) p(x (1) x (2) ) p(y x (1) ) ICA N/A sparse δ(y Gx (1) ) SC N/A sparse Norm(Gx (1), σyi) 2 GSM Norm(0, I) Norm(0, λ 2 i ) Norm(Gx(1), σyi) 2 λ 2 i = exp(ht i x(2) ) Assumption: Latents are sparse, and share power. x (1) = λu λ 0 a positive scalar random variable, u G(0, Q)

70 What s up with current generative models? Model p(x (2) ) p(x (1) x (2) ) p(y x (1) ) ICA N/A sparse δ(y Gx (1) ) SC N/A sparse Norm(Gx (1), σyi) 2 GSM Norm(0, I) Norm(0, λ 2 i ) Norm(Gx(1), σyi) 2 λ 2 i = exp(ht i x(2) ) SFA N/A Norm(γx t 1, σ 2 ) δ(y Gx (1) ) Assumption: Latents are slow.

71 What s up with current generative models? Model p(x (2) ) p(x (1) x (2) ) p(y x (1) ) ICA N/A sparse δ(y Gx (1) ) SC N/A sparse Norm(Gx (1), σ 2 yi) GSM Norm(0, I) Norm(0, λ 2 i ) Norm(Gx(1), σ 2 yi) λ 2 i = exp(ht i x(2) ) SFA N/A Norm(γx t 1, σ 2 ) δ(y Gx (1) ) Bubbles point-process Norm(0, λ 2 i,t ) δ(y Gx(1) ) λ 2 i,t = f(ht i x(2) t φ t ) Assumption: Latents are sparse, slow (and share power).

Is early vision optimised for extracting higher order dependencies? Karklin and Lewicki, NIPS 2005

Is early vision optimised for extracting higher order dependencies? Karklin and Lewicki, NIPS 2005 Is early vision optimised for extracting higher order dependencies? Karklin and Lewicki, NIPS 2005 Richard Turner (turner@gatsby.ucl.ac.uk) Gatsby Computational Neuroscience Unit, 02/03/2006 Outline Historical

More information

Modeling Natural Sounds with Gaussian Modulation Cascade Processes

Modeling Natural Sounds with Gaussian Modulation Cascade Processes Modeling Natural Sounds with Gaussian Modulation Cascade Processes Anonymous Author(s) Affiliation Address City, State/Province, Postal Code, Country email Abstract The computational principles underpinning

More information

Gaussian Processes for Audio Feature Extraction

Gaussian Processes for Audio Feature Extraction Gaussian Processes for Audio Feature Extraction Dr. Richard E. Turner (ret26@cam.ac.uk) Computational and Biological Learning Lab Department of Engineering University of Cambridge Machine hearing pipeline

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College

More information

Hierarchy. Will Penny. 24th March Hierarchy. Will Penny. Linear Models. Convergence. Nonlinear Models. References

Hierarchy. Will Penny. 24th March Hierarchy. Will Penny. Linear Models. Convergence. Nonlinear Models. References 24th March 2011 Update Hierarchical Model Rao and Ballard (1999) presented a hierarchical model of visual cortex to show how classical and extra-classical Receptive Field (RF) effects could be explained

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

PILCO: A Model-Based and Data-Efficient Approach to Policy Search PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol

More information

The 1d Kalman Filter. 1 Understanding the forward model. Richard Turner

The 1d Kalman Filter. 1 Understanding the forward model. Richard Turner The d Kalman Filter Richard Turner This is a Jekyll and Hyde of a document and should really be split up. We start with Jekyll which contains a very short derivation for the d Kalman filter, the purpose

More information

A Maximum-Likelihood Interpretation for Slow Feature Analysis

A Maximum-Likelihood Interpretation for Slow Feature Analysis 1 A Maximum-Likelihood Interpretation for Slow Feature Analysis Richard Turner turner@gatsby.ucl.ac.uk Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, University College

More information

Single Channel Signal Separation Using MAP-based Subspace Decomposition

Single Channel Signal Separation Using MAP-based Subspace Decomposition Single Channel Signal Separation Using MAP-based Subspace Decomposition Gil-Jin Jang, Te-Won Lee, and Yung-Hwan Oh 1 Spoken Language Laboratory, Department of Computer Science, KAIST 373-1 Gusong-dong,

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary discussion 1: Most excitatory and suppressive stimuli for model neurons The model allows us to determine, for each model neuron, the set of most excitatory and suppresive features. First,

More information

Population Coding. Maneesh Sahani Gatsby Computational Neuroscience Unit University College London

Population Coding. Maneesh Sahani Gatsby Computational Neuroscience Unit University College London Population Coding Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2010 Coding so far... Time-series for both spikes and stimuli Empirical

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

New Machine Learning Methods for Neuroimaging

New Machine Learning Methods for Neuroimaging New Machine Learning Methods for Neuroimaging Gatsby Computational Neuroscience Unit University College London, UK Dept of Computer Science University of Helsinki, Finland Outline Resting-state networks

More information

Natural Image Statistics

Natural Image Statistics Natural Image Statistics A probabilistic approach to modelling early visual processing in the cortex Dept of Computer Science Early visual processing LGN V1 retina From the eye to the primary visual cortex

More information

Expectation Propagation in Dynamical Systems

Expectation Propagation in Dynamical Systems Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex

More information

Bayesian Methods for Sparse Signal Recovery

Bayesian Methods for Sparse Signal Recovery Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri Motivation Motivation Sparse Signal Recovery

More information

Machine Recognition of Sounds in Mixtures

Machine Recognition of Sounds in Mixtures Machine Recognition of Sounds in Mixtures Outline 1 2 3 4 Computational Auditory Scene Analysis Speech Recognition as Source Formation Sound Fragment Decoding Results & Conclusions Dan Ellis

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Lecture 7: Con3nuous Latent Variable Models

Lecture 7: Con3nuous Latent Variable Models CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi Signal Modeling Techniques in Speech Recognition Hassan A. Kingravi Outline Introduction Spectral Shaping Spectral Analysis Parameter Transforms Statistical Modeling Discussion Conclusions 1: Introduction

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger

More information

Lecture 15. Probabilistic Models on Graph

Lecture 15. Probabilistic Models on Graph Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how

More information

Efficient Coding. Odelia Schwartz 2017

Efficient Coding. Odelia Schwartz 2017 Efficient Coding Odelia Schwartz 2017 1 Levels of modeling Descriptive (what) Mechanistic (how) Interpretive (why) 2 Levels of modeling Fitting a receptive field model to experimental data (e.g., using

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Modelling temporal structure (in noise and signal)

Modelling temporal structure (in noise and signal) Modelling temporal structure (in noise and signal) Mark Woolrich, Christian Beckmann*, Salima Makni & Steve Smith FMRIB, Oxford *Imperial/FMRIB temporal noise: modelling temporal autocorrelation temporal

More information

TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES. Mika Inki and Aapo Hyvärinen

TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES. Mika Inki and Aapo Hyvärinen TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES Mika Inki and Aapo Hyvärinen Neural Networks Research Centre Helsinki University of Technology P.O. Box 54, FIN-215 HUT, Finland ABSTRACT

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models and Gaussian Mixture Models Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of

More information

A two-layer ICA-like model estimated by Score Matching

A two-layer ICA-like model estimated by Score Matching A two-layer ICA-like model estimated by Score Matching Urs Köster and Aapo Hyvärinen University of Helsinki and Helsinki Institute for Information Technology Abstract. Capturing regularities in high-dimensional

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment

Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment Jean-Pierre Nadal CNRS & EHESS Laboratoire de Physique Statistique (LPS, UMR 8550 CNRS - ENS UPMC Univ.

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior

Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior Tom Heskes joint work with Marcel van Gerven

More information

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models and Gaussian Mixture Models Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 25&29 January 2018 ASR Lectures 4&5 Hidden Markov Models and Gaussian

More information

A latent variable modelling approach to the acoustic-to-articulatory mapping problem

A latent variable modelling approach to the acoustic-to-articulatory mapping problem A latent variable modelling approach to the acoustic-to-articulatory mapping problem Miguel Á. Carreira-Perpiñán and Steve Renals Dept. of Computer Science, University of Sheffield {miguel,sjr}@dcs.shef.ac.uk

More information

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II Gatsby Unit University College London 27 Feb 2017 Outline Part I: Theory of ICA Definition and difference

More information

Neural networks: Unsupervised learning

Neural networks: Unsupervised learning Neural networks: Unsupervised learning 1 Previously The supervised learning paradigm: given example inputs x and target outputs t learning the mapping between them the trained network is supposed to give

More information

Quantitative Biology II Lecture 4: Variational Methods

Quantitative Biology II Lecture 4: Variational Methods 10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate

More information

Bayesian Inference. Will Penny. 24th February Bayesian Inference. Will Penny. Bayesian Inference. References

Bayesian Inference. Will Penny. 24th February Bayesian Inference. Will Penny. Bayesian Inference. References 24th February 2011 Given probabilities p(a), p(b), and the joint probability p(a, B), we can write the conditional probabilities p(b A) = p(a B) = p(a, B) p(a) p(a, B) p(b) Eliminating p(a, B) gives p(b

More information

Latent state estimation using control theory

Latent state estimation using control theory Latent state estimation using control theory Bert Kappen SNN Donders Institute, Radboud University, Nijmegen Gatsby Unit, UCL London August 3, 7 with Hans Christian Ruiz Bert Kappen Smoothing problem Given

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

Linear Dynamical Systems (Kalman filter)

Linear Dynamical Systems (Kalman filter) Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete

More information

+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing

+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing Linear recurrent networks Simpler, much more amenable to analytic treatment E.g. by choosing + ( + ) = Firing rates can be negative Approximates dynamics around fixed point Approximation often reasonable

More information

Neural Encoding Models

Neural Encoding Models Neural Encoding Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2011 Studying sensory systems x(t) y(t) Decoding: ˆx(t)= G[y(t)]

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Sound Recognition in Mixtures

Sound Recognition in Mixtures Sound Recognition in Mixtures Juhan Nam, Gautham J. Mysore 2, and Paris Smaragdis 2,3 Center for Computer Research in Music and Acoustics, Stanford University, 2 Advanced Technology Labs, Adobe Systems

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Lecture 21: Spectral Learning for Graphical Models

Lecture 21: Spectral Learning for Graphical Models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Dimension Reduction. David M. Blei. April 23, 2012

Dimension Reduction. David M. Blei. April 23, 2012 Dimension Reduction David M. Blei April 23, 2012 1 Basic idea Goal: Compute a reduced representation of data from p -dimensional to q-dimensional, where q < p. x 1,...,x p z 1,...,z q (1) We want to do

More information

Probabilistic & Unsupervised Learning. Latent Variable Models

Probabilistic & Unsupervised Learning. Latent Variable Models Probabilistic & Unsupervised Learning Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London

More information

Dynamic Bayesian Networks with Deterministic Latent Tables

Dynamic Bayesian Networks with Deterministic Latent Tables Dynamic Bayesian Networks with Deterministic Latent Tables David Barber Institute for Adaptive and Neural Computation Edinburgh University 5 Forrest Hill, Edinburgh, EH1 QL, U.K. dbarber@anc.ed.ac.uk Abstract

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

Gaussian with mean ( µ ) and standard deviation ( σ)

Gaussian with mean ( µ ) and standard deviation ( σ) Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks Delivered by Mark Ebden With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable

More information

Linking non-binned spike train kernels to several existing spike train metrics

Linking non-binned spike train kernels to several existing spike train metrics Linking non-binned spike train kernels to several existing spike train metrics Benjamin Schrauwen Jan Van Campenhout ELIS, Ghent University, Belgium Benjamin.Schrauwen@UGent.be Abstract. This work presents

More information

MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 8: Latent Variables, EM Yann LeCun

MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 8: Latent Variables, EM Yann LeCun Y. LeCun: Machine Learning and Pattern Recognition p. 1/? MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 8: Latent Variables, EM Yann LeCun The Courant Institute, New York University http://yann.lecun.com

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Variational Autoencoders

Variational Autoencoders Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly

More information

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I University of Cambridge MPhil in Computer Speech Text & Internet Technology Module: Speech Processing II Lecture 2: Hidden Markov Models I o o o o o 1 2 3 4 T 1 b 2 () a 12 2 a 3 a 4 5 34 a 23 b () b ()

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Bayesian Paradigm. Maximum A Posteriori Estimation

Bayesian Paradigm. Maximum A Posteriori Estimation Bayesian Paradigm Maximum A Posteriori Estimation Simple acquisition model noise + degradation Constraint minimization or Equivalent formulation Constraint minimization Lagrangian (unconstraint minimization)

More information

Higher Order Statistics

Higher Order Statistics Higher Order Statistics Matthias Hennig Neural Information Processing School of Informatics, University of Edinburgh February 12, 2018 1 0 Based on Mark van Rossum s and Chris Williams s old NIP slides

More information

Review: Learning Bimodal Structures in Audio-Visual Data

Review: Learning Bimodal Structures in Audio-Visual Data Review: Learning Bimodal Structures in Audio-Visual Data CSE 704 : Readings in Joint Visual, Lingual and Physical Models and Inference Algorithms Suren Kumar Vision and Perceptual Machines Lab 106 Davis

More information

CIFAR Lectures: Non-Gaussian statistics and natural images

CIFAR Lectures: Non-Gaussian statistics and natural images CIFAR Lectures: Non-Gaussian statistics and natural images Dept of Computer Science University of Helsinki, Finland Outline Part I: Theory of ICA Definition and difference to PCA Importance of non-gaussianity

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

Gaussian Process Approximations of Stochastic Differential Equations

Gaussian Process Approximations of Stochastic Differential Equations Gaussian Process Approximations of Stochastic Differential Equations Cédric Archambeau Centre for Computational Statistics and Machine Learning University College London c.archambeau@cs.ucl.ac.uk CSML

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

How to do backpropagation in a brain

How to do backpropagation in a brain How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep

More information

Bayesian probability theory and generative models

Bayesian probability theory and generative models Bayesian probability theory and generative models Bruno A. Olshausen November 8, 2006 Abstract Bayesian probability theory provides a mathematical framework for peforming inference, or reasoning, using

More information

Posterior Regularization

Posterior Regularization Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

Week 3: The EM algorithm

Week 3: The EM algorithm Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent

More information

Bayesian inference J. Daunizeau

Bayesian inference J. Daunizeau Bayesian inference J. Daunizeau Brain and Spine Institute, Paris, France Wellcome Trust Centre for Neuroimaging, London, UK Overview of the talk 1 Probabilistic modelling and representation of uncertainty

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Gaussian Process Latent Variable Models for Dimensionality Reduction and Time Series Modeling

Gaussian Process Latent Variable Models for Dimensionality Reduction and Time Series Modeling Gaussian Process Latent Variable Models for Dimensionality Reduction and Time Series Modeling Nakul Gopalan IAS, TU Darmstadt nakul.gopalan@stud.tu-darmstadt.de Abstract Time series data of high dimensions

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

Nonlinear reverse-correlation with synthesized naturalistic noise

Nonlinear reverse-correlation with synthesized naturalistic noise Cognitive Science Online, Vol1, pp1 7, 2003 http://cogsci-onlineucsdedu Nonlinear reverse-correlation with synthesized naturalistic noise Hsin-Hao Yu Department of Cognitive Science University of California

More information

Vision Modules and Cue Combination

Vision Modules and Cue Combination Cue Coupling This section describes models for coupling different visual cues. The ideas in this section are logical extensions of the ideas in the earlier sections. But we are now addressing more complex

More information

Towards a Bayesian model for Cyber Security

Towards a Bayesian model for Cyber Security Towards a Bayesian model for Cyber Security Mark Briers (mbriers@turing.ac.uk) Joint work with Henry Clausen and Prof. Niall Adams (Imperial College London) 27 September 2017 The Alan Turing Institute

More information

Inference in Explicit Duration Hidden Markov Models

Inference in Explicit Duration Hidden Markov Models Inference in Explicit Duration Hidden Markov Models Frank Wood Joint work with Chris Wiggins, Mike Dewar Columbia University November, 2011 Wood (Columbia University) EDHMM Inference November, 2011 1 /

More information

Density estimation. Computing, and avoiding, partition functions. Iain Murray

Density estimation. Computing, and avoiding, partition functions. Iain Murray Density estimation Computing, and avoiding, partition functions Roadmap: Motivation: density estimation Understanding annealing/tempering NADE Iain Murray School of Informatics, University of Edinburgh

More information

Probabilistic Graphical Models for Image Analysis - Lecture 1

Probabilistic Graphical Models for Image Analysis - Lecture 1 Probabilistic Graphical Models for Image Analysis - Lecture 1 Alexey Gronskiy, Stefan Bauer 21 September 2018 Max Planck ETH Center for Learning Systems Overview 1. Motivation - Why Graphical Models 2.

More information