Learning and Meta-learning

Size: px
Start display at page:

Download "Learning and Meta-learning"

Transcription

1 Learning and Meta-learning computation making predictions choosing actions acquiring episodes statistics algorithm gradient ascent (eg of the likelihood) correlation Kalman filtering implementation Hebbian synpatic plasticity neuromodulation 1

2 Adaptation and Learning Two strategies: bottom-up rules of neural plasticity (Hebbian; sliding threshold) mathematical application (development) computational significance (PCA; projection pursuit; probabilistic model fitting) top-down function(s) E(w) w w E(w) delta rule; basis functions 2

3 Types of Learning supervised v u inputs u and desired or target outputs v both provided reinforce max r u input u and scalar evaluation r often with temporal credit assignment problem unsupervised u or self-supervised learn structure from statistics These are closely related: supervised learn P[v u] unsupervised learn P[v, u] 3

4 Famously suggested: Hebb if cell A consistently contributes to the activity of cell B, then the synapse from A to B should be strengthened strong element of causality what about weakening (LTD)? multiple timescales STP to protein synthesis multiple biochemical mechanisms systems: hippocampus multiple sub-areas neocortex layer and area differences cerebellum LTD is the norm 4

5 Neural Rules field potential amplitude (mv) s Hz LTP 10 2 min Hz LTD time (min) potentiated level depressed, partially depotentiated level control level 5

6 Stability and Competition Hebbian learning involves positive feedback. Control by: LTD usually not enough covariance versus correlation saturation prevent synaptic weights from getting too big (or too small) triviality beckons competition spike-time dependent learning rules normalization over pre-synaptic or post-synaptic arbors subtractive: decrease all synapses by the same amount whether large or small multiplicative: decrease large synapses by more than small synapses 6

7 Preamble Linear firing rate model Nu dv τ r dt = v + w u = v + b=1 w b u b assume that τ r is small compared with the rate of change of the weights, then during plasticity v = w u Then have τ w dw dt = f(v,u,w) Supervised rules use targets to specify v neural basis in ACh? 7

8 The Basic Hebb Rule τ w dw dt = uv averaged over input statistics gives dw τ w = uv = uu w = Q w dt where Q is the input correlation matrix. Positive feedback instability τ w d dt w 2 = 2τ w w dw dt = 2v2 Also have discretised version w w + T τ w Q w. integrating over time, presenting patterns for T seconds. 8

9 Covariance Rule Since LTD really exists, contra Hebb: or τ w dw dt = u (v θ v) τ w dw dt = (u θ u ) v If θ v = v or θ u = u then τ w dw dt = C w where C = (u u )(u u ) is the input covariance matrix. Still unstable τ w d dt w 2 = 2v(v v ) which averages to the (positive) covariance of v. 9

10 BCM Rule Odd to have LTD with v = 0 or u = 0. Evidence for 1.5 τ w dw dt = vu(v θ v). weight change/u v If θ v slides to match a high power of v τ θ dθ v dt = v2 θ v with a fast τ θ, then get competition between synapses intrinsic stabilization. 10

11 Subtractive Normalisation Could normalise w 2 or wb = n w n = (1,1...,1) For subtractive normalisation of n w: dw v(n u) τ w = vu n dt N u with dynamic subtraction, since dn w τ w dt as n n = N u. = vn u ( 1 n n N u ) = 0. Strongly competitive typically all the weights bar one go to 0. Therefore use upper saturating limit. 11

12 The Oja Rule A multiplicative way to ensure w 2 is constant gives τ w dw dt = vu αv2 w d w 2 τ w dt so w 2 1/α. = 2v 2 (1 α w 2 ). Dynamic normalisation could also enforce normalisation all the time. 12

13 Timing-Based Rules A 140 B 90 epsp amplitude (% of control) (+10 ms) (±100 ms) (-10 ms) time (min) percent potentiation t post - t pre (ms) 100 slice cortical pyramidal cells; Xenopus retinotectal system window of 50ms gets Hebbian causality right rate-description τ w dw dt = 0 dτ (H(τ)v(t)u(t τ) + H( τ)v(t τ)u(t)). spike-based description necessary if an input spike can have a measurable impact on an output spike. critical factor is the overall integral net LTD with local LTP. partially self-stabilizing 13

14 Single Postsynaptic Neuron Basic Hebb rule: τ w dw dt = Q w analyse using an eigendecomposition of Q: Q e µ = λ µ e µ λ 1 λ 2... Since Q is symmetric and positive (semi-)definite complete set of real orthonormal evecs with non-negative eigenvalues whose growth is decoupled Write then w(t) = N u µ=1 c µ (t) = c µ (0)exp and w(t) α(t)e 1 as t c µ (t)e µ ( ) t λ µ τ w 14

15 Constraints α(t) = exp(λ µ t/τ w ). Oja makes w(t) e 1 / α saturation can disturb outcome A 1 B w w w w 1 subtractive constraint τ w ẇ = Q w (w Q n)n N u. Sometimes e 1 n so its growth is stunted; and e µ n = 0 for µ 1 so w(t) =(w(0) e 1 )e 1 + N u µ=2 exp ( ) λµ t τ w (w(0) e µ )e µ 15

16 Translation Invariance Particularly important case for development has u b = u Q bb = Q(b b ) Write n = (1,...,1) and J = nn T, then Q = Q N u 2 J 1. e µ n = 0, AC modes are unaffected 2. e µ n 0, DC modes are affected 3. Q has discrete sines and cosines as eigenvectors 4. fourier spectrum of Q are the eigenvalues 16

17 PCA What is the significance of e 1? A 2 u 2, w 2 B C u 1, w 1 2 u 2, w u 2, w u 1, w 1 u 1, w 1 optimal linear reconstruction: minimise E(w,g) = u gv 2 information maximisation: I[v,u] = H[v] H[v x] under a linear model assume u = 0 or use C instead of Q. 17

18 Linear Reconstruction E(w,g) = u gv 2 = K 2w Q g + g 2 w Q w quadratic in w with minimum at w = g g 2 making E(w,g) = K g Q g g 2. look for soln with g= k(e k g)e k and g 2 =1: E(w,g) = K N k=1 (e k g) 2 λ k clearly has e 1 g = 1 and e 2 g = e 3 g =... = 0 Therefore g and w both point along principal component 18

19 Infomax (Linsker) argmax w I[v,u] = H[v] H[v u] Very general unsupervised learning suggestion: H[v u] is not quite well defined unless v = w u + η where η is arbitrarily deterministic H[v] = 1 2 log2πeσ2 for a Gaussian. If P[u] N[0,Q] then v N[0,w Q w + υ 2 ] maximise wqw T subject to w 2 = 1 Same problem as above: implies that note the normalisation w e 1. If non-gaussian, only maximising an upper bound on I[v, u]. 19

20 Ocular Dominance Ú µ cortex Ï L µ µ Ï R µ µ competitive interaction Ù L µ left thalamus right Ù R µ retina-thalamus-cortex OD develops around eye-opening interaction with refinement of topography interaction with orientation interaction with ipsi/contra-innervation effect of manipulations to input Ï L R L R Ï ocularity L R 20

21 Start Simple Consider one input from each eye Then has v = w R u R + w L u L. Q = uu = ( qs q D q D q S ) e 1 = (1,1)/ 2 e 2 = (1, 1)/ 2 λ 1 = q S + q D λ 2 = q S q D so if w + = w R + w L, w = w R w L then τ w dw + dt = (q S + q D )w + τ w dw dt = (q S q D )w. Since q D 0, w+ dominates so use subtractive normalisation τ w dw + dt = 0 τ w dw dt = (q S q D )w. so w ±ω and one eye dominates. 21

22 Orientation Selectivity Model is exactly the same input correlations come from ON/OFF cells: C) 1 D) 1.5 Q(b) ~ ~ Q (b) b ~ 4 6 b Now dominant mode of Q has spatial structure: centre-surround version also possible, but is usually dominated because of non-linear effects. 22

23 Temporal Hebbian Rules Look at rate-based temporal model as w = 1 T dt v(t) τ w 0 ignoring some edge effects. Correlate output v(t) with filtered version of the input dτ H(τ)u(t τ) dτ H(τ)u(t τ) ie look for structure at the scale of the temporal filter 23

24 Multiple Output Neurons output v input u M W u 1 u 2 u 3 u Nu Fixed recurrent connections leads to τ r dv dt = v + W u + M v v = W u + M v = K W u where K=(I M) 1. Thus with Hebbian learning dw τ w = vu = K W Q dt and we can analyse the eigeneffect of K. 24

25 Ocular Dominance Revisited A B u L u R Write w + = w R + w L,w = w R w L, for the projective weights, then τ w dw + dt = (q S + q D )K w + τ w dw dt = (q S q D )K w Since w + is clamped by subtractive normalisation, just interested in the pattern of ± in w. Since K is Töplitz eigenvectors are waves; eigenvalues come from the Fourier transform. A K, e 0 K ~ -0.5 B cortical distance (mm) k (1/mm) 25

26 Comp Hebbian Learning Use a competitive non-linearity z a = ( b W ab u b ) δ a ( b W a bu b ) δ in conjunction with a postive interaction term v a = a M aa z a. and standard Hebbian learning: A Ï L Ï R B Ï R Ï L C output a output a output a left input b right input b input b L R Features: ocularity b W topography b W + x b 26

27 Feature-Based Models Reduced descriptions (x, y, z, r cos(θ), r sin(θ)) x, y topographic location z ocularity ( [ 1, 1]) r orientation strength θ orientation matching replace [W u] a by exp ( b (u b W ab ) 2 /2σ 2 b plus softmax competition and cortical interaction learning self organizing map dw ab τ w = v a (u b W ab ). dt or elastic net only competition and ) τ w dw ab dt = v a (u b W ab ) +β a N(a) (W a b W ab ) 27

28 Large-Scale Results meshing of the patterns of OD and OR: pinwheels linear zones ocular dominance boundaries overall pattern of OD stripes vs elastic net simulation 28

29 Redundancy Multiple units redundancy: Hebbian learning all units the same fixed output connections inadequate One possibility is decorrelation: vv = I. If Gaussian, then complete factorisation. Three approaches: Atick & Redlich force n n mapping and decorrelate using anti-hebbian learning. Földiák use Hebbian and anti-hebbian learning to learn feedforward and lateral weights. Sanger explicitly subtract off first component from subsequent ones. Williams subtract off predicted portion of u 29

30 Goodall v = W u + M v Anti-Hebbian learning is ideal for lateral weights: if v a and v b are correlated make M ab = M ba negative which reduces the correlation Goodall n n with W = I so: Then At Ṁ = 0 v = (I M) 1 x = K x. τ M Ṁ = uv + I M uu K = K 1 K Q K = I. So uu = K uu K = I as required. 30

31 Temporal Plasticity Using the temporal rule: τ w dw dt = 0 dτ (H(τ)v(t)u(t τ) + H( τ)v(t τ)u(t)) A 1.2 v s B place field location (cm) lap number s a = 2 is active before s a = 0 synapse 2 0 gets strengthened s a = 0 extends its firing field backwards 31

32 Supervised Learning Consider case of learning pairs u m, v m : classification binary v m to classify real-valued u m. regression real-valued mapping from u m to v m. storage learn the relationships in the data generalisation infer a functional relationship from limited examples error-correction mistakes drive adaptation Hebbian plasticity: τ w dw dt = vu = 1 N S N S m=1 and (multiplicative) weight decay τ w ẇdt = vu αw, v m u m. makes w vu /α. No positive feedback. 32

33 Classification and the Perceptron Classification rule { 1 if w u γ 0 v = 0 if w u γ < 0 Can use supervised Hebbian learning w = 2 N S N u m=1 v m u m. but works quite poorly for random patterns A B 1 u 3 w probability correct u 1 u N u N S -1 33

34 Function Approximation Basis function network Ú µ Û µ µ Ù µ s output v(s) = w u = w f(s) error E = (h(s) 1 w f(s)) 2 2 reaches a minimum at (normal equations) f(s)f(s) w = f(s)h(s). 34

35 Hebbian Function Approximation When does the Hebbian w = f(s)h(s) /α satisfy the normal equations f(s)f(s) w = f(s)h(s)? 1. input patterns are orthongonal f(s)f(s) = I 2. tight frame condition as then f(s)f(s) w = f(s m ) f(s m ) = cδ mm f(s)f(s) f(s)h(s) = 1 αn 2 S = c αn 2 S α f(s m )f(s m ) f(s m )h(s m ) mm f(s m )h(s m ) m = c αn S f(s)h(s) V1 forms an approximate tight frame 35

36 Error-Correcting Rules Hebbian plasticity is independent of the performance of the network Perceptron learning rule: if v(u m ) = 0 when v m = 1, modify w and γ to increase w u m γ easiest rule: implies that w w + ǫ w (v m v(u m ))u m γ γ ǫ w (v m v(u m )) (w u m γ) = ǫ w (v m v(u m )) ( u m ) which has just the right sign. In fact, guaranteed to converge. note the discrete nature of the weight update 36

37 The Delta Rule Defintion of the task in E(w) how well (poorly) do synaptic weights w perform? Gradient descent: w w ǫ w w E(w) since if w =w ǫ w E(w), then to first order in ǫ w : E(w ǫ w w E) = E(w) ǫ w w E 2 1 E(w) 0.8 E(w) w 37

38 Stochastic Gradient Descent E(w) = 1 (h(s) w f(s)) 2 is an average 2 over many examples. Use random input-output paris s m, h(s m ) and change w w ǫ w w (h(s m ) v(s m )) 2 /2 = w + ǫ w (h(s m ) v(s m ))f(s m ) called stochastic gradient descent. A B C v s s s 38

39 Contrastive Hebbian Learning The delta rule involves: w w + ǫ w (v m u m v(u m )u m ) Hebbian learning v m u m based on target anti-hebbian learning v(u m )u m based on outcome learning stops when outcome = target Generalize to a stochastic network P[v u;w] = exp( E(u,v)) Z(u) Z(u) = exp( E(u, v)) v weights W generate a conditional distribution eg with quadratic form E(u,v) = u W v 39

40 Goal of Learning Natural quality measure for u: ( ) P[v u] D KL (P[v u], P[v u;w]) = P[v u]ln P[v u;w] v = v P[v u]ln(p[v u;w]) + K, average over u m ; v m is sample of P[v u m ] DKL (P[v u], P[v u;w]) 1 N S N S ln(p[v m u m ;W]) m=1 amounts to maximum likelihood learning. ln P[v m u m ;W] W ab = ( ) E(u m,v m ) ln Z(u m ) W ab = v m a um b v P[v u m ;W]v a u m b. is also Hebb anti-hebb positive negative use Gibbs sampling for v P[v u m ;W] unsupervised version is just the same 40

41 Function Approximation Two ignored concerns: 1. examples may be corrupted by noise D {(s m, v m )} v m = h(s m ) + η m 2. want v(s) = h(s) even away from training set in general impossible need prior information eg smoothness of h(s) Gaussian process says that given s 1, s 2,..., s K, h(s 1 ), h(s 2 ),..., h(s K ) are jointly Gaussian with 0 mean (short of expectations about bias covariance matrix determined by prior expectations eg Σ µν = Σ( s µ s ν ) Then distribution of h(s) given D is Gaussian with mean that depends linearly on v 1,..., v N S. 41

42 Functional Priors Alternatively, specify: P[h] e 1 2 φ[h], where φ[h] is a functional, eg: φ[h] = s d s h( s) 2 G( s) in the Fourier domain. G( s) penalises high frequency wobbles in h(s). G( s) = e s 2 /δ If η µ are iid Gaussians, then likelihood is: P[D h] exp 1 2σ 2 N S m=1 (v m h(s m )) 2 So posterior distributions over h(s) is: log P[h D] = log P[D h]p[h] + K = 1 2σ 2 N S m=1 (v m h(s m )) 2 1 φ[h] + K 2 42

43 Regularization Theory Find MAP solution by minimising: E[h] = 1 N S σ 2 Turns out that h(s) = m=1 N S m=1 (v m h(s m )) 2 + φ[h], w m G(s s m ) + k(s) where G(s s µ ) is IFT of smoothness prior and k(s) satisfies φ[k] = 0. The weights are given by w = (G + σ 2 I) 1 v. where G mn = G(s m s n ) recovering linear dependence on v. For Gaussian G( s), G(s s m ) e δ s sm 2 is a radial basis function: 43

44 Regularization Theory v( s)=h( s) w 1 w N G (s-s m ) s one basis function for each example basis functions determined by prior as σ 2 0, MAP solution interpolates y µ use Bayesian hyperparameters to manipulate the norm fixed basis functions inference over weights: E(w) = 1 N S σ 2 m=1 eg φ(w) = w is Gaussian (v m h(s m )) 2 + φ(w), or learn the basis functions themselves using backprop. 44

Outline. NIP: Hebbian Learning. Overview. Types of Learning. Neural Information Processing. Amos Storkey

Outline. NIP: Hebbian Learning. Overview. Types of Learning. Neural Information Processing. Amos Storkey Outline NIP: Hebbian Learning Neural Information Processing Amos Storkey 1/36 Overview 2/36 Types of Learning Types of learning, learning strategies Neurophysiology, LTP/LTD Basic Hebb rule, covariance

More information

Synaptic Plasticity. Introduction. Biophysics of Synaptic Plasticity. Functional Modes of Synaptic Plasticity. Activity-dependent synaptic plasticity:

Synaptic Plasticity. Introduction. Biophysics of Synaptic Plasticity. Functional Modes of Synaptic Plasticity. Activity-dependent synaptic plasticity: Synaptic Plasticity Introduction Dayan and Abbott (2001) Chapter 8 Instructor: Yoonsuck Choe; CPSC 644 Cortical Networks Activity-dependent synaptic plasticity: underlies learning and memory, and plays

More information

Plasticity and Learning

Plasticity and Learning Chapter 8 Plasticity and Learning 8.1 Introduction Activity-dependent synaptic plasticity is widely believed to be the basic phenomenon underlying learning and memory, and it is also thought to play a

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks What are (Artificial) Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning

More information

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer. University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Hierarchy. Will Penny. 24th March Hierarchy. Will Penny. Linear Models. Convergence. Nonlinear Models. References

Hierarchy. Will Penny. 24th March Hierarchy. Will Penny. Linear Models. Convergence. Nonlinear Models. References 24th March 2011 Update Hierarchical Model Rao and Ballard (1999) presented a hierarchical model of visual cortex to show how classical and extra-classical Receptive Field (RF) effects could be explained

More information

Hebb rule book: 'The Organization of Behavior' Theory about the neural bases of learning

Hebb rule book: 'The Organization of Behavior' Theory about the neural bases of learning PCA by neurons Hebb rule 1949 book: 'The Organization of Behavior' Theory about the neural bases of learning Learning takes place in synapses. Synapses get modified, they get stronger when the pre- and

More information

CSE/NB 528 Final Lecture: All Good Things Must. CSE/NB 528: Final Lecture

CSE/NB 528 Final Lecture: All Good Things Must. CSE/NB 528: Final Lecture CSE/NB 528 Final Lecture: All Good Things Must 1 Course Summary Where have we been? Course Highlights Where do we go from here? Challenges and Open Problems Further Reading 2 What is the neural code? What

More information

Neural networks: Unsupervised learning

Neural networks: Unsupervised learning Neural networks: Unsupervised learning 1 Previously The supervised learning paradigm: given example inputs x and target outputs t learning the mapping between them the trained network is supposed to give

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:

More information

The Hebb rule Neurons that fire together wire together.

The Hebb rule Neurons that fire together wire together. Unsupervised learning The Hebb rule Neurons that fire together wire together. PCA RF development with PCA Classical Conditioning and Hebbʼs rule Ear A Nose B Tongue When an axon in cell A is near enough

More information

Computational Neuroscience. Structure Dynamics Implementation Algorithm Computation - Function

Computational Neuroscience. Structure Dynamics Implementation Algorithm Computation - Function Computational Neuroscience Structure Dynamics Implementation Algorithm Computation - Function Learning at psychological level Classical conditioning Hebb's rule When an axon of cell A is near enough to

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

Lecture 6. Notes on Linear Algebra. Perceptron

Lecture 6. Notes on Linear Algebra. Perceptron Lecture 6. Notes on Linear Algebra. Perceptron COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture Notes on linear algebra Vectors

More information

How do biological neurons learn? Insights from computational modelling of

How do biological neurons learn? Insights from computational modelling of How do biological neurons learn? Insights from computational modelling of neurobiological experiments Lubica Benuskova Department of Computer Science University of Otago, New Zealand Brain is comprised

More information

Principles of DCM. Will Penny. 26th May Principles of DCM. Will Penny. Introduction. Differential Equations. Bayesian Estimation.

Principles of DCM. Will Penny. 26th May Principles of DCM. Will Penny. Introduction. Differential Equations. Bayesian Estimation. 26th May 2011 Dynamic Causal Modelling Dynamic Causal Modelling is a framework studying large scale brain connectivity by fitting differential equation models to brain imaging data. DCMs differ in their

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College

More information

Machine Learning: Logistic Regression. Lecture 04

Machine Learning: Logistic Regression. Lecture 04 Machine Learning: Logistic Regression Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Supervised Learning Task = learn an (unkon function t : X T that maps input

More information

Neuro-Fuzzy Comp. Ch. 4 March 24, R p

Neuro-Fuzzy Comp. Ch. 4 March 24, R p 4 Feedforward Multilayer Neural Networks part I Feedforward multilayer neural networks (introduced in sec 17) with supervised error correcting learning are used to approximate (synthesise) a non-linear

More information

Logistic Regression & Neural Networks

Logistic Regression & Neural Networks Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability

More information

Artificial Neural Networks Examination, June 2005

Artificial Neural Networks Examination, June 2005 Artificial Neural Networks Examination, June 2005 Instructions There are SIXTY questions. (The pass mark is 30 out of 60). For each question, please select a maximum of ONE of the given answers (either

More information

Neural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28

Neural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28 1 / 28 Neural Networks Mark van Rossum School of Informatics, University of Edinburgh January 15, 2018 2 / 28 Goals: Understand how (recurrent) networks behave Find a way to teach networks to do a certain

More information

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated

More information

Feedforward Neural Nets and Backpropagation

Feedforward Neural Nets and Backpropagation Feedforward Neural Nets and Backpropagation Julie Nutini University of British Columbia MLRG September 28 th, 2016 1 / 23 Supervised Learning Roadmap Supervised Learning: Assume that we are given the features

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Exercises. Chapter 1. of τ approx that produces the most accurate estimate for this firing pattern.

Exercises. Chapter 1. of τ approx that produces the most accurate estimate for this firing pattern. 1 Exercises Chapter 1 1. Generate spike sequences with a constant firing rate r 0 using a Poisson spike generator. Then, add a refractory period to the model by allowing the firing rate r(t) to depend

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

Adaptation in the Neural Code of the Retina

Adaptation in the Neural Code of the Retina Adaptation in the Neural Code of the Retina Lens Retina Fovea Optic Nerve Optic Nerve Bottleneck Neurons Information Receptors: 108 95% Optic Nerve 106 5% After Polyak 1941 Visual Cortex ~1010 Mean Intensity

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks Delivered by Mark Ebden With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable

More information

How to do backpropagation in a brain

How to do backpropagation in a brain How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep

More information

Artificial Neural Networks Examination, March 2004

Artificial Neural Networks Examination, March 2004 Artificial Neural Networks Examination, March 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing

+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing Linear recurrent networks Simpler, much more amenable to analytic treatment E.g. by choosing + ( + ) = Firing rates can be negative Approximates dynamics around fixed point Approximation often reasonable

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

5.0 References. Unsupervised Learning for Boltzmann Machines 15

5.0 References. Unsupervised Learning for Boltzmann Machines 15 5.0 References Ackley D., Hinton G. and Sejnowski, 1985, A Learning Algorithm for Boltzmann Machines, Cognitive Science, 9, 147-169. Atick J. and Redlich A., 1990, Towards a theory of early visual processing,

More information

Computational statistics

Computational statistics Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial

More information

Regression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh.

Regression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. Regression Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh September 24 (All of the slides in this course have been adapted from previous versions

More information

Iterative face image feature extraction with Generalized Hebbian Algorithm and a Sanger-like BCM rule

Iterative face image feature extraction with Generalized Hebbian Algorithm and a Sanger-like BCM rule Iterative face image feature extraction with Generalized Hebbian Algorithm and a Sanger-like BCM rule Clayton Aldern (Clayton_Aldern@brown.edu) Tyler Benster (Tyler_Benster@brown.edu) Carl Olsson (Carl_Olsson@brown.edu)

More information

Learning from Data Logistic Regression

Learning from Data Logistic Regression Learning from Data Logistic Regression Copyright David Barber 2-24. Course lecturer: Amos Storkey a.storkey@ed.ac.uk Course page : http://www.anc.ed.ac.uk/ amos/lfd/ 2.9.8.7.6.5.4.3.2...2.3.4.5.6.7.8.9

More information

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London

More information

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Bayesian Inference of Noise Levels in Regression

Bayesian Inference of Noise Levels in Regression Bayesian Inference of Noise Levels in Regression Christopher M. Bishop Microsoft Research, 7 J. J. Thomson Avenue, Cambridge, CB FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop

More information

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1 Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger

More information

Artificial Neural Networks Examination, June 2004

Artificial Neural Networks Examination, June 2004 Artificial Neural Networks Examination, June 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum

More information

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,

More information

ECE521 Lecture7. Logistic Regression

ECE521 Lecture7. Logistic Regression ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard

More information

A. The Hopfield Network. III. Recurrent Neural Networks. Typical Artificial Neuron. Typical Artificial Neuron. Hopfield Network.

A. The Hopfield Network. III. Recurrent Neural Networks. Typical Artificial Neuron. Typical Artificial Neuron. Hopfield Network. III. Recurrent Neural Networks A. The Hopfield Network 2/9/15 1 2/9/15 2 Typical Artificial Neuron Typical Artificial Neuron connection weights linear combination activation function inputs output net

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning Lecture 0 Neural networks and optimization Machine Learning and Data Mining November 2009 UBC Gradient Searching for a good solution can be interpreted as looking for a minimum of some error (loss) function

More information

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of

More information

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009 AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is

More information

ADAPTIVE LATERAL INHIBITION FOR NON-NEGATIVE ICA. Mark Plumbley

ADAPTIVE LATERAL INHIBITION FOR NON-NEGATIVE ICA. Mark Plumbley Submitteed to the International Conference on Independent Component Analysis and Blind Signal Separation (ICA2) ADAPTIVE LATERAL INHIBITION FOR NON-NEGATIVE ICA Mark Plumbley Audio & Music Lab Department

More information

CS-E3210 Machine Learning: Basic Principles

CS-E3210 Machine Learning: Basic Principles CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction

More information

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Neural Networks CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Perceptrons x 0 = 1 x 1 x 2 z = h w T x Output: z x D A perceptron

More information

Financial Informatics XVII:

Financial Informatics XVII: Financial Informatics XVII: Unsupervised Learning Khurshid Ahmad, Professor of Computer Science, Department of Computer Science Trinity College, Dublin-, IRELAND November 9 th, 8. https://www.cs.tcd.ie/khurshid.ahmad/teaching.html

More information

Artificial Neural Networks (ANN)

Artificial Neural Networks (ANN) Artificial Neural Networks (ANN) Edmondo Trentin April 17, 2013 ANN: Definition The definition of ANN is given in 3.1 points. Indeed, an ANN is a machine that is completely specified once we define its:

More information

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Machine Learning Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1395 1 / 47 Table of contents 1 Introduction

More information

Introduction to Gaussian Process

Introduction to Gaussian Process Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression

More information

Learning and Memory in Neural Networks

Learning and Memory in Neural Networks Learning and Memory in Neural Networks Guy Billings, Neuroinformatics Doctoral Training Centre, The School of Informatics, The University of Edinburgh, UK. Neural networks consist of computational units

More information

A. The Hopfield Network. III. Recurrent Neural Networks. Typical Artificial Neuron. Typical Artificial Neuron. Hopfield Network.

A. The Hopfield Network. III. Recurrent Neural Networks. Typical Artificial Neuron. Typical Artificial Neuron. Hopfield Network. Part 3A: Hopfield Network III. Recurrent Neural Networks A. The Hopfield Network 1 2 Typical Artificial Neuron Typical Artificial Neuron connection weights linear combination activation function inputs

More information

arxiv: v1 [q-bio.nc] 4 Jan 2016

arxiv: v1 [q-bio.nc] 4 Jan 2016 Nonlinear Hebbian learning as a unifying principle in receptive field formation Carlos S. N. Brito*, Wulfram Gerstner arxiv:1601.00701v1 [q-bio.nc] 4 Jan 2016 School of Computer and Communication Sciences

More information

Optimization and Gradient Descent

Optimization and Gradient Descent Optimization and Gradient Descent INFO-4604, Applied Machine Learning University of Colorado Boulder September 12, 2017 Prof. Michael Paul Prediction Functions Remember: a prediction function is the function

More information

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks C.M. Bishop s PRML: Chapter 5; Neural Networks Introduction The aim is, as before, to find useful decompositions of the target variable; t(x) = y(x, w) + ɛ(x) (3.7) t(x n ) and x n are the observations,

More information

Multi-layer Neural Networks

Multi-layer Neural Networks Multi-layer Neural Networks Steve Renals Informatics 2B Learning and Data Lecture 13 8 March 2011 Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 1 Overview Multi-layer neural

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Lateral organization & computation

Lateral organization & computation Lateral organization & computation review Population encoding & decoding lateral organization Efficient representations that reduce or exploit redundancy Fixation task 1rst order Retinotopic maps Log-polar

More information

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

9 Classification. 9.1 Linear Classifiers

9 Classification. 9.1 Linear Classifiers 9 Classification This topic returns to prediction. Unlike linear regression where we were predicting a numeric value, in this case we are predicting a class: winner or loser, yes or no, rich or poor, positive

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression. COMP 527 Danushka Bollegala Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

More information

Neural Network Language Modeling

Neural Network Language Modeling Neural Network Language Modeling Instructor: Wei Xu Ohio State University CSE 5525 Many slides from Marek Rei, Philipp Koehn and Noah Smith Course Project Sign up your course project In-class presentation

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

Tutorial on Machine Learning for Advanced Electronics

Tutorial on Machine Learning for Advanced Electronics Tutorial on Machine Learning for Advanced Electronics Maxim Raginsky March 2017 Part I (Some) Theory and Principles Machine Learning: estimation of dependencies from empirical data (V. Vapnik) enabling

More information

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4 Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:

More information

Chapter ML:VI. VI. Neural Networks. Perceptron Learning Gradient Descent Multilayer Perceptron Radial Basis Functions

Chapter ML:VI. VI. Neural Networks. Perceptron Learning Gradient Descent Multilayer Perceptron Radial Basis Functions Chapter ML:VI VI. Neural Networks Perceptron Learning Gradient Descent Multilayer Perceptron Radial asis Functions ML:VI-1 Neural Networks STEIN 2005-2018 The iological Model Simplified model of a neuron:

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Computational Explorations in Cognitive Neuroscience Chapter 2

Computational Explorations in Cognitive Neuroscience Chapter 2 Computational Explorations in Cognitive Neuroscience Chapter 2 2.4 The Electrophysiology of the Neuron Some basic principles of electricity are useful for understanding the function of neurons. This is

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Consider the following spike trains from two different neurons N1 and N2:

Consider the following spike trains from two different neurons N1 and N2: About synchrony and oscillations So far, our discussions have assumed that we are either observing a single neuron at a, or that neurons fire independent of each other. This assumption may be correct in

More information

Sampling-based probabilistic inference through neural and synaptic dynamics

Sampling-based probabilistic inference through neural and synaptic dynamics Sampling-based probabilistic inference through neural and synaptic dynamics Wolfgang Maass for Robert Legenstein Institute for Theoretical Computer Science Graz University of Technology, Austria Institute

More information

Bayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop

Bayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop Bayesian Gaussian / Linear Models Read Sections 2.3.3 and 3.3 in the text by Bishop Multivariate Gaussian Model with Multivariate Gaussian Prior Suppose we model the observed vector b as having a multivariate

More information

Probabilistic & Bayesian deep learning. Andreas Damianou

Probabilistic & Bayesian deep learning. Andreas Damianou Probabilistic & Bayesian deep learning Andreas Damianou Amazon Research Cambridge, UK Talk at University of Sheffield, 19 March 2019 In this talk Not in this talk: CRFs, Boltzmann machines,... In this

More information

Unit III. A Survey of Neural Network Model

Unit III. A Survey of Neural Network Model Unit III A Survey of Neural Network Model 1 Single Layer Perceptron Perceptron the first adaptive network architecture was invented by Frank Rosenblatt in 1957. It can be used for the classification of

More information