Learning and Meta-learning
|
|
- Dinah Rice
- 5 years ago
- Views:
Transcription
1 Learning and Meta-learning computation making predictions choosing actions acquiring episodes statistics algorithm gradient ascent (eg of the likelihood) correlation Kalman filtering implementation Hebbian synpatic plasticity neuromodulation 1
2 Adaptation and Learning Two strategies: bottom-up rules of neural plasticity (Hebbian; sliding threshold) mathematical application (development) computational significance (PCA; projection pursuit; probabilistic model fitting) top-down function(s) E(w) w w E(w) delta rule; basis functions 2
3 Types of Learning supervised v u inputs u and desired or target outputs v both provided reinforce max r u input u and scalar evaluation r often with temporal credit assignment problem unsupervised u or self-supervised learn structure from statistics These are closely related: supervised learn P[v u] unsupervised learn P[v, u] 3
4 Famously suggested: Hebb if cell A consistently contributes to the activity of cell B, then the synapse from A to B should be strengthened strong element of causality what about weakening (LTD)? multiple timescales STP to protein synthesis multiple biochemical mechanisms systems: hippocampus multiple sub-areas neocortex layer and area differences cerebellum LTD is the norm 4
5 Neural Rules field potential amplitude (mv) s Hz LTP 10 2 min Hz LTD time (min) potentiated level depressed, partially depotentiated level control level 5
6 Stability and Competition Hebbian learning involves positive feedback. Control by: LTD usually not enough covariance versus correlation saturation prevent synaptic weights from getting too big (or too small) triviality beckons competition spike-time dependent learning rules normalization over pre-synaptic or post-synaptic arbors subtractive: decrease all synapses by the same amount whether large or small multiplicative: decrease large synapses by more than small synapses 6
7 Preamble Linear firing rate model Nu dv τ r dt = v + w u = v + b=1 w b u b assume that τ r is small compared with the rate of change of the weights, then during plasticity v = w u Then have τ w dw dt = f(v,u,w) Supervised rules use targets to specify v neural basis in ACh? 7
8 The Basic Hebb Rule τ w dw dt = uv averaged over input statistics gives dw τ w = uv = uu w = Q w dt where Q is the input correlation matrix. Positive feedback instability τ w d dt w 2 = 2τ w w dw dt = 2v2 Also have discretised version w w + T τ w Q w. integrating over time, presenting patterns for T seconds. 8
9 Covariance Rule Since LTD really exists, contra Hebb: or τ w dw dt = u (v θ v) τ w dw dt = (u θ u ) v If θ v = v or θ u = u then τ w dw dt = C w where C = (u u )(u u ) is the input covariance matrix. Still unstable τ w d dt w 2 = 2v(v v ) which averages to the (positive) covariance of v. 9
10 BCM Rule Odd to have LTD with v = 0 or u = 0. Evidence for 1.5 τ w dw dt = vu(v θ v). weight change/u v If θ v slides to match a high power of v τ θ dθ v dt = v2 θ v with a fast τ θ, then get competition between synapses intrinsic stabilization. 10
11 Subtractive Normalisation Could normalise w 2 or wb = n w n = (1,1...,1) For subtractive normalisation of n w: dw v(n u) τ w = vu n dt N u with dynamic subtraction, since dn w τ w dt as n n = N u. = vn u ( 1 n n N u ) = 0. Strongly competitive typically all the weights bar one go to 0. Therefore use upper saturating limit. 11
12 The Oja Rule A multiplicative way to ensure w 2 is constant gives τ w dw dt = vu αv2 w d w 2 τ w dt so w 2 1/α. = 2v 2 (1 α w 2 ). Dynamic normalisation could also enforce normalisation all the time. 12
13 Timing-Based Rules A 140 B 90 epsp amplitude (% of control) (+10 ms) (±100 ms) (-10 ms) time (min) percent potentiation t post - t pre (ms) 100 slice cortical pyramidal cells; Xenopus retinotectal system window of 50ms gets Hebbian causality right rate-description τ w dw dt = 0 dτ (H(τ)v(t)u(t τ) + H( τ)v(t τ)u(t)). spike-based description necessary if an input spike can have a measurable impact on an output spike. critical factor is the overall integral net LTD with local LTP. partially self-stabilizing 13
14 Single Postsynaptic Neuron Basic Hebb rule: τ w dw dt = Q w analyse using an eigendecomposition of Q: Q e µ = λ µ e µ λ 1 λ 2... Since Q is symmetric and positive (semi-)definite complete set of real orthonormal evecs with non-negative eigenvalues whose growth is decoupled Write then w(t) = N u µ=1 c µ (t) = c µ (0)exp and w(t) α(t)e 1 as t c µ (t)e µ ( ) t λ µ τ w 14
15 Constraints α(t) = exp(λ µ t/τ w ). Oja makes w(t) e 1 / α saturation can disturb outcome A 1 B w w w w 1 subtractive constraint τ w ẇ = Q w (w Q n)n N u. Sometimes e 1 n so its growth is stunted; and e µ n = 0 for µ 1 so w(t) =(w(0) e 1 )e 1 + N u µ=2 exp ( ) λµ t τ w (w(0) e µ )e µ 15
16 Translation Invariance Particularly important case for development has u b = u Q bb = Q(b b ) Write n = (1,...,1) and J = nn T, then Q = Q N u 2 J 1. e µ n = 0, AC modes are unaffected 2. e µ n 0, DC modes are affected 3. Q has discrete sines and cosines as eigenvectors 4. fourier spectrum of Q are the eigenvalues 16
17 PCA What is the significance of e 1? A 2 u 2, w 2 B C u 1, w 1 2 u 2, w u 2, w u 1, w 1 u 1, w 1 optimal linear reconstruction: minimise E(w,g) = u gv 2 information maximisation: I[v,u] = H[v] H[v x] under a linear model assume u = 0 or use C instead of Q. 17
18 Linear Reconstruction E(w,g) = u gv 2 = K 2w Q g + g 2 w Q w quadratic in w with minimum at w = g g 2 making E(w,g) = K g Q g g 2. look for soln with g= k(e k g)e k and g 2 =1: E(w,g) = K N k=1 (e k g) 2 λ k clearly has e 1 g = 1 and e 2 g = e 3 g =... = 0 Therefore g and w both point along principal component 18
19 Infomax (Linsker) argmax w I[v,u] = H[v] H[v u] Very general unsupervised learning suggestion: H[v u] is not quite well defined unless v = w u + η where η is arbitrarily deterministic H[v] = 1 2 log2πeσ2 for a Gaussian. If P[u] N[0,Q] then v N[0,w Q w + υ 2 ] maximise wqw T subject to w 2 = 1 Same problem as above: implies that note the normalisation w e 1. If non-gaussian, only maximising an upper bound on I[v, u]. 19
20 Ocular Dominance Ú µ cortex Ï L µ µ Ï R µ µ competitive interaction Ù L µ left thalamus right Ù R µ retina-thalamus-cortex OD develops around eye-opening interaction with refinement of topography interaction with orientation interaction with ipsi/contra-innervation effect of manipulations to input Ï L R L R Ï ocularity L R 20
21 Start Simple Consider one input from each eye Then has v = w R u R + w L u L. Q = uu = ( qs q D q D q S ) e 1 = (1,1)/ 2 e 2 = (1, 1)/ 2 λ 1 = q S + q D λ 2 = q S q D so if w + = w R + w L, w = w R w L then τ w dw + dt = (q S + q D )w + τ w dw dt = (q S q D )w. Since q D 0, w+ dominates so use subtractive normalisation τ w dw + dt = 0 τ w dw dt = (q S q D )w. so w ±ω and one eye dominates. 21
22 Orientation Selectivity Model is exactly the same input correlations come from ON/OFF cells: C) 1 D) 1.5 Q(b) ~ ~ Q (b) b ~ 4 6 b Now dominant mode of Q has spatial structure: centre-surround version also possible, but is usually dominated because of non-linear effects. 22
23 Temporal Hebbian Rules Look at rate-based temporal model as w = 1 T dt v(t) τ w 0 ignoring some edge effects. Correlate output v(t) with filtered version of the input dτ H(τ)u(t τ) dτ H(τ)u(t τ) ie look for structure at the scale of the temporal filter 23
24 Multiple Output Neurons output v input u M W u 1 u 2 u 3 u Nu Fixed recurrent connections leads to τ r dv dt = v + W u + M v v = W u + M v = K W u where K=(I M) 1. Thus with Hebbian learning dw τ w = vu = K W Q dt and we can analyse the eigeneffect of K. 24
25 Ocular Dominance Revisited A B u L u R Write w + = w R + w L,w = w R w L, for the projective weights, then τ w dw + dt = (q S + q D )K w + τ w dw dt = (q S q D )K w Since w + is clamped by subtractive normalisation, just interested in the pattern of ± in w. Since K is Töplitz eigenvectors are waves; eigenvalues come from the Fourier transform. A K, e 0 K ~ -0.5 B cortical distance (mm) k (1/mm) 25
26 Comp Hebbian Learning Use a competitive non-linearity z a = ( b W ab u b ) δ a ( b W a bu b ) δ in conjunction with a postive interaction term v a = a M aa z a. and standard Hebbian learning: A Ï L Ï R B Ï R Ï L C output a output a output a left input b right input b input b L R Features: ocularity b W topography b W + x b 26
27 Feature-Based Models Reduced descriptions (x, y, z, r cos(θ), r sin(θ)) x, y topographic location z ocularity ( [ 1, 1]) r orientation strength θ orientation matching replace [W u] a by exp ( b (u b W ab ) 2 /2σ 2 b plus softmax competition and cortical interaction learning self organizing map dw ab τ w = v a (u b W ab ). dt or elastic net only competition and ) τ w dw ab dt = v a (u b W ab ) +β a N(a) (W a b W ab ) 27
28 Large-Scale Results meshing of the patterns of OD and OR: pinwheels linear zones ocular dominance boundaries overall pattern of OD stripes vs elastic net simulation 28
29 Redundancy Multiple units redundancy: Hebbian learning all units the same fixed output connections inadequate One possibility is decorrelation: vv = I. If Gaussian, then complete factorisation. Three approaches: Atick & Redlich force n n mapping and decorrelate using anti-hebbian learning. Földiák use Hebbian and anti-hebbian learning to learn feedforward and lateral weights. Sanger explicitly subtract off first component from subsequent ones. Williams subtract off predicted portion of u 29
30 Goodall v = W u + M v Anti-Hebbian learning is ideal for lateral weights: if v a and v b are correlated make M ab = M ba negative which reduces the correlation Goodall n n with W = I so: Then At Ṁ = 0 v = (I M) 1 x = K x. τ M Ṁ = uv + I M uu K = K 1 K Q K = I. So uu = K uu K = I as required. 30
31 Temporal Plasticity Using the temporal rule: τ w dw dt = 0 dτ (H(τ)v(t)u(t τ) + H( τ)v(t τ)u(t)) A 1.2 v s B place field location (cm) lap number s a = 2 is active before s a = 0 synapse 2 0 gets strengthened s a = 0 extends its firing field backwards 31
32 Supervised Learning Consider case of learning pairs u m, v m : classification binary v m to classify real-valued u m. regression real-valued mapping from u m to v m. storage learn the relationships in the data generalisation infer a functional relationship from limited examples error-correction mistakes drive adaptation Hebbian plasticity: τ w dw dt = vu = 1 N S N S m=1 and (multiplicative) weight decay τ w ẇdt = vu αw, v m u m. makes w vu /α. No positive feedback. 32
33 Classification and the Perceptron Classification rule { 1 if w u γ 0 v = 0 if w u γ < 0 Can use supervised Hebbian learning w = 2 N S N u m=1 v m u m. but works quite poorly for random patterns A B 1 u 3 w probability correct u 1 u N u N S -1 33
34 Function Approximation Basis function network Ú µ Û µ µ Ù µ s output v(s) = w u = w f(s) error E = (h(s) 1 w f(s)) 2 2 reaches a minimum at (normal equations) f(s)f(s) w = f(s)h(s). 34
35 Hebbian Function Approximation When does the Hebbian w = f(s)h(s) /α satisfy the normal equations f(s)f(s) w = f(s)h(s)? 1. input patterns are orthongonal f(s)f(s) = I 2. tight frame condition as then f(s)f(s) w = f(s m ) f(s m ) = cδ mm f(s)f(s) f(s)h(s) = 1 αn 2 S = c αn 2 S α f(s m )f(s m ) f(s m )h(s m ) mm f(s m )h(s m ) m = c αn S f(s)h(s) V1 forms an approximate tight frame 35
36 Error-Correcting Rules Hebbian plasticity is independent of the performance of the network Perceptron learning rule: if v(u m ) = 0 when v m = 1, modify w and γ to increase w u m γ easiest rule: implies that w w + ǫ w (v m v(u m ))u m γ γ ǫ w (v m v(u m )) (w u m γ) = ǫ w (v m v(u m )) ( u m ) which has just the right sign. In fact, guaranteed to converge. note the discrete nature of the weight update 36
37 The Delta Rule Defintion of the task in E(w) how well (poorly) do synaptic weights w perform? Gradient descent: w w ǫ w w E(w) since if w =w ǫ w E(w), then to first order in ǫ w : E(w ǫ w w E) = E(w) ǫ w w E 2 1 E(w) 0.8 E(w) w 37
38 Stochastic Gradient Descent E(w) = 1 (h(s) w f(s)) 2 is an average 2 over many examples. Use random input-output paris s m, h(s m ) and change w w ǫ w w (h(s m ) v(s m )) 2 /2 = w + ǫ w (h(s m ) v(s m ))f(s m ) called stochastic gradient descent. A B C v s s s 38
39 Contrastive Hebbian Learning The delta rule involves: w w + ǫ w (v m u m v(u m )u m ) Hebbian learning v m u m based on target anti-hebbian learning v(u m )u m based on outcome learning stops when outcome = target Generalize to a stochastic network P[v u;w] = exp( E(u,v)) Z(u) Z(u) = exp( E(u, v)) v weights W generate a conditional distribution eg with quadratic form E(u,v) = u W v 39
40 Goal of Learning Natural quality measure for u: ( ) P[v u] D KL (P[v u], P[v u;w]) = P[v u]ln P[v u;w] v = v P[v u]ln(p[v u;w]) + K, average over u m ; v m is sample of P[v u m ] DKL (P[v u], P[v u;w]) 1 N S N S ln(p[v m u m ;W]) m=1 amounts to maximum likelihood learning. ln P[v m u m ;W] W ab = ( ) E(u m,v m ) ln Z(u m ) W ab = v m a um b v P[v u m ;W]v a u m b. is also Hebb anti-hebb positive negative use Gibbs sampling for v P[v u m ;W] unsupervised version is just the same 40
41 Function Approximation Two ignored concerns: 1. examples may be corrupted by noise D {(s m, v m )} v m = h(s m ) + η m 2. want v(s) = h(s) even away from training set in general impossible need prior information eg smoothness of h(s) Gaussian process says that given s 1, s 2,..., s K, h(s 1 ), h(s 2 ),..., h(s K ) are jointly Gaussian with 0 mean (short of expectations about bias covariance matrix determined by prior expectations eg Σ µν = Σ( s µ s ν ) Then distribution of h(s) given D is Gaussian with mean that depends linearly on v 1,..., v N S. 41
42 Functional Priors Alternatively, specify: P[h] e 1 2 φ[h], where φ[h] is a functional, eg: φ[h] = s d s h( s) 2 G( s) in the Fourier domain. G( s) penalises high frequency wobbles in h(s). G( s) = e s 2 /δ If η µ are iid Gaussians, then likelihood is: P[D h] exp 1 2σ 2 N S m=1 (v m h(s m )) 2 So posterior distributions over h(s) is: log P[h D] = log P[D h]p[h] + K = 1 2σ 2 N S m=1 (v m h(s m )) 2 1 φ[h] + K 2 42
43 Regularization Theory Find MAP solution by minimising: E[h] = 1 N S σ 2 Turns out that h(s) = m=1 N S m=1 (v m h(s m )) 2 + φ[h], w m G(s s m ) + k(s) where G(s s µ ) is IFT of smoothness prior and k(s) satisfies φ[k] = 0. The weights are given by w = (G + σ 2 I) 1 v. where G mn = G(s m s n ) recovering linear dependence on v. For Gaussian G( s), G(s s m ) e δ s sm 2 is a radial basis function: 43
44 Regularization Theory v( s)=h( s) w 1 w N G (s-s m ) s one basis function for each example basis functions determined by prior as σ 2 0, MAP solution interpolates y µ use Bayesian hyperparameters to manipulate the norm fixed basis functions inference over weights: E(w) = 1 N S σ 2 m=1 eg φ(w) = w is Gaussian (v m h(s m )) 2 + φ(w), or learn the basis functions themselves using backprop. 44
Outline. NIP: Hebbian Learning. Overview. Types of Learning. Neural Information Processing. Amos Storkey
Outline NIP: Hebbian Learning Neural Information Processing Amos Storkey 1/36 Overview 2/36 Types of Learning Types of learning, learning strategies Neurophysiology, LTP/LTD Basic Hebb rule, covariance
More informationSynaptic Plasticity. Introduction. Biophysics of Synaptic Plasticity. Functional Modes of Synaptic Plasticity. Activity-dependent synaptic plasticity:
Synaptic Plasticity Introduction Dayan and Abbott (2001) Chapter 8 Instructor: Yoonsuck Choe; CPSC 644 Cortical Networks Activity-dependent synaptic plasticity: underlies learning and memory, and plays
More informationPlasticity and Learning
Chapter 8 Plasticity and Learning 8.1 Introduction Activity-dependent synaptic plasticity is widely believed to be the basic phenomenon underlying learning and memory, and it is also thought to play a
More informationIntroduction to Neural Networks
Introduction to Neural Networks What are (Artificial) Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning
More informationMark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.
University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationHierarchy. Will Penny. 24th March Hierarchy. Will Penny. Linear Models. Convergence. Nonlinear Models. References
24th March 2011 Update Hierarchical Model Rao and Ballard (1999) presented a hierarchical model of visual cortex to show how classical and extra-classical Receptive Field (RF) effects could be explained
More informationHebb rule book: 'The Organization of Behavior' Theory about the neural bases of learning
PCA by neurons Hebb rule 1949 book: 'The Organization of Behavior' Theory about the neural bases of learning Learning takes place in synapses. Synapses get modified, they get stronger when the pre- and
More informationCSE/NB 528 Final Lecture: All Good Things Must. CSE/NB 528: Final Lecture
CSE/NB 528 Final Lecture: All Good Things Must 1 Course Summary Where have we been? Course Highlights Where do we go from here? Challenges and Open Problems Further Reading 2 What is the neural code? What
More informationNeural networks: Unsupervised learning
Neural networks: Unsupervised learning 1 Previously The supervised learning paradigm: given example inputs x and target outputs t learning the mapping between them the trained network is supposed to give
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers
Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:
More informationThe Hebb rule Neurons that fire together wire together.
Unsupervised learning The Hebb rule Neurons that fire together wire together. PCA RF development with PCA Classical Conditioning and Hebbʼs rule Ear A Nose B Tongue When an axon in cell A is near enough
More informationComputational Neuroscience. Structure Dynamics Implementation Algorithm Computation - Function
Computational Neuroscience Structure Dynamics Implementation Algorithm Computation - Function Learning at psychological level Classical conditioning Hebb's rule When an axon of cell A is near enough to
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V
More informationLecture 6. Notes on Linear Algebra. Perceptron
Lecture 6. Notes on Linear Algebra. Perceptron COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture Notes on linear algebra Vectors
More informationHow do biological neurons learn? Insights from computational modelling of
How do biological neurons learn? Insights from computational modelling of neurobiological experiments Lubica Benuskova Department of Computer Science University of Otago, New Zealand Brain is comprised
More informationPrinciples of DCM. Will Penny. 26th May Principles of DCM. Will Penny. Introduction. Differential Equations. Bayesian Estimation.
26th May 2011 Dynamic Causal Modelling Dynamic Causal Modelling is a framework studying large scale brain connectivity by fitting differential equation models to brain imaging data. DCMs differ in their
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationData Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College
More informationMachine Learning: Logistic Regression. Lecture 04
Machine Learning: Logistic Regression Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Supervised Learning Task = learn an (unkon function t : X T that maps input
More informationNeuro-Fuzzy Comp. Ch. 4 March 24, R p
4 Feedforward Multilayer Neural Networks part I Feedforward multilayer neural networks (introduced in sec 17) with supervised error correcting learning are used to approximate (synthesise) a non-linear
More informationLogistic Regression & Neural Networks
Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability
More informationArtificial Neural Networks Examination, June 2005
Artificial Neural Networks Examination, June 2005 Instructions There are SIXTY questions. (The pass mark is 30 out of 60). For each question, please select a maximum of ONE of the given answers (either
More informationNeural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28
1 / 28 Neural Networks Mark van Rossum School of Informatics, University of Edinburgh January 15, 2018 2 / 28 Goals: Understand how (recurrent) networks behave Find a way to teach networks to do a certain
More informationOverview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation
Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated
More informationFeedforward Neural Nets and Backpropagation
Feedforward Neural Nets and Backpropagation Julie Nutini University of British Columbia MLRG September 28 th, 2016 1 / 23 Supervised Learning Roadmap Supervised Learning: Assume that we are given the features
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationExercises. Chapter 1. of τ approx that produces the most accurate estimate for this firing pattern.
1 Exercises Chapter 1 1. Generate spike sequences with a constant firing rate r 0 using a Poisson spike generator. Then, add a refractory period to the model by allowing the firing rate r(t) to depend
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA
More informationAdaptation in the Neural Code of the Retina
Adaptation in the Neural Code of the Retina Lens Retina Fovea Optic Nerve Optic Nerve Bottleneck Neurons Information Receptors: 108 95% Optic Nerve 106 5% After Polyak 1941 Visual Cortex ~1010 Mean Intensity
More informationUnsupervised Learning
2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks Delivered by Mark Ebden With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable
More informationHow to do backpropagation in a brain
How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep
More informationArtificial Neural Networks Examination, March 2004
Artificial Neural Networks Examination, March 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More information+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing
Linear recurrent networks Simpler, much more amenable to analytic treatment E.g. by choosing + ( + ) = Firing rates can be negative Approximates dynamics around fixed point Approximation often reasonable
More information(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann
(Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for
More information5.0 References. Unsupervised Learning for Boltzmann Machines 15
5.0 References Ackley D., Hinton G. and Sejnowski, 1985, A Learning Algorithm for Boltzmann Machines, Cognitive Science, 9, 147-169. Atick J. and Redlich A., 1990, Towards a theory of early visual processing,
More informationComputational statistics
Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial
More informationRegression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh.
Regression Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh September 24 (All of the slides in this course have been adapted from previous versions
More informationIterative face image feature extraction with Generalized Hebbian Algorithm and a Sanger-like BCM rule
Iterative face image feature extraction with Generalized Hebbian Algorithm and a Sanger-like BCM rule Clayton Aldern (Clayton_Aldern@brown.edu) Tyler Benster (Tyler_Benster@brown.edu) Carl Olsson (Carl_Olsson@brown.edu)
More informationLearning from Data Logistic Regression
Learning from Data Logistic Regression Copyright David Barber 2-24. Course lecturer: Amos Storkey a.storkey@ed.ac.uk Course page : http://www.anc.ed.ac.uk/ amos/lfd/ 2.9.8.7.6.5.4.3.2...2.3.4.5.6.7.8.9
More informationARTIFICIAL INTELLIGENCE. Artificial Neural Networks
INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London
More informationNeural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann
Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationBayesian Inference of Noise Levels in Regression
Bayesian Inference of Noise Levels in Regression Christopher M. Bishop Microsoft Research, 7 J. J. Thomson Avenue, Cambridge, CB FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop
More informationIntroduction to Machine Learning. Introduction to ML - TAU 2016/7 1
Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger
More informationArtificial Neural Networks Examination, June 2004
Artificial Neural Networks Examination, June 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum
More informationLINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More informationECE521 Lecture7. Logistic Regression
ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard
More informationA. The Hopfield Network. III. Recurrent Neural Networks. Typical Artificial Neuron. Typical Artificial Neuron. Hopfield Network.
III. Recurrent Neural Networks A. The Hopfield Network 2/9/15 1 2/9/15 2 Typical Artificial Neuron Typical Artificial Neuron connection weights linear combination activation function inputs output net
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationLecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning
Lecture 0 Neural networks and optimization Machine Learning and Data Mining November 2009 UBC Gradient Searching for a good solution can be interpreted as looking for a minimum of some error (loss) function
More informationNeural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications
Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of
More informationAN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009
AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is
More informationADAPTIVE LATERAL INHIBITION FOR NON-NEGATIVE ICA. Mark Plumbley
Submitteed to the International Conference on Independent Component Analysis and Blind Signal Separation (ICA2) ADAPTIVE LATERAL INHIBITION FOR NON-NEGATIVE ICA Mark Plumbley Audio & Music Lab Department
More informationCS-E3210 Machine Learning: Basic Principles
CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction
More informationNeural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Neural Networks CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Perceptrons x 0 = 1 x 1 x 2 z = h w T x Output: z x D A perceptron
More informationFinancial Informatics XVII:
Financial Informatics XVII: Unsupervised Learning Khurshid Ahmad, Professor of Computer Science, Department of Computer Science Trinity College, Dublin-, IRELAND November 9 th, 8. https://www.cs.tcd.ie/khurshid.ahmad/teaching.html
More informationArtificial Neural Networks (ANN)
Artificial Neural Networks (ANN) Edmondo Trentin April 17, 2013 ANN: Definition The definition of ANN is given in 3.1 points. Indeed, an ANN is a machine that is completely specified once we define its:
More informationMachine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Machine Learning Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1395 1 / 47 Table of contents 1 Introduction
More informationIntroduction to Gaussian Process
Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression
More informationLearning and Memory in Neural Networks
Learning and Memory in Neural Networks Guy Billings, Neuroinformatics Doctoral Training Centre, The School of Informatics, The University of Edinburgh, UK. Neural networks consist of computational units
More informationA. The Hopfield Network. III. Recurrent Neural Networks. Typical Artificial Neuron. Typical Artificial Neuron. Hopfield Network.
Part 3A: Hopfield Network III. Recurrent Neural Networks A. The Hopfield Network 1 2 Typical Artificial Neuron Typical Artificial Neuron connection weights linear combination activation function inputs
More informationarxiv: v1 [q-bio.nc] 4 Jan 2016
Nonlinear Hebbian learning as a unifying principle in receptive field formation Carlos S. N. Brito*, Wulfram Gerstner arxiv:1601.00701v1 [q-bio.nc] 4 Jan 2016 School of Computer and Communication Sciences
More informationOptimization and Gradient Descent
Optimization and Gradient Descent INFO-4604, Applied Machine Learning University of Colorado Boulder September 12, 2017 Prof. Michael Paul Prediction Functions Remember: a prediction function is the function
More informationVasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks
C.M. Bishop s PRML: Chapter 5; Neural Networks Introduction The aim is, as before, to find useful decompositions of the target variable; t(x) = y(x, w) + ɛ(x) (3.7) t(x n ) and x n are the observations,
More informationMulti-layer Neural Networks
Multi-layer Neural Networks Steve Renals Informatics 2B Learning and Data Lecture 13 8 March 2011 Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 1 Overview Multi-layer neural
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationLateral organization & computation
Lateral organization & computation review Population encoding & decoding lateral organization Efficient representations that reduce or exploit redundancy Fixation task 1rst order Retinotopic maps Log-polar
More informationNeural Networks, Computation Graphs. CMSC 470 Marine Carpuat
Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More information9 Classification. 9.1 Linear Classifiers
9 Classification This topic returns to prediction. Unlike linear regression where we were predicting a numeric value, in this case we are predicting a class: winner or loser, yes or no, rich or poor, positive
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationLogistic Regression. COMP 527 Danushka Bollegala
Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will
More informationNeural Network Language Modeling
Neural Network Language Modeling Instructor: Wei Xu Ohio State University CSE 5525 Many slides from Marek Rei, Philipp Koehn and Noah Smith Course Project Sign up your course project In-class presentation
More informationAlgorithmisches Lernen/Machine Learning
Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines
More informationTutorial on Machine Learning for Advanced Electronics
Tutorial on Machine Learning for Advanced Electronics Maxim Raginsky March 2017 Part I (Some) Theory and Principles Machine Learning: estimation of dependencies from empirical data (V. Vapnik) enabling
More informationNeural Networks Learning the network: Backprop , Fall 2018 Lecture 4
Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:
More informationChapter ML:VI. VI. Neural Networks. Perceptron Learning Gradient Descent Multilayer Perceptron Radial Basis Functions
Chapter ML:VI VI. Neural Networks Perceptron Learning Gradient Descent Multilayer Perceptron Radial asis Functions ML:VI-1 Neural Networks STEIN 2005-2018 The iological Model Simplified model of a neuron:
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationComputational Explorations in Cognitive Neuroscience Chapter 2
Computational Explorations in Cognitive Neuroscience Chapter 2 2.4 The Electrophysiology of the Neuron Some basic principles of electricity are useful for understanding the function of neurons. This is
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationConsider the following spike trains from two different neurons N1 and N2:
About synchrony and oscillations So far, our discussions have assumed that we are either observing a single neuron at a, or that neurons fire independent of each other. This assumption may be correct in
More informationSampling-based probabilistic inference through neural and synaptic dynamics
Sampling-based probabilistic inference through neural and synaptic dynamics Wolfgang Maass for Robert Legenstein Institute for Theoretical Computer Science Graz University of Technology, Austria Institute
More informationBayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop
Bayesian Gaussian / Linear Models Read Sections 2.3.3 and 3.3 in the text by Bishop Multivariate Gaussian Model with Multivariate Gaussian Prior Suppose we model the observed vector b as having a multivariate
More informationProbabilistic & Bayesian deep learning. Andreas Damianou
Probabilistic & Bayesian deep learning Andreas Damianou Amazon Research Cambridge, UK Talk at University of Sheffield, 19 March 2019 In this talk Not in this talk: CRFs, Boltzmann machines,... In this
More informationUnit III. A Survey of Neural Network Model
Unit III A Survey of Neural Network Model 1 Single Layer Perceptron Perceptron the first adaptive network architecture was invented by Frank Rosenblatt in 1957. It can be used for the classification of
More information