Neural networks: Unsupervised learning

Similar documents
Introduction to Neural Networks

Machine Learning Techniques for Computer Vision

Synaptic Plasticity. Introduction. Biophysics of Synaptic Plasticity. Functional Modes of Synaptic Plasticity. Activity-dependent synaptic plasticity:

CSE/NB 528 Final Lecture: All Good Things Must. CSE/NB 528: Final Lecture

CHALMERS, GÖTEBORGS UNIVERSITET. EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

Computational Neuroscience. Structure Dynamics Implementation Algorithm Computation - Function

Artificial Intelligence Hopfield Networks

Lecture 7: Con3nuous Latent Variable Models

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Using a Hopfield Network: A Nuts and Bolts Approach

3.3 Discrete Hopfield Net An iterative autoassociative net similar to the nets described in the previous sections has been developed by Hopfield

Learning and Memory in Neural Networks

A. The Hopfield Network. III. Recurrent Neural Networks. Typical Artificial Neuron. Typical Artificial Neuron. Hopfield Network.

A. The Hopfield Network. III. Recurrent Neural Networks. Typical Artificial Neuron. Typical Artificial Neuron. Hopfield Network.

STA 414/2104: Lecture 8

ECE521 Lectures 9 Fully Connected Neural Networks

How to do backpropagation in a brain

Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science

Hopfield Networks and Boltzmann Machines. Christian Borgelt Artificial Neural Networks and Deep Learning 296

CS281 Section 4: Factor Analysis and PCA

CHAPTER 3. Pattern Association. Neural Networks

Introduction to Artificial Neural Networks

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

Principal Component Analysis (PCA) for Sparse High-Dimensional Data

Hopfield Neural Network and Associative Memory. Typical Myelinated Vertebrate Motoneuron (Wikipedia) Topic 3 Polymers and Neurons Lecture 5

Outline. NIP: Hebbian Learning. Overview. Types of Learning. Neural Information Processing. Amos Storkey

Neural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28

STA 414/2104: Lecture 8

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017

Deep unsupervised learning

Unsupervised learning: beyond simple clustering and PCA

Neural Networks for Machine Learning. Lecture 11a Hopfield Nets

Hopfield Neural Network

Lecture 16 Deep Neural Generative Models

Variational Autoencoders. Presented by Alex Beatson Materials from Yann LeCun, Jaan Altosaar, Shakir Mohamed

Plasticity and Learning

Probabilistic Reasoning in Deep Learning

Christian Mohr

Neural networks. Chapter 19, Sections 1 5 1

Recent Advances in Bayesian Inference Techniques

How can ideas from quantum computing improve or speed up neuromorphic models of computation?

Is early vision optimised for extracting higher order dependencies? Karklin and Lewicki, NIPS 2005

Machine Learning Basics: Maximum Likelihood Estimation

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Basic Principles of Unsupervised and Unsupervised

arxiv: v2 [nlin.ao] 19 May 2015

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis

Neural Nets and Symbolic Reasoning Hopfield Networks

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Series 7, May 22, 2018 (EM Convergence)

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

In biological terms, memory refers to the ability of neural systems to store activity patterns and later recall them when required.

c Springer, Reprinted with permission.

Plan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation

The wake-sleep algorithm for unsupervised neural networks

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu

Non-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology

Neural networks. Chapter 20. Chapter 20 1

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

CSC321 Lecture 20: Autoencoders

Credit Assignment: Beyond Backpropagation

Principal Component Analysis (PCA) for Sparse High-Dimensional Data

UNSUPERVISED LEARNING

Generative models for missing value completion

CSC411: Final Review. James Lucas & David Madras. December 3, 2018

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

INTRODUCTION TO ARTIFICIAL INTELLIGENCE

Neural networks. Chapter 20, Section 5 1

J. Sadeghi E. Patelli M. de Angelis

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch

Neural Networks. Hopfield Nets and Auto Associators Fall 2017

Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

Expectation Maximization

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Machine Learning for Signal Processing Sparse and Overcomplete Representations

Hebb rule book: 'The Organization of Behavior' Theory about the neural bases of learning

Neural Network Control of Robot Manipulators and Nonlinear Systems

Artificial Neural Networks Examination, June 2004

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

The Origin of Deep Learning. Lili Mou Jan, 2015

Simple Neural Nets For Pattern Classification

6.036 midterm review. Wednesday, March 18, 15

Neural Nets in PR. Pattern Recognition XII. Michal Haindl. Outline. Neural Nets in PR 2

Learning Energy-Based Models of High-Dimensional Data

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Iterative Autoassociative Net: Bidirectional Associative Memory

Logistic Regression. COMP 527 Danushka Bollegala

A Coupled Helmholtz Machine for PCA

Statistical Learning Theory. Part I 5. Deep Learning

Associative Memories (I) Hopfield Networks

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Statistical Learning Reading Assignments

Bayesian ensemble learning of generative models

Dreem Challenge report (team Bussanati)

Posterior Regularization

Modeling Natural Images with Higher-Order Boltzmann Machines

DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION. Alexandre Iline, Harri Valpola and Erkki Oja

Transcription:

Neural networks: Unsupervised learning 1

Previously The supervised learning paradigm: given example inputs x and target outputs t learning the mapping between them the trained network is supposed to give correct response for any given input stimulus training is equivalent of learning the appropriate weights to achiece this an objective function (or error function) is defined, which is minimized during training 2

Previously Optimization wrt. an objective function where (error function) (regularizer) 3

Interpret y(x,w) as a probability: Previously 4

Interpret y(x,w) as a probability: Previously the likelihood of the input data can be expressed with the original error function function 4

Interpret y(x,w) as a probability: Previously the likelihood of the input data can be expressed with the original error function function the regularizer has the form of a prior! 4

Interpret y(x,w) as a probability: Previously the likelihood of the input data can be expressed with the original error function function the regularizer has the form of a prior! what we get in the objective function M(w): the posterior distribution of w: 4

Interpret y(x,w) as a probability: Previously the likelihood of the input data can be expressed with the original error function function the regularizer has the form of a prior! what we get in the objective function M(w): the posterior distribution of w: The neuron s behavior is faithfully translated into probabilistic terms! 4

When making predictions... Previously 5

When making predictions... Original estimate Previously 5

Previously When making predictions... Original estimate Bayesian estimate 5

Previously When making predictions... Original estimate Bayesian estimate The probabilistic interpretation makes our assumptions explicit: by the regularizer we imposed a soft constraint on the learned parameters, which expresses our prior expecations. An additional plus: beyond getting w MP we get a measure for learned parameter uncertainty 5

What s coming? Networks & probabilistic framework: from the Hopfield network to Boltzmann machine What we learn? Density estimation, neural architecture and optimization principles: principal component analysis (PCA) How we learn? Hebb et al: Learning rules Any biology? Simple cells & ICA 6

Learning data... 7

Learning data... 7

Neural networks Unsupervised learning Capacity of a single neuron is limited: certain data can only be learned So far, we used a supervised learning paradigm: a teacher was necessary to teach an input-output relation Hopfield networks try to cure both Unsupervised learning: what is it about? Hebb rule: an enlightening example assuming 2 neurons and a weight modification process: This simple rule realizes an associative memory! 8

Neural networks The Hopfield network Architecture: a set of I neurons connected by symmetric synapses of weight w ij no self connections: w ii =0 output of neuron i: x i Activity rule: Learning rule: Synchronous/ asynchronous update ; 9

Neural networks The Hopfield network Architecture: a set of I neurons connected by symmetric synapses of weight w ij no self connections: w ii =0 output of neuron i: x i Activity rule: Learning rule: Synchronous/ asynchronous update alternatively, a continuous network can be defined as: ; 9

Neural networks Stability of Hopfield network Are the memories stable? the activation and activity rule together define a Lyapunov function Necessary conditions: symmetric weights; asynchronous update 10

Neural networks Stability of Hopfield network Are the memories stable? the activation and activity rule together define a Lyapunov function Necessary conditions: symmetric weights; asynchronous update 10

Neural networks Stability of Hopfield network Are the memories stable? the activation and activity rule together define a Lyapunov function Necessary conditions: symmetric weights; asynchronous update Robust against perturbation of a subset of weights 10

Neural networks Capacity of Hopfield network How many traces can be memorized by a network of I neurons? 11

Neural networks Capacity of Hopfield network Failures of the Hopfield networks: Corrupted bits Missing memory traces Spurious states not directly related to training data 12

Neural networks Capacity of Hopfield network Activation rule: Trace of the desired memory and additional random memories: 13

Neural networks Capacity of Hopfield network Activation rule: Trace of the desired memory and additional random memories: 13

Neural networks Capacity of Hopfield network Activation rule: Trace of the desired memory and additional random memories: desired state 13

Neural networks Capacity of Hopfield network Activation rule: Trace of the desired memory and additional random memories: desired state random contribution 13

Neural networks Capacity of Hopfield network Activation rule: Trace of the desired memory and additional random memories: desired state random contribution 13

Neural networks Capacity of Hopfield network Failure in operation: avalanches N/I < 0.138: spin glass states N/I (0 0.138): states close to desired memories N/I (0 0.05): desired states have lower energy than spurious states N/I (0.05 0.138): spurious states dominate N/I (0 0.03): mixture states The Hebb rule determines how well it performs other learning might do a better job (reiforcement learning) 14

Hopfield network for optimization 15

The Boltzmann machine The optimization performed by Hopfield network: minimizing 16

The Boltzmann machine The optimization performed by Hopfield network: minimizing Again: we can make a correspondence with a probabilistic model: 16

The Boltzmann machine The optimization performed by Hopfield network: minimizing Again: we can make a correspondence with a probabilistic model: What do we gain by this: more transparent functioning superior performance than Hebb rule 16

The Boltzmann machine The optimization performed by Hopfield network: minimizing Again: we can make a correspondence with a probabilistic model: What do we gain by this: more transparent functioning superior performance than Hebb rule Activity rule: 16

The Boltzmann machine The optimization performed by Hopfield network: minimizing Again: we can make a correspondence with a probabilistic model: What do we gain by this: more transparent functioning superior performance than Hebb rule Activity rule: How is learning performed? 16

Boltzmann machine -- EM Likelihood function: 17

Boltzmann machine -- EM Likelihood function: Estimating the parameters: 17

Boltzmann machine -- EM Likelihood function: Estimating the parameters: Minimizing for w: 17

Boltzmann machine -- EM Likelihood function: Estimating the parameters: Minimizing for w: Sleeping and waking: 17

Boltzmann machine -- EM Likelihood function: Estimating the parameters: Minimizing for w: Sleeping and waking: 17

Boltzmann machine -- EM Likelihood function: Estimating the parameters: Minimizing for w: Sleeping and waking: 17

Learning data... 18

Learning data... 18

Summary Boltmann translates the neural network mecanisms into a probablisitic framework Its capabilities are limited We learned that the probabilistic framework clarifies assumptions We learned that within the world constrained by our assumptions the probabilistic approach gives clear answers 19

Learning data... 20

Learning data... 20

Learning data... Hopfield/Boltman 20

Learning data... Hopfield/Boltman? 20

Principal Component Analysis 21

Principal Component Analysis Let s try to find linearly independent filters Set the basis along the eigenvectors of the data 21

Principal Component Analysis 22

Principal Component Analysis 22

Principal Component Analysis 22

Principal Component Analysis Olshausen & Field, Nature (1996) 22

Relation between PCA and learning rules A single neuron driven by multiple inputs: Basic Hebb rule: Averaged Hebb rule: Correlation based rule: note that 23

Relation between PCA and learning rules Making possible both LTP and LTD Postsynaptic threshold Postsynaptic threshold Setting the threshold to average postsynaptic activity:, where 24

Relation between PCA and learning rules Making possible both LTP and LTD Postsynaptic threshold homosynaptic depression Postsynaptic threshold heterosynaptic depression Setting the threshold to average postsynaptic activity:, where 24

Relation between PCA and learning rules Making possible both LTP and LTD Postsynaptic threshold homosynaptic depression Postsynaptic threshold heterosynaptic depression Setting the threshold to average postsynaptic activity:, where BCM rule: 24

Relation between PCA and learning rules Regularization again BCM rule: Hebb rule: (Oja rule) 25

Relation between PCA and learning rules 26

Relation between PCA and learning rules 26

Relation between PCA and learning rules 26

The architecture of the network and learning rule hand-in-hand detemine the learned representation 27

Density estimation Empirical distribution/ input distribution Latent variables: v Recognition: p[ v u] Generative distribution: Kullback-Leibler divergence: 28

Density estimation Empirical distribution/ input distribution Latent variables: v Recognition: p[ v u] generative model Generative distribution: Kullback-Leibler divergence: 28

Density estimation Empirical distribution/ input distribution Latent variables: v Recognition: p[ v u] generative model Generative distribution: Recognition distribution: Kullback-Leibler divergence: 28

Density estimation Empirical distribution/ input distribution Latent variables: v Recognition: p[ v u] generative model Generative distribution: Recognition distribution: Kullback-Leibler divergence: The match between our model distribution and input distribution 28

How to solve density estmation? EM 29

Sparse coding PCA: Sparse coding: make the reconstruction faithful keep the units/neurons quiet 30

Sparse coding PCA: trying to find linearly independent filters setting the basis along the eigenvectors of the data Sparse coding: make the reconstruction faithful keep the units/neurons quiet 30

Sparse coding PCA: trying to find linearly independent filters setting the basis along the eigenvectors of the data Sparse coding: make the reconstruction faithful keep the units/neurons quiet 30

Sparse coding Neural dynamics: gradient descent 31

Sparse coding 32