Neural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28

Similar documents
Introduction to Neural Networks

Neural networks. Chapter 19, Sections 1 5 1

Machine Learning. Neural Networks

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Neural networks. Chapter 20, Section 5 1

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Neural Nets and Symbolic Reasoning Hopfield Networks

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Basic Principles of Unsupervised and Unsupervised

UNSUPERVISED LEARNING

How to do backpropagation in a brain

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Artificial Intelligence

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Artificial Neural Networks Examination, June 2005

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

Neural Networks for Machine Learning. Lecture 11a Hopfield Nets

Neural networks. Chapter 20. Chapter 20 1

Grundlagen der Künstlichen Intelligenz

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Plan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Artificial Neural Networks. Q550: Models in Cognitive Science Lecture 5

COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017

Introduction to Artificial Neural Networks

Learning and Memory in Neural Networks

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

CMSC 421: Neural Computation. Applications of Neural Networks

Artificial Neural Networks Examination, June 2004

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Revision: Neural Network

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Learning Deep Architectures for AI. Part II - Vijay Chakilam

RegML 2018 Class 8 Deep learning

Neural Networks and Deep Learning.

Artificial Neural Networks

Unit III. A Survey of Neural Network Model

Lecture 5: Logistic Regression. Neural Networks

How to do backpropagation in a brain. Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto

Feedforward Neural Nets and Backpropagation

Probabilistic Models in Theoretical Neuroscience

Introduction to Deep Learning

Artificial Neural Networks Examination, March 2004

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

INTRODUCTION TO ARTIFICIAL INTELLIGENCE

Course 395: Machine Learning - Lectures

Lecture 5: Recurrent Neural Networks

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Neural Nets in PR. Pattern Recognition XII. Michal Haindl. Outline. Neural Nets in PR 2

Credit Assignment: Beyond Backpropagation

Input layer. Weight matrix [ ] Output layer

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

An efficient way to learn deep generative models

CS:4420 Artificial Intelligence

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Deep Feedforward Networks. Seung-Hoon Na Chonbuk National University

Neural Networks. William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ]

Part 8: Neural Networks

Introduction Biologically Motivated Crude Model Backpropagation

Introduction to Convolutional Neural Networks (CNNs)

Chapter 2 Single Layer Feedforward Networks

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

Fundamentals of Computational Neuroscience 2e

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

AI Programming CS F-20 Neural Networks

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Lecture 16 Deep Neural Generative Models

Introduction to Neural Networks

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep Neural Networks

Data Mining Part 5. Prediction

Lecture 17: Neural Networks and Deep Learning

Multilayer Neural Networks

Artificial Neural Network

Deep Feedforward Networks

Neural Networks biological neuron artificial neuron 1

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Jakub Hajic Artificial Intelligence Seminar I

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Deep unsupervised learning

Multilayer Perceptrons (MLPs)

CHALMERS, GÖTEBORGS UNIVERSITET. EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

Introduction to Neural Networks

4 3hrs. Stefano Rovetta Introduction to neural networks 20/23-Jul / 109

Sample Exam COMP 9444 NEURAL NETWORKS Solutions

Neural Networks DWML, /25

COMP9444 Neural Networks and Deep Learning 2. Perceptrons. COMP9444 c Alan Blair, 2017

Neural Networks and Deep Learning

Transcription:

1 / 28 Neural Networks Mark van Rossum School of Informatics, University of Edinburgh January 15, 2018

2 / 28 Goals: Understand how (recurrent) networks behave Find a way to teach networks to do a certain computation (e.g. ICA) Network choices Neuron models: spiking, binary, rate (and its in-out relation). Use separate inhibitory neurons (Dale s law)? Synaptic transmission dynamics?

Overview 3 / 28 Feedforward networks Perceptron Multi-layer perceptron Liquid state machines Deep layered networks Recurrent networks Hopfield networks Boltzmann Machines

AI 4 / 28 [?]

AI 5 / 28

6 / 28 History McCullough & Pitts (1943): Binary neurons can implement any finite state machine. Rosenblatt: Perceptron learning rule: Learning of (some) classification problems. Backprop: Universal function approximator. Generalizes, but has local maxima.

7 / 28 Perceptrons Supervised binary classification of N-dimensional x pattern vectors. y = H(w.x + b), H is step function General trick: replace bias b = w b.1 with always on input. Perceptron learning algorithm: Learnable if patterns are linearly seperable. If learnable, rule converges. XXXCOntinuous input??? XXX figure Cerebellum?

8 / 28 Multi-layer perceptron (MLP) Overcomes limited functions the single perceptron With continuous units, MLP can approximate any function! Tradionally one hidden layer. More layers does enhance repetoire (but could help learning, see below). Learning: backpropagation of errors. Error: E = E µ = P µ=1 (y µ goal y µ actual (xµ ; w)) 2 Gradient descent (batch) w ij = η E w ij, where w are all the weights (input hidden, hidden output, bias). other cost functions are possible XXX picture Stochastic descent: use w ij = Eµ w i j. Learning MLPs is slow, local maxima.

9 / 28 Deep MLPs Traditional MLPs are also called shallow While deeper nets do not have more computational power, they can lead to better representations. Better representations lead to better generalization and better learning. Learning slows down in deep networks, as transfer functions g() saturate at 0 or 1. Solutions: pre-training convolutional networks Better representation by adding noisy/partial stimuli

10 / 28 Liquid state machines [?] Motivation: arbitrary spatio-temporal computation without precise design. Create pool of spiking neurons with random connections. Results in very complex dynamics if weights are strong enough Similar to echo state networks (but those are rate based). Both are known as reservoir computing Similar theme as HMAX model: only learn at the output layer.

11 / 28

Optimal reservoir? Best reservoir has rich yet predictable dynamics. Edge of Chaos [?] Network 250 binary nodes, w ij = N (0, σ 2 ) (x-axis is recurrent strength) 12 / 28

13 / 28 Optimal reservoir? Task: Parity(in(t), in(t 1), in(t 2)) Best (darkest in plot) at edge of chaos. Does chaos exist in the brain? In spiking network models: yes [?] In real brains:?

Relation to Support Vector Machines 14 / 28 Map problem in to high dimensional space F; there it often becomes linearly separable. This can be done without much computational overhead (kernel trick).

Hopfield networks 15 / 28 All to all connected network (can be relaxed) Binary units s i = ±1, or rate with sigmodial transfer. Dynamics s i (t + 1) = sign( j w ijs j (t)) Using symmetric weights w ij = w ji, we can define energy E = 1 2 ij s iw ij s j.

16 / 28 Under these conditions network moves from initial condition (stimulus, s(t = 0) = x) into the closest attractor state ( memory ). Auto-associative, pattern completion Simple (suboptimal) learning rule: w ij = M (µ indexes patterns x µ ). µ x µ i x µ j

Indirect experimental evidence using maze deformation[?] 17 / 28

Winnerless competition 18 / 28 How to escape from attractor states? Noise, asymmetric connections, adaptation. From [?].

Boltzmann machines Hopfield network is not smart. In Hopfield network it is impossible to learn only (1, 1, 1), ( 1, 1, 1), (1, 1, 1), ( 1, 1, 1) but not ( 1, 1, 1), (1, 1, 1), ( 1, 1, 1), (1, 1, 1) (XOR again)... Because x i = x i x j = 0 Two, somewhat unrelated, modifications: Introduce hidden units, these can extract features. Stochastic updating: p(s i = 1) = 1 1+e βe i E i = j w ijs j θ i, E = i E i. T = 1/β is temperature (set to some arbitrary value). 19 / 28

Learning in Boltzmann machines The generated probability for state s α, after equilibrium is reached, is given by the Boltzmann distribution P α = 1 Z γ e βhαγ H αγ = 1 w ij s i s j 2 ij Z = αβ e βhαγ where α labels states of visible units, γ the hidden states. 20 / 28

As in other generative models, we match true distribution to generated one. Minimize KL divergence between input and generated dist. C = α G α log G α P α Minimize to get [?] w ij = ηβ[ s i s j clamped s i s j free ] (note, w ij = w ji ) Wake ( clamped ) phase vs. sleep ( dreaming ) phase Clamped phase: Hebbian type learning. Average over input patterns and hidden states. Sleep phase: unlearn erroneous correlations. The hidden units will discover statistical regularities. 21 / 28

22 / 28 Boltzmann machines: applications Shifter circuit. Learning symmetry [?]. Create a network that categorizes horizontal, vertical, diagonal symmetry (2nd order predicate).

23 / 28 Restricted Boltzmann Need for multiple relaxation runs for every weight update (triple loop), makes training Boltzmann networks very slow. Speed up learning in restricted Boltzmann: No hidden-hidden connections Don t wait for the sleep state to fully settle Stack multiple layers (deep-learning) Application: high quality autoencoder (i.e. compression) [?] [also good webtalks by Hinton on this]

Le etal. ICML 2012 Deep auto-encoder network with 10 9 weights learns high level features from images unsupervised. 24 / 28

Relation to schema learning? 25 / 28 Maria Shippi & MvR Cortex learns semantic /scheme (i.e. statistical) information Presence of a schema can speed up subsequent fact learning.

26 / 28 Discussion Networks still very challenging Can we predict activity? What is the network trying to do? What are the learning rules?

References I 27 / 28