Dynamical Systems and Deep Learning: Overview. Abbas Edalat

Similar documents
Introduction to Chaotic Dynamics and Fractals

Hertz, Krogh, Palmer: Introduction to the Theory of Neural Computation. Addison-Wesley Publishing Company (1991). (v ji (1 x i ) + (1 v ji )x i )

COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017

Markov Chains and MCMC

Neural Networks for Machine Learning. Lecture 11a Hopfield Nets

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

7 Rate-Based Recurrent Networks of Threshold Neurons: Basis for Associative Memory

How to do backpropagation in a brain

Artificial Intelligence Hopfield Networks

Introduction to Dynamical Systems Basic Concepts of Dynamics

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

Hopfield Network for Associative Memory

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch

The Origin of Deep Learning. Lili Mou Jan, 2015

In biological terms, memory refers to the ability of neural systems to store activity patterns and later recall them when required.

Knowledge Extraction from DBNs for Images

7 Recurrent Networks of Threshold (Binary) Neurons: Basis for Associative Memory

How to do backpropagation in a brain. Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto

CSC321 Lecture 20: Autoencoders

Sample Exam COMP 9444 NEURAL NETWORKS Solutions

Motivation. Lecture 23. The Stochastic Neuron ( ) Properties of Logistic Sigmoid. Logistic Sigmoid With Varying T. Logistic Sigmoid T = 0.

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

Neural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28

Learning Tetris. 1 Tetris. February 3, 2009

Learning and Memory in Neural Networks

Hopfield Networks and Boltzmann Machines. Christian Borgelt Artificial Neural Networks and Deep Learning 296

arxiv: v2 [nlin.ao] 19 May 2015

Hopfield Neural Network

Chapter 11. Stochastic Methods Rooted in Statistical Mechanics

Restricted Boltzmann Machines for Collaborative Filtering

Greedy Layer-Wise Training of Deep Networks

Artificial Neural Networks Examination, March 2004

Neural Nets and Symbolic Reasoning Hopfield Networks

Memories Associated with Single Neurons and Proximity Matrices

Machine Learning. Neural Networks

Chaos and Liapunov exponents

Evolutionary Computation

Recurrent Neural Networks

UNSUPERVISED LEARNING

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Plan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation

Introduction to Neural Networks

Neural networks. Chapter 20. Chapter 20 1

Approximate inference in Energy-Based Models

ECE521 Lecture 7/8. Logistic Regression

Universal Approximation Theory of Neural Networks. Simon Odense B.Sc., University of Victoria, 2013

5.0 References. Unsupervised Learning for Boltzmann Machines 15

Neural networks. Chapter 19, Sections 1 5 1

6. APPLICATION TO THE TRAVELING SALESMAN PROBLEM

Lecture 16 Deep Neural Generative Models

Handout 2: Invariant Sets and Stability

Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. Revised submission to IEEE TNN

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Chapter 16. Structured Probabilistic Models for Deep Learning

Implementation of a Restricted Boltzmann Machine in a Spiking Neural Network

Lecture 5: Recurrent Neural Networks

Learning Energy-Based Models of High-Dimensional Data

EEE 241: Linear Systems

Why is Deep Learning so effective?

Introduction to Restricted Boltzmann Machines

CS 4700: Foundations of Artificial Intelligence

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Stochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence

Jakub Hajic Artificial Intelligence Seminar I

DEVS Simulation of Spiking Neural Networks

Computational complexity and some Graph Theory

Neural Networks: Backpropagation

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks

Artificial neural networks

Deep Belief Networks are Compact Universal Approximators

Asynchronous updating of threshold-coupled chaotic neurons

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 14. Sinan Kalkan

Cellular Automata. ,C ) (t ) ,..., C i +[ K / 2] Cellular Automata. x > N : C x ! N. = C x. x < 1: C x. = C N+ x.

Machine Learning Basics III

Learning with Momentum, Conjugate Gradient Learning

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Stochastic Networks Variations of the Hopfield model

A Hybrid Deep Learning Approach For Chaotic Time Series Prediction Based On Unsupervised Feature Learning

Evolutionary Games and Computer Simulations

Bidirectional Representation and Backpropagation Learning

Show that the following problems are NP-complete

Probabilistic Models in Theoretical Neuroscience

HOPFIELD neural networks (HNNs) are a class of nonlinear

Downloaded from

Mathematical Biology - Lecture 1 - general formulation

Notes for Lecture 11

Associative Memories (I) Hopfield Networks

Representational Power of Restricted Boltzmann Machines and Deep Belief Networks. Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber

( ) T. Reading. Lecture 22. Definition of Covariance. Imprinting Multiple Patterns. Characteristics of Hopfield Memory

Kyle Reing University of Southern California April 18, 2018

Calculus. Weijiu Liu. Department of Mathematics University of Central Arkansas 201 Donaghey Avenue, Conway, AR 72035, USA

Basic Principles of Unsupervised and Unsupervised

7.1 Basis for Boltzmann machine. 7. Boltzmann machines

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Neural networks: Unsupervised learning

1 Randomized Computation

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Theory of Computation

Algorithms and Their Complexity

Deep unsupervised learning

Transcription:

Dynamical Systems and Deep Learning: Overview Abbas Edalat

Dynamical Systems The notion of a dynamical system includes the following: A phase or state space, which may be continuous, e.g. the real line, or discrete, e.g. strings of bits 0 or 1, whose elements represent the states of the system. Time, which may be discrete, e.g., recursive equations, or continuous, e.g., differential or stochastic processes. At any given point in time, there is only one state. An evolution law that determines the state at time t from the states at all previous times. This defines the orbit or the trajectory of a state in the phase space. Interested in the long term behaviour of orbits of points.

Dynamical Systems: A simple example Let R be the state space. Let time be discrete t = 0, 1, 2,.... Q : R R with Q(x) = x 2 the time independent law: If at any time t the state is x R, then at time t + 1 the state will become Q(x) = x 2 R. At time t = 0 start at state x 0 R then the orbit of x 0 is: x 0, Q(x 0 ), Q(Q(x 0 )), Q(Q(Q(x 0 ))),... Q(Q... (Q(x 0 ))...),... also written as x 0, Q(x 0 ), Q 2 (x 0 ), Q 3 (x 0 ),..., Q n (x 0 ),... What is the long term behaviour of such an orbit? If x 0 < 1 then Q n (x 0 ) 0 as n. If x 0 > 1 then Q n (x 0 ) as n. What happens when x 0 = 1?

Basic concepts in dynamical systems We study attractors, repellers, bifurcations etc. Bifurcation diagram of the quadratic family F c : x cx(1 x) : R R for 2.5 c 4. (i) Fix c and a random x 0 [0, 1]. (ii) Plot f n c (x 0 ) for 20 n 100. For c > 3.57, the map F c can exhibit chaotic dynamics: the orbit of a typical point in [0, 1] wanders erratically in [0, 1].

Koch curve: an example of a self-similar fractal E 0 2 3 1 4 E1 23 22 12 13 21 11 14 24 31 32 33 34 43 42 41 44 E2 E 3 (a) F (b)

Agent-Based Models Agent-based models are systems in which at any given point in time there are many interacting agents present. They can be considered as dynamical systems with many concurrent states. Agent Based Models deal directly with spatially distributed agents such as neurons, animals or autonomous agents. They can be used for learning. The actions and interactions of individual agents or units are taken into account with a view of assessing their effects on the system as a whole. We are interested to know the emerging patterns in the long term evolution of the interacting agents. These long term emerging patterns cannot be deduced using ordinary mathematical analysis applied to the local rules for the interacting agents.

A simple deterministic example: Hopfield networks Hopfield networks: the first model of associative memory in neural networks used for pattern recognition. N neurons with values ±1. The network has a connection or synaptic weight, a real number, between any two neurons. The connection weights can be determined so as to store images in the network memory. State of the network is given by values of its neurons. There is a time independent updating rule that updates the values of each neuron either asynchronously or synchronously using the network synaptic weights. With the asynchronous updating rule the orbit of any given initial state converges to an attractor, the closest pattern in the network memory to the initial state.

A simple stochastic example: Markov chains 1/2 1/2 1/2 1 1/2 2 1/3 1/6 3 1/3 4 1/6 1 Markov chains: Finite state space. E,g., a Markov chain with states labelled 1,2,3,4 as above. At time t there is a time independent probablity of transition from any state to any other state. We are interested in the long term behaviour of the system. What can be said about orbits in the above example?

Other Agent-Based Models in this course Boltzmann Machines: stochastic extension of Hopfield networks with hidden units. Learns probability distribution associated with a data set. Very inefficient. Restricted Boltzmann Machines (RBM): have connections only between any hidden unit and any visible unit. A simple training algorithm has revolutionised machine learning, Deep Belief Nets: are obtained by stacking RBM s. Consistently outperformed many rival techniques. Small World Networks: which model social and biological networks, are distinguished by low average path length, high clustering and scale-free properties. Kaufmann Networks: are Boolean networks which model gene mutation and evolution.

Computational complexity: Big O Notation Let f (x) and g(x) be two real-valued functions defined on some subset of R (e.g., N). One writes f (x) = O(g(x)) or f (x) O(g(x)) as x if for sufficiently large values of x, the value f (x) is at most a constant times g(x) in absolute value. That is, f (x) = O(g(x)) if there exists a positive real number M and a real number x 0 such that f (x) M g(x) for all x > x 0. Examples: 2x 3 log x + x 2 (log x) 4 4x = O(x 3 log x). x 4 + 6 2 x+7 3 x 13 = O(3 x ).

Little o Notation and Equivalence Given a real-valued function f defined on some subset of R and a R, we say f (x) a as x if for any ɛ > 0 there exists K > 0 such that f (x) a < ɛ for all x > K. Let f (x) and g(x) be two functions defined on some subset of the real numbers. One writes f (x) = o(g(x)) or f (x) o(g(x)) as x if f (x)/g(x) 0 as x. So, in words, f (x) = o(g(x)) if f (x) is negligible compared to g(x) for large enough x. We write f g (i.e., f and g are equivalent) as x if f (x)/g(x) 1 as x.

Examples (log x) n = o(x a ) as x for any n > 0 and any a > 0. P(x) = o(2 x ) for any polynomial P as x. sin(1/x) 1/x as x. 3x 4 + 10x 3 + 8x 2 3x 4 as x. x 2 + 7 2 x 7x 9 + 3 x 5 x 7(2/5)x as x.

Asymptotic behaviour We can use the Big O notation to describe the space complexity (how the CPU or memory resources vary with the algorithm s input size) as well as time complexity (how the time taken for the algorithm to complete varies with its input size). We may be interested in the best, worst, and average cases. By default it usually refers to the average case, using random data. The frequently encountered O values are: constant O(1), logarithmic O(log n), linear O(n), O(n log n), quadratic O(n 2 ), cubic O(n 3 ), polynomial O(n d ) for some d N. We also use the notation to describe the asymptotic behaviour of characteristic quantities in dynamical and complex systems.

Organisation This course consists of 20 lectures; 8 tutorials; 2 assessed courseworks; Paragraphs, pages or subsections or exercises that are labelled with (*) are non-examinable although they are useful to know to follow the course. Some of the pictures in the notes have been reproduced from the books listed as references.

Suggested Reading Murphy, K. P., Machine learning: A probabilistic Perspective. MIT Press 2012 Hinton, G., A Practical Guide to Training Restricted Boltzmann Machines, 2010, Available on-line. Devaney, R. L. An Introduction to Chaotic Dynamical Systems. Westview Press, 2003. Gros, C. Complex and Adaptive Dynamical systems. Springer, 2008. Hertz, J. and Krogh, A. and Palmer, R.G. Introduction to the Theory of Neural Computation. Addison-Wesley, 1991.