On Computational Limitations of Neural Network Architectures

Similar documents
COMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

ECLT 5810 Classification Neural Networks. Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997

Ensembles of Classifiers.

Simple Neural Nets For Pattern Classification

Neural Networks and the Back-propagation Algorithm

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

Computational Intelligence Lecture 3: Simple Neural Networks for Pattern Classification

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Learning with multiple models. Boosting.

Electrical Engineering and Computer Science Department The University of Toledo Toledo, OH 43606

PAC Model and Generalization Bounds

Single layer NN. Neuron Model

Artificial Neural Networks. Part 2

Chapter 6 The Structural Risk Minimization Principle

Learning Objectives. c D. Poole and A. Mackworth 2010 Artificial Intelligence, Lecture 7.2, Page 1

Computational Intelligence Winter Term 2017/18

ECE521 Lectures 9 Fully Connected Neural Networks

k k k 1 Lecture 9: Applying Backpropagation Lecture 9: Applying Backpropagation 3 Lecture 9: Applying Backpropagation

Computational Intelligence

Intelligent Systems Discriminative Learning, Neural Networks

EEE 241: Linear Systems

Neural Networks Introduction

CSC321 Lecture 4: Learning a Classifier

CSC321 Lecture 5: Multilayer Perceptrons

Reification of Boolean Logic

A Pseudo-Boolean Set Covering Machine

Radial-Basis Function Networks

Protein Structure Prediction Using Neural Networks

Three Analog Neurons Are Turing Universal

Neural Networks DWML, /25

Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks

Data Mining. Preamble: Control Application. Industrial Researcher s Approach. Practitioner s Approach. Example. Example. Goal: Maintain T ~Td

Artificial Intelligence Hopfield Networks

CS 4700: Foundations of Artificial Intelligence

Vote. Vote on timing for night section: Option 1 (what we have now) Option 2. Lecture, 6:10-7:50 25 minute dinner break Tutorial, 8:15-9

CSC321 Lecture 4: Learning a Classifier

Linear discriminant functions

Eigenface-based facial recognition

Parts 3-6 are EXAMPLES for cse634

Notes on Machine Learning for and

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

The Power of Extra Analog Neuron. Institute of Computer Science Academy of Sciences of the Czech Republic

Computational Learning Theory

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Lecture 21: Algebraic Computation Models

Artificial Neural Networks. Edward Gatt

Lecture 4: Perceptrons and Multilayer Perceptrons

Is there an Elegant Universal Theory of Prediction?

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2.

.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. for each element of the dataset we are given its class label.

Course 395: Machine Learning

Multilayer Perceptron Tutorial

Neural Networks Lecture 2:Single Layer Classifiers

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015

Modified Learning for Discrete Multi-Valued Neuron

DRAFT. Algebraic computation models. Chapter 14

CE213 Artificial Intelligence Lecture 14

Lecture 13: Introduction to Neural Networks

EVALUATING MISCLASSIFICATION PROBABILITY USING EMPIRICAL RISK 1. Victor Nedel ko

Algorithms for Picture Analysis. Lecture 07: Metrics. Axioms of a Metric

CSE 311: Foundations of Computing I Autumn 2014 Practice Final: Section X. Closed book, closed notes, no cell phones, no calculators.

The Perceptron. Volker Tresp Summer 2016

Week Cuts, Branch & Bound, and Lagrangean Relaxation

Algorithm-Independent Learning Issues

Learning Vector Quantization

Cse537 Ar*fficial Intelligence Short Review 1 for Midterm 2. Professor Anita Wasilewska Computer Science Department Stony Brook University

The Perceptron. Volker Tresp Summer 2014

Machine Learning. Neural Networks

Using a Hopfield Network: A Nuts and Bolts Approach

BITS F464: MACHINE LEARNING

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

Week 4: Hopfield Network

Computational Models of Human Cognition

Neural Networks. Nethra Sambamoorthi, Ph.D. Jan CRMportals Inc., Nethra Sambamoorthi, Ph.D. Phone:

Learning and Neural Networks

Module Contact: Dr Pierre Chardaire, CMP Copyright of the University of East Anglia Version 1

Data Mining Part 5. Prediction

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Perceptron. (c) Marcin Sydow. Summary. Perceptron

CS154, Lecture 12: Kolmogorov Complexity: A Universal Theory of Data Compression

Artificial Neural Networks Examination, June 2005

The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt.

Artificial Neural Networks

Artificial Intelligence (AI) Common AI Methods. Training. Signals to Perceptrons. Artificial Neural Networks (ANN) Artificial Intelligence

Principles of Artificial Intelligence Fall 2005 Handout #7 Perceptrons

Part 8: Neural Networks

CS 4700: Foundations of Artificial Intelligence Ungraded Homework Solutions

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

Computational Intelligence Lecture 6: Associative Memory

Revision: Neural Network

In the Name of God. Lecture 11: Single Layer Perceptrons

Computational Learning Theory

Genetic Algorithms: Basic Principles and Applications

Simple Neural Nets for Pattern Classification: McCulloch-Pitts Threshold Logic CS 5870

Radial-Basis Function Networks

COMPUTER SCIENCE TRIPOS

Transcription:

On Computational Limitations of Neural Network Architectures Achim Hoffmann + 1

In short A powerful method for analyzing the computational abilities of neural networks based on algorithmic information theory is introduced. It is shown that the idea of many interacting computing units does not essentially facilitate the task of constructing intelligent systems. Furthermore, it is shown that the same holds for building powerful learning systems. This holds independently from the epistemological problems of inductive inference. + 2

Overview Describing neural networks Algorithmic information theory The complexity measure for neural networks Computational Limits of a particular net structure Limitations of learning in neural networks Conclusions + 3

Describing neural networks In general the following two aspects can be distinguished. a) the functionality of a single neuron. Often a certain threshold function of the sum of the weighted inputs to the neuron is proposed. b) the topological organization of a complete network consisting of a large number of neurons. Often nets are organized in layers. Thus, nets can be distinguished depending on their number of layers. + 4

Describing neural networks Each node ν in a neural network can be described by the following items: The number i of input signals of the particular node The nodes in the network whose output signals are connected to each input of ν The specification of the I/O behavior of ν. + 5

Describing neural networks The specification of the I/O behavior of ν ν may be in different internal states. Let the set of all possible internal states be S ν. For each computation step of the network, ν computes a function f : {0, 1} i S ν {0, 1} as output value of ν. Furthermore, ν possibly changes its internal state determined by a function g : {0, 1} i S ν S ν. Both functions f and g are encoded as programs p f, p g of minimal length. + 6

2 2 2 2 2 2 A B C D A B C D 2 6 6 6 2 2 6 ABCDABCD ABCDABCD ABCDABCD ABCDABCD Two neural networks with a similar structure. + 7

Algorithmic Information Theory The amount of information necessary for printing certain strings is measured. Only binary strings consisting of 0 s and 1 s are considered. The length of the shortest program for printing a certain string s is called its Kolmogorov complexity K(s). + 8

Examples Strings of small Kolmogorov complexity: 11111111111111 or 0000000000000000000 or 1010101010101010 etc. Strings of rather large Kolmogorov complexity: 1000100111011001011101010 or 1001111010010110110111001 etc. + 9

The complexity of a neural net N Definition Let descr(n) be the binary encoded description of an arbitrary discrete neural net N. Then, the complexity of N comp(n) is given by the Kolmogorov complexity of descr(n) comp(n) = K(descr(N)) Note: comp(n) reflects the minimal amount of engineering work necessary for designing the network N. + 10

Computational Limitations Definition Let N be a static discrete neural network with i binary input signals s 1,..., s i and one binary output signal. Then the output behavior of N is in accordance to a binary string s of length 2 i, iff for any binary number b of the i digits applied as binary input values to N, N outputs exactly the value at the b th position in s. Theorem Let N be an arbitrary static discrete neural network. Then, N s ouput behavior must be in accordance to some binary sequence s with a Kolmogorov complexity K(s) comp(n) + const for a small constant const. + 11

output 2 2 2 2 2 2 2 A B C D A B C D A 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 B 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 C 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 D 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 S 0 0 1 0 0 0 1 1 0 1 1 0 1 1 1 0 + 12

Learning in Neural Networks We consider a set of objects X. The learning task: Determining for each object in X whether it belongs to the class to learn or not. A concept is a subset of X. A concept class C is a set of concepts (subsets) of X. For any learning system L there is exactly one concept class C 2 X that underlies L. + 13

X c 2 c 4 7 c 1 9 c 6 3 1 8 4 6 5 2 c 3 c 5 X = {1, 2, 3, 4, 5, 6, 7, 8, 9}, C = {c 1, c 2, c 3, c 4, c 5, c 6 }. c 1 = {1, 2, 3, 4, 5, 6, 7, 8, 9}, c 2 = {}, c 3 = {1, 3, 5, 7, 9}, c 4 = {1, 4, 6, 7}, c 5 = {2, 4, 6, 8}, c 6 = {1, 2, 4, 6, 9} + 14

The binary string representation s(c) of a concept c X indicates for each object whether it belongs to c by a corresponding 1. Definition The complexity K max (C) of a concept class C is given by the Kolmogorov complexity of the most complex concept in C, i.e. K max (C) = max c C [K(s(c))] + 15

Example X = {1, 2, 3, 4, 5, 6, 7, 8, 9} C = {c 1, c 2, c 3, c 4, c 5, c 6 } c 1 = {1, 2, 3, 4, 5, 6, 7, 8, 9}; s(c 1 ) = 111111111 c 2 = {}; s(c 2 ) = 000000000 c 3 = {1, 3, 5, 7, 9}; s(c 3 ) = 101010101 c 4 = {1, 4, 6, 7}; s(c 4 ) = 100101100 c 5 = {2, 4, 6, 8}; s(c 5 ) = 010101010 c 6 = {1, 2, 4, 6, 9}; s(c 6 ) = 110101001 K max (C) = K(s(c 6 )) = K(110101001) + 16

Learning complex concepts Theorem Let N be a neural net and comp(n) its complexity. Let C be the concept class underlying N. Then there are at least 2 K max(c) comp(n) const concepts in C, where const is a small constant integer. + 17

Probably approximately correct learning Assumptions Each x X appears with a fixed probability according to some probability distribution D on X. This holds during the learning phase as well as for the classification phase. Goals Achieving a high probability for correct classification Achieving the above goal with a high confidence probability + 18

Probably approximately correct learning Definition Let C be a concept class. We say a learning system L pac-learns C iff ( c t C)( D)( ε > 0)( δ > 0) L classifies correctly an object x randomly chosen according to D with probability at least of 1 ε. This has to happen with a confidence probability of at least 1 δ. + 19

Probably approximately correct learning Theorem Let N be a neural network. Let C be the concept class underlying N. Let be 0 < ε 1 4 ; 0 < δ 100 1. Then for pac-learning C, N requires at least K max (C) comp(n) const 32ε log 2 X examples randomly chosen according to D. where const is a small constant integer. + 20

Conclusions The potential of neural networks for modeling intelligent behavior is essentially limited by the complexity of their architectures. The ability of systems to behave intelligently as well as to learn does not increase by simply using many interacting computing units. Instead the topology of the network has to be rather irregular! + 21

Conclusions With any approach, intelligent neural network architectures require much engineering work. Simple principles cannot embody the essential features necessary for building intelligent systems. Any potential advantage of neural nets for cognitive modelling will become more and more neglectable with an increasing complexity of the system. + 22