VC-dimension of a context-dependent perceptron

Similar documents
On the complexity of shallow and deep neural network classifiers

Neural Networks and the Back-propagation Algorithm

CSC242: Intro to AI. Lecture 21

Artifical Neural Networks

Neural networks. Chapter 20. Chapter 20 1

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Simple Neural Nets For Pattern Classification

COMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Computational Learning Theory (VC Dimension)

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Neural networks. Chapter 19, Sections 1 5 1

Lecture 4: Perceptrons and Multilayer Perceptrons

Artificial Intelligence

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Artificial Neural Networks

Application of Artificial Neural Networks in Evaluation and Identification of Electrical Loss in Transformers According to the Energy Consumption

CSC321 Lecture 5: Multilayer Perceptrons

Artificial Neural Networks

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Multilayer Perceptron

Revision: Neural Network

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

A Novel Activity Detection Method

2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller

Multilayer Perceptrons and Backpropagation

Lecture 17: Neural Networks and Deep Learning

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Multilayer Neural Networks

Learning Deep Architectures for AI. Part I - Vijay Chakilam

Νεςπο-Ασαυήρ Υπολογιστική Neuro-Fuzzy Computing

Multilayer Neural Networks

Vapnik-Chervonenkis Dimension of Neural Nets

Neural Network Learning: Testing Bounds on Sample Complexity

Multi-layer Neural Networks

A summary of Deep Learning without Poor Local Minima

Support Vector Machines

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Artificial Neural Networks (ANN)

Lab 5: 16 th April Exercises on Neural Networks

Course 395: Machine Learning - Lectures

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

y(x n, w) t n 2. (1)

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015

Machine Learning Lecture 7

Neural Networks (Part 1) Goals for the lecture

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

MACHINE LEARNING. Support Vector Machines. Alessandro Moschitti

Learning and Memory in Neural Networks

Neural networks. Chapter 20, Section 5 1

Vapnik-Chervonenkis Dimension of Neural Nets

18.6 Regression and Classification with Linear Models

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Logistic Regression & Neural Networks

Neural Networks biological neuron artificial neuron 1

Machine Learning

The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt.

Classification with Perceptrons. Reading:

Intro to Neural Networks and Deep Learning

Multilayer Perceptron

AI Programming CS F-20 Neural Networks

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Lecture 13: Introduction to Neural Networks

Content. Learning. Regression vs Classification. Regression a.k.a. function approximation and Classification a.k.a. pattern recognition

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

From perceptrons to word embeddings. Simon Šuster University of Groningen

Discriminative Direction for Kernel Classifiers

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

Deep Neural Networks (1) Hidden layers; Back-propagation

Neural Network Control of Robot Manipulators and Nonlinear Systems

Perceptron. (c) Marcin Sydow. Summary. Perceptron

Artificial Neural Networks

VC Dimension Bounds for Product Unit Networks*

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2.

LEARNING & LINEAR CLASSIFIERS

Computational Intelligence Winter Term 2017/18

Input layer. Weight matrix [ ] Output layer

Introduction to Neural Networks

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

ECE521 Lectures 9 Fully Connected Neural Networks

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

1 What a Neural Network Computes

INTRODUCTION TO ARTIFICIAL INTELLIGENCE

COMP-4360 Machine Learning Neural Networks

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

100 inference steps doesn't seem like enough. Many neuron-like threshold switching units. Many weighted interconnections among units

Machine Learning. Neural Networks

Computational Intelligence

CSC 411 Lecture 10: Neural Networks

Understanding How ConvNets See

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018

The error-backpropagation algorithm is one of the most important and widely used (and some would say wildly used) learning techniques for neural

Lecture Support Vector Machine (SVM) Classifiers

Nonlinear Classification

Transcription:

1

VC-dimension of a context-dependent perceptron Piotr Ciskowski Institute of Engineering Cybernetics, Wroc law University of Technology, Wybrzeże Wyspiańskiego 27, 50 370 Wroc law, Poland cis@vectra.ita.pwr.wroc.pl Abstract. Ciekawym, czy pjd polskie znaczki:. A teraz: In the paper, we present the model of a context-dependent neural net - a net which may change the way it works according to the external conditions. The information about the environmental conditions is fed to the net through the context inputs, which are used to calculate the net s weights, and as a consequence modify the way the net reacts to the traditional inputs. We discuss the Vapnik-Chervonenkis dimension of such a neuron and show that the separating power of a context-dependent neuron and multilayer net grows with the number of adjustable parameters. We present the difference in the way traditional and context-dependent nets work and compare the input space transformations both of them are able to perform. We also show that context-dependent nets learn faster than traditional ones with the same VC-dimension. 1 Introduction The notion of context in computer science appeared some time ago, first in the area of formal languages. Now it is introduced to many areas of machine learning, classification, robotics and neural nets [5]. Medical applications seem to be an intuitive example of decisions dependence on external parameters. One of the first medical applications of context-sensitive neural networks was presented in [9], where a neural network is tuned to the parameters of a monitored patient. The paper presents a model of a context-dependent neural network - a network which may change the way it works according to the environmental conditions. In other words, such a network may react differently for the same values of inputs, depending on external conditions, later called context variables. The problem of defining and identifying primary, context-sensitive and irrelevant features among the input data is presented well in [1]. In the paper we assume that the division of the net s inputs into primary and context-sensitive ones for simplicity called context inputs has already been done. Paper supported by Wroc law University of Technology grant no. 332291

Different strategies of managing context-sensitive features are presented in [2]. The neural network model presented in the paper corresponds to the strategy 3 contextual classifier selection or strategy 5 contextual weighting. 2 Model of a Context-Dependent Neuron Consider a neuron model of the form: [ S ] [ S ] y = Φ w 0 + w s xs = Φ w s xs = Φ [W T ] X, 1 s=1 where y is the neuron s output, w s is its weight on the x s input and w 0 is the threshold which is included in the weight vector, while the input vector includes the bias x 0 = 1. Φ is the neuron s activation function - for example a sigmoidal 1 function: y u =. 1+e βu The dependence of the neuron s weight on the context vector is modeled by: w s = A T s V = [a s1, a s2,..., a sm ] [ v 1, v2,..., vm ] T, 2 where V is the vector of M linearly independent base functions spanning the weights dependence on the context vector, and A is the vector of coefficients approximating the s-th weight s dependence on the context. The number of adjustable parameters in each neuron is M S + 1 for the traditional neuron the number of parameters equals the number of weights: S + 1. This number is crucial for estimating the Vapnik-Chervonenkis dimension of the context-dependent perceptron. s=0 3 The VC-dimension of a Context-Dependent Neuron Vapnik-Chervonenkis dimension is the main quantity used for measuring the capacity of a learning machine, its generalization abilities or the number of learning examples needed to obtain the required accuracy of predictions. In the following we shall compare the results for the traditional and the context-dependent neuron. For more details on the VC-dimension of neural nets, see [6]. Theorem 1. [6] Consider a standard real-weight perceptron with S N real inputs { and denote} the set of functions it computes by H stand. Then a set S stand = X1, X 2,..., X n R S is shattered by H stand iff S stand is affinely independent, that is iff the set { X T 1, 1 in R S+1. It follows that:, X T 2, 1,..., X T n, 1 } is linearly independent VCdim H stand = S + 1 3 For the lack of space we omit the proofs of the following theorems, which may however be reconstructed by analogy to those presented in [6].

Theorem 2. Consider a context-dependent real-weight perceptron with S N real inputs, P N real context inputs and P N base functions. Denote the set of functions it computes by H cont. Then a set S cont = { X 1, 1, X 2, 2,..., } X n, n R S is shattered by H cont only if in { the subsets of S cont containing points with the same value of context, e.g. = z: X T z,1, 1, X T z,2, 1,..., } X T z,n z, 1, all points are linearly independent in R S+1. It follows that: VCdim H cont = M S + 1 4 It is known [6] that for standard feed-forward linear threshold networks with a total of W weights the VC-dimension grows as O W 2. Theorem 3. Suppose N cont is a context-dependent feed-forward linear threshold network consisting of context-dependent neurons given by 1, with a total of W weights, where each weight is given by a combination of M coefficients and base functions as in 2. Let H cont be the class of functions computed by this network. Then VCdim H cont [ = O MW 2] 5 The difference in the way traditional and context-dependent nets work can be seen in the following example. Suppose we have a traditional neuron with S + 1 inputs including bias and add another P contextual variables as traditional ones. We therefore expand the neuron s input space from R S+1 to R S+1+P the same expansion is done with its parameter space and the transformation done by the neuron R S+1+P R is still hyperplane, but in a higher-dimensional input space. When we add these P inputs as context ones and expand the base function vector with M functions M may be greater than P, the neuron s input space remains R S+1, while its parameter space growths to R MS+1 and the division R S+1+P R done by the neuron is not a hyperplane but a hypersurface, the more complicated, the more M is, remaining a hyperplane for a fixed value of context - this is the reason why the separating power of a context-dependent net is greater for sets of points in different contexts. 4 Learning of Context-Dependent Nets An interesting learning algorithm for context-dependent nets is presented in [7]. It uses the properties of the Kronecker product and allows to train the net using all examples from different contexts during training. It is a gradient descent algorithm, in which the gradient of the quality function: Q A = E X,,Y [Φ 1 Y W T ] 2 X = 6 [ = E X,,Y Φ 1 Y A T X V ] 2 7

is given by grad A = E X,,Y [Φ 1 Y A T X V ] X V 8 It should be emphasized that the neuron s output is calculated directly from the input vector X and the vector of base functions V without having to calculate the neuron s weight. The same Kronecker product is then used for calculating the target function s gradient w.r.t. the coefficient vector A. If all the net s layers have the same base functions vectors this calculation is also done once per epoch. These facts result in much less calculations in each learning epoch of the context-dependent net. This estimation may be slightly disturbed by the necessity of calculating the weights for backpropagation algorithm - but in this case it is only necessary to calculate the weight of neurons in all layers except the first one, which usually contains most neurons. 5 Conclusions The model of a context-dependent perceptron has been presented in the paper, as well as learning algorithms. It has been shown, that similarly to the traditional neurons, the Vapnik-Chervonenkis dimension of a context-dependent neuron and net grows with the number of adjustable parameters but, as this number is greater than that of a traditional one, the separating power of such a neuron is much greater and depends not on the context variables, but on the way the network designer uses them by choosing the base functions v. The growth of the Vapnik-Chervonenkis dimension is both a benefit and a problem - the number of examples needed for the learning algorithm to achieve the desired error is larger. The advantage of context-dependent nets over the traditional ones is that when comparing the nets with the same number of parameters the same VC-dimension, the context-dependent ones learn faster and this difference gets more significant with the growth of the nets size. References 1. Turney P.: The Identification of Context-Sensitive Features: A Formal Definition of Context for Concept Learning, Proc. of 13th International Conference on Machine Learning ICML96, Workshop on Learning in Context-Sensitive Domains, Bari, Italy, 1996 2. Turney P.: The Management of Context-Sensitive Features: A Review of Strategies, Proc. of ICML96, Bari, Italy, 1996 3. Turney P.: Exploiting context when learning to classify, Proc. of ICML93, Springer- Verlag 4. Harries M., Sammut C., Horn K.: Extracting hidden contexts, Machine Learning 32 5. Yeung D.T., Bekey G.A.: Using a context-sensitive learning to robot arm control, Proc. IEEE Int. Conf. on Robotics and Automation, pp 1441-1447, 1989 6. Anthony M., Bartlett P.L.: Neural Network Learning: Theoretical Foundations, Cambridge University Press, 1999, Cambridge

7. Rafaj lowicz E.: Context Dependent Neural Nets - Problem Statement and Examples Part 1, Learning Part 2, Proc. of 3rd Conference Neural Networks and Their Applications, akopane, Poland, 1999 8. Ciskowski P., Rafaj lowicz E.: Context Dependent Neural Nets - Structures and Learning, to be published in IEEE Trans. on Neural Networks 9. Watrous R.L., Towell G.: A Patient-Adaptive Neural Network ECG Patient Monitoring Algorithm. In Proc. Computers in Cardiology 1995, Vienna, Austria.