On The Equivalence of Hierarchical Temporal Memory and Neural Nets
|
|
- Vernon Sanders
- 5 years ago
- Views:
Transcription
1 On The Equivalence of Hierarchical Temporal Memory and Neural Nets Bedeho Mesghina Wolde Mender December 7, 2009 Abstract In this paper we present a rigorous definition of classification in a common family of Hierarchical Temporal Memory networks, and provide a formal analysis of their classification power. We determine the classification power of a single unit, and subsequently demonstrate how to reduce a network of arbitrary topology and size to such a single unit. This reduction is shown to require no worse then an exponential increase in memory, and we finally we show how such a network, via a reduction to a single unit, is equivalent to a simple 3 layer feed forward network of threshold units. 1 Introduction With our maturing understanding of cortical computation in mammals, particularly in the visual cortex, there has been developed a number of biologically inspired vision systems and models of cortex, e.g. K. Fukushima [3], M. Riesenhuber and T. Poggio [2], T.S. Lee and D. Mumford [4], G. Wallis and E. T. Rolls [1], T. Dean [9] and finally D. George and J. Hawkins [5]. These models generally mimic the hierarchical organization and the alternating simple and complex cell structure of the visual cortex [6]. In the context of machine learning, this family of models may be called hierarchical classifiers, and it is from this perspective they are studied in this paper. We consider the model of D. George and J. Hawkins, called Hierarchical Temporal Memory (HTM, first outlined in the 2004 book OnIntelligence by J. Hawkins as a biological model of cortex. We chose this model, not only because of the significant empirical performance it has demonstrated [5] or because it subsumes similar models [7], but also because it is being developed actively and enjoys a growing comunity of developers using a general purpose computing platform based on it 1. The goal of this paper is to demonstrate the usefullness of formal analysis of biologically inspired models of cortex. It is also the opinion of the author of this paper that if such models aspire to be not only devices of utility but also bmmender@ifi.uio.no, Institute of Informatics, University of Oslo, Norway. 1 Numenta Inc. ( is a private company, founded by D. George and J. Hawkins, developing a general purpose computing platform, called NuPiC, based on the HTM modell. 1
2 accurate descriptions of cortical computation, then foundational insight into their computational mechanics is a necessary step towards a full understanding of the cortex. 2 Hierarchical Temporal Memory In this paper we consider fully trained HTM networks ready for classification tasks, for more information on how learning, prediction and other features work, please consult [7, 5, 8]. A HTM network is a tree of nodes where the input is fed into the leaf nodes and the result is outputted from the top node. The leaf nodes are given nonoverlapping sections of the total network input, and the internal nodes are given input from their children. A leaf node has a finite list of vectors from its input space, called spatial patterns, and these are partitioned into groups called temporal groups. 2 Given some input x, the output of the node is a vector with a score for each of its temporal groups, where the score of a group is the maximum score among all spatial patterns in that group. The score of each individual spatial pattern is a simple matching score between the pattern and the input x. An internal node also has in finite list of vectors called coincidence vectors. These are as long as the number of children of the node, and each position in a coincidence vector references a temporal group in the corresponding child node. So for example if a node has a coincidence vector (2, 4, 6, then the node has three children and this specific coincidence is referencing the second, fourth and sixth temporal group in the children respectively. As in the leaf nodes, these are partitioned in to temporal groups and the score of each group is the maximum score among all coincidence vectors in that group. The score of each coincidence is calculated by taking the product of the scores of the temporal groups in the children referenced by the coincidence. With this in mind we give a formal definition, but first some preliminaries. Let x = (x 1,..., x n R n and y = (y 1,..., y m R m. Vector concatenation of y onto x is denoted x y, that is x y = (x 1,..., x n, y 1,..., y m. We may project out a contiguous section, or just one position, of a vector by writing x[i, j] = (x i,..., x j or x[i] = x i respectively. We use the standard norm, that is x 2 = n i=1 (x i 2. Finally, a cover for some set T is a family of sets {T 1,..., T p } such that T = i T i, and this cover is called a partitioning if T i T j = for i j. Definition 1. A gaussian node is defined as G = (l, r, σ, S, T where - 1 l r and (l, r is called the receptive field, - σ R is called the spread, - S R r l+1 is finite and called the spatial set, 2 These names (spatial pattern, temporal group, coincidence are due to how this information is learned during training, and while this is not relevant in this presentation, we choose to use the original names for the sake of consistency. 2
3 - T = {T 1,..., T p } is a partitioning of S, and the partitions are are called temporal groups. We may indicate the receptive field of a node by writing G l,r. We overload the notation and allow interpreting G as a function, so for any x R d where d r we let where G(x = (z 1 (x[l, r],..., z p (x[l, r], z i (u = max e u c 2. Definition 2. A network with( receptive field (l, r is inductively defined as a gaussian node G l,r, or as H = Hr 1 1,r 2,..., Hr n 2n 1,r 2n, C, T where - H 1 r 1,r 2,..., H n r 2n 1,r 2n are networks such that l = r 1 <... < r 2n = r and r 2 r 3 =... = r 2n 1 r 2n 2 = 1, - C {(c 1,..., c n 1 c i d i } is called the coincidence set, where d i is the number of temporal groups in H i, - T = {T 1,..., T p } is a partitioning of C, and the partitions are are called temporal groups. As before we overload the notation and allow interpreting H as a function, so for any x R d where d r where H(x = (z 1 (x,..., z p (x, z i (x = max n H j (x[c[j]]. We first consider a gaussian node by itself. Recall that each spatial pattern is a point in the input space, and that groups of these points constitute temporal groups. Given two spatial patterns a, b R n and some input x R n it is easily observable that the score for a dominates the score for b as long as x is closer to a then b. You may observe e x a 2 = e x b 2 x a 2 x b 2 = x a = x b, hence the score boundary, where they have equal score, is the hyperplane of inputs equidistant to a and b. This is the case for any pair of spatial patterns, so if we fix some node spatial pattern a and consider all other node patterns b 1,..., b n, we see that the region of the input space where a has the greatest score among all patterns is the convex polytype 3 arising from the intersection of all score boundary hyperplanes 3 A polytype is a n-dimensional generalization of the two dimensional polygon, and a convex polytype is one which defines a convex set. They are often defined as intersections of a finite number of half-spaces, which again are characterized by hyperplanes. 3
4 between a and each of b 1,..., b n. More generally, this means that the temporal group {a 1,..., a m } dominates all other temporal groups exactly for all points in residing in some convex polytype around some a i. Moreover we see that the input space of any gaussian node is covered by a finite number of convex polytypes. We emphasize that it is covered and not partitioned, since the polytypes overlap exactly at the score boundaries. A convenient way of representing a convex polytypes is by keeping the coefficients of the bounding hyperplanes in matrix form, this allows us to efficiently represent each temporal group as a finite number of such matrices. Definition 3. A cover {T 1,..., T p } for R n is said to be a finite union of convex polytypes when for each i = 1,..., p there exist m n matrices A i 1,..., A i k i and vectors b i 1,..., b i k i R n such that x T i A i jx b i j for some j = 1,..., k i. In general, both for a full network and a single gaussian node, it is natural to consider how the network classifies the input space, that is how it gives category labels to diffrent input points. As we already know, the score boundaries for diffrent categories will in the case of gaussian nodes have each point in the input space with some fixed label set. In practice, when some point x is assigned more then one label, the network has given the same maximum score to two or more categories. Definition 4. For any n, m > 0, a function f : R n P({l 1,..., l m } is called a classification in dimension n of size m when for each l i there exists an x R n such that l i f(x. Given classification f, we define R n l i = {x R n l i f(x}, and R n f = {Rn l 1,..., R n l m }, which is a cover for R n. For any network H l,r with n temporal groups we define the classification f H : R r l+1 P({l 1,..., l n } by { } f H (x = l i i argmax {H(x[j]} 1 j n. Now we can precisely summarize our recent discussion in the form of the next theorem. Theorem 5. For any gaussian node G 1,n, R n f G is a finite unions of convex polytypes. Proof. We demonstrate this by the providing the matrices that characterice each partition. So let G 1,n = (l, r, σ, S, {T 1,..., T p } and let S = {s 1,..., s m } and T i = {s i 1,..., s i k i } for any 1 i p. For all i = 1,..., p and j = 1,..., k i let and A i j = s i j [1] s 1[1] s i j [n] s 1[n]..... s i j [1] s m[1] s i j [n] s m[n] n b i j = 1 (s 1[v] 2 ( s i j [v] 2 2. n (s m[v] 2 (. s i j [v] 2, 4
5 Now fix any 1 i p and observe that for any x = (x 1,..., x n R n x R n l i l i f G (x R n l i def. i argmax {G(x[u]} 1 u p u (G(x[i] G(x[u] x c 2 x c 2 u max e max c T u e j c e x si j 2 c 2 x e j c ( x s i j 2 x c 2 ( n j c (x[v] s i j[v] 2 (x[v] c[v] 2 ( n j c x[v] 2 2x[v]s i j[v] + s i j[v] 2 ( n j c 2x[v]s i j[v] + s i j[v] 2 ( n j c 2x[v](s i j[v] c[v] ( n j c x[v](s i j[v] c[v] 1 2 A i jx b i j for some j = 1,..., k i. x[v] 2 2x[v]c[v] + c[v] 2 2x[v]c[v] + c[v] 2 c[v] 2 s i j[v] 2 c[v] 2 s i j[v] 2 f G def. c S This proves that the cover R n f G is a finite union of convex polytypes. With our understanding of the classification in a gaussian node we can try to relate this to to a more complicated network, so consider two level network with one top node and two gaussian children, both with the same spread σ. Let c T i be some coincidence in the top node which references temporal groups T L, T R in the left and right child nodes respectively. Lets say that for some particular network input x, the score of T L, T R is the score of a T L, b T R respectively, which also means that the score of c is the score of score of a times the score of b. In general, for any input x there exists á T L, b T R as mentioned above, and observe then that the score of c is e x[l, r 1] á 2 e x[r 2, r] b 2 = e x[l, r 1] á 2 + x[r 2, r] b 2 2 á b x = e 5
6 We recognize the last term as the score of the spatial pattern á b in a gaussian node with spread σ. We also see that the score of c will always be this term by way of the combination of á T L, b T R which yield the maximum score for a particular x. This suggests a method for reducing a two layer network to a single node, and by repeatedly doing this from below we can reduce any network to a single gaussian node. Theorem 6. For any network H l,r where all gaussian nodes have the same spread σ, there exists a gaussian node G l,r with spread σ such that for any x R d where d r l + 1 we have H(x = G(x. Proof. We prove this by induction on the structure of H. The base case is trivial, since then H itself would be a gaussian node. Now let H = (Hr 1 1,r 2,..., Hr n 2n 1,r 2n, C, T. For each H i let G i be a gaussian node with spread σ such that H i (x = G i (x for all x R d. Such a gaussian node must exist for each i since if H i itself is a gaussian node, it has spread σ by assumption, if it is not a gaussian node we apply the induction hypothesis to it and obtain the necessary gaussian node. We can now, based on H, G 1,..., G n, construct the desired gaussian node G. Let T j i and T i denote the i th temporal group of G j and the i th temporal group of H respectively. For each c C let { } S c = t 1... t n t 1 Tc[1] 1,..., t n Tc[n] n, and for each i = 1,..., n let letting us define our gaussian node ( G = β i = l, r, c C S c, S c, {β i } i n. To confirm that H and G agree, we consult the their i th component and estab- 6
7 lish equality. So for any 1 i n observe that n H(x[i] = max H j (x[c[j]] n = max G j (x[c[j]] n x[r 2j 1, r 2j ] c 2 = max max c T j c[j] e n max e x[r 2j 1, r 2j ] c j 2 = max = max = max = max = max u β i c 1 T 1 c[1],...,cn T n c[n] x[r 2j 1, r 2j ] c j 2 max e c 1 T 1 c[1],...,c n T n c[n] 1 max c 1 T 1 c[1],...,cn T n e x c 1... c n 2 c[n] 1 max u S c e x u 2 1 e x u 2 H def. H i (x = G i (x. G i def. ( ( = G(x[i] G def. For ( observe that taking the product of a sequence of function maximums is the same as taking the maximum over all ordered products of the functions, that is max i 1 {F 1 (i 1 }... max {F n (i n } = max {F 1 (i 1... F n (i n }. i n i 1,...,i n 7
8 For ( observe that e 1 x[r 2j 1, r 2j ] c j 2 = e 1 = e = e = e r n r 2j r 2j 1+1 l=1 r 2j (x[r 2j 1, r 2j ][l] c j [l] 2 l=r 2j 1 (x[l] c j [l r 2j 1 + 1] 2 r 2j l=r 2j 1 (x[l] c 1... c n [l] 2 l=r 1 (x[l] c 1... c n [l] 2 = e 1 x c 1... c n 2 Observe that the process suggested in the previous theorem can also be performed in reverse, giving a means of decomposing a single node into a network. The practical feasibility of such a reduction is dependent on the increase in the number of spatial patterns, since they must be saved in memory and iterated when computing the score. Therefor we give a very loose upper bound on the size of G H, the node built in previous theorem, by way of an inductive function counting the number of spatial patterns in a network, so let Spatial(l, r, σ, S, T = S, Spatial ( H 1,..., H n, C, T = C + i Spatial(Hi. Theorem 7. For any network H we have that Spatial(G H 2 Spatial(H Proof. We prove this by induction on the structure of H. trivial, since then H = G H. The base case is 8
9 For the induction step we have Spatial(G H = S c G H def. c C S c c C = c C Tc[1] 1 n... Tc[n] Sc def. c C Spatial(G H 1... Spatial(G H n ( c C 2 Spatial(H Spatial(Hn (IH = c C 2 Spatial(H Spatial(H n = C 2 Spatial(H Spatial(H n = 2 lg C + Spatial(H Spatial(H n 2 Spatial(H For ( recall that the temporal groups of a node partitions its spatial pattern set, so obviously T i c[i] Spatial(G H i. It is well known that a single linear threshold unit, or perceptron, can solve linearly separable classification tasks. This means we can use a collection of such units to capture the convex polytype surrounding each spatial pattern in a gaussian node, by having one perceptron for each hyperplane supporting the polytype. Further, we can combine many such collections of units to determine whether input is in at least one of the polytypes in a temporal group, and thereby deciding if that temporal group is the winning group for that particular input. This can be done for each temporal group, giving us a simple feed forward network of perceptrons which fully captures the classification the node induces on the input space. This can also be done for full networks via Theorem 6. Theorem 8. For any finite union of convex polytypes cover {T 1,..., T p } for R n there exists 3 layer feed-forward neural network classifying it. More over, such a network exists for any network H 1,n where all gaussian nodes have the same spread. Proof. We provide a neural network as described which takes any x R n and has p outputs, one for each element T i in the cover. For any x R n the network will spike on output i if and only if x T i. We use simple neurons with a threshold activation function and an input bias. We construct one network N i for each T i which spikes if and only if x T i, and each such network is one of the p outputs of the total network. Fix some i and recall that, by assumption on the cover, there exist m n matrices A i 1,..., A i k and vectors b i 1,..., b i k Rn such that x T i A i jx b i j for some j = 1,..., k. 9
10 So we may decide x T i by examining A i j x bi j for at most all 1 j k, and therefor we build a neural network A j for each j which does exactly this. Observe that testing the condition amounts to testing on which side of m hyperplanes, one for each row in A i j combined with corresponding cell from bi j, x is. This is easily achieved by having m neurons V i,j 1,..., Vi,j m, each deciding x for one hyperplane. The network A i j is then simply an AND-gate which spikes when all V i,j 1,..., Vi,j m spike. Finally the network N i is then simply an OR-gate which spikes when at least one of A 1,..., A k spike. We now have a 3 layer feed-forward network where layer three is p OR nodes, layer two is AND nodes and layer one is a layer with hyperplane deciders. 3 Discussion As we alluded to in the abstract, the flavor of networks considered here by no means represent the full range of networks that one can have in the HTM model. The model has matured significantly, although the fundamental concepts of hierarchical temporal and spatial pooling remains the same. The two primary alternative forms are letting the score of a temporal group be a weighted sum of pattern/coincidence scores instead of the maximum, letting the score of a coincidence pattern be a weighted sum child temporal group scores instead of a product. More exotic features include mixing node types, overlapping receptive fields and level skipping, these are however much less used in practice. A glaring limitation in our results is that we require all gaussian nodes to share the same spread, this is however not such a severe restriction as it may seem. In principle, if all gaussian nodes are receiving input from the same domain, for example as bottom nodes in a vision network, we expect that they should learn the same low level spatial patterns (e.g. horizontal, vertical, diagonal lines and shapes and thus having the same spread is reasonable. At present the spread is a handtuned parameter which cannot be learned, so it is therefor often left the same across all nodes. Additionally, a common technique in training such networks is what is called node cloning, which means that you only train a single node pr. network level, and then just replicate that node when you do classification. Lastly, given that the upperbound in Theorem 7 cannot be significantly improved, then this can be seen as evidence that the hierarchical organization of this and other models is a nontrivial construct in terms of computational feasibility. This may also, as mentioned, mean that large gaussian nodes may be decomposed to increase classification speed. References [1] G. Wallis and E. T. Rolls. A model of invariant object recognition in the visual system. Prog. Neurobiol., 51: , [2] M. Riesenhuber and T. Poggio. Hierarchical models of object recognition in cortex. Nat. Neurosci., 2: ,
11 [3] K. Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cyb., 36: , [4] T.S. Lee and D. Mumford. Hierarchical bayesian inference in the visual cortex. J Opt Soc Am A Opt Image Sci Vis, 20(7: , July [5] George D, Hawkins J, 2009 Towards a Mathematical Theory of Cortical Microcircuits. PLoS Comput Biol 5(10: e doi: /journal.pcbi [6] D.H. Hubel and T.N. Wiesel. Receptive fields and functional architecture of monkey striate cortex. J. Phys., 195: , [7] George D. HOW THE BRAIN MIGHT WORK: A hierarchical and temporal model for learning and recognition. PhD Thesis Stanford, June 2008: [8] Numenta Inc. Zeta1 Algorithms Reference Version 1.5. Available upon request. [9] Thomas Dean. A computational model of the cerebral cortex. In Proceedings of AAAI-05, pages , Cambridge, Massachusetts, MIT Press. 11
ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD
ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided
More informationNeural Networks: Introduction
Neural Networks: Introduction Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others 1
More informationÂngelo Cardoso 27 May, Symbolic and Sub-Symbolic Learning Course Instituto Superior Técnico
BIOLOGICALLY INSPIRED COMPUTER MODELS FOR VISUAL RECOGNITION Ângelo Cardoso 27 May, 2010 Symbolic and Sub-Symbolic Learning Course Instituto Superior Técnico Index Human Vision Retinal Ganglion Cells Simple
More informationFundamentals of Computational Neuroscience 2e
Fundamentals of Computational Neuroscience 2e January 1, 2010 Chapter 10: The cognitive brain Hierarchical maps and attentive vision A. Ventral visual pathway B. Layered cortical maps Receptive field size
More informationARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92
ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000
More information(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann
(Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for
More informationCS 4700: Foundations of Artificial Intelligence
CS 4700: Foundations of Artificial Intelligence Prof. Bart Selman selman@cs.cornell.edu Machine Learning: Neural Networks R&N 18.7 Intro & perceptron learning 1 2 Neuron: How the brain works # neurons
More informationDiscriminative Direction for Kernel Classifiers
Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering
More informationBoosting & Deep Learning
Boosting & Deep Learning Ensemble Learning n So far learning methods that learn a single hypothesis, chosen form a hypothesis space that is used to make predictions n Ensemble learning à select a collection
More informationRegML 2018 Class 8 Deep learning
RegML 2018 Class 8 Deep learning Lorenzo Rosasco UNIGE-MIT-IIT June 18, 2018 Supervised vs unsupervised learning? So far we have been thinking of learning schemes made in two steps f(x) = w, Φ(x) F, x
More informationLearning and Memory in Neural Networks
Learning and Memory in Neural Networks Guy Billings, Neuroinformatics Doctoral Training Centre, The School of Informatics, The University of Edinburgh, UK. Neural networks consist of computational units
More informationVote. Vote on timing for night section: Option 1 (what we have now) Option 2. Lecture, 6:10-7:50 25 minute dinner break Tutorial, 8:15-9
Vote Vote on timing for night section: Option 1 (what we have now) Lecture, 6:10-7:50 25 minute dinner break Tutorial, 8:15-9 Option 2 Lecture, 6:10-7 10 minute break Lecture, 7:10-8 10 minute break Tutorial,
More informationArtificial Neural Networks
Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples
More informationA summary of Deep Learning without Poor Local Minima
A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given
More informationChristian Mohr
Christian Mohr 20.12.2011 Recurrent Networks Networks in which units may have connections to units in the same or preceding layers Also connections to the unit itself possible Already covered: Hopfield
More informationIntroduction to Neural Networks
Introduction to Neural Networks What are (Artificial) Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationCS:4420 Artificial Intelligence
CS:4420 Artificial Intelligence Spring 2018 Neural Networks Cesare Tinelli The University of Iowa Copyright 2004 18, Cesare Tinelli and Stuart Russell a a These notes were originally developed by Stuart
More informationMulticlass Classification-1
CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass
More informationDeep learning in the visual cortex
Deep learning in the visual cortex Thomas Serre Brown University. Fundamentals of primate vision. Computational mechanisms of rapid recognition and feedforward processing. Beyond feedforward processing:
More informationLecture 4: Perceptrons and Multilayer Perceptrons
Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons
More informationWhat and where: A Bayesian inference theory of attention
What and where: A Bayesian inference theory of attention Sharat Chikkerur, Thomas Serre, Cheston Tan & Tomaso Poggio CBCL, McGovern Institute for Brain Research, MIT Preliminaries Outline Perception &
More informationSimple Neural Nets For Pattern Classification
CHAPTER 2 Simple Neural Nets For Pattern Classification Neural Networks General Discussion One of the simplest tasks that neural nets can be trained to perform is pattern classification. In pattern classification
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationSupervised Learning. George Konidaris
Supervised Learning George Konidaris gdk@cs.brown.edu Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom Mitchell,
More informationSections 18.6 and 18.7 Artificial Neural Networks
Sections 18.6 and 18.7 Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline The brain vs. artifical neural
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationNeural Networks Introduction CIS 32
Neural Networks Introduction CIS 32 Functionalia Office Hours (Last Change!) - Location Moved to 0317 N (Bridges Room) Today: Alpha-Beta Example Neural Networks Learning with T-R Agent (from before) direction
More informationArtificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen
Artificial Neural Networks Introduction to Computational Neuroscience Tambet Matiisen 2.04.2018 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationLecture 6. Notes on Linear Algebra. Perceptron
Lecture 6. Notes on Linear Algebra. Perceptron COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture Notes on linear algebra Vectors
More informationIntelligent Systems Discriminative Learning, Neural Networks
Intelligent Systems Discriminative Learning, Neural Networks Carsten Rother, Dmitrij Schlesinger WS2014/2015, Outline 1. Discriminative learning 2. Neurons and linear classifiers: 1) Perceptron-Algorithm
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)
More information18.6 Regression and Classification with Linear Models
18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight
More informationDecision Trees (Cont.)
Decision Trees (Cont.) R&N Chapter 18.2,18.3 Side example with discrete (categorical) attributes: Predicting age (3 values: less than 30, 30-45, more than 45 yrs old) from census data. Attributes (split
More informationMachine Learning Linear Models
Machine Learning Linear Models Outline II - Linear Models 1. Linear Regression (a) Linear regression: History (b) Linear regression with Least Squares (c) Matrix representation and Normal Equation Method
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationSections 18.6 and 18.7 Analysis of Artificial Neural Networks
Sections 18.6 and 18.7 Analysis of Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline Univariate regression
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification
More informationArtificial Intelligence
Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement
More informationHierarchy. Will Penny. 24th March Hierarchy. Will Penny. Linear Models. Convergence. Nonlinear Models. References
24th March 2011 Update Hierarchical Model Rao and Ballard (1999) presented a hierarchical model of visual cortex to show how classical and extra-classical Receptive Field (RF) effects could be explained
More informationSections 18.6 and 18.7 Artificial Neural Networks
Sections 18.6 and 18.7 Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline The brain vs artifical neural networks
More informationArtificial neural networks
Artificial neural networks Chapter 8, Section 7 Artificial Intelligence, spring 203, Peter Ljunglöf; based on AIMA Slides c Stuart Russel and Peter Norvig, 2004 Chapter 8, Section 7 Outline Brains Neural
More informationNeural networks. Chapter 20. Chapter 20 1
Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms
More informationSGD and Deep Learning
SGD and Deep Learning Subgradients Lets make the gradient cheating more formal. Recall that the gradient is the slope of the tangent. f(w 1 )+rf(w 1 ) (w w 1 ) Non differentiable case? w 1 Subgradients
More informationMethods for Estimating the Computational Power and Generalization Capability of Neural Microcircuits
Methods for Estimating the Computational Power and Generalization Capability of Neural Microcircuits Wolfgang Maass, Robert Legenstein, Nils Bertschinger Institute for Theoretical Computer Science Technische
More informationMachine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler
+ Machine Learning and Data Mining Multi-layer Perceptrons & Neural Networks: Basics Prof. Alexander Ihler Linear Classifiers (Perceptrons) Linear Classifiers a linear classifier is a mapping which partitions
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,
More informationHidden Markov Models Part 1: Introduction
Hidden Markov Models Part 1: Introduction CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Modeling Sequential Data Suppose that
More informationIntroduction: The Perceptron
Introduction: The Perceptron Haim Sompolinsy, MIT October 4, 203 Perceptron Architecture The simplest type of perceptron has a single layer of weights connecting the inputs and output. Formally, the perceptron
More information) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2.
1 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.034 Artificial Intelligence, Fall 2011 Recitation 8, November 3 Corrected Version & (most) solutions
More informationThe error-backpropagation algorithm is one of the most important and widely used (and some would say wildly used) learning techniques for neural
1 2 The error-backpropagation algorithm is one of the most important and widely used (and some would say wildly used) learning techniques for neural networks. First we will look at the algorithm itself
More informationLast update: October 26, Neural networks. CMSC 421: Section Dana Nau
Last update: October 26, 207 Neural networks CMSC 42: Section 8.7 Dana Nau Outline Applications of neural networks Brains Neural network units Perceptrons Multilayer perceptrons 2 Example Applications
More informationQualifying Exam in Machine Learning
Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts
More informationPattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore
Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal
More informationLecture 4: Feed Forward Neural Networks
Lecture 4: Feed Forward Neural Networks Dr. Roman V Belavkin Middlesex University BIS4435 Biological neurons and the brain A Model of A Single Neuron Neurons as data-driven models Neural Networks Training
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationNeural networks. Chapter 20, Section 5 1
Neural networks Chapter 20, Section 5 Chapter 20, Section 5 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 20, Section 5 2 Brains 0 neurons of
More informationCMSC 421: Neural Computation. Applications of Neural Networks
CMSC 42: Neural Computation definition synonyms neural networks artificial neural networks neural modeling connectionist models parallel distributed processing AI perspective Applications of Neural Networks
More informationChapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang
Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check
More informationNeural networks. Chapter 19, Sections 1 5 1
Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10
More informationUnit 8: Introduction to neural networks. Perceptrons
Unit 8: Introduction to neural networks. Perceptrons D. Balbontín Noval F. J. Martín Mateos J. L. Ruiz Reina A. Riscos Núñez Departamento de Ciencias de la Computación e Inteligencia Artificial Universidad
More informationArtificial Neural Network and Fuzzy Logic
Artificial Neural Network and Fuzzy Logic 1 Syllabus 2 Syllabus 3 Books 1. Artificial Neural Networks by B. Yagnanarayan, PHI - (Cover Topologies part of unit 1 and All part of Unit 2) 2. Neural Networks
More informationName (NetID): (1 Point)
CS446: Machine Learning Fall 2016 October 25 th, 2016 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains four
More informationCSC Neural Networks. Perceptron Learning Rule
CSC 302 1.5 Neural Networks Perceptron Learning Rule 1 Objectives Determining the weight matrix and bias for perceptron networks with many inputs. Explaining what a learning rule is. Developing the perceptron
More informationMachine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /
Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical
More informationECE521 Lectures 9 Fully Connected Neural Networks
ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance
More informationBACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation
BACKPROPAGATION Neural network training optimization problem min J(w) w The application of gradient descent to this problem is called backpropagation. Backpropagation is gradient descent applied to J(w)
More informationThe XOR problem. Machine learning for vision. The XOR problem. The XOR problem. x 1 x 2. x 2. x 1. Fall Roland Memisevic
The XOR problem Fall 2013 x 2 Lecture 9, February 25, 2015 x 1 The XOR problem The XOR problem x 1 x 2 x 2 x 1 (picture adapted from Bishop 2006) It s the features, stupid It s the features, stupid The
More informationIntroduction to Machine Learning (67577) Lecture 3
Introduction to Machine Learning (67577) Lecture 3 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem General Learning Model and Bias-Complexity tradeoff Shai Shalev-Shwartz
More informationIntroduction Biologically Motivated Crude Model Backpropagation
Introduction Biologically Motivated Crude Model Backpropagation 1 McCulloch-Pitts Neurons In 1943 Warren S. McCulloch, a neuroscientist, and Walter Pitts, a logician, published A logical calculus of the
More informationLecture Support Vector Machine (SVM) Classifiers
Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in
More informationArtificial Neural Networks. Part 2
Artificial Neural Netorks Part Artificial Neuron Model Folloing simplified model of real neurons is also knon as a Threshold Logic Unit x McCullouch-Pitts neuron (943) x x n n Body of neuron f out Biological
More informationDeep Feedforward Networks. Sargur N. Srihari
Deep Feedforward Networks Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation
More informationCSC242: Intro to AI. Lecture 21
CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationNeural Networks for Machine Learning. Lecture 2a An overview of the main types of neural network architecture
Neural Networks for Machine Learning Lecture 2a An overview of the main types of neural network architecture Geoffrey Hinton with Nitish Srivastava Kevin Swersky Feed-forward neural networks These are
More informationLinear Classifiers: Expressiveness
Linear Classifiers: Expressiveness Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Lecture outline Linear classifiers: Introduction What functions do linear classifiers express?
More informationAn Analytical Comparison between Bayes Point Machines and Support Vector Machines
An Analytical Comparison between Bayes Point Machines and Support Vector Machines Ashish Kapoor Massachusetts Institute of Technology Cambridge, MA 02139 kapoor@mit.edu Abstract This paper analyzes the
More informationVisual Motion Analysis by a Neural Network
Visual Motion Analysis by a Neural Network Kansai University Takatsuki, Osaka 569 1095, Japan E-mail: fukushima@m.ieice.org (Submitted on December 12, 2006) Abstract In the visual systems of mammals, visual
More informationDeep Learning. Convolutional Neural Network (CNNs) Ali Ghodsi. October 30, Slides are partially based on Book in preparation, Deep Learning
Convolutional Neural Network (CNNs) University of Waterloo October 30, 2015 Slides are partially based on Book in preparation, by Bengio, Goodfellow, and Aaron Courville, 2015 Convolutional Networks Convolutional
More informationHow to make computers work like the brain
How to make computers work like the brain (without really solving the brain) Dileep George a single special machine can be made to do the work of all. It could in fact be made to work as a model of any
More informationDeep Feedforward Networks
Deep Feedforward Networks Yongjin Park 1 Goal of Feedforward Networks Deep Feedforward Networks are also called as Feedforward neural networks or Multilayer Perceptrons Their Goal: approximate some function
More informationLinear discriminant functions
Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative
More informationHierarchical Boosting and Filter Generation
January 29, 2007 Plan Combining Classifiers Boosting Neural Network Structure of AdaBoost Image processing Hierarchical Boosting Hierarchical Structure Filters Combining Classifiers Combining Classifiers
More informationLecture 8. Instructor: Haipeng Luo
Lecture 8 Instructor: Haipeng Luo Boosting and AdaBoost In this lecture we discuss the connection between boosting and online learning. Boosting is not only one of the most fundamental theories in machine
More informationMachine Learning, Fall 2009: Midterm
10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all
More informationINTRODUCTION TO DATA SCIENCE
INTRODUCTION TO DATA SCIENCE JOHN P DICKERSON Lecture #13 3/9/2017 CMSC320 Tuesdays & Thursdays 3:30pm 4:45pm ANNOUNCEMENTS Mini-Project #1 is due Saturday night (3/11): Seems like people are able to do
More informationNeural Networks and the Back-propagation Algorithm
Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely
More informationShort Term Memory and Pattern Matching with Simple Echo State Networks
Short Term Memory and Pattern Matching with Simple Echo State Networks Georg Fette (fette@in.tum.de), Julian Eggert (julian.eggert@honda-ri.de) Technische Universität München; Boltzmannstr. 3, 85748 Garching/München,
More informationTHE MOST IMPORTANT BIT
NEURAL NETWORKS THE MOST IMPORTANT BIT A neural network represents a function f : R d R d 2. Peter Orbanz Applied Data Mining 262 BUILDING BLOCKS Units The basic building block is a node or unit: φ The
More informationNeural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21
Neural Networks Chapter 8, Section 7 TB Artificial Intelligence Slides from AIMA http://aima.cs.berkeley.edu / 2 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear
More informationInstance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest
More informationConvolutional neural networks
11-1: Convolutional neural networks Prof. J.C. Kao, UCLA Convolutional neural networks Motivation Biological inspiration Convolution operation Convolutional layer Padding and stride CNN architecture 11-2:
More informationCS 4700: Foundations of Artificial Intelligence
CS 4700: Foundations of Artificial Intelligence Prof. Bart Selman selman@cs.cornell.edu Machine Learning: Neural Networks R&N 18.7 Intro & perceptron learning 1 2 Neuron: How the brain works # neurons
More informationNatural Image Statistics
Natural Image Statistics A probabilistic approach to modelling early visual processing in the cortex Dept of Computer Science Early visual processing LGN V1 retina From the eye to the primary visual cortex
More information