On The Equivalence of Hierarchical Temporal Memory and Neural Nets

Size: px
Start display at page:

Download "On The Equivalence of Hierarchical Temporal Memory and Neural Nets"

Transcription

1 On The Equivalence of Hierarchical Temporal Memory and Neural Nets Bedeho Mesghina Wolde Mender December 7, 2009 Abstract In this paper we present a rigorous definition of classification in a common family of Hierarchical Temporal Memory networks, and provide a formal analysis of their classification power. We determine the classification power of a single unit, and subsequently demonstrate how to reduce a network of arbitrary topology and size to such a single unit. This reduction is shown to require no worse then an exponential increase in memory, and we finally we show how such a network, via a reduction to a single unit, is equivalent to a simple 3 layer feed forward network of threshold units. 1 Introduction With our maturing understanding of cortical computation in mammals, particularly in the visual cortex, there has been developed a number of biologically inspired vision systems and models of cortex, e.g. K. Fukushima [3], M. Riesenhuber and T. Poggio [2], T.S. Lee and D. Mumford [4], G. Wallis and E. T. Rolls [1], T. Dean [9] and finally D. George and J. Hawkins [5]. These models generally mimic the hierarchical organization and the alternating simple and complex cell structure of the visual cortex [6]. In the context of machine learning, this family of models may be called hierarchical classifiers, and it is from this perspective they are studied in this paper. We consider the model of D. George and J. Hawkins, called Hierarchical Temporal Memory (HTM, first outlined in the 2004 book OnIntelligence by J. Hawkins as a biological model of cortex. We chose this model, not only because of the significant empirical performance it has demonstrated [5] or because it subsumes similar models [7], but also because it is being developed actively and enjoys a growing comunity of developers using a general purpose computing platform based on it 1. The goal of this paper is to demonstrate the usefullness of formal analysis of biologically inspired models of cortex. It is also the opinion of the author of this paper that if such models aspire to be not only devices of utility but also bmmender@ifi.uio.no, Institute of Informatics, University of Oslo, Norway. 1 Numenta Inc. ( is a private company, founded by D. George and J. Hawkins, developing a general purpose computing platform, called NuPiC, based on the HTM modell. 1

2 accurate descriptions of cortical computation, then foundational insight into their computational mechanics is a necessary step towards a full understanding of the cortex. 2 Hierarchical Temporal Memory In this paper we consider fully trained HTM networks ready for classification tasks, for more information on how learning, prediction and other features work, please consult [7, 5, 8]. A HTM network is a tree of nodes where the input is fed into the leaf nodes and the result is outputted from the top node. The leaf nodes are given nonoverlapping sections of the total network input, and the internal nodes are given input from their children. A leaf node has a finite list of vectors from its input space, called spatial patterns, and these are partitioned into groups called temporal groups. 2 Given some input x, the output of the node is a vector with a score for each of its temporal groups, where the score of a group is the maximum score among all spatial patterns in that group. The score of each individual spatial pattern is a simple matching score between the pattern and the input x. An internal node also has in finite list of vectors called coincidence vectors. These are as long as the number of children of the node, and each position in a coincidence vector references a temporal group in the corresponding child node. So for example if a node has a coincidence vector (2, 4, 6, then the node has three children and this specific coincidence is referencing the second, fourth and sixth temporal group in the children respectively. As in the leaf nodes, these are partitioned in to temporal groups and the score of each group is the maximum score among all coincidence vectors in that group. The score of each coincidence is calculated by taking the product of the scores of the temporal groups in the children referenced by the coincidence. With this in mind we give a formal definition, but first some preliminaries. Let x = (x 1,..., x n R n and y = (y 1,..., y m R m. Vector concatenation of y onto x is denoted x y, that is x y = (x 1,..., x n, y 1,..., y m. We may project out a contiguous section, or just one position, of a vector by writing x[i, j] = (x i,..., x j or x[i] = x i respectively. We use the standard norm, that is x 2 = n i=1 (x i 2. Finally, a cover for some set T is a family of sets {T 1,..., T p } such that T = i T i, and this cover is called a partitioning if T i T j = for i j. Definition 1. A gaussian node is defined as G = (l, r, σ, S, T where - 1 l r and (l, r is called the receptive field, - σ R is called the spread, - S R r l+1 is finite and called the spatial set, 2 These names (spatial pattern, temporal group, coincidence are due to how this information is learned during training, and while this is not relevant in this presentation, we choose to use the original names for the sake of consistency. 2

3 - T = {T 1,..., T p } is a partitioning of S, and the partitions are are called temporal groups. We may indicate the receptive field of a node by writing G l,r. We overload the notation and allow interpreting G as a function, so for any x R d where d r we let where G(x = (z 1 (x[l, r],..., z p (x[l, r], z i (u = max e u c 2. Definition 2. A network with( receptive field (l, r is inductively defined as a gaussian node G l,r, or as H = Hr 1 1,r 2,..., Hr n 2n 1,r 2n, C, T where - H 1 r 1,r 2,..., H n r 2n 1,r 2n are networks such that l = r 1 <... < r 2n = r and r 2 r 3 =... = r 2n 1 r 2n 2 = 1, - C {(c 1,..., c n 1 c i d i } is called the coincidence set, where d i is the number of temporal groups in H i, - T = {T 1,..., T p } is a partitioning of C, and the partitions are are called temporal groups. As before we overload the notation and allow interpreting H as a function, so for any x R d where d r where H(x = (z 1 (x,..., z p (x, z i (x = max n H j (x[c[j]]. We first consider a gaussian node by itself. Recall that each spatial pattern is a point in the input space, and that groups of these points constitute temporal groups. Given two spatial patterns a, b R n and some input x R n it is easily observable that the score for a dominates the score for b as long as x is closer to a then b. You may observe e x a 2 = e x b 2 x a 2 x b 2 = x a = x b, hence the score boundary, where they have equal score, is the hyperplane of inputs equidistant to a and b. This is the case for any pair of spatial patterns, so if we fix some node spatial pattern a and consider all other node patterns b 1,..., b n, we see that the region of the input space where a has the greatest score among all patterns is the convex polytype 3 arising from the intersection of all score boundary hyperplanes 3 A polytype is a n-dimensional generalization of the two dimensional polygon, and a convex polytype is one which defines a convex set. They are often defined as intersections of a finite number of half-spaces, which again are characterized by hyperplanes. 3

4 between a and each of b 1,..., b n. More generally, this means that the temporal group {a 1,..., a m } dominates all other temporal groups exactly for all points in residing in some convex polytype around some a i. Moreover we see that the input space of any gaussian node is covered by a finite number of convex polytypes. We emphasize that it is covered and not partitioned, since the polytypes overlap exactly at the score boundaries. A convenient way of representing a convex polytypes is by keeping the coefficients of the bounding hyperplanes in matrix form, this allows us to efficiently represent each temporal group as a finite number of such matrices. Definition 3. A cover {T 1,..., T p } for R n is said to be a finite union of convex polytypes when for each i = 1,..., p there exist m n matrices A i 1,..., A i k i and vectors b i 1,..., b i k i R n such that x T i A i jx b i j for some j = 1,..., k i. In general, both for a full network and a single gaussian node, it is natural to consider how the network classifies the input space, that is how it gives category labels to diffrent input points. As we already know, the score boundaries for diffrent categories will in the case of gaussian nodes have each point in the input space with some fixed label set. In practice, when some point x is assigned more then one label, the network has given the same maximum score to two or more categories. Definition 4. For any n, m > 0, a function f : R n P({l 1,..., l m } is called a classification in dimension n of size m when for each l i there exists an x R n such that l i f(x. Given classification f, we define R n l i = {x R n l i f(x}, and R n f = {Rn l 1,..., R n l m }, which is a cover for R n. For any network H l,r with n temporal groups we define the classification f H : R r l+1 P({l 1,..., l n } by { } f H (x = l i i argmax {H(x[j]} 1 j n. Now we can precisely summarize our recent discussion in the form of the next theorem. Theorem 5. For any gaussian node G 1,n, R n f G is a finite unions of convex polytypes. Proof. We demonstrate this by the providing the matrices that characterice each partition. So let G 1,n = (l, r, σ, S, {T 1,..., T p } and let S = {s 1,..., s m } and T i = {s i 1,..., s i k i } for any 1 i p. For all i = 1,..., p and j = 1,..., k i let and A i j = s i j [1] s 1[1] s i j [n] s 1[n]..... s i j [1] s m[1] s i j [n] s m[n] n b i j = 1 (s 1[v] 2 ( s i j [v] 2 2. n (s m[v] 2 (. s i j [v] 2, 4

5 Now fix any 1 i p and observe that for any x = (x 1,..., x n R n x R n l i l i f G (x R n l i def. i argmax {G(x[u]} 1 u p u (G(x[i] G(x[u] x c 2 x c 2 u max e max c T u e j c e x si j 2 c 2 x e j c ( x s i j 2 x c 2 ( n j c (x[v] s i j[v] 2 (x[v] c[v] 2 ( n j c x[v] 2 2x[v]s i j[v] + s i j[v] 2 ( n j c 2x[v]s i j[v] + s i j[v] 2 ( n j c 2x[v](s i j[v] c[v] ( n j c x[v](s i j[v] c[v] 1 2 A i jx b i j for some j = 1,..., k i. x[v] 2 2x[v]c[v] + c[v] 2 2x[v]c[v] + c[v] 2 c[v] 2 s i j[v] 2 c[v] 2 s i j[v] 2 f G def. c S This proves that the cover R n f G is a finite union of convex polytypes. With our understanding of the classification in a gaussian node we can try to relate this to to a more complicated network, so consider two level network with one top node and two gaussian children, both with the same spread σ. Let c T i be some coincidence in the top node which references temporal groups T L, T R in the left and right child nodes respectively. Lets say that for some particular network input x, the score of T L, T R is the score of a T L, b T R respectively, which also means that the score of c is the score of score of a times the score of b. In general, for any input x there exists á T L, b T R as mentioned above, and observe then that the score of c is e x[l, r 1] á 2 e x[r 2, r] b 2 = e x[l, r 1] á 2 + x[r 2, r] b 2 2 á b x = e 5

6 We recognize the last term as the score of the spatial pattern á b in a gaussian node with spread σ. We also see that the score of c will always be this term by way of the combination of á T L, b T R which yield the maximum score for a particular x. This suggests a method for reducing a two layer network to a single node, and by repeatedly doing this from below we can reduce any network to a single gaussian node. Theorem 6. For any network H l,r where all gaussian nodes have the same spread σ, there exists a gaussian node G l,r with spread σ such that for any x R d where d r l + 1 we have H(x = G(x. Proof. We prove this by induction on the structure of H. The base case is trivial, since then H itself would be a gaussian node. Now let H = (Hr 1 1,r 2,..., Hr n 2n 1,r 2n, C, T. For each H i let G i be a gaussian node with spread σ such that H i (x = G i (x for all x R d. Such a gaussian node must exist for each i since if H i itself is a gaussian node, it has spread σ by assumption, if it is not a gaussian node we apply the induction hypothesis to it and obtain the necessary gaussian node. We can now, based on H, G 1,..., G n, construct the desired gaussian node G. Let T j i and T i denote the i th temporal group of G j and the i th temporal group of H respectively. For each c C let { } S c = t 1... t n t 1 Tc[1] 1,..., t n Tc[n] n, and for each i = 1,..., n let letting us define our gaussian node ( G = β i = l, r, c C S c, S c, {β i } i n. To confirm that H and G agree, we consult the their i th component and estab- 6

7 lish equality. So for any 1 i n observe that n H(x[i] = max H j (x[c[j]] n = max G j (x[c[j]] n x[r 2j 1, r 2j ] c 2 = max max c T j c[j] e n max e x[r 2j 1, r 2j ] c j 2 = max = max = max = max = max u β i c 1 T 1 c[1],...,cn T n c[n] x[r 2j 1, r 2j ] c j 2 max e c 1 T 1 c[1],...,c n T n c[n] 1 max c 1 T 1 c[1],...,cn T n e x c 1... c n 2 c[n] 1 max u S c e x u 2 1 e x u 2 H def. H i (x = G i (x. G i def. ( ( = G(x[i] G def. For ( observe that taking the product of a sequence of function maximums is the same as taking the maximum over all ordered products of the functions, that is max i 1 {F 1 (i 1 }... max {F n (i n } = max {F 1 (i 1... F n (i n }. i n i 1,...,i n 7

8 For ( observe that e 1 x[r 2j 1, r 2j ] c j 2 = e 1 = e = e = e r n r 2j r 2j 1+1 l=1 r 2j (x[r 2j 1, r 2j ][l] c j [l] 2 l=r 2j 1 (x[l] c j [l r 2j 1 + 1] 2 r 2j l=r 2j 1 (x[l] c 1... c n [l] 2 l=r 1 (x[l] c 1... c n [l] 2 = e 1 x c 1... c n 2 Observe that the process suggested in the previous theorem can also be performed in reverse, giving a means of decomposing a single node into a network. The practical feasibility of such a reduction is dependent on the increase in the number of spatial patterns, since they must be saved in memory and iterated when computing the score. Therefor we give a very loose upper bound on the size of G H, the node built in previous theorem, by way of an inductive function counting the number of spatial patterns in a network, so let Spatial(l, r, σ, S, T = S, Spatial ( H 1,..., H n, C, T = C + i Spatial(Hi. Theorem 7. For any network H we have that Spatial(G H 2 Spatial(H Proof. We prove this by induction on the structure of H. trivial, since then H = G H. The base case is 8

9 For the induction step we have Spatial(G H = S c G H def. c C S c c C = c C Tc[1] 1 n... Tc[n] Sc def. c C Spatial(G H 1... Spatial(G H n ( c C 2 Spatial(H Spatial(Hn (IH = c C 2 Spatial(H Spatial(H n = C 2 Spatial(H Spatial(H n = 2 lg C + Spatial(H Spatial(H n 2 Spatial(H For ( recall that the temporal groups of a node partitions its spatial pattern set, so obviously T i c[i] Spatial(G H i. It is well known that a single linear threshold unit, or perceptron, can solve linearly separable classification tasks. This means we can use a collection of such units to capture the convex polytype surrounding each spatial pattern in a gaussian node, by having one perceptron for each hyperplane supporting the polytype. Further, we can combine many such collections of units to determine whether input is in at least one of the polytypes in a temporal group, and thereby deciding if that temporal group is the winning group for that particular input. This can be done for each temporal group, giving us a simple feed forward network of perceptrons which fully captures the classification the node induces on the input space. This can also be done for full networks via Theorem 6. Theorem 8. For any finite union of convex polytypes cover {T 1,..., T p } for R n there exists 3 layer feed-forward neural network classifying it. More over, such a network exists for any network H 1,n where all gaussian nodes have the same spread. Proof. We provide a neural network as described which takes any x R n and has p outputs, one for each element T i in the cover. For any x R n the network will spike on output i if and only if x T i. We use simple neurons with a threshold activation function and an input bias. We construct one network N i for each T i which spikes if and only if x T i, and each such network is one of the p outputs of the total network. Fix some i and recall that, by assumption on the cover, there exist m n matrices A i 1,..., A i k and vectors b i 1,..., b i k Rn such that x T i A i jx b i j for some j = 1,..., k. 9

10 So we may decide x T i by examining A i j x bi j for at most all 1 j k, and therefor we build a neural network A j for each j which does exactly this. Observe that testing the condition amounts to testing on which side of m hyperplanes, one for each row in A i j combined with corresponding cell from bi j, x is. This is easily achieved by having m neurons V i,j 1,..., Vi,j m, each deciding x for one hyperplane. The network A i j is then simply an AND-gate which spikes when all V i,j 1,..., Vi,j m spike. Finally the network N i is then simply an OR-gate which spikes when at least one of A 1,..., A k spike. We now have a 3 layer feed-forward network where layer three is p OR nodes, layer two is AND nodes and layer one is a layer with hyperplane deciders. 3 Discussion As we alluded to in the abstract, the flavor of networks considered here by no means represent the full range of networks that one can have in the HTM model. The model has matured significantly, although the fundamental concepts of hierarchical temporal and spatial pooling remains the same. The two primary alternative forms are letting the score of a temporal group be a weighted sum of pattern/coincidence scores instead of the maximum, letting the score of a coincidence pattern be a weighted sum child temporal group scores instead of a product. More exotic features include mixing node types, overlapping receptive fields and level skipping, these are however much less used in practice. A glaring limitation in our results is that we require all gaussian nodes to share the same spread, this is however not such a severe restriction as it may seem. In principle, if all gaussian nodes are receiving input from the same domain, for example as bottom nodes in a vision network, we expect that they should learn the same low level spatial patterns (e.g. horizontal, vertical, diagonal lines and shapes and thus having the same spread is reasonable. At present the spread is a handtuned parameter which cannot be learned, so it is therefor often left the same across all nodes. Additionally, a common technique in training such networks is what is called node cloning, which means that you only train a single node pr. network level, and then just replicate that node when you do classification. Lastly, given that the upperbound in Theorem 7 cannot be significantly improved, then this can be seen as evidence that the hierarchical organization of this and other models is a nontrivial construct in terms of computational feasibility. This may also, as mentioned, mean that large gaussian nodes may be decomposed to increase classification speed. References [1] G. Wallis and E. T. Rolls. A model of invariant object recognition in the visual system. Prog. Neurobiol., 51: , [2] M. Riesenhuber and T. Poggio. Hierarchical models of object recognition in cortex. Nat. Neurosci., 2: ,

11 [3] K. Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cyb., 36: , [4] T.S. Lee and D. Mumford. Hierarchical bayesian inference in the visual cortex. J Opt Soc Am A Opt Image Sci Vis, 20(7: , July [5] George D, Hawkins J, 2009 Towards a Mathematical Theory of Cortical Microcircuits. PLoS Comput Biol 5(10: e doi: /journal.pcbi [6] D.H. Hubel and T.N. Wiesel. Receptive fields and functional architecture of monkey striate cortex. J. Phys., 195: , [7] George D. HOW THE BRAIN MIGHT WORK: A hierarchical and temporal model for learning and recognition. PhD Thesis Stanford, June 2008: [8] Numenta Inc. Zeta1 Algorithms Reference Version 1.5. Available upon request. [9] Thomas Dean. A computational model of the cerebral cortex. In Proceedings of AAAI-05, pages , Cambridge, Massachusetts, MIT Press. 11

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided

More information

Neural Networks: Introduction

Neural Networks: Introduction Neural Networks: Introduction Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others 1

More information

Ângelo Cardoso 27 May, Symbolic and Sub-Symbolic Learning Course Instituto Superior Técnico

Ângelo Cardoso 27 May, Symbolic and Sub-Symbolic Learning Course Instituto Superior Técnico BIOLOGICALLY INSPIRED COMPUTER MODELS FOR VISUAL RECOGNITION Ângelo Cardoso 27 May, 2010 Symbolic and Sub-Symbolic Learning Course Instituto Superior Técnico Index Human Vision Retinal Ganglion Cells Simple

More information

Fundamentals of Computational Neuroscience 2e

Fundamentals of Computational Neuroscience 2e Fundamentals of Computational Neuroscience 2e January 1, 2010 Chapter 10: The cognitive brain Hierarchical maps and attentive vision A. Ventral visual pathway B. Layered cortical maps Receptive field size

More information

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

CS 4700: Foundations of Artificial Intelligence

CS 4700: Foundations of Artificial Intelligence CS 4700: Foundations of Artificial Intelligence Prof. Bart Selman selman@cs.cornell.edu Machine Learning: Neural Networks R&N 18.7 Intro & perceptron learning 1 2 Neuron: How the brain works # neurons

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

Boosting & Deep Learning

Boosting & Deep Learning Boosting & Deep Learning Ensemble Learning n So far learning methods that learn a single hypothesis, chosen form a hypothesis space that is used to make predictions n Ensemble learning à select a collection

More information

RegML 2018 Class 8 Deep learning

RegML 2018 Class 8 Deep learning RegML 2018 Class 8 Deep learning Lorenzo Rosasco UNIGE-MIT-IIT June 18, 2018 Supervised vs unsupervised learning? So far we have been thinking of learning schemes made in two steps f(x) = w, Φ(x) F, x

More information

Learning and Memory in Neural Networks

Learning and Memory in Neural Networks Learning and Memory in Neural Networks Guy Billings, Neuroinformatics Doctoral Training Centre, The School of Informatics, The University of Edinburgh, UK. Neural networks consist of computational units

More information

Vote. Vote on timing for night section: Option 1 (what we have now) Option 2. Lecture, 6:10-7:50 25 minute dinner break Tutorial, 8:15-9

Vote. Vote on timing for night section: Option 1 (what we have now) Option 2. Lecture, 6:10-7:50 25 minute dinner break Tutorial, 8:15-9 Vote Vote on timing for night section: Option 1 (what we have now) Lecture, 6:10-7:50 25 minute dinner break Tutorial, 8:15-9 Option 2 Lecture, 6:10-7 10 minute break Lecture, 7:10-8 10 minute break Tutorial,

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples

More information

A summary of Deep Learning without Poor Local Minima

A summary of Deep Learning without Poor Local Minima A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given

More information

Christian Mohr

Christian Mohr Christian Mohr 20.12.2011 Recurrent Networks Networks in which units may have connections to units in the same or preceding layers Also connections to the unit itself possible Already covered: Hopfield

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks What are (Artificial) Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

CS:4420 Artificial Intelligence

CS:4420 Artificial Intelligence CS:4420 Artificial Intelligence Spring 2018 Neural Networks Cesare Tinelli The University of Iowa Copyright 2004 18, Cesare Tinelli and Stuart Russell a a These notes were originally developed by Stuart

More information

Multiclass Classification-1

Multiclass Classification-1 CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass

More information

Deep learning in the visual cortex

Deep learning in the visual cortex Deep learning in the visual cortex Thomas Serre Brown University. Fundamentals of primate vision. Computational mechanisms of rapid recognition and feedforward processing. Beyond feedforward processing:

More information

Lecture 4: Perceptrons and Multilayer Perceptrons

Lecture 4: Perceptrons and Multilayer Perceptrons Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons

More information

What and where: A Bayesian inference theory of attention

What and where: A Bayesian inference theory of attention What and where: A Bayesian inference theory of attention Sharat Chikkerur, Thomas Serre, Cheston Tan & Tomaso Poggio CBCL, McGovern Institute for Brain Research, MIT Preliminaries Outline Perception &

More information

Simple Neural Nets For Pattern Classification

Simple Neural Nets For Pattern Classification CHAPTER 2 Simple Neural Nets For Pattern Classification Neural Networks General Discussion One of the simplest tasks that neural nets can be trained to perform is pattern classification. In pattern classification

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Supervised Learning. George Konidaris

Supervised Learning. George Konidaris Supervised Learning George Konidaris gdk@cs.brown.edu Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom Mitchell,

More information

Sections 18.6 and 18.7 Artificial Neural Networks

Sections 18.6 and 18.7 Artificial Neural Networks Sections 18.6 and 18.7 Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline The brain vs. artifical neural

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Neural Networks Introduction CIS 32

Neural Networks Introduction CIS 32 Neural Networks Introduction CIS 32 Functionalia Office Hours (Last Change!) - Location Moved to 0317 N (Bridges Room) Today: Alpha-Beta Example Neural Networks Learning with T-R Agent (from before) direction

More information

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen Artificial Neural Networks Introduction to Computational Neuroscience Tambet Matiisen 2.04.2018 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Lecture 6. Notes on Linear Algebra. Perceptron

Lecture 6. Notes on Linear Algebra. Perceptron Lecture 6. Notes on Linear Algebra. Perceptron COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture Notes on linear algebra Vectors

More information

Intelligent Systems Discriminative Learning, Neural Networks

Intelligent Systems Discriminative Learning, Neural Networks Intelligent Systems Discriminative Learning, Neural Networks Carsten Rother, Dmitrij Schlesinger WS2014/2015, Outline 1. Discriminative learning 2. Neurons and linear classifiers: 1) Perceptron-Algorithm

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)

More information

18.6 Regression and Classification with Linear Models

18.6 Regression and Classification with Linear Models 18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight

More information

Decision Trees (Cont.)

Decision Trees (Cont.) Decision Trees (Cont.) R&N Chapter 18.2,18.3 Side example with discrete (categorical) attributes: Predicting age (3 values: less than 30, 30-45, more than 45 yrs old) from census data. Attributes (split

More information

Machine Learning Linear Models

Machine Learning Linear Models Machine Learning Linear Models Outline II - Linear Models 1. Linear Regression (a) Linear regression: History (b) Linear regression with Least Squares (c) Matrix representation and Normal Equation Method

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Sections 18.6 and 18.7 Analysis of Artificial Neural Networks

Sections 18.6 and 18.7 Analysis of Artificial Neural Networks Sections 18.6 and 18.7 Analysis of Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline Univariate regression

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement

More information

Hierarchy. Will Penny. 24th March Hierarchy. Will Penny. Linear Models. Convergence. Nonlinear Models. References

Hierarchy. Will Penny. 24th March Hierarchy. Will Penny. Linear Models. Convergence. Nonlinear Models. References 24th March 2011 Update Hierarchical Model Rao and Ballard (1999) presented a hierarchical model of visual cortex to show how classical and extra-classical Receptive Field (RF) effects could be explained

More information

Sections 18.6 and 18.7 Artificial Neural Networks

Sections 18.6 and 18.7 Artificial Neural Networks Sections 18.6 and 18.7 Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline The brain vs artifical neural networks

More information

Artificial neural networks

Artificial neural networks Artificial neural networks Chapter 8, Section 7 Artificial Intelligence, spring 203, Peter Ljunglöf; based on AIMA Slides c Stuart Russel and Peter Norvig, 2004 Chapter 8, Section 7 Outline Brains Neural

More information

Neural networks. Chapter 20. Chapter 20 1

Neural networks. Chapter 20. Chapter 20 1 Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms

More information

SGD and Deep Learning

SGD and Deep Learning SGD and Deep Learning Subgradients Lets make the gradient cheating more formal. Recall that the gradient is the slope of the tangent. f(w 1 )+rf(w 1 ) (w w 1 ) Non differentiable case? w 1 Subgradients

More information

Methods for Estimating the Computational Power and Generalization Capability of Neural Microcircuits

Methods for Estimating the Computational Power and Generalization Capability of Neural Microcircuits Methods for Estimating the Computational Power and Generalization Capability of Neural Microcircuits Wolfgang Maass, Robert Legenstein, Nils Bertschinger Institute for Theoretical Computer Science Technische

More information

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler + Machine Learning and Data Mining Multi-layer Perceptrons & Neural Networks: Basics Prof. Alexander Ihler Linear Classifiers (Perceptrons) Linear Classifiers a linear classifier is a mapping which partitions

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,

More information

Hidden Markov Models Part 1: Introduction

Hidden Markov Models Part 1: Introduction Hidden Markov Models Part 1: Introduction CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Modeling Sequential Data Suppose that

More information

Introduction: The Perceptron

Introduction: The Perceptron Introduction: The Perceptron Haim Sompolinsy, MIT October 4, 203 Perceptron Architecture The simplest type of perceptron has a single layer of weights connecting the inputs and output. Formally, the perceptron

More information

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2.

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2. 1 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.034 Artificial Intelligence, Fall 2011 Recitation 8, November 3 Corrected Version & (most) solutions

More information

The error-backpropagation algorithm is one of the most important and widely used (and some would say wildly used) learning techniques for neural

The error-backpropagation algorithm is one of the most important and widely used (and some would say wildly used) learning techniques for neural 1 2 The error-backpropagation algorithm is one of the most important and widely used (and some would say wildly used) learning techniques for neural networks. First we will look at the algorithm itself

More information

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau Last update: October 26, 207 Neural networks CMSC 42: Section 8.7 Dana Nau Outline Applications of neural networks Brains Neural network units Perceptrons Multilayer perceptrons 2 Example Applications

More information

Qualifying Exam in Machine Learning

Qualifying Exam in Machine Learning Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts

More information

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal

More information

Lecture 4: Feed Forward Neural Networks

Lecture 4: Feed Forward Neural Networks Lecture 4: Feed Forward Neural Networks Dr. Roman V Belavkin Middlesex University BIS4435 Biological neurons and the brain A Model of A Single Neuron Neurons as data-driven models Neural Networks Training

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Neural networks. Chapter 20, Section 5 1

Neural networks. Chapter 20, Section 5 1 Neural networks Chapter 20, Section 5 Chapter 20, Section 5 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 20, Section 5 2 Brains 0 neurons of

More information

CMSC 421: Neural Computation. Applications of Neural Networks

CMSC 421: Neural Computation. Applications of Neural Networks CMSC 42: Neural Computation definition synonyms neural networks artificial neural networks neural modeling connectionist models parallel distributed processing AI perspective Applications of Neural Networks

More information

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check

More information

Neural networks. Chapter 19, Sections 1 5 1

Neural networks. Chapter 19, Sections 1 5 1 Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10

More information

Unit 8: Introduction to neural networks. Perceptrons

Unit 8: Introduction to neural networks. Perceptrons Unit 8: Introduction to neural networks. Perceptrons D. Balbontín Noval F. J. Martín Mateos J. L. Ruiz Reina A. Riscos Núñez Departamento de Ciencias de la Computación e Inteligencia Artificial Universidad

More information

Artificial Neural Network and Fuzzy Logic

Artificial Neural Network and Fuzzy Logic Artificial Neural Network and Fuzzy Logic 1 Syllabus 2 Syllabus 3 Books 1. Artificial Neural Networks by B. Yagnanarayan, PHI - (Cover Topologies part of unit 1 and All part of Unit 2) 2. Neural Networks

More information

Name (NetID): (1 Point)

Name (NetID): (1 Point) CS446: Machine Learning Fall 2016 October 25 th, 2016 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains four

More information

CSC Neural Networks. Perceptron Learning Rule

CSC Neural Networks. Perceptron Learning Rule CSC 302 1.5 Neural Networks Perceptron Learning Rule 1 Objectives Determining the weight matrix and bias for perceptron networks with many inputs. Explaining what a learning rule is. Developing the perceptron

More information

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring / Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical

More information

ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

More information

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation BACKPROPAGATION Neural network training optimization problem min J(w) w The application of gradient descent to this problem is called backpropagation. Backpropagation is gradient descent applied to J(w)

More information

The XOR problem. Machine learning for vision. The XOR problem. The XOR problem. x 1 x 2. x 2. x 1. Fall Roland Memisevic

The XOR problem. Machine learning for vision. The XOR problem. The XOR problem. x 1 x 2. x 2. x 1. Fall Roland Memisevic The XOR problem Fall 2013 x 2 Lecture 9, February 25, 2015 x 1 The XOR problem The XOR problem x 1 x 2 x 2 x 1 (picture adapted from Bishop 2006) It s the features, stupid It s the features, stupid The

More information

Introduction to Machine Learning (67577) Lecture 3

Introduction to Machine Learning (67577) Lecture 3 Introduction to Machine Learning (67577) Lecture 3 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem General Learning Model and Bias-Complexity tradeoff Shai Shalev-Shwartz

More information

Introduction Biologically Motivated Crude Model Backpropagation

Introduction Biologically Motivated Crude Model Backpropagation Introduction Biologically Motivated Crude Model Backpropagation 1 McCulloch-Pitts Neurons In 1943 Warren S. McCulloch, a neuroscientist, and Walter Pitts, a logician, published A logical calculus of the

More information

Lecture Support Vector Machine (SVM) Classifiers

Lecture Support Vector Machine (SVM) Classifiers Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in

More information

Artificial Neural Networks. Part 2

Artificial Neural Networks. Part 2 Artificial Neural Netorks Part Artificial Neuron Model Folloing simplified model of real neurons is also knon as a Threshold Logic Unit x McCullouch-Pitts neuron (943) x x n n Body of neuron f out Biological

More information

Deep Feedforward Networks. Sargur N. Srihari

Deep Feedforward Networks. Sargur N. Srihari Deep Feedforward Networks Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation

More information

CSC242: Intro to AI. Lecture 21

CSC242: Intro to AI. Lecture 21 CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

Neural Networks for Machine Learning. Lecture 2a An overview of the main types of neural network architecture

Neural Networks for Machine Learning. Lecture 2a An overview of the main types of neural network architecture Neural Networks for Machine Learning Lecture 2a An overview of the main types of neural network architecture Geoffrey Hinton with Nitish Srivastava Kevin Swersky Feed-forward neural networks These are

More information

Linear Classifiers: Expressiveness

Linear Classifiers: Expressiveness Linear Classifiers: Expressiveness Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Lecture outline Linear classifiers: Introduction What functions do linear classifiers express?

More information

An Analytical Comparison between Bayes Point Machines and Support Vector Machines

An Analytical Comparison between Bayes Point Machines and Support Vector Machines An Analytical Comparison between Bayes Point Machines and Support Vector Machines Ashish Kapoor Massachusetts Institute of Technology Cambridge, MA 02139 kapoor@mit.edu Abstract This paper analyzes the

More information

Visual Motion Analysis by a Neural Network

Visual Motion Analysis by a Neural Network Visual Motion Analysis by a Neural Network Kansai University Takatsuki, Osaka 569 1095, Japan E-mail: fukushima@m.ieice.org (Submitted on December 12, 2006) Abstract In the visual systems of mammals, visual

More information

Deep Learning. Convolutional Neural Network (CNNs) Ali Ghodsi. October 30, Slides are partially based on Book in preparation, Deep Learning

Deep Learning. Convolutional Neural Network (CNNs) Ali Ghodsi. October 30, Slides are partially based on Book in preparation, Deep Learning Convolutional Neural Network (CNNs) University of Waterloo October 30, 2015 Slides are partially based on Book in preparation, by Bengio, Goodfellow, and Aaron Courville, 2015 Convolutional Networks Convolutional

More information

How to make computers work like the brain

How to make computers work like the brain How to make computers work like the brain (without really solving the brain) Dileep George a single special machine can be made to do the work of all. It could in fact be made to work as a model of any

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Yongjin Park 1 Goal of Feedforward Networks Deep Feedforward Networks are also called as Feedforward neural networks or Multilayer Perceptrons Their Goal: approximate some function

More information

Linear discriminant functions

Linear discriminant functions Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative

More information

Hierarchical Boosting and Filter Generation

Hierarchical Boosting and Filter Generation January 29, 2007 Plan Combining Classifiers Boosting Neural Network Structure of AdaBoost Image processing Hierarchical Boosting Hierarchical Structure Filters Combining Classifiers Combining Classifiers

More information

Lecture 8. Instructor: Haipeng Luo

Lecture 8. Instructor: Haipeng Luo Lecture 8 Instructor: Haipeng Luo Boosting and AdaBoost In this lecture we discuss the connection between boosting and online learning. Boosting is not only one of the most fundamental theories in machine

More information

Machine Learning, Fall 2009: Midterm

Machine Learning, Fall 2009: Midterm 10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all

More information

INTRODUCTION TO DATA SCIENCE

INTRODUCTION TO DATA SCIENCE INTRODUCTION TO DATA SCIENCE JOHN P DICKERSON Lecture #13 3/9/2017 CMSC320 Tuesdays & Thursdays 3:30pm 4:45pm ANNOUNCEMENTS Mini-Project #1 is due Saturday night (3/11): Seems like people are able to do

More information

Neural Networks and the Back-propagation Algorithm

Neural Networks and the Back-propagation Algorithm Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely

More information

Short Term Memory and Pattern Matching with Simple Echo State Networks

Short Term Memory and Pattern Matching with Simple Echo State Networks Short Term Memory and Pattern Matching with Simple Echo State Networks Georg Fette (fette@in.tum.de), Julian Eggert (julian.eggert@honda-ri.de) Technische Universität München; Boltzmannstr. 3, 85748 Garching/München,

More information

THE MOST IMPORTANT BIT

THE MOST IMPORTANT BIT NEURAL NETWORKS THE MOST IMPORTANT BIT A neural network represents a function f : R d R d 2. Peter Orbanz Applied Data Mining 262 BUILDING BLOCKS Units The basic building block is a node or unit: φ The

More information

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA   1/ 21 Neural Networks Chapter 8, Section 7 TB Artificial Intelligence Slides from AIMA http://aima.cs.berkeley.edu / 2 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear

More information

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest

More information

Convolutional neural networks

Convolutional neural networks 11-1: Convolutional neural networks Prof. J.C. Kao, UCLA Convolutional neural networks Motivation Biological inspiration Convolution operation Convolutional layer Padding and stride CNN architecture 11-2:

More information

CS 4700: Foundations of Artificial Intelligence

CS 4700: Foundations of Artificial Intelligence CS 4700: Foundations of Artificial Intelligence Prof. Bart Selman selman@cs.cornell.edu Machine Learning: Neural Networks R&N 18.7 Intro & perceptron learning 1 2 Neuron: How the brain works # neurons

More information

Natural Image Statistics

Natural Image Statistics Natural Image Statistics A probabilistic approach to modelling early visual processing in the cortex Dept of Computer Science Early visual processing LGN V1 retina From the eye to the primary visual cortex

More information