Neural Network Models in Statistical Learning
|
|
- Camilla Johns
- 6 years ago
- Views:
Transcription
1 Neural Network Models in Statistical Learning Stephen Talley April 25, 2014 Abstract Neural network models can solve problems more easily than traditional methods by emulating the human brain. We examine a basic neural network to model regression and to classify data. We conclude with an example of basic ZIP code character recognition. 1 Introduction 1.1 Definition Neural network models were originally developed in two separate yet equally important fields: statistics and artificial intelligence [1]. However, despite the connotations that the term neural network carries, there is nothing highly technical or mysterious about such a model. Rather, a neural network is defined by the following: Definition 1. A neural network is a nonlinear statistical model that emulates the human brain on a very basic level by adapting to or learning from a set of training patterns [1, 2]. Because a neural network requires a set of training patterns and targets to properly function, it may be characterized as a supervised system, as opposed to an unsupervised system which infers trends in random, unmarked data. Definition 2. A supervised system is a system or algorithm that infers trends from objective training data. 1.2 History Hebb s Rule The origins of today s neural network models can be traced back to one man s contribution. Dr. Donald Hebb, widely regarded as the father of neuropsychology, outlined an intial theory of biological neural networking in his seminal work The Organization of Behavior (1949) [3]. Theorem 1. Hebb s Rule: When a neuron of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A s efficiency, as one of the cells firing B, is increased. 1
2 Put simply, Cells that fire together, wire together. We can also express a simplified version of Hebb s Rule mathematically: α ij = ηx i x j. (1) In Equation (1), α ij is the change in connection strength between two given nodes i and j and η is a constant learning rate such that 0 η 1. This neurological rule not only proposed an explanation for associative learning in humans, but also provided the basis for adaptive learning algorithms in computer science Early Devlopments While the advent of Hebb s Rule is considered the beginning of computational neuroscience, in truth, the first neural network model had already been created six years before [4]. In 1943, Walter Pitts and Warren McCulloch created the first computational neural network using basic algorithms. Unfortunately, because of their model s simplicity, it was only capable of solving simple arithmetic and logic problems [4]. In 1958, Frank Rosenblatt developed the first successful neurocomputer: a single layer neural network or perceptron model called single layer since it only had one hidden layer between input and output which could receive multiple inputs and create a single output from a linear combination of these inputs [2]. The single layer perceptron, shown in Figure 1, was more adaptable than other models at the time and could solve problems more quickly and reliably despite its simplicity [8]. Figure 1: Diagram of a single layer perceptron neural network. Despite the progress of the perceptron model, it suffered from two limitations. First, the perceptron model could not solve the exclusive-or problem a logical operation that outputs true even when inputs differ on truth value [2]. Second, as problems increased in complexity, progressively more inputs were required for classifications, and the computer hardware of the time was simply too limited to handle these problems. Most further advancement in the field stagnated until technology could reach the perceptron model s computational demands [1]. 2
3 Figure 2: Plot for the general sigmoid function [6] Recent Developments Computer capabilities did not reach the level required for more complex neural network models until the early 80 s, and interest revived in the field soon after [1]. In particular, the discovery of the back-propogation algorithm in 1986 was crucial for further developments, since it helped to find global minimums for error functions in any neural network model [4]. Since this time, researchers have found many new applications for neural network models, including mathematical finance, data mining, handwriting recognition and (obviously) modeling biological neural systems [1]. 1.3 Basics of a Neural Network Model All neural network models, regardless of application, share some common elements, though the number and complexity of these elements can vary depending on the model used [5]. For a basic model, as shown in Figure 1, each red node represents an individual input x i in the vector of p inputs X T = [x 1, x 2, x 3,..., x p ]. These inputs all form a layer unto themselves, simply called the input layer [2]. Each input is connected to the nodes in the second, hidden layer, and these connections all have values associated with them, called weights. Each weight is assigned a random value between 0 and 1 depending on the context of the problem. Then, by using the inputs and weights, the model determines the value of the hidden layer node Z m by forming the linear combination p αmix T = α m1 x 1 + α m2 x α mp x p. i=1 Once the value for the linear combination is found, it is then inserted into a nonlinear activation function σ. Usually, this nonlinear function is the sigmoid function 1 σ(x) =. (2) 1 + e x The sigmoid function is frequently used particularly for regression models because it combines nearly linear, curvilinear, and nearly constant behavior depending on input value [5]. As Figure 2 illustrates, the sigmoid function becomes nearly linear for domain values 1 < x < 1. For extreme values of x, σ(x) becomes nearly constant. 3
4 1.4 Applications Because of the neural network model s ability to generalize a linear model using a nonlinear function along with its ability to learn from data, they can be used for a variety of practical applications. In particular, neural networks are best used for four types of problems [7]: 1. function prediction or approximation, 2. complex data classification (with nonlinear classification boundaries), 3. using internal properties of data for clustering, and 4. time-series forecasting. 1.5 Advantages and Disadvantages of a Neural Network Model The neural network model offers a few distinct advantages over other types of machine learning algorithms. Because a neural network is a supervised system (i.e. it requires a standard or basis for classification), it requires less formal training to determine a proper algorithm for a given data set [5]. Furthermore, neural networks can detect more complex relationships and interactions among variables thanks to their aforementioned property of deriving parameters from data [8]. One last advantage of the neural network is the ubiquity of training algorithms for working with data, most likely stemming from their variety of applications. Unfortunately, neural networks also have several disadvantages. Though computer technology has advanced substantially since the neural network s introduction, more complex models still have heavy computational demands that sometimes cannot be met within a reasonable time. Another disadvantage involves the sheer quantity of connections/weights. Since almost every node is connected to one another, forming a weight for each connection, overfitting data can be an issue; however, this problem can be regulated either by early stopping or by a process called weight decay using a penalty function to shrink all weights toward zero, thereby reducing the model to a linear one [1]. 2 Body 2.1 Advanced Neural Networks Obviously, with more advanced computers come more advanced neural networks. Since the transformation functions of the hidden layers are fairly simple, a typical neural network model can, in truth, have up to 100 nodes encompassing multiple hidden layers [1]. In this case, the formula for determining the outputs becomes a multi-step transformation: Z m = σ(α 0m + α T mx), where X = [x 1, x 2,..., x p ], T k = β 0 + β T k Z, f k (X) = g k (T ), where T = [T 1, T 2,..., T k ]. (3) 4
5 Typically, the complexity of these neural network models is dependent upon the following variables: p, the number of inputs, m, the total number of neurons, and k, the number of classes or outputs. Each step of this algorithm alternates the linearity of the data. Initially, the neural network forms linear combinations from the original inputs. Then, the linear combination is plugged into the activation function σ. Unlike the single-layer perceptron model, a multi-layer network makes an additional linear combination T k from the non-linear hidden layer values Z m and subsequently inputs said linear combination into another, different non-linear function g k (T ). Note that g k (T ) in Equation (3) is an additional, often final activation function brought about by the inclusion of multiple hidden layers. In some of the earliest multi-layer neural network models (and in some current regression models), g k (T ) = T k ; thus, the entire model reduced to a linear output [5]. Classification models later replaced the identity function with the softmax function g k (T ) = e T k K. (4) l=1 et l The softmax function (Equation (4)) was chosen due to its probabalistic properties: each output is between zero and one, and all outputs sum to one [7]. 2.2 Overparameterization and Prevention The Weight Problem Because the scale of the neural network model is dependent on both the number of neurons and the number of inputs, the quantity of connections increases as these two variables increase. These weights are designated by two key parameters, α and β, the complete set of which are given by the matrices below [1]: α 01 α 11 α α p1 α 02 α 12 α α p2 α 03 α 13 α α p α 0m α 1m α 2m... α pm β 01 β 11 β β m1 β 02 β 12 β β m2 β 03 β 13 β β m β 0k β 1k β 2k... β mk Even if errors are minimized, the neural network may overfit the data due to the sheer quantity of weights accounted for in the algorithm [1]. An overfitted model will become excessively complex, and often it will exaggerate minor or random errors in the data. The best and most efficient way to prevent overfitting is by establishing an early stopping rule [5]. An early stopping rule is a method of training the model only for a short time thereby generating fewer weights than would be generated with a full network. This simplifies the model while limiting the potential effect of random error.. 5
6 2.2.2 Error Functions and Minima Aside from the problem of having too many weights, a neural network may also have problems associated with the weights values. Consequently, we must adjust the values for the initially random weights such that they fit the data well enough to make predictions [1]. For regression models, we use a sum-of-squares as our error function R(θ) = K k=1 i=1 N (y ik f k (x i )) 2. (5) Note that R(θ) measures the total difference between the actual class or value and the predicted class or value across all classes K and across all observations N. For a classification neural network, we can also use a cross-entropy equation R(θ) = N i=1 k=1 K y ik log f k (x i ) to determine the minimum amount of information needed for categorizing a given observation [5] Weight Decay While the aforementioned early-stopping technique can be effective for controlling the number of weights, there exists a more explicit method for controlling the quality of weights rather than the quantity: a process known as weight decay [1]. By adding an additional term to the error function, the error equation becomes R(θ) + λj(θ), where J(θ) = km β 2 km + ml α 2 ml and λ 0 represents a tuning parameter [7]. This tuning parameter is ideally large, and the larger the value of λ, the more quickly the weights will shrink to 0. As the weights shrink to 0, the activation (sigmoid) function and by extension the entire model reduces to an approximately linear function. The value of λ is also generally estimated using a cross-validation function [7]. Weight decay is especially important as it helps to improve prediction on any type of neural network [1]. 2.3 Back-propagation Regardless of the equation used for R(θ), it is an error term; therefore, we want to keep the value of R(θ) small. In neural network design, the most popular method for minimizing R(θ) is through back propagation (also called gradient descent) [7]. Quite simply, back-propagation is the process of working backwards from an estimated point using a function s rate of change. Once we have the rate of change and the estimated point, we estimate another, lower point on the function until we reach a minimum. While the network is training with this algorithm, its weights are continually modified to reduce mean-squared error across all classes and observations [5]. The back-propagation method can be 6
7 Figure 3: Examples of handwritten characters from training data [1]. applied for either single or multivariate functions. In this case we only have two parameters that we have any degree of control over, α and β. Using Equation 5 as our error function, we obtain the beta and alpha derivatives R α ml = R β km = 2(y ik f k (x i )g k(β T k z i )z mi. K 2(y ik f k (x i )g k(β k T z i )β km σ (αmx T i )x il. k=1 Once the rates of change are determined for the error function, a gradient descent update for the (r + 1)st iteration takes the form β (r+1) km α (r+1) ml N = β(r) km γ R, β km i=1 N = α (r) ml γ R. α ml The gamma term in both equations denotes the step size for the backpropagation, and it is an arbitrary constant such that 0 γ 1. The actual value for the step size should be chosen carefully, as problems may arise if γ is either too large or too small. If the step size is too large, the algorithm may overstep the local minimum and come up with a larger, inaccurate result. If the step size is too small, the algorithm will take some time to reach the local minimum, sacrificing efficiency in the process. Due to its simplicity, back-propagation is considered the textbook approach to minimizing error; however, there are other methods that can converge to minima more quickly [1]. Use of Newton s method for optimization is possible, but because the second derivative for both parameters can be very complex, it is avoided. One more efficient method is a variation of traditional backpropagation, called conjugate gradient back-propagation. Conjugate gradient back-propagation is similar to back-propagation, but rather than using the negative gradient for steepest descent, the algorithm uses a line search in conjugate directions for alternate directions of descent [7]. While this method tends to be faster, it also is more computationally demanding because of the required searching at each step. i=1 7
8 2.4 Example: ZIP Code Character Recognition The Setup One of the earliest, best-known problems in neural networks has been handwritten character recognition. Because recognizing characters is essentially a classification task (into categories A-Z for letters and 0-9 for numbers), it is an ideal test of a neural network s capabilities. The particular data set for this example is the same used in a similar neural network test in 1989 [9]; however, this example obviously uses more advanced neural networks and error reduction techniques than the previous experiment. Every digit was scanned from U.S. Postal Service envelopes and then standardized into pixel grayscale images such as those in Figure 3. The digits were standardized this way to limit certain characteristics (such as the slant or rotation of the number) which could lead to misclassification [1]. Since the digits were 16 16, each digit, denoted as an observation, had 256 inputs The Procedure Since this example uses the same data as the 1989 neural network test, the total data set consisted of 9298 handwritten digits, each one an individual observation. This data was divided into two main subsets: a practice set of 7291 observation and a working set of 2007 observations [9]. The practice subset was further divided into randomly assigned training, validation, and test sets to prepare the neural network. Both the inputs and the actual targets (the ground truth of which input belongs to which class) were inserted into a MATLAB program which then constructed the neural network. The MATLAB program relies on user input for only two parts of a singlelayer neural network model: the number of neurons in the hidden layer and the allocation of the training data. Given the subdivisions of the training data mentioned before, we determined that an allocation of 90% training, 5% validation, and 5% testing for the 7291 observations in the practice data set yielded the best performance for a single-layer network. While using the entirety of the practice set for training would have likely been ideal, restrictions in MATLAB s neural network scripts prevented us from doing so. The primary reason that this allocation was so effective was because of the high number of observations reserved strictly for training i.e. preparing the neural network. Since the network itself was actually being prepared for the working data set, the testing and validation portions could be relatively low The Results Once we determined the best allocation to use, the next part to consider was the size of the neural network or more specifically, the number of neurons in the hidden layer. For the sake of simplicity, we investigated varying network sizes (multiples of 10 neurons) for accuracy. After running each network five times each, an average accuracy rating was taken, and the initial findings are given in Table 1. Note that ultimately a 60-layer network was the most accurate by a slight margin. While neural networks with more than 100 neurons would possibly be more accurate, these would take a considerable amount of time to compute for 8
9 T raining% V alidation% T est% #of N eurons Accuracy % Table 1: Data for initial neural network tests. each iteration of the network. Once we determined the most accurate allocation and network size, the network was ready for the data. Because MATLAB uses different initial conditions for each neural network test, the best way to gather information was to run the test several times; thus, we decided on trying the network one-hundred times. This reduced the actual test to setting up a for loop to evaluate the working data set a full one-hundred times. The program would then plot a histogram for the classification error and display both the mean and the standard deviation for said error. Referring back to the issue of time, as an example this particular loop required approximately two hours to complete all one-hundred trials. The resulting histogram, shown in Figure 4, showed some interesting results. For one, the error distribution was very right-skewed with only one true outlier of 27% error. Furthermore, both the average error and the standard deviation were far smaller than the findings in Table 1 would have indicated. For this histogram, the mean error = (meaning that the network had 98.42% accuracy) and the standard deviation = Though these statistics may seem high compared to expectations, they are, in comparison with other modern neural networks, relatively low. For example, as of 2011, multi-layer networks have reported error rates as low as 0.7% [1]. 3 Conclusion Overall, neural network models are incredibly useful and versatile statistical learning tools. This paper only examines the basics of the models themselves, error detection, and possible applications that such models can have. Though other applications, such as regression modeling or time series analysis, along with more thorough multi-layer networks, may be examined at a later date, everything in this paper should be sufficient information to give one a proper overview of this fascinating subject. 9
10 Figure 4: Histogram of classification error on working set. References [1] T. Hastie, R. Tibshirani, J. Friedman, Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition, Springer: New York, [2] K. Gurney, An Introduction to Neural Networks, UCL Press: London, 1997 [3] D. Hebb, The Organization of Behavior, Wiley and Sons: New York, 1949 [4] I.A. Bansheer, M. Hajmeer, Artifical Neural Networks: Fundamentals, Computing, Design, and Application, Journal of Microbiological Methods, 43 issue 1, 2000, pp [5] J. Han, M. Kamber, J. Pei, Data Mining: Concepts and Techniques, Third Edition, Burlington, Massachusetts: Morgan Kaufman, [6] Image found at svg [7] S. Samarasinghe, Neural Networks for Applied Sciences and Engineering, Auerbach: Boca Raton, Florida, [8] J.V. Tu, Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes, 49 issue 11, 1996, pp [9] Y. LeCun, B. Boser, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard, L.D. Jackel, Back-Propagation Applied to Handwritten ZIP Code Recognition, Neural Computation, 1 (1989) pp
Computational statistics
Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial
More informationNeural networks (NN) 1
Neural networks (NN) 1 Hedibert F. Lopes Insper Institute of Education and Research São Paulo, Brazil 1 Slides based on Chapter 11 of Hastie, Tibshirani and Friedman s book The Elements of Statistical
More informationNeural Networks. Haiming Zhou. Division of Statistics Northern Illinois University.
Neural Networks Haiming Zhou Division of Statistics Northern Illinois University zhouh@niu.edu Neural Networks The term neural network has evolved to encompass a large class of models and learning methods.
More informationLecture 4: Feed Forward Neural Networks
Lecture 4: Feed Forward Neural Networks Dr. Roman V Belavkin Middlesex University BIS4435 Biological neurons and the brain A Model of A Single Neuron Neurons as data-driven models Neural Networks Training
More informationARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD
ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,
More informationFeedforward Neural Nets and Backpropagation
Feedforward Neural Nets and Backpropagation Julie Nutini University of British Columbia MLRG September 28 th, 2016 1 / 23 Supervised Learning Roadmap Supervised Learning: Assume that we are given the features
More informationMachine Learning. Neural Networks
Machine Learning Neural Networks Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 Biological Analogy Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 THE
More informationIntroduction To Artificial Neural Networks
Introduction To Artificial Neural Networks Machine Learning Supervised circle square circle square Unsupervised group these into two categories Supervised Machine Learning Supervised Machine Learning Supervised
More informationArtificial Neural Networks. Historical description
Artificial Neural Networks Historical description Victor G. Lopez 1 / 23 Artificial Neural Networks (ANN) An artificial neural network is a computational model that attempts to emulate the functions of
More informationARTIFICIAL INTELLIGENCE. Artificial Neural Networks
INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html
More informationArtificial Neural Network
Artificial Neural Network Contents 2 What is ANN? Biological Neuron Structure of Neuron Types of Neuron Models of Neuron Analogy with human NN Perceptron OCR Multilayer Neural Network Back propagation
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationArtificial Neural Networks. MGS Lecture 2
Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation
More informationNeural Networks and the Back-propagation Algorithm
Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely
More informationArtificial Neural Networks The Introduction
Artificial Neural Networks The Introduction 01001110 01100101 01110101 01110010 01101111 01101110 01101111 01110110 01100001 00100000 01110011 01101011 01110101 01110000 01101001 01101110 01100001 00100000
More informationThe Perceptron. Volker Tresp Summer 2016
The Perceptron Volker Tresp Summer 2016 1 Elements in Learning Tasks Collection, cleaning and preprocessing of training data Definition of a class of learning models. Often defined by the free model parameters
More informationArtificial Neural Networks
Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Knowledge
More informationSPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks
Topics in Machine Learning-EE 5359 Neural Networks 1 The Perceptron Output: A perceptron is a function that maps D-dimensional vectors to real numbers. For notational convenience, we add a zero-th dimension
More information18.6 Regression and Classification with Linear Models
18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)
More informationIntroduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen
Neural Networks - I Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Neural Networks 1 /
More informationNeural Networks biological neuron artificial neuron 1
Neural Networks biological neuron artificial neuron 1 A two-layer neural network Output layer (activation represents classification) Weighted connections Hidden layer ( internal representation ) Input
More informationIntroduction to Neural Networks
CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology Copyright by Nguyen, Hotta and Nakagawa 1 Pattern classification Which category of an input? Example: Character
More informationNeural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Neural Networks CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Perceptrons x 0 = 1 x 1 x 2 z = h w T x Output: z x D A perceptron
More informationIntroduction to Neural Networks
Introduction to Neural Networks What are (Artificial) Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning
More informationAdvanced statistical methods for data analysis Lecture 2
Advanced statistical methods for data analysis Lecture 2 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationMultilayer Perceptron = FeedForward Neural Network
Multilayer Perceptron = FeedForward Neural Networ History Definition Classification = feedforward operation Learning = bacpropagation = local optimization in the space of weights Pattern Classification
More informationNeural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav
Neural Networks 30.11.2015 Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav 1 Talk Outline Perceptron Combining neurons to a network Neural network, processing input to an output Learning Cost
More informationMachine Learning Linear Models
Machine Learning Linear Models Outline II - Linear Models 1. Linear Regression (a) Linear regression: History (b) Linear regression with Least Squares (c) Matrix representation and Normal Equation Method
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More informationThe Perceptron. Volker Tresp Summer 2014
The Perceptron Volker Tresp Summer 2014 1 Introduction One of the first serious learning machines Most important elements in learning tasks Collection and preprocessing of training data Definition of a
More informationApprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning
Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire
More informationAdministration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6
Administration Registration Hw3 is out Due on Thursday 10/6 Questions Lecture Captioning (Extra-Credit) Look at Piazza for details Scribing lectures With pay; come talk to me/send email. 1 Projects Projects
More informationAI Programming CS F-20 Neural Networks
AI Programming CS662-2008F-20 Neural Networks David Galles Department of Computer Science University of San Francisco 20-0: Symbolic AI Most of this class has been focused on Symbolic AI Focus or symbols
More informationCSC Neural Networks. Perceptron Learning Rule
CSC 302 1.5 Neural Networks Perceptron Learning Rule 1 Objectives Determining the weight matrix and bias for perceptron networks with many inputs. Explaining what a learning rule is. Developing the perceptron
More informationCSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning
CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Learning Neural Networks Classifier Short Presentation INPUT: classification data, i.e. it contains an classification (class) attribute.
More informationARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92
ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000
More informationSections 18.6 and 18.7 Artificial Neural Networks
Sections 18.6 and 18.7 Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline The brain vs artifical neural networks
More informationInf2b Learning and Data
Inf2b Learning and Data Lecture : Single layer Neural Networks () (Credit: Hiroshi Shimodaira Iain Murray and Steve Renals) Centre for Speech Technology Research (CSTR) School of Informatics University
More informationArtificial Neural Networks
Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples
More informationNeural Networks DWML, /25
DWML, 2007 /25 Neural networks: Biological and artificial Consider humans: Neuron switching time 0.00 second Number of neurons 0 0 Connections per neuron 0 4-0 5 Scene recognition time 0. sec 00 inference
More informationArtificial Intelligence
Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement
More informationNeural Networks Learning the network: Backprop , Fall 2018 Lecture 4
Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:
More informationSections 18.6 and 18.7 Artificial Neural Networks
Sections 18.6 and 18.7 Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline The brain vs. artifical neural
More informationLecture 7 Artificial neural networks: Supervised learning
Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works The neuron as a simple computing element The perceptron Multilayer neural networks Accelerated learning in
More informationArtificial Neural Networks. Edward Gatt
Artificial Neural Networks Edward Gatt What are Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning Very
More informationLast updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationCMSC 421: Neural Computation. Applications of Neural Networks
CMSC 42: Neural Computation definition synonyms neural networks artificial neural networks neural modeling connectionist models parallel distributed processing AI perspective Applications of Neural Networks
More informationReading Group on Deep Learning Session 1
Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular
More informationArtifical Neural Networks
Neural Networks Artifical Neural Networks Neural Networks Biological Neural Networks.................................. Artificial Neural Networks................................... 3 ANN Structure...........................................
More informationAN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009
AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is
More informationCSC 411 Lecture 10: Neural Networks
CSC 411 Lecture 10: Neural Networks Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 10-Neural Networks 1 / 35 Inspiration: The Brain Our brain has 10 11
More informationNeural networks. Chapter 19, Sections 1 5 1
Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10
More informationThe Perceptron. Volker Tresp Summer 2018
The Perceptron Volker Tresp Summer 2018 1 Elements in Learning Tasks Collection, cleaning and preprocessing of training data Definition of a class of learning models. Often defined by the free model parameters
More informationUnit 8: Introduction to neural networks. Perceptrons
Unit 8: Introduction to neural networks. Perceptrons D. Balbontín Noval F. J. Martín Mateos J. L. Ruiz Reina A. Riscos Núñez Departamento de Ciencias de la Computación e Inteligencia Artificial Universidad
More informationNeural Networks: Introduction
Neural Networks: Introduction Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others 1
More informationCSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer
CSE446: Neural Networks Spring 2017 Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer Human Neurons Switching time ~ 0.001 second Number of neurons 10 10 Connections per neuron 10 4-5 Scene
More informationNeural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET
Unit-. Definition Neural network is a massively parallel distributed processing system, made of highly inter-connected neural computing elements that have the ability to learn and thereby acquire knowledge
More informationCSC 578 Neural Networks and Deep Learning
CSC 578 Neural Networks and Deep Learning Fall 2018/19 3. Improving Neural Networks (Some figures adapted from NNDL book) 1 Various Approaches to Improve Neural Networks 1. Cost functions Quadratic Cross
More informationSections 18.6 and 18.7 Analysis of Artificial Neural Networks
Sections 18.6 and 18.7 Analysis of Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline Univariate regression
More informationDEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY
DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo
More informationCS 4700: Foundations of Artificial Intelligence
CS 4700: Foundations of Artificial Intelligence Prof. Bart Selman selman@cs.cornell.edu Machine Learning: Neural Networks R&N 18.7 Intro & perceptron learning 1 2 Neuron: How the brain works # neurons
More informationFrom perceptrons to word embeddings. Simon Šuster University of Groningen
From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written
More informationNeural Networks Introduction
Neural Networks Introduction H.A Talebi Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2011 H. A. Talebi, Farzaneh Abdollahi Neural Networks 1/22 Biological
More informationArtificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!
Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011! 1 Todayʼs lecture" How the brain works (!)! Artificial neural networks! Perceptrons! Multilayer feed-forward networks! Error
More informationIntroduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis
Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.
More informationPart 8: Neural Networks
METU Informatics Institute Min720 Pattern Classification ith Bio-Medical Applications Part 8: Neural Netors - INTRODUCTION: BIOLOGICAL VS. ARTIFICIAL Biological Neural Netors A Neuron: - A nerve cell as
More informationArtificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino
Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as
More informationNeural Networks (Part 1) Goals for the lecture
Neural Networks (Part ) Mark Craven and David Page Computer Sciences 760 Spring 208 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed
More informationAnalysis of Fast Input Selection: Application in Time Series Prediction
Analysis of Fast Input Selection: Application in Time Series Prediction Jarkko Tikka, Amaury Lendasse, and Jaakko Hollmén Helsinki University of Technology, Laboratory of Computer and Information Science,
More informationTopic 3: Neural Networks
CS 4850/6850: Introduction to Machine Learning Fall 2018 Topic 3: Neural Networks Instructor: Daniel L. Pimentel-Alarcón c Copyright 2018 3.1 Introduction Neural networks are arguably the main reason why
More informationJakub Hajic Artificial Intelligence Seminar I
Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014 Outline Key concepts Deep Belief Networks Convolutional Neural Networks A couple of questions Convolution Perceptron Feedforward Neural Network
More informationIntroduction to Machine Learning
Introduction to Machine Learning Neural Networks Varun Chandola x x 5 Input Outline Contents February 2, 207 Extending Perceptrons 2 Multi Layered Perceptrons 2 2. Generalizing to Multiple Labels.................
More informationNeural networks. Chapter 20. Chapter 20 1
Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms
More informationPairwise Neural Network Classifiers with Probabilistic Outputs
NEURAL INFORMATION PROCESSING SYSTEMS vol. 7, 1994 Pairwise Neural Network Classifiers with Probabilistic Outputs David Price A2iA and ESPCI 3 Rue de l'arrivée, BP 59 75749 Paris Cedex 15, France a2ia@dialup.francenet.fr
More informationNotes on Back Propagation in 4 Lines
Notes on Back Propagation in 4 Lines Lili Mou moull12@sei.pku.edu.cn March, 2015 Congratulations! You are reading the clearest explanation of forward and backward propagation I have ever seen. In this
More informationNeural Networks. Nicholas Ruozzi University of Texas at Dallas
Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify
More informationData Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction
Data Mining 3.6 Regression Analysis Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Straight-Line Linear Regression Multiple Linear Regression Other Regression Models References Introduction
More informationMaster Recherche IAC TC2: Apprentissage Statistique & Optimisation
Master Recherche IAC TC2: Apprentissage Statistique & Optimisation Alexandre Allauzen Anne Auger Michèle Sebag LIMSI LRI Oct. 4th, 2012 This course Bio-inspired algorithms Classical Neural Nets History
More informationLecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning
Lecture 0 Neural networks and optimization Machine Learning and Data Mining November 2009 UBC Gradient Searching for a good solution can be interpreted as looking for a minimum of some error (loss) function
More informationNeural Networks, Computation Graphs. CMSC 470 Marine Carpuat
Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ
More informationIncremental Stochastic Gradient Descent
Incremental Stochastic Gradient Descent Batch mode : gradient descent w=w - η E D [w] over the entire data D E D [w]=1/2σ d (t d -o d ) 2 Incremental mode: gradient descent w=w - η E d [w] over individual
More informationMultilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)
Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w x + w 2 x 2 + w 0 = 0 Feature x 2 = w w 2 x w 0 w 2 Feature 2 A perceptron can separate
More information(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann
(Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for
More informationMultilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)
Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w 1 x 1 + w 2 x 2 + w 0 = 0 Feature 1 x 2 = w 1 w 2 x 1 w 0 w 2 Feature 2 A perceptron
More informationAlgorithms for Learning Good Step Sizes
1 Algorithms for Learning Good Step Sizes Brian Zhang (bhz) and Manikant Tiwari (manikant) with the guidance of Prof. Tim Roughgarden I. MOTIVATION AND PREVIOUS WORK Many common algorithms in machine learning,
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationIntroduction Biologically Motivated Crude Model Backpropagation
Introduction Biologically Motivated Crude Model Backpropagation 1 McCulloch-Pitts Neurons In 1943 Warren S. McCulloch, a neuroscientist, and Walter Pitts, a logician, published A logical calculus of the
More informationNeural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21
Neural Networks Chapter 8, Section 7 TB Artificial Intelligence Slides from AIMA http://aima.cs.berkeley.edu / 2 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural
More informationCSC242: Intro to AI. Lecture 21
CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages
More informationSupervised (BPL) verses Hybrid (RBF) Learning. By: Shahed Shahir
Supervised (BPL) verses Hybrid (RBF) Learning By: Shahed Shahir 1 Outline I. Introduction II. Supervised Learning III. Hybrid Learning IV. BPL Verses RBF V. Supervised verses Hybrid learning VI. Conclusion
More informationAn artificial neural networks (ANNs) model is a functional abstraction of the
CHAPER 3 3. Introduction An artificial neural networs (ANNs) model is a functional abstraction of the biological neural structures of the central nervous system. hey are composed of many simple and highly
More informationLearning from Examples
Learning from Examples Data fitting Decision trees Cross validation Computational learning theory Linear classifiers Neural networks Nonparametric methods: nearest neighbor Support vector machines Ensemble
More informationy(x n, w) t n 2. (1)
Network training: Training a neural network involves determining the weight parameter vector w that minimizes a cost function. Given a training set comprising a set of input vector {x n }, n = 1,...N,
More informationArtificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen
Artificial Neural Networks Introduction to Computational Neuroscience Tambet Matiisen 2.04.2018 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition
More informationChapter 9: The Perceptron
Chapter 9: The Perceptron 9.1 INTRODUCTION At this point in the book, we have completed all of the exercises that we are going to do with the James program. These exercises have shown that distributed
More informationStatistical NLP for the Web
Statistical NLP for the Web Neural Networks, Deep Belief Networks Sameer Maskey Week 8, October 24, 2012 *some slides from Andrew Rosenberg Announcements Please ask HW2 related questions in courseworks
More information