Deep neural networks
|
|
- Joel Roberts
- 5 years ago
- Views:
Transcription
1 Deep neural networks
2 On-line Resources Online book by Michael Nielsen - of convolutions demo/mnist.html - demo of CNN - 3D visualization Stanford CS class CS231n: Convolutional Neural Networks for Visual Recognition. MIT Press book in prep from Bengio
3 A history of neural networks 1940s-60 s: McCulloch & Pitts; Hebb: modeling real neurons Rosenblatt, Widrow-Hoff: : perceptrons 1969: Minskey & Papert, Perceptrons book showed formal limitations of one-layer linear network 1970 s-mid-1980 s: mid-1980 s mid-1990 s: backprop and multi-layer networks Rumelhart and McClelland PDP book set Sejnowski s NETTalk, BP-based text-to-speech Neural Info Processing Systems (NIPS) conference starts Mid 1990 s-early 2000 s: Mid-2000 s to current: More and more interest and experimental success
4
5
6 ANNs in the 90 s Mostly 2-layer networks or else carefully constructed deep networks Worked well but training was slow and einicky Nov 1998 Yann LeCunn, Bo2ou, Bengio, Haffner
7 ANNs in the 90 s Mostly 2-layer networks or else carefully constructed deep networks Worked well but training typically took weeks when guided by an expert SVM: % accurate CNNs: % accurate
8 A history of neural networks 1940s-60 s: McCulloch & Pitts; Hebb: modeling real neurons Rosenblatt, Widrow-Hoff: : perceptrons 1969: Minskey & Papert, Perceptrons book showed formal limitations of one-layer linear network 1970 s-mid-1980 s: mid-1980 s mid-1990 s: backprop and multi-layer networks Rumelhart and McClelland PDP book set Sejnowski s NETTalk, BP-based text-to-speech Neural Info Processing Systems (NIPS) conference starts Mid 1990 s-early 2000 s: Mid-2000 s to current: The next two lectures
9 Outline of lectures What s new in ANNs in the last 5-10 years? What types of ANNs are most successful and why? What are the hot research topics for deep learning?
10 Outline What s new in ANNs in the last 5-10 years? Deeper networks, more data, and faster training Scalability and use of GPUs Symbolic differentiation Some subtle changes to cost function, architectures, optimization methods What types of ANNs are most successful and why? Convolutional networks (CNNs) Long term/short term memory networks (LSTM) Word2vec and embeddings What are the hot research topics for deep learning?
11 PARALLEL TRAINING FOR ANNS
12 Recap: logistic regression with SGD P(Y =1 X = x) = p = 1 1+ e x w This part computes inner product <x,w> This part logisjc of <x,w> 12
13 Recap: logistic regression with SGD P(Y =1 X = x) = p = 1 1+ e x w a z On one example: computes inner product <x,w> There s some chance to compute this in parallel can we do more? 13
14 In ANNs we have many many logistic regression nodes
15 Recap: logistic regression with SGD Let x be an example Let w i be the input weights for the i-th hidden unit Then output a i = x. w i a i z i 15
16 Recap: logistic regression with SGD Let x be an example Let w i be the input weights for the i-th hidden unit Then a = x W is output for all m units w 1 w 2 w 3 w m W = a i z i 16
17 Recap: logistic regression with SGD Let X be a matrix with k examples Let w i be the input weights for the i-th hidden unit Then A = X W is output for all m units for all k examples w 1 w 2 w 3 w m x x 2 x k There s a lot of chances to do this in parallel XW = x 1.w 1 x 1.w 2 x 1.w m x k.w 1 x k.w m 17
18 ANNs and multicore CPUs Modern libraries (Matlab, numpy, ) do matrix operations fast, in parallel Many ANN implementations exploit this parallelism automatically Key implementation issue is working with matrices comfortably
19 ANNs and GPUs GPUs do matrix operations very fast, in parallel For dense matrixes, not sparse ones! Training ANNs on GPUs is common SGD and minibatch sizes of 128 Modern ANN implementations can exploit this GPUs are not super-expensive $500 for high-end one large models with O(10 7 ) parameters can eit in a large-memory GPU (12Gb) Speedups of 20x-50x have been reported
20 ANNs and multi-gpu systems There are ways to set up ANN computations so that they are spread across multiple GPUs Sometimes needed for very large networks Not especially easy to implement and do with most current tools
21 Outline What s new in ANNs in the last 5-10 years? Deeper networks, more data, and faster training Scalability and use of GPUs Symbolic differentiation Some subtle changes to cost function, architectures, optimization methods But eirst why? What types of ANNs are most successful and why? Convolutional networks (CNNs) Long term/short term memory networks (LSTM) Word2vec and embeddings What are the hot research topics for deep learning?
22 WHY ARE DEEPER NETWORKS USEFUL?
23 Recap: mul6layer networks Classifier is a muljlayer network of logis/c units Each unit takes some inputs and produces one output using a logisjc classifier Output of one unit can be the input of another Input layer Hidden layer Output layer 1 w 0,2 w 0,1 w 1,1 v 1 =S(w T X) w 1 x 1 w 1,2 w 2,1 w 2 z 1 =S(w T V) x 2 w 2,2 v 2 =S(w T X)
24 Recap: Learning a mul6layer network Define a loss which is squared error But over a network of logisjc units Minimize loss with gradient descent J X,y (w) = i ( y i y ˆ i ) 2 But output is network output
25 Recap: weight updates for mul6layer ANN For nodes k in output layer: δ k ( t k a ) k a ( k 1 a ) k For nodes j in hidden layer: δ j ( δ k w ) kj a j 1 a j k For all weights: ( ) w kj = w kj ε δ k a j Propagate errors backward BACKPROP Can carry this recursion out further if you have multiple hidden layers w ji = w ji ε δ j a i
26 ANNs are expressive One logisjc unit can implement and AND or an OR of a subset of inputs e.g., (x 3 AND x 5 AND AND x 19 ) Every boolean funcjon can be expressed as an OR of ANDs e.g., (x 3 AND x 5 ) OR (x 7 AND x 19 ) OR So one hidden layer can express any BF (But it might need lots and lots of hidden units)
27 ANNs are expressive One logisjc unit can implement and AND or an OR of a subset of inputs e.g., (x 3 AND x 5 AND AND x 19 ) Every boolean funcjon can be expressed as an OR of ANDs e.g., (x 3 AND x 5 ) OR (x 7 AND x 19 ) OR So one hidden layer can express any BF Example: parity(x1,,xn)=1 iff off number of xi s are set to one Parity(a,b,c,d) = (a & -b & -c & -d) OR (-a & b & -c & -d) OR #list all the 1s (a & b & c & -d) OR (a & b & -c & d) OR #list all the 3s Size in general is O(2 N )
28 Deeper ANNs are more expressive A two-layer network needs O(2 N ) units A two-layer network can express binary XOR A 2*logN layer network can express the parity of N inputs (even/odd number of 1 s) With O(logN) units in a binary tree Deep network + parameter tying ~= subroujnes x1 x2 x3 x4 x5 x6 x7 x8
29 Hypothe6cal code for face recogni6on..
30 WHY ARE DEEP NETWORKS HARD TO TRAIN?
31 Recap: weight updates for mul6layer ANN For nodes k in output layer L: ( ) a k ( 1 a k ) δ L k t k a k For nodes j in hidden layer h: ( ) δ h δ h+1 w j j kj a j 1 a j k ( ) What happens as the layers get further and further from the output layer? E.g., what s gradient for the bias term with several layers after it?
32 Gradients are unstable Max at 1/4 If weights are usually < 1 then we are multiplying by many numbers < 1 so the gradients get very small. The vanishing gradient problem What happens as the layers get further and further from the output layer? E.g., what s gradient for the bias term with several layers after it in a trivial net?
33 Gradients are unstable Max at 1/4 If weights are usually > 1 then we are multiplying by many numbers > 1 so the gradients get very big. The exploding gradient problem What happens (less common as the layers but possible) get further and further from the output layer? E.g., what s gradient for the bias term with several layers after it in a trivial net?
34 AI Stats 2010 input output Histogram of gradients in a 5-layer network for an artificial image recognition task
35 AI Stats 2010 We will get to these tricks eventually.
36 It s easy for sigmoid units to saturate Learning rate approaches zero and unit is stuck
37 It s easy for sigmoid units to saturate For a big network there are lots of weighted inputs to each neuron. If any of them are too large then the neuron will saturate. So neurons get stuck with a few large inputs OR many small ones.
38 It s easy for sigmoid units to saturate If there are 500 non-zero inputs inijalized with a Gaussian ~N(0,1) then the SD is
39 It s easy for sigmoid units to saturate SaturaJon visualizajon from Glorot & Bengio using a smarter inijalizajon scheme Bottom layer still stuck for first 100 epochs
40 Outline What s new in ANNs in the last 5-10 years? Deeper networks, more data, and faster training Scalability and use of GPUs Symbolic differenjajon Some subtle changes to cost func6on, architectures, op6miza6on methods What types of ANNs are most successful and why? ConvoluJonal networks (CNNs) Long term/short term memory networks (LSTM) Word2vec and embeddings What are the hot research topics for deep learning?
41 WHAT S DIFFERENT ABOUT MODERN ANNS?
42 Some key differences Use of soomax and entropic loss instead of quadrajc loss. Use of alternate non-linearijes relu and hyperbolic tangent Be2er understanding of weight inijalizajon Data augmentajon Especially for image data
43 Cross-entropy loss Compare to gradient for square loss when a~=1 y=0 and x=1 C w = σ (z) y a z
44 Cross-entropy loss
45 Cross-entropy loss
46 So\max output layer Softmax Network outputs a probability distribution! Cross-entropy loss after a softmax layer gives a very simple, numerically stable gradient Δw ij = (y i -z i )y j
47 Some key differences Use of soomax and entropic loss instead of quadrajc loss. Ooen learning is faster and more stable as well as gepng be2er accuracies in the limit Use of alternate non-linearijes Be2er understanding of weight inijalizajon Data augmentajon Especially for image data
48 Some key differences Use of soomax and entropic loss instead of quadrajc loss. Ooen learning is faster and more stable as well as gepng be2er accuracies in the limit Use of alternate non-linearijes relu and hyperbolic tangent Be2er understanding of weight inijalizajon Data augmentajon Especially for image data
49 Alterna6ve non-lineari6es Changes so far Changed the loss from square error to crossentropy (no effect at test Jme) Proposed adding another output layer (soomax) A new change: modifying the nonlinearity The logisjc is not widely used in modern ANNs
50 Alterna6ve non-lineari6es A new change: modifying the nonlinearity The logisjc is not widely used in modern ANNs Alternate 1: tanh Like logistic function but shifted to range [-1, +1]
51 AI Stats 2010 depth 4? We will get to these tricks eventually.
52 Alterna6ve non-lineari6es A new change: modifying the nonlinearity relu ooen used in vision tasks Alternate 2: rectified linear unit Linear with a cutoff at zero (Implement: clip the gradient when you pass zero)
53 Alterna6ve non-lineari6es A new change: modifying the nonlinearity relu ooen used in vision tasks Alternate 2: rectified linear unit Soft version: log(exp(x)+1) Doesn t saturate (at one end) Sparsifies outputs Helps with vanishing gradient
54 Some key differences Use of soomax and entropic loss instead of quadrajc loss. Ooen learning is faster and more stable as well as gepng be2er accuracies in the limit Use of alternate non-linearijes relu and hyperbolic tangent Be2er understanding of weight inijalizajon Data augmentajon Especially for image data
55 It s easy for sigmoid units to saturate For a big network there are lots of weighted inputs to each neuron. If any of them are too large then the neuron will saturate. So neurons get stuck with a few large inputs OR many small ones.
56 It s easy for sigmoid units to saturate If there are 500 non-zero inputs inijalized with a Gaussian ~N(0,1) then the SD is Common heurisjcs for inijalizing weights:! N # 0, " 1 $ & #inputs % " U $ # 1 #inputs, 1 #inputs % ' &
57 It s easy for sigmoid units to saturate SaturaJon visualizajon from Glorot & Bengio 2010 using " U $ # 1 #inputs, Bottom layer still stuck for first 100 epochs 1 #inputs % ' &
58 Ini6alizing to avoid satura6on In Glorot and Bengio they suggest weights if level j First (with breakthrough n j inputs) deep from learning results were based on clever pre-training initialization schemes, where deep networks were seeded with weights learned from unsupervised strategies This is not always the solution but good initialization is very important for deep nets!
59 Outline What s new in ANNs in the last 5-10 years? Deeper networks, more data, and faster training Scalability and use of GPUs Symbolic differenjajon Some subtle changes to cost funcjon, architectures, opjmizajon methods What types of ANNs are most successful and why? Convolu6onal networks (CNNs) Long term/short term memory networks (LSTM) Word2vec and embeddings What are the hot research topics for deep learning?
60 WHAT S A CONVOLUTIONAL NEURAL NETWORK?
61 Model of vision in animals
62 Vision with ANNs
63 What s a convolu6on? 1-D
64 What s a convolu6on? 1-D
65 What s a convolu6on?
66 What s a convolu6on?
67 What s a convolu6on?
68 What s a convolu6on?
69 What s a convolu6on?
70 What s a convolu6on?
71 What s a convolu6on? Basic idea: Pick a 3-3 matrix F of weights Slide this over an image and compute the inner product (similarity) of F and the corresponding field of the image, and replace the pixel in the center of the field with the output of the inner product operajon Key point: Different convolujons extract different types of low-level features from an image All that we need to vary to generate these different features is the weights of F
72 How do we convolve an image with an ANN? Note that the parameters in the matrix defining the convolution are tied across all places that it is used
73 How do we do many convolu6ons of an image with an ANN?
74 Example: 6 convolu6ons of a digit
75 CNNs typically alternate convolu6ons, non-linearity, and then downsampling Downsampling is usually averaging or (more common in recent CNNs) max-pooling
76 Why do max-pooling? Saves space Reduces overfipng? Because I m going to add more convolujons aoer it! Allows the short-range convolujons to extend over larger subfields of the images So we can spot larger objects Eg, a long horizontal line, or a corner, or
77 Another CNN visualiza6on
78
79
80 Why do max-pooling? Saves space Reduces overfipng? Because I m going to add more convolujons aoer it! Allows the short-range convolujons to extend over larger subfields of the images So we can spot larger objects Eg, a long horizontal line, or a corner, or At some point the feature maps start to get very sparse and blobby they are indicators of some semanjc property, not a recognizable transformajon of the image Then just use them as features in a normal ANN
81 Why do max-pooling? Saves space Reduces overfipng? Because I m going to add more convolujons aoer it! Allows the short-range convolujons to extend over larger subfields of the images So we can spot larger objects Eg, a long horizontal line, or a corner, or
82 Alterna6ng convolu6on and downsampling 5 layers up The subfield in a large dataset that gives the strongest output for a neuron
DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY
DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo
More informationBackpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation Matt Gormley Lecture 12 Feb 23, 2018 1 Neural Networks Outline
More informationNeural Networks. William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ]
Neural Networks William Cohen 10-601 [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ] WHAT ARE NEURAL NETWORKS? William s notation Logis;c regression + 1
More informationLecture 17: Neural Networks and Deep Learning
UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions
More informationComments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms
Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationArtificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino
Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as
More informationIntroduction to Convolutional Neural Networks (CNNs)
Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationNeural Networks (Part 1) Goals for the lecture
Neural Networks (Part ) Mark Craven and David Page Computer Sciences 760 Spring 208 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More informationNeural Networks and Deep Learning.
Neural Networks and Deep Learning www.cs.wisc.edu/~dpage/cs760/ 1 Goals for the lecture you should understand the following concepts perceptrons the perceptron training rule linear separability hidden
More informationNeural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav
Neural Networks 30.11.2015 Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav 1 Talk Outline Perceptron Combining neurons to a network Neural network, processing input to an output Learning Cost
More informationArtificial Neural Networks
Artificial Neural Networks Oliver Schulte - CMPT 310 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will focus on
More informationNeural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21
Neural Networks Chapter 8, Section 7 TB Artificial Intelligence Slides from AIMA http://aima.cs.berkeley.edu / 2 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural
More informationNeural Networks Learning the network: Backprop , Fall 2018 Lecture 4
Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:
More informationDeep Learning (CNNs)
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deep Learning (CNNs) Deep Learning Readings: Murphy 28 Bishop - - HTF - - Mitchell
More informationFeed-forward Network Functions
Feed-forward Network Functions Sargur Srihari Topics 1. Extension of linear models 2. Feed-forward Network Functions 3. Weight-space symmetries 2 Recap of Linear Models Linear Models for Regression, Classification
More informationArtificial Neuron (Perceptron)
9/6/208 Gradient Descent (GD) Hantao Zhang Deep Learning with Python Reading: https://en.wikipedia.org/wiki/gradient_descent Artificial Neuron (Perceptron) = w T = w 0 0 + + w 2 2 + + w d d where
More informationNeural networks. Chapter 20, Section 5 1
Neural networks Chapter 20, Section 5 Chapter 20, Section 5 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 20, Section 5 2 Brains 0 neurons of
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)
More informationDeep Learning for NLP
Deep Learning for NLP CS224N Christopher Manning (Many slides borrowed from ACL 2012/NAACL 2013 Tutorials by me, Richard Socher and Yoshua Bengio) Machine Learning and NLP NER WordNet Usually machine learning
More information11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1)
11/3/15 Machine Learning and NLP Deep Learning for NLP Usually machine learning works well because of human-designed representations and input features CS224N WordNet SRL Parser Machine learning becomes
More informationFrom perceptrons to word embeddings. Simon Šuster University of Groningen
From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written
More informationECE521 Lectures 9 Fully Connected Neural Networks
ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance
More informationDeep Feedforward Networks. Han Shao, Hou Pong Chan, and Hongyi Zhang
Deep Feedforward Networks Han Shao, Hou Pong Chan, and Hongyi Zhang Deep Feedforward Networks Goal: approximate some function f e.g., a classifier, maps input to a class y = f (x) x y Defines a mapping
More informationCSC321 Lecture 5: Multilayer Perceptrons
CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer Perceptrons 1 / 21 Overview Recall the simple neuron-like unit: y output output bias i'th weight w 1 w2 w3
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Neural networks Daniel Hennes 21.01.2018 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Logistic regression Neural networks Perceptron
More informationDimensionality Reduction and Principle Components Analysis
Dimensionality Reduction and Principle Components Analysis 1 Outline What is dimensionality reduction? Principle Components Analysis (PCA) Example (Bishop, ch 12) PCA vs linear regression PCA as a mixture
More informationJakub Hajic Artificial Intelligence Seminar I
Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014 Outline Key concepts Deep Belief Networks Convolutional Neural Networks A couple of questions Convolution Perceptron Feedforward Neural Network
More informationNeural networks. Chapter 19, Sections 1 5 1
Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10
More informationCSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer
CSE446: Neural Networks Spring 2017 Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer Human Neurons Switching time ~ 0.001 second Number of neurons 10 10 Connections per neuron 10 4-5 Scene
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationNeural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35
Neural Networks David Rosenberg New York University July 26, 2017 David Rosenberg (New York University) DS-GA 1003 July 26, 2017 1 / 35 Neural Networks Overview Objectives What are neural networks? How
More informationLecture 35: Optimization and Neural Nets
Lecture 35: Optimization and Neural Nets CS 4670/5670 Sean Bell DeepDream [Google, Inceptionism: Going Deeper into Neural Networks, blog 2015] Aside: CNN vs ConvNet Note: There are many papers that use
More informationPerceptron & Neural Networks
School of Computer Science 10-701 Introduction to Machine Learning Perceptron & Neural Networks Readings: Bishop Ch. 4.1.7, Ch. 5 Murphy Ch. 16.5, Ch. 28 Mitchell Ch. 4 Matt Gormley Lecture 12 October
More informationDeep Neural Networks (1) Hidden layers; Back-propagation
Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 4 October 2017 / 9 October 2017 MLP Lecture 3 Deep Neural Networs (1) 1 Recap: Softmax single
More informationCourse 395: Machine Learning - Lectures
Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture
More informationArtificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen
Artificial Neural Networks Introduction to Computational Neuroscience Tambet Matiisen 2.04.2018 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition
More informationOnline Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?
Online Videos FERPA Sign waiver or sit on the sides or in the back Off camera question time before and after lecture Questions? Lecture 1, Slide 1 CS224d Deep NLP Lecture 4: Word Window Classification
More informationARTIFICIAL INTELLIGENCE. Artificial Neural Networks
INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html
More informationMultilayer Perceptrons (MLPs)
CSE 5526: Introduction to Neural Networks Multilayer Perceptrons (MLPs) 1 Motivation Multilayer networks are more powerful than singlelayer nets Example: XOR problem x 2 1 AND x o x 1 x 2 +1-1 o x x 1-1
More informationConvolutional Neural Networks
Convolutional Neural Networks Books» http://www.deeplearningbook.org/ Books http://neuralnetworksanddeeplearning.com/.org/ reviews» http://www.deeplearningbook.org/contents/linear_algebra.html» http://www.deeplearningbook.org/contents/prob.html»
More informationBased on the original slides of Hung-yi Lee
Based on the original slides of Hung-yi Lee Google Trends Deep learning obtains many exciting results. Can contribute to new Smart Services in the Context of the Internet of Things (IoT). IoT Services
More informationMachine Learning. Neural Networks
Machine Learning Neural Networks Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 Biological Analogy Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 THE
More informationConvolutional Neural Networks II. Slides from Dr. Vlad Morariu
Convolutional Neural Networks II Slides from Dr. Vlad Morariu 1 Optimization Example of optimization progress while training a neural network. (Loss over mini-batches goes down over time.) 2 Learning rate
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17
3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural
More informationTopics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 3: Introduction to Deep Learning (continued)
Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound Lecture 3: Introduction to Deep Learning (continued) Course Logistics - Update on course registrations - 6 seats left now -
More informationNeural networks. Chapter 20. Chapter 20 1
Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms
More informationDeep Neural Networks (1) Hidden layers; Back-propagation
Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 2 October 2018 http://www.inf.ed.ac.u/teaching/courses/mlp/ MLP Lecture 3 / 2 October 2018 Deep
More informationIntroduction Biologically Motivated Crude Model Backpropagation
Introduction Biologically Motivated Crude Model Backpropagation 1 McCulloch-Pitts Neurons In 1943 Warren S. McCulloch, a neuroscientist, and Walter Pitts, a logician, published A logical calculus of the
More informationNeural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications
Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of
More informationLast update: October 26, Neural networks. CMSC 421: Section Dana Nau
Last update: October 26, 207 Neural networks CMSC 42: Section 8.7 Dana Nau Outline Applications of neural networks Brains Neural network units Perceptrons Multilayer perceptrons 2 Example Applications
More informationCSC 411 Lecture 10: Neural Networks
CSC 411 Lecture 10: Neural Networks Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 10-Neural Networks 1 / 35 Inspiration: The Brain Our brain has 10 11
More information<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)
Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation
More informationArtificial Neural Networks
Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples
More informationApprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning
Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire
More informationCS 4700: Foundations of Artificial Intelligence
CS 4700: Foundations of Artificial Intelligence Prof. Bart Selman selman@cs.cornell.edu Machine Learning: Neural Networks R&N 18.7 Intro & perceptron learning 1 2 Neuron: How the brain works # neurons
More informationNeural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann
Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable
More informationMachine Learning Lecture 12
Machine Learning Lecture 12 Neural Networks 30.11.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory Probability
More informationBack-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples
Back-Propagation Algorithm Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples 1 Inner-product net =< w, x >= w x cos(θ) net = n i=1 w i x i A measure
More informationSupervised (BPL) verses Hybrid (RBF) Learning. By: Shahed Shahir
Supervised (BPL) verses Hybrid (RBF) Learning By: Shahed Shahir 1 Outline I. Introduction II. Supervised Learning III. Hybrid Learning IV. BPL Verses RBF V. Supervised verses Hybrid learning VI. Conclusion
More informationLecture 5 Neural models for NLP
CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm
More informationMachine Learning Lecture 10
Machine Learning Lecture 10 Neural Networks 26.11.2018 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Today s Topic Deep Learning 2 Course Outline Fundamentals Bayes
More informationDeep Feedforward Networks
Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3
More informationDeep Feedforward Networks. Seung-Hoon Na Chonbuk National University
Deep Feedforward Networks Seung-Hoon Na Chonbuk National University Neural Network: Types Feedforward neural networks (FNN) = Deep feedforward networks = multilayer perceptrons (MLP) No feedback connections
More informationCSCI567 Machine Learning (Fall 2018)
CSCI567 Machine Learning (Fall 2018) Prof. Haipeng Luo U of Southern California Sep 12, 2018 September 12, 2018 1 / 49 Administration GitHub repos are setup (ask TA Chi Zhang for any issues) HW 1 is due
More informationCS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS
CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS LAST TIME Intro to cudnn Deep neural nets using cublas and cudnn TODAY Building a better model for image classification Overfitting
More informationDEEP LEARNING PART ONE - INTRODUCTION CS/CNS/EE MACHINE LEARNING & DATA MINING - LECTURE 7
DEEP LEARNING PART ONE - INTRODUCTION CS/CNS/EE 155 - MACHINE LEARNING & DATA MINING - LECTURE 7 INTRODUCTION & MOTIVATION data labels 3 y is continuous y is binary or categorical y y x x regression classification
More informationUnit III. A Survey of Neural Network Model
Unit III A Survey of Neural Network Model 1 Single Layer Perceptron Perceptron the first adaptive network architecture was invented by Frank Rosenblatt in 1957. It can be used for the classification of
More informationIntroduction to Deep Learning
Introduction to Deep Learning Some slides and images are taken from: David Wolfe Corne Wikipedia Geoffrey A. Hinton https://www.macs.hw.ac.uk/~dwcorne/teaching/introdl.ppt Feedforward networks for function
More informationArtificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!
Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011! 1 Todayʼs lecture" How the brain works (!)! Artificial neural networks! Perceptrons! Multilayer feed-forward networks! Error
More informationWhat Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1
What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1 Multi-layer networks Steve Renals Machine Learning Practical MLP Lecture 3 7 October 2015 MLP Lecture 3 Multi-layer networks 2 What Do Single
More informationArtificial Neural Networks. MGS Lecture 2
Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation
More informationNeural Networks 2. 2 Receptive fields and dealing with image inputs
CS 446 Machine Learning Fall 2016 Oct 04, 2016 Neural Networks 2 Professor: Dan Roth Scribe: C. Cheng, C. Cervantes Overview Convolutional Neural Networks Recurrent Neural Networks 1 Introduction There
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationCS 4700: Foundations of Artificial Intelligence
CS 4700: Foundations of Artificial Intelligence Prof. Bart Selman selman@cs.cornell.edu Machine Learning: Neural Networks R&N 18.7 Intro & perceptron learning 1 2 Neuron: How the brain works # neurons
More informationMachine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group
Machine Learning for Computer Vision 8. Neural Networks and Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group INTRODUCTION Nonlinear Coordinate Transformation http://cs.stanford.edu/people/karpathy/convnetjs/
More informationMachine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016
Machine Learning for Signal Processing Neural Networks Continue Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 1 So what are neural networks?? Voice signal N.Net Transcription Image N.Net Text
More informationIntro to Neural Networks and Deep Learning
Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi UVA CS 6316 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions Backpropagation Nonlinearity Functions NNs
More informationIntelligent Systems Discriminative Learning, Neural Networks
Intelligent Systems Discriminative Learning, Neural Networks Carsten Rother, Dmitrij Schlesinger WS2014/2015, Outline 1. Discriminative learning 2. Neurons and linear classifiers: 1) Perceptron-Algorithm
More informationECE521 Lecture 7/8. Logistic Regression
ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression
More informationIntroduction to Deep Neural Networks
Introduction to Deep Neural Networks Presenter: Chunyuan Li Pattern Classification and Recognition (ECE 681.01) Duke University April, 2016 Outline 1 Background and Preliminaries Why DNNs? Model: Logistic
More informationMachine Learning Lecture 14
Machine Learning Lecture 14 Tricks of the Trade 07.12.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory Probability
More informationDeep Feedforward Networks. Lecture slides for Chapter 6 of Deep Learning Ian Goodfellow Last updated
Deep Feedforward Networks Lecture slides for Chapter 6 of Deep Learning www.deeplearningbook.org Ian Goodfellow Last updated 2016-10-04 Roadmap Example: Learning XOR Gradient-Based Learning Hidden Units
More informationNeural networks and optimization
Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional
More informationCOMP 551 Applied Machine Learning Lecture 14: Neural Networks
COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: Ryan Lowe (ryan.lowe@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted,
More informationNeural networks COMS 4771
Neural networks COMS 4771 1. Logistic regression Logistic regression Suppose X = R d and Y = {0, 1}. A logistic regression model is a statistical model where the conditional probability function has a
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationNeural networks and support vector machines
Neural netorks and support vector machines Perceptron Input x 1 Weights 1 x 2 x 3... x D 2 3 D Output: sgn( x + b) Can incorporate bias as component of the eight vector by alays including a feature ith
More informationReading Group on Deep Learning Session 1
Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular
More informationDeep Learning Recurrent Networks 2/28/2018
Deep Learning Recurrent Networks /8/8 Recap: Recurrent networks can be incredibly effective Story so far Y(t+) Stock vector X(t) X(t+) X(t+) X(t+) X(t+) X(t+5) X(t+) X(t+7) Iterated structures are good
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationStatistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks
Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Jan Drchal Czech Technical University in Prague Faculty of Electrical Engineering Department of Computer Science Topics covered
More informationNeural networks and optimization
Neural networks and optimization Nicolas Le Roux INRIA 8 Nov 2011 Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov 2011 1 / 80 1 Introduction 2 Linear classifier 3 Convolutional neural networks
More informationNeural Networks. Intro to AI Bert Huang Virginia Tech
Neural Networks Intro to AI Bert Huang Virginia Tech Outline Biological inspiration for artificial neural networks Linear vs. nonlinear functions Learning with neural networks: back propagation https://en.wikipedia.org/wiki/neuron#/media/file:chemical_synapse_schema_cropped.jpg
More informationNatural Language Processing with Deep Learning CS224N/Ling284
Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 4: Word Window Classification and Neural Networks Richard Socher Organization Main midterm: Feb 13 Alternative midterm: Friday Feb
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks Delivered by Mark Ebden With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable
More information