Theano: A Few Examples

Size: px
Start display at page:

Download "Theano: A Few Examples"

Transcription

1 October 21, 2015

2 Theano in a Nutshell Python library for creating, optimizing and evaluating mathematical expressions. Designed for Machine-Learning applications by the LISA Lab in Montreal, Canada, one of the world leaders in neural network research. Particularly good with multi-dimensional arrays, the basis of most neural-net representations. Automatic derivation of gradients (derivatives), a central aspect of many core neural-net computations. Seamless exploitation of GPU s for up to 140 X performance improvements. NOT a drag-and-drop construction kit for neural nets, but a programming paradigm (that takes a little getting used to). The source of info:

3 Theano Variables Theano variables have the same Python type, TensorVariable. Each variables type slot houses its Theano type, of which there are many. Theano variables have (optional) names that serve no important purpose in Theano but are useful for the user. import theano import theano. tensor as T >>> x = T. dscalar ( x ) >>> type ( x ) <class theano. tensor. var. TensorVariable > >>> x. type TensorType ( f l o a t 6 4, s c a l a r ) >>> m = T. dmatrix ( MyMatrix ) >>> type (m) <class theano. tensor. var. TensorVariable > >>> m. type TensorType ( f l o a t 6 4, m a t r i x ) >>> m. name MyMatrix

4 Theano Functions import theano import theano. tensor as T def theg1 ( ) : w = T. dscalar ( w ) x = T. dscalar ( x ) y = T. dscalar ( y ) z = w x + y f = theano. f u n c t i o n ( [ w, x, y ], z ) return f z declared as a TensorVariable of Theano type dscalar (double-fp scalar). theano.function compiles the entire expression connecting inputs [w,x,y] to output(s), z. >>> type ( f ) <class theano. compile. function module. Function > >>> f ( 1, 2, 3 ) array ( 5. 0 )

5 A Theano Expression Graph for z name=w TensorType(float64, scalar) name=x TensorType(float64, scalar) 0 TensorType(float64, scalar) 1 TensorType(float64, scalar) name=y TensorType(float64, scalar) Elemwise{mul,no_inplace} 1 TensorType(float64, scalar) 0 TensorType(float64, scalar) Elemwise{add,no_inplace} TensorType(float64, scalar) TensorType(float64, scalar) id=4

6 Expression Graph for Compiled Function f name=w TensorType(float64, scalar) name=x TensorType(float64, scalar) name=y TensorType(float64, scalar) 0 TensorType(float64, scalar) 1 TensorType(float64, scalar) 2 TensorType(float64, scalar) Elemwise{Composite{((i0 * i1) + i2)}} TensorType(float64, scalar) TensorType(float64, scalar)

7 Calculating Derivatives def theg2 ( ) : w = T. dscalar ( w ) x = T. dscalar ( x ) y = T. dscalar ( y ) z = 7 w x + y f = theano. f u n c t i o n ( [ w, x, y ], z ) dz = T. grad ( z, x ) g = theano. f u n c t i o n ( [ w, x, y ], dz ) return g T.grad calculates the derivative of z with respect to x and stores it in dz, a scalar theano variable. g is the function object that computes dz, given w, x and y.

8 Graphing the variable, dz val=7 TensorType(int8, scalar) name=w TensorType(float64, scalar) 0 TensorType(int8, scalar) 1 TensorType(float64, scalar) name=x TensorType(float64, scalar) Elemwise{mul,no_inplace} 1 TensorType(float64, scalar) 0 TensorType(float64, scalar) name=y TensorType(float64, scalar) Elemwise{mul,no_inplace} id=1 1 TensorType(float64, scalar) 0 TensorType(float64, scalar) val=1.0 TensorType(float64, scalar) Elemwise{add,no_inplace} 1 TensorType(float64, scalar) 1 TensorType(float64, scalar) 0 TensorType(float64, scalar) Elemwise{second,no_inplace} 0 TensorType(float64, scalar) Elemwise{mul} TensorType(float64, scalar) TensorType(float64, scalar) id=9

9 Graphing the Derivative Function, g val=7.0 TensorType(float64, scalar) name=w TensorType(float64, scalar) 0 TensorType(float64, scalar) 1 TensorType(float64, scalar) Elemwise{mul,no_inplace} TensorType(float64, scalar) TensorType(float64, scalar) >>> g = theg2 ( ) >>> g (5,1,20) array ( ) # 7w = 7 5

10 Working with Vectors and Matrices def theg3 ( ) : w = T. dmatrix ( weights ) v = T. dvector ( upstream a c t i v a t i o n s ) b = T. dvector ( biases ) x = T. dot ( v,w) + b # T. dot = dot product x. name = i n t e g r a t e d s i g n a l s f = theano. f u n c t i o n ( [ v,w, b ], x ) return f >>> f = theg3 ( ) >>> v = [ 1, 1 ] >>> b = [. 5,.3] >>> w = [ [ 2, 4 ], [ 3, 5 ] ] >>> f ( v,w, b ) array ( [ 5.5, 8. 7 ] )

11 Graphing the variable, x name=upstream activations name=weights TensorType(float64, matrix) 0 1 TensorType(float64, matrix) name=biases dot 1 0 Elemwise{add,no_inplace} name=integrated signals

12 Graphing the compiled function, f name=weights TensorType(float64, matrix) TensorType(float64, matrix) InplaceDimShuffle{1,0} TensorType(float64, matrix) name=biases val=1.0 TensorType(float64, scalar) name=upstream activations name=weights.t TensorType(float64, matrix) 0 1 TensorType(float64, scalar) 4 TensorType(float64, scalar) 3 2 TensorType(float64, matrix) CGemv{no_inplace} name=integrated signals This is compressed and simplified (due to compilation) but also complicated by an additional operation, DimShuffle, on the weight matrix, where {1,0} transpose.

13 Matrix Operations with Memory Create a shared accumulator variable: s = theano.shared(...) Update as a side-effect of the function:...updates=[(s,s+x)] def theg4 ( n =10): w = T. dmatrix ( weights ) v = T. dvector ( upstream a c t i v a t i o n s ) b = T. dvector ( biases ) s = theano. shared ( np. zeros ( 2 ) ) x = T. dot ( v,w) + b x. name = i n t e g r a t e d s i g n a l s f = theano. f u n c t i o n ( [ v,w, b ], x, updates = [ ( s, s+x ) ] ) w0 = np. random. uniform (.1,.1, size = ( 2, 2 ) ) b0 = [ 1, 1 ] for i in range ( n ) : # C a l l f many times f ( [ 1 + i / n,1 i / n ], w0, b0 ) return ( f, s )

14 Expression Graph including Update name=weights TensorType(float64, matrix) TensorType(float64, matrix) InplaceDimShuffle{1,0} TensorType(float64, matrix) name=biases val=1.0 TensorType(float64, scalar) name=upstream activations name=weights.t TensorType(float64, matrix) 0 1 TensorType(float64, scalar) 4 TensorType(float64, scalar) 3 2 TensorType(float64, matrix) CGemv{no_inplace} name=integrated signals 0 1 Elemwise{Add}[(0, 0)] Note addition of the update, lower left.

15 Including an Activation Function and Error Term Use Theano s NNET module for activation functions import theano. tensor. nnet as Tann def theg5 ( t a r g e t = [ 1, 1 ] ) : w = theano. shared ( np. random. uniform (.1,.1, size = ( 2, 2 ) ) ) v = T. dvector ( V ) b = theano. shared ( np. ones ( 2 ) ) x = Tann. sigmoid ( T. dot ( v,w) + b ) w. name = w ; x. name = x e r r o r = T.sum( ( t a r g e t x ) 2) de = T. grad ( e r r o r,w) return ( x, de )

16 Expression Graph of x name=v name=w TensorType(float64, matrix) 0 1 TensorType(float64, matrix) id=3 dot 1 0 Elemwise{add,no_inplace} sigmoid name=x

17 Expression Graph of error name=v name=w TensorType(float64, matrix) 0 1 TensorType(float64, matrix) dot id=5 0 1 Elemwise{add,no_inplace} sigmoid name=x val=[1 1] TensorType(int64, vector) val=2 TensorType(int8, scalar) 1 0 TensorType(int64, vector) TensorType(int8, scalar) Elemwise{sub,no_inplace} DimShuffle{x} 0 1 TensorType(int8, (True,)) Elemwise{pow,no_inplace} Sum{acc_dtype=float64} TensorType(float64, scalar) TensorType(float64, scalar)

18 Expression Graph of d(error)/d(weight) name=w TensorType(float64, matrix) name=v 1 TensorType(float64, matrix) 0 dot id= DimShuffle{0} Elemwise{add,no_inplace} val=1.0 TensorType(float32, scalar) TensorType(float32, scalar) name=v.t sigmoid Elemwise{scalar_sigmoid} DimShuffle{x} 1 0 TensorType(float32, (True,)) DimShuffle{0,x} val=1 TensorType(int8, scalar) val=2 TensorType(int8, scalar) name=x val=[1 1] TensorType(int64, vector) Elemwise{sub} id=11 Elemwise{scalar_sigmoid} id=9 TensorType(int8, scalar) TensorType(int8, scalar) 1 0 TensorType(int64, vector) DimShuffle{x} id=3 DimShuffle{x} id=4 Elemwise{sub,no_inplace} 1 TensorType(int8, (True,)) 0 TensorType(int8, (True,)) 1 TensorType(int8, (True,)) 0 Elemwise{sub} 0 Elemwise{pow,no_inplace} 1 TensorType(int8, (True,)) Elemwise{pow} val=1.0 TensorType(float64, scalar) Sum{acc_dtype=float64} 1 TensorType(float64, scalar) 0 TensorType(float64, scalar) 1 TensorType(int8, (True,)) 0 Elemwise{second,no_inplace} TensorType(float64, scalar) 1 DimShuffle{x} id= TensorType(float64, (True,)) 1 Elemwise{second} 0 TensorType(float64, col) 0 Elemwise{mul} 0 Elemwise{mul} id=20 Elemwise{neg} 0 Elemwise{mul} id=22 0 Elemwise{mul} id=23 DimShuffle{x,0} 1 TensorType(float64, row) dot id=25 TensorType(float64, matrix) TensorType(float64, matrix)

19 An Autoencoder Neural Network Hidden-Layer Activation Pattern = Encoding of the Input Target Output = Input Output = Input, but it must pass through the hidden layer. Hidden activation pattern = compression.

20 The Autoencoder Class nb = # bits = # input nodes nh = # hidden nodes lr = learning rate def g e n a l l b i t c a s e s ( num bits ) : def b i t s ( n ) : s = bin ( n ) [ 2 : ] return [ i n t ( b ) for b in 0 ( num bits len ( s ) ) + s ] return [ b i t s ( i ) for i in range (2 num bits ) ] class autoencoder ( ) : def i n i t ( s e l f, nb=3,nh=2, l r =. 1 ) : s e l f. cases = g e n a l l b i t c a s e s ( nb ) s e l f. l r a t e = l r s e l f. b u i l d a n n ( nb, nh, l r )

21 The Core Theano Build def b u i l d a n n ( s e l f, nb, nh, l r ) : w1 = theano. shared ( np. random. uniform (.1,.1, size =(nb, nh ) ) ) w2 = theano. shared ( np. random. uniform (.1,.1, size =(nh, nb ) ) ) input = T. dvector ( i n p u t ) b1 = theano. shared ( np. random. uniform (.1,.1, size=nh ) ) b2 = theano. shared ( np. random. uniform (.1,.1, size=nb ) ) x1 = Tann. sigmoid ( T. dot ( input, w1) + b1 ) x2 = Tann. sigmoid ( T. dot ( x1, w2) + b2 ) e r r o r = T.sum( ( input x2 ) 2) params = [ w1, b1, w2, b2 ] g r a d i e n t s = T. grad ( e r r o r, params ) backprop acts = [ ( p, p s e l f. l r a t e g ) for p, g in zip ( params, g r a d i e n t s ) ] s e l f. p r e d i c t o r = theano. f u n c t i o n ( [ input ], [ x2, x1 ] ) s e l f. t r a i n e r = theano. f u n c t i o n ( [ input ], e r r o r, updates=backprop acts )

22 Training the Autoencoder def d o t r a i n i n g ( s e l f, epochs =100): e r r o r s = [ ] for i in range ( epochs ) : e r r o r = 0 for c in s e l f. cases : e r r o r += s e l f. t r a i n e r ( c ) e r r o r s. append ( e r r o r ) return e r r o r s

23 Testing the Autoencoder For this example, the main purpose of testing is to find the hidden-node activation patterns for each input case. Ideally, they should be well separated in 2-d space. def d o t e s t i n g ( s e l f ) : h i d d e n a c t i v a t i o n s = [ ] for c in s e l f. cases :, hact = s e l f. p r e d i c t o r ( c ) h i d d e n a c t i v a t i o n s. append ( hact ) return h i d d e n a c t i v a t i o n s Evolving Separation of Hidden-Layer Patterns Final Hidden-Layer Patterns

ECE521 W17 Tutorial 1. Renjie Liao & Min Bai

ECE521 W17 Tutorial 1. Renjie Liao & Min Bai ECE521 W17 Tutorial 1 Renjie Liao & Min Bai Schedule Linear Algebra Review Matrices, vectors Basic operations Introduction to TensorFlow NumPy Computational Graphs Basic Examples Linear Algebra Review

More information

Hands-on Lab: Deep Learning with the Theano Python Library

Hands-on Lab: Deep Learning with the Theano Python Library Hands-on Lab: Deep Learning with the Python Library Frédéric Bastien Montreal Institute for Learning Algorithms Université de Montréal Montréal, Canada bastienf@iro.umontreal.ca Presentation prepared with

More information

More on Neural Networks

More on Neural Networks More on Neural Networks Yujia Yan Fall 2018 Outline Linear Regression y = Wx + b (1) Linear Regression y = Wx + b (1) Polynomial Regression y = Wφ(x) + b (2) where φ(x) gives the polynomial basis, e.g.,

More information

Automatic Reverse-Mode Differentiation: Lecture Notes

Automatic Reverse-Mode Differentiation: Lecture Notes Automatic Reverse-Mode Differentiation: Lecture Notes William W. Cohen October 17, 2016 1 Background: Automatic Differentiation 1.1 Why automatic differentiation is important Most neural network packages

More information

Deep Neural Networks (3) Computational Graphs, Learning Algorithms, Initialisation

Deep Neural Networks (3) Computational Graphs, Learning Algorithms, Initialisation Deep Neural Networks (3) Computational Graphs, Learning Algorithms, Initialisation Steve Renals Machine Learning Practical MLP Lecture 5 16 October 2018 MLP Lecture 5 / 16 October 2018 Deep Neural Networks

More information

TensorFlow: A Framework for Scalable Machine Learning

TensorFlow: A Framework for Scalable Machine Learning TensorFlow: A Framework for Scalable Machine Learning You probably Outline want to know... What is TensorFlow? Why did we create TensorFlow? How does Tensorflow Work? Example: Linear Regression Example:

More information

Backpropagation: The Good, the Bad and the Ugly

Backpropagation: The Good, the Bad and the Ugly Backpropagation: The Good, the Bad and the Ugly The Norwegian University of Science and Technology (NTNU Trondheim, Norway keithd@idi.ntnu.no October 3, 2017 Supervised Learning Constant feedback from

More information

Deep Learning 101 a Hands-on Tutorial

Deep Learning 101 a Hands-on Tutorial Deep Learning 101 a Hands-on Tutorial Yarin Gal yg279@cam.ac.uk A TALK IN THREE ACTS, based in part on the online tutorial deeplearning.net/software/theano/tutorial Synopsis Deep Learning is not rocket

More information

Introduction to TensorFlow

Introduction to TensorFlow Large Scale Data Analysis Using Deep Learning (Prof. U Kang) Introduction to TensorFlow 2017.04.17 Beunguk Ahn ( beunguk.ahn@gmail.com) 1 What is TensorFlow? Consturction Phase Execution Phase Examples

More information

(Artificial) Neural Networks in TensorFlow

(Artificial) Neural Networks in TensorFlow (Artificial) Neural Networks in TensorFlow By Prof. Seungchul Lee Industrial AI Lab http://isystems.unist.ac.kr/ POSTECH Table of Contents I. 1. Recall Supervised Learning Setup II. 2. Artificial Neural

More information

Pytorch Tutorial. Xiaoyong Yuan, Xiyao Ma 2018/01

Pytorch Tutorial. Xiaoyong Yuan, Xiyao Ma 2018/01 (Li Lab) National Science Foundation Center for Big Learning (CBL) Department of Electrical and Computer Engineering (ECE) Department of Computer & Information Science & Engineering (CISE) Pytorch Tutorial

More information

Gradients via Reverse Accumulation

Gradients via Reverse Accumulation Gradients via Reverse Accumulation John Mount July 14, 2010 Abstract In our earlier article Automatic Differentiation with Scala 1 we demonstrated how to use Scala to compute the gradient of function (a

More information

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Learning Deep Architectures for AI. Part II - Vijay Chakilam Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model

More information

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as

More information

(Artificial) Neural Networks in TensorFlow

(Artificial) Neural Networks in TensorFlow (Artificial) Neural Networks in TensorFlow By Prof. Seungchul Lee Industrial AI Lab http://isystems.unist.ac.kr/ POSTECH Table of Contents I. 1. Recall Supervised Learning Setup II. 2. Artificial Neural

More information

8-1: Backpropagation Prof. J.C. Kao, UCLA. Backpropagation. Chain rule for the derivatives Backpropagation graphs Examples

8-1: Backpropagation Prof. J.C. Kao, UCLA. Backpropagation. Chain rule for the derivatives Backpropagation graphs Examples 8-1: Backpropagation Prof. J.C. Kao, UCLA Backpropagation Chain rule for the derivatives Backpropagation graphs Examples 8-2: Backpropagation Prof. J.C. Kao, UCLA Motivation for backpropagation To do gradient

More information

TensorFlow. Dan Evans

TensorFlow. Dan Evans TensorFlow Presentation references material from https://www.tensorflow.org/get_started/get_started and Data Science From Scratch by Joel Grus, 25, O Reilly, Ch. 8 Dan Evans TensorFlow www.tensorflow.org

More information

Multilayer Perceptrons (MLPs)

Multilayer Perceptrons (MLPs) CSE 5526: Introduction to Neural Networks Multilayer Perceptrons (MLPs) 1 Motivation Multilayer networks are more powerful than singlelayer nets Example: XOR problem x 2 1 AND x o x 1 x 2 +1-1 o x x 1-1

More information

DM534 - Introduction to Computer Science

DM534 - Introduction to Computer Science Department of Mathematics and Computer Science University of Southern Denmark, Odense October 21, 2016 Marco Chiarandini DM534 - Introduction to Computer Science Training Session, Week 41-43, Autumn 2016

More information

Ch.6 Deep Feedforward Networks (2/3)

Ch.6 Deep Feedforward Networks (2/3) Ch.6 Deep Feedforward Networks (2/3) 16. 10. 17. (Mon.) System Software Lab., Dept. of Mechanical & Information Eng. Woonggy Kim 1 Contents 6.3. Hidden Units 6.3.1. Rectified Linear Units and Their Generalizations

More information

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Speaker Representation and Verification Part II. by Vasileios Vasilakakis Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation

More information

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 13 Mar 1, 2018

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 13 Mar 1, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation Matt Gormley Lecture 13 Mar 1, 2018 1 Reminders Homework 5: Neural

More information

The XOR problem. Machine learning for vision. The XOR problem. The XOR problem. x 1 x 2. x 2. x 1. Fall Roland Memisevic

The XOR problem. Machine learning for vision. The XOR problem. The XOR problem. x 1 x 2. x 2. x 1. Fall Roland Memisevic The XOR problem Fall 2013 x 2 Lecture 9, February 25, 2015 x 1 The XOR problem The XOR problem x 1 x 2 x 2 x 1 (picture adapted from Bishop 2006) It s the features, stupid It s the features, stupid The

More information

Single Layer Perceptron Networks

Single Layer Perceptron Networks Single Layer Perceptron Networks We have looked at what artificial neural networks (ANNs) can do, and by looking at their history have seen some of the different types of neural network. We started looking

More information

EPL442: Computational

EPL442: Computational EPL442: Computational Learning Systems Lab 2 Vassilis Vassiliades Department of Computer Science University of Cyprus Outline Artificial Neuron Feedforward Neural Network Back-propagation Algorithm Notes

More information

Error Backpropagation

Error Backpropagation Error Backpropagation Sargur Srihari 1 Topics in Error Backpropagation Terminology of backpropagation 1. Evaluation of Error function derivatives 2. Error Backpropagation algorithm 3. A simple example

More information

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1 What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1 Multi-layer networks Steve Renals Machine Learning Practical MLP Lecture 3 7 October 2015 MLP Lecture 3 Multi-layer networks 2 What Do Single

More information

Deep Neural Networks (1) Hidden layers; Back-propagation

Deep Neural Networks (1) Hidden layers; Back-propagation Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 2 October 2018 http://www.inf.ed.ac.u/teaching/courses/mlp/ MLP Lecture 3 / 2 October 2018 Deep

More information

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ

More information

Computing Neural Network Gradients

Computing Neural Network Gradients Computing Neural Network Gradients Kevin Clark 1 Introduction The purpose of these notes is to demonstrate how to quickly compute neural network gradients in a completely vectorized way. It is complementary

More information

Supporting Information

Supporting Information Supporting Information Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction Connor W. Coley a, Regina Barzilay b, William H. Green a, Tommi S. Jaakkola b, Klavs F. Jensen

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks Philipp Koehn 4 April 205 Linear Models We used before weighted linear combination of feature values h j and weights λ j score(λ, d i ) = j λ j h j (d i ) Such models can

More information

CSC321 Lecture 8: Optimization

CSC321 Lecture 8: Optimization CSC321 Lecture 8: Optimization Roger Grosse Roger Grosse CSC321 Lecture 8: Optimization 1 / 26 Overview We ve talked a lot about how to compute gradients. What do we actually do with them? Today s lecture:

More information

Introduction to PyTorch

Introduction to PyTorch Introduction to PyTorch Benjamin Roth Centrum für Informations- und Sprachverarbeitung Ludwig-Maximilian-Universität München beroth@cis.uni-muenchen.de Benjamin Roth (CIS) Introduction to PyTorch 1 / 16

More information

Tracking the World State with Recurrent Entity Networks

Tracking the World State with Recurrent Entity Networks Tracking the World State with Recurrent Entity Networks Mikael Henaff, Jason Weston, Arthur Szlam, Antoine Bordes, Yann LeCun Task At each timestep, get information (in the form of a sentence) about the

More information

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

COMP 551 Applied Machine Learning Lecture 14: Neural Networks COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: Ryan Lowe (ryan.lowe@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted,

More information

Recurrent Neural Networks. Jian Tang

Recurrent Neural Networks. Jian Tang Recurrent Neural Networks Jian Tang tangjianpku@gmail.com 1 RNN: Recurrent neural networks Neural networks for sequence modeling Summarize a sequence with fix-sized vector through recursively updating

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

ECLT 5810 Classification Neural Networks. Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann

ECLT 5810 Classification Neural Networks. Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann ECLT 5810 Classification Neural Networks Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann Neural Networks A neural network is a set of connected input/output

More information

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2.

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2. 1 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.034 Artificial Intelligence, Fall 2011 Recitation 8, November 3 Corrected Version & (most) solutions

More information

Lecture 4 Backpropagation

Lecture 4 Backpropagation Lecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 5, 2017 Things we will look at today More Backpropagation Still more backpropagation Quiz

More information

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer CSE446: Neural Networks Spring 2017 Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer Human Neurons Switching time ~ 0.001 second Number of neurons 10 10 Connections per neuron 10 4-5 Scene

More information

Neural Networks Task Sheet 2. Due date: May

Neural Networks Task Sheet 2. Due date: May Neural Networks 2007 Task Sheet 2 1/6 University of Zurich Prof. Dr. Rolf Pfeifer, pfeifer@ifi.unizh.ch Department of Informatics, AI Lab Matej Hoffmann, hoffmann@ifi.unizh.ch Andreasstrasse 15 Marc Ziegler,

More information

PYTHON AND DATA SCIENCE. Prof. Chris Jermaine

PYTHON AND DATA SCIENCE. Prof. Chris Jermaine PYTHON AND DATA SCIENCE Prof. Chris Jermaine cmj4@cs.rice.edu 1 Python Old language, first appeared in 1991 But updated often over the years Important characteristics Interpreted Dynamically-typed High

More information

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Neural Networks Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Outline Part 1 Introduction Feedforward Neural Networks Stochastic Gradient Descent Computational Graph

More information

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo

More information

CSC321 Lecture 2: Linear Regression

CSC321 Lecture 2: Linear Regression CSC32 Lecture 2: Linear Regression Roger Grosse Roger Grosse CSC32 Lecture 2: Linear Regression / 26 Overview First learning algorithm of the course: linear regression Task: predict scalar-valued targets,

More information

18.6 Regression and Classification with Linear Models

18.6 Regression and Classification with Linear Models 18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight

More information

A High Level Programming Language for Quantum Computing

A High Level Programming Language for Quantum Computing QUARK QUantum Analysis and Realization Kit A High Level Programming Language for Quantum Computing Team In lexicographical order: Name UNI Role Daria Jung djj2115 Verification and Validation Jamis Johnson

More information

Gradient Descent Training Rule: The Details

Gradient Descent Training Rule: The Details Gradient Descent Training Rule: The Details 1 For Perceptrons The whole idea behind gradient descent is to gradually, but consistently, decrease the output error by adjusting the weights. The trick is

More information

COMP-4360 Machine Learning Neural Networks

COMP-4360 Machine Learning Neural Networks COMP-4360 Machine Learning Neural Networks Jacky Baltes Autonomous Agents Lab University of Manitoba Winnipeg, Canada R3T 2N2 Email: jacky@cs.umanitoba.ca WWW: http://www.cs.umanitoba.ca/~jacky http://aalab.cs.umanitoba.ca

More information

@SoyGema GEMA PARREÑO PIQUERAS

@SoyGema GEMA PARREÑO PIQUERAS @SoyGema GEMA PARREÑO PIQUERAS WHAT IS AN ARTIFICIAL NEURON? WHAT IS AN ARTIFICIAL NEURON? Image Recognition Classification using Softmax Regressions and Convolutional Neural Networks Languaje Understanding

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009 AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is

More information

CSC321 Lecture 7: Optimization

CSC321 Lecture 7: Optimization CSC321 Lecture 7: Optimization Roger Grosse Roger Grosse CSC321 Lecture 7: Optimization 1 / 25 Overview We ve talked a lot about how to compute gradients. What do we actually do with them? Today s lecture:

More information

Backpropagation Neural Net

Backpropagation Neural Net Backpropagation Neural Net As is the case with most neural networks, the aim of Backpropagation is to train the net to achieve a balance between the ability to respond correctly to the input patterns that

More information

Plan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation

Plan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation Neural Networks Plan Perceptron Linear discriminant Associative memories Hopfield networks Chaotic networks Multilayer perceptron Backpropagation Perceptron Historically, the first neural net Inspired

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)

More information

Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets)

Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets) COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets) Sanjeev Arora Elad Hazan Recap: Structure of a deep

More information

Convolutional Neural Networks

Convolutional Neural Networks Convolutional Neural Networks Books» http://www.deeplearningbook.org/ Books http://neuralnetworksanddeeplearning.com/.org/ reviews» http://www.deeplearningbook.org/contents/linear_algebra.html» http://www.deeplearningbook.org/contents/prob.html»

More information

Midterm for CSC321, Intro to Neural Networks Winter 2018, night section Tuesday, March 6, 6:10-7pm

Midterm for CSC321, Intro to Neural Networks Winter 2018, night section Tuesday, March 6, 6:10-7pm Midterm for CSC321, Intro to Neural Networks Winter 2018, night section Tuesday, March 6, 6:10-7pm Name: Student number: This is a closed-book test. It is marked out of 15 marks. Please answer ALL of the

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 10-708 Probabilistic Graphical Models Homework 3 (v1.1.0) Due Apr 14, 7:00 PM Rules: 1. Homework is due on the due date at 7:00 PM. The homework should be submitted via Gradescope. Solution to each problem

More information

A Tutorial On Backward Propagation Through Time (BPTT) In The Gated Recurrent Unit (GRU) RNN

A Tutorial On Backward Propagation Through Time (BPTT) In The Gated Recurrent Unit (GRU) RNN A Tutorial On Backward Propagation Through Time (BPTT In The Gated Recurrent Unit (GRU RNN Minchen Li Department of Computer Science The University of British Columbia minchenl@cs.ubc.ca Abstract In this

More information

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs) Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w x + w 2 x 2 + w 0 = 0 Feature x 2 = w w 2 x w 0 w 2 Feature 2 A perceptron can separate

More information

NEURAL LANGUAGE MODELS

NEURAL LANGUAGE MODELS COMP90042 LECTURE 14 NEURAL LANGUAGE MODELS LANGUAGE MODELS Assign a probability to a sequence of words Framed as sliding a window over the sentence, predicting each word from finite context to left E.g.,

More information

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen Neural Networks - I Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Neural Networks 1 /

More information

Machine Learning for Physicists Lecture 1

Machine Learning for Physicists Lecture 1 Machine Learning for Physicists Lecture 1 Summer 2017 University of Erlangen-Nuremberg Florian Marquardt (Image generated by a net with 20 hidden layers) OUTPUT INPUT (Picture: Wikimedia Commons) OUTPUT

More information

Artificial Neural Networks. MGS Lecture 2

Artificial Neural Networks. MGS Lecture 2 Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation

More information

INF 5860 Machine learning for image classification. Lecture 5 : Introduction to TensorFlow Tollef Jahren February 14, 2018

INF 5860 Machine learning for image classification. Lecture 5 : Introduction to TensorFlow Tollef Jahren February 14, 2018 INF 5860 Machine learning for image classification Lecture 5 : Introduction to TensorFlow Tollef Jahren February 14, 2018 OUTLINE Deep learning frameworks TensorFlow TensorFlow graphs TensorFlow session

More information

The Transpose of a Vector

The Transpose of a Vector 8 CHAPTER Vectors The Transpose of a Vector We now consider the transpose of a vector in R n, which is a row vector. For a vector u 1 u. u n the transpose is denoted by u T = [ u 1 u u n ] EXAMPLE -5 Find

More information

A NOVEL ALGORITHM FOR TRAINING NEURAL NETWORKS ON INCOMPLETE INPUT DATA SETS

A NOVEL ALGORITHM FOR TRAINING NEURAL NETWORKS ON INCOMPLETE INPUT DATA SETS TALLINN UNIVERSITY OF TECHNOLOGY School of Information Technologies Serkan Ongan 156395IASM A NOVEL ALGORITHM FOR TRAINING NEURAL NETWORKS ON INCOMPLETE INPUT DATA SETS Master s thesis Supervisor: Eduard

More information

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch Psychology 452 Week 12: Deep Learning What Is Deep Learning? Preliminary Ideas (that we already know!) The Restricted Boltzmann Machine (RBM) Many Layers of RBMs Pros and Cons of Deep Learning Course Structure

More information

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders

More information

Machine Learning. Boris

Machine Learning. Boris Machine Learning Boris Nadion boris@astrails.com @borisnadion @borisnadion boris@astrails.com astrails http://astrails.com awesome web and mobile apps since 2005 terms AI (artificial intelligence)

More information

A Practitioner s Guide to MXNet

A Practitioner s Guide to MXNet 1/34 A Practitioner s Guide to MXNet Xingjian Shi Hong Kong University of Science and Technology (HKUST) HKUST CSE Seminar, March 31st, 2017 2/34 Outline 1 Introduction Deep Learning Basics MXNet Highlights

More information

Neural Network Approximation. Low rank, Sparsity, and Quantization Oct. 2017

Neural Network Approximation. Low rank, Sparsity, and Quantization Oct. 2017 Neural Network Approximation Low rank, Sparsity, and Quantization zsc@megvii.com Oct. 2017 Motivation Faster Inference Faster Training Latency critical scenarios VR/AR, UGV/UAV Saves time and energy Higher

More information

Introduction to Machine Learning Spring 2018 Note Neural Networks

Introduction to Machine Learning Spring 2018 Note Neural Networks CS 189 Introduction to Machine Learning Spring 2018 Note 14 1 Neural Networks Neural networks are a class of compositional function approximators. They come in a variety of shapes and sizes. In this class,

More information

Machine Learning Techniques

Machine Learning Techniques Machine Learning Techniques ( 機器學習技法 ) Lecture 13: Deep Learning Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University ( 國立台灣大學資訊工程系

More information

Modelling Time Series with Neural Networks. Volker Tresp Summer 2017

Modelling Time Series with Neural Networks. Volker Tresp Summer 2017 Modelling Time Series with Neural Networks Volker Tresp Summer 2017 1 Modelling of Time Series The next figure shows a time series (DAX) Other interesting time-series: energy prize, energy consumption,

More information

PYTHON, NUMPY, AND SPARK. Prof. Chris Jermaine

PYTHON, NUMPY, AND SPARK. Prof. Chris Jermaine PYTHON, NUMPY, AND SPARK Prof. Chris Jermaine cmj4@cs.rice.edu 1 Next 1.5 Days Intro to Python for statistical/numerical programming (day one) Focus on basic NumPy API, using arrays efficiently Will take

More information

MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 4.1 Gradient-Based Learning: Back-Propagation and Multi-Module Systems Yann LeCun

MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 4.1 Gradient-Based Learning: Back-Propagation and Multi-Module Systems Yann LeCun Y. LeCun: Machine Learning and Pattern Recognition p. 1/2 MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 4.1 Gradient-Based Learning: Back-Propagation and Multi-Module Systems Yann LeCun The

More information

Artificial Neural Networks Francesco DI MAIO, Ph.D., Politecnico di Milano Department of Energy - Nuclear Division IEEE - Italian Reliability Chapter

Artificial Neural Networks Francesco DI MAIO, Ph.D., Politecnico di Milano Department of Energy - Nuclear Division IEEE - Italian Reliability Chapter Artificial Neural Networks Francesco DI MAIO, Ph.D., Politecnico di Milano Department of Energy - Nuclear Division IEEE - Italian Reliability Chapter (Chair) STF - China Fellow francesco.dimaio@polimi.it

More information

Deep Learning: a gentle introduction

Deep Learning: a gentle introduction Deep Learning: a gentle introduction Jamal Atif jamal.atif@dauphine.fr PSL, Université Paris-Dauphine, LAMSADE February 8, 206 Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 / Why

More information

Long-Short Term Memory and Other Gated RNNs

Long-Short Term Memory and Other Gated RNNs Long-Short Term Memory and Other Gated RNNs Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Sequence Modeling

More information

Supervised Learning in Neural Networks

Supervised Learning in Neural Networks The Norwegian University of Science and Technology (NTNU Trondheim, Norway keithd@idi.ntnu.no March 7, 2011 Supervised Learning Constant feedback from an instructor, indicating not only right/wrong, but

More information

Neural networks. Chapter 19, Sections 1 5 1

Neural networks. Chapter 19, Sections 1 5 1 Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10

More information

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation BACKPROPAGATION Neural network training optimization problem min J(w) w The application of gradient descent to this problem is called backpropagation. Backpropagation is gradient descent applied to J(w)

More information

Recurrent Neural Networks

Recurrent Neural Networks Recurrent Neural Networks Datamining Seminar Kaspar Märtens Karl-Oskar Masing Today's Topics Modeling sequences: a brief overview Training RNNs with back propagation A toy example of training an RNN Why

More information

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Neural Networks. Nicholas Ruozzi University of Texas at Dallas Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify

More information

An Algebraic Generalization for Graph and Tensor-Based Neural Networks

An Algebraic Generalization for Graph and Tensor-Based Neural Networks An Algebraic Generalization for Graph and Tensor-Based Neural Networks CIBCB 2017 Manchester, U.K. Ethan C. Jackson 1, James A. Hughes 1, Mark Daley 1, and Michael Winter 2 August 24, 2017 The University

More information

Reward-modulated inference

Reward-modulated inference Buck Shlegeris Matthew Alger COMP3740, 2014 Outline Supervised, unsupervised, and reinforcement learning Neural nets RMI Results with RMI Types of machine learning supervised unsupervised reinforcement

More information

Stephen Scott.

Stephen Scott. 1 / 35 (Adapted from Vinod Variyam and Ian Goodfellow) sscott@cse.unl.edu 2 / 35 All our architectures so far work on fixed-sized inputs neural networks work on sequences of inputs E.g., text, biological

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression (Continued) Generative v. Discriminative Decision rees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 31 st, 2007 2005-2007 Carlos Guestrin 1 Generative

More information

Linear Neural Networks

Linear Neural Networks Chapter 10 Linear Neural Networks In this chapter, we introduce the concept of the linear neural network. 10.1 Introduction and Notation 1. The linear neural cell, or node has the schematic form as shown

More information

ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

EE-559 Deep learning Recurrent Neural Networks

EE-559 Deep learning Recurrent Neural Networks EE-559 Deep learning 11.1. Recurrent Neural Networks François Fleuret https://fleuret.org/ee559/ Sun Feb 24 20:33:31 UTC 2019 Inference from sequences François Fleuret EE-559 Deep learning / 11.1. Recurrent

More information

Neural Nets Supervised learning

Neural Nets Supervised learning 6.034 Artificial Intelligence Big idea: Learning as acquiring a function on feature vectors Background Nearest Neighbors Identification Trees Neural Nets Neural Nets Supervised learning y s(z) w w 0 w

More information

Unsupervised Learning

Unsupervised Learning CS 3750 Advanced Machine Learning hkc6@pitt.edu Unsupervised Learning Data: Just data, no labels Goal: Learn some underlying hidden structure of the data P(, ) P( ) Principle Component Analysis (Dimensionality

More information

TYPESAFE ABSTRACTIONS FOR TENSOR OPERATIONS. Tongfei Chen Johns Hopkins University

TYPESAFE ABSTRACTIONS FOR TENSOR OPERATIONS. Tongfei Chen Johns Hopkins University TYPESAFE ABSTRACTIONS FOR TENSOR OPERATIONS Tongfei Chen Johns Hopkins University 2 Tensors / NdArrays Scalar (0-tensor) Vector (1-tensor) Matrix (2-tensor) 3-tensor 3 Tensors in ML Core data structure

More information