Neural networks (NN) 1

Similar documents
Computational statistics

Neural Networks. Haiming Zhou. Division of Statistics Northern Illinois University.

ECE521 Lectures 9 Fully Connected Neural Networks

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

Jakub Hajic Artificial Intelligence Seminar I

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Neural Network Models in Statistical Learning

(Artificial) Neural Networks in TensorFlow

Introduction to Neural Networks

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

CSC321 Lecture 5: Multilayer Perceptrons

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Deep Feedforward Networks. Han Shao, Hou Pong Chan, and Hongyi Zhang

Neural networks and support vector machines

STA 414/2104: Lecture 8

Lecture 17: Neural Networks and Deep Learning

PATTERN RECOGNITION AND MACHINE LEARNING

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

STA 414/2104: Lecture 8

Grundlagen der Künstlichen Intelligenz

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

CSCI 315: Artificial Intelligence through Deep Learning

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018

Neural Networks: Backpropagation

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

EVERYTHING YOU NEED TO KNOW TO BUILD YOUR FIRST CONVOLUTIONAL NEURAL NETWORK (CNN)

Deep Neural Networks (1) Hidden layers; Back-propagation

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation

Logistic Regression & Neural Networks

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Artificial Neural Networks

Artificial Neural Networks

Other Topologies. Y. LeCun: Machine Learning and Pattern Recognition p. 5/3

Convolutional Neural Networks

Introduction to Machine Learning

Machine Learning Basics

Artificial Neural Networks. MGS Lecture 2

Learning from Data: Multi-layer Perceptrons

Feed-forward Network Functions

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Machine Learning. Boris

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Neural networks. Chapter 19, Sections 1 5 1

MACHINE LEARNING AND PATTERN RECOGNITION Fall 2005, Lecture 4 Gradient-Based Learning III: Architectures Yann LeCun

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Neural networks. Chapter 20. Chapter 20 1

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

Error Backpropagation

Learning Deep Architectures for AI. Part II - Vijay Chakilam

HOMEWORK #4: LOGISTIC REGRESSION

Neural Networks and the Back-propagation Algorithm

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Artificial neural networks

Stochastic gradient descent; Classification

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35

Deep Neural Networks (1) Hidden layers; Back-propagation

ECE521: Inference Algorithms and Machine Learning University of Toronto. Assignment 1: k-nn and Linear Regression

Machine Learning. 7. Logistic and Linear Regression

Neural Networks biological neuron artificial neuron 1

TTIC 31230, Fundamentals of Deep Learning David McAllester, Winter Multiclass Logistic Regression. Multilayer Perceptrons (MLPs)

Assignment 4. Machine Learning, Summer term 2014, Ulrike von Luxburg To be discussed in exercise groups on May 12-14

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

text classification 3: neural networks

Introduction to Neural Networks

Classification with Perceptrons. Reading:

Neural Networks. Intro to AI Bert Huang Virginia Tech

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Artificial Intelligence

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Practicals 5 : Perceptron

Reading Group on Deep Learning Session 1

CS 4700: Foundations of Artificial Intelligence

Deep Learning Lab Course 2017 (Deep Learning Practical)

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

Neural Networks. Henrik I. Christensen. Computer Science and Engineering University of California, San Diego

An Introduction to Statistical and Probabilistic Linear Models

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

From perceptrons to word embeddings. Simon Šuster University of Groningen

Introduction To Artificial Neural Networks

How to do backpropagation in a brain

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

Deep Learning book, by Ian Goodfellow, Yoshua Bengio and Aaron Courville

Deep Feedforward Networks

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

HOMEWORK #4: LOGISTIC REGRESSION

Multivariate Methods in Statistical Data Analysis

Intro to Neural Networks and Deep Learning

CSC 411 Lecture 10: Neural Networks

Notes on Back Propagation in 4 Lines

Final Examination CS 540-2: Introduction to Artificial Intelligence

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Transcription:

Neural networks (NN) 1 Hedibert F. Lopes Insper Institute of Education and Research São Paulo, Brazil 1 Slides based on Chapter 11 of Hastie, Tibshirani and Friedman s book The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd Edition).

2/18 Chapter s Introduction In this chapter we describe a class of learning methods that was developed separately in different field statistics and artificial intelligence based on essentially identical models. The central idea is to extract linear combinations of the inputs as derived features, and then model the target as a nonlinear function of these features. The result is a powerful learning method, with widespread applications in many fields.

Single hidden layer, feed-forward neural network 2/18 Input layer covariates Hidden layer features Output layer response Input 1 Input 2 Input 3 Ouput Input 4 Input 5

3/18 Neural network Inputs x 1 w 1 Bias b Activate function x 2 w 2 Σ f Output y x 3 w 3 Weights

3/18 Neural network 2 2 From http://mdtux89.github.io/2015/12/11/torch-tutorial.html

4/18 Vanilla neural net Single hidden layer back-propagation network or single layer perceptron. A neural network is a two-stage regression or classification model. Derived features z m are created from linear combinations of the inputs, and then the target y (with K categories or K-dimensional responses) is modeled as a function of linear combinations of the z m, z m = σ(α 0m + α mx), for m = 1,..., M, y k = β 0k + β kz, for k = 1,..., K, f k (x) = g k (y), where x = (x 1,..., x p ), z = (z 1,..., z M ) and y = (y 1,..., y K ).

5/18 Activation function The activation function σ(v) is usually chosen to be the sigmoid σ(v) = 1 1 + e v. Sometimes Gaussian radial basis functions are used for the σ(v), producing what is known as a radial basis function network.

6/18 Output function The output function g k (y) allows a transformation of the vector of outputs y. For regression we typically choose the identity function g k (y) = y k. Early work in K-class classification also used the identity function, but this was later abandoned in favor of the softmax function: g k (y) = e y k K l=1 ey l Linear regression and classification: Notice that if σ is the identity function, then the entire model collapses to a linear model in the inputs. Hence a neural network can be thought of as a nonlinear generalization of the linear model, both for regression and classification.

Weights The neural network model has unknown parameters, often called weights, and we seek values for them that make the model fit the training data well. We denote the complete set of weights by θ, which consists of {α 0m, α m ; m = 1, 2,..., M} and {β 0k, β k ; k = 1, 2,..., K}, or M(p + 1) plus K(M + 1) weights! Measures of fit Regression - sum-of-squared errors Classification - cross-entropy R(θ) = K k=1 i=1 N (y ik f k (x i )) 2. K N R(θ) = y ik log f k (x i ), k=1 i=1 and the corresponding classifier is G(x) = argmax k f k (x). 7/18

8/18 Back-propagation With the softmax activation function and the cross-entropy error function, the neural network model is exactly a linear logistic regression model in the hidden units, and all the parameters are estimated by maximum likelihood. The generic approach to minimizing R(θ) is by gradient descent, called back-propagation in this setting. Because of the compositional form of the model, the gradient can be easily derived using the chain rule for differentiation. This can be computed by a forward and backward sweep over the network, keeping track only of quantities local to each unit.

Back-propagation (part 1) 9/18

Back-propagation (part 2) 10/18

11/18 Back-propagation (part 3) The advantages of back-propagation are its simple, local nature. In the back propagation algorithm, each hidden unit passes and receives information only to and from units that share a connection. Hence it can be implemented efficiently on a parallel architecture computer.

12/18 Zip Code Data This example is a character recognition task: classification of handwritten numerals scanned from envelopes by the U.S. Postal Service. Input: 256 pixel values (16 16) Output: digits {0, 1, 2,..., 8, 9} Training set: 320 digits Test set: 160 digits

Five different networks 13/18

Five different networks 14/18

Test performance curves 15/18

Test performance 16/18

17/18 Keras and RStudio https://keras.rstudio.com Keras is a high-level neural networks API developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research. dataset mnist From keras v2.1.6, by JJ Allaire MNIST Database Of Handwritten Digits Dataset of 60,000 28x28 grayscale images of the 10 digits, along with a test set of 10,000 images. Lists of training and test data: train$x, train$y, test$x, test$y, where x is an array of grayscale image data with shape (num samples, 28, 28) and y is an array of digit labels (integers in range 0-9) with shape (num samples).

R script devtools::install_github("rstudio/keras") library(keras) install_keras() mnist <- dataset_mnist() x_train <- mnist$train$x y_train <- mnist$train$y x_test <- mnist$test$x y_test <- mnist$test$y # reshape x_train <- array_reshape(x_train, c(nrow(x_train), 784)) x_test <- array_reshape(x_test, c(nrow(x_test), 784)) # rescale x_train <- x_train / 255 x_test <- x_test / 255 y_train <- to_categorical(y_train, 10) y_test <- to_categorical(y_test, 10) model <- keras_model_sequential() model %>% layer_dense(units = 256, activation = relu, input_shape = c(784)) %>% layer_dropout(rate = 0.4) %>% layer_dense(units = 128, activation = relu ) %>% layer_dropout(rate = 0.3) %>% layer_dense(units = 10, activation = softmax ) summary(model) model %>% compile( loss = categorical_crossentropy, optimizer = optimizer_rmsprop(), metrics = c( accuracy ) ) history <- model %>% fit( x_train, y_train, epochs = 30, batch_size = 128, validation_split = 0.2 ) plot(history) model %>% evaluate(x_test, y_test) model %>% predict_classes(x_test) 18/18