Neural Networks. Haiming Zhou. Division of Statistics Northern Illinois University.

Similar documents
Computational statistics

Neural networks (NN) 1

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Practicals 5 : Perceptron

Neural Network Models in Statistical Learning

ECE521 Lectures 9 Fully Connected Neural Networks

CSC321 Lecture 5: Multilayer Perceptrons

y(x n, w) t n 2. (1)

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Machine Learning Linear Models

Introduction to Neural Networks

Lab 5: 16 th April Exercises on Neural Networks

Artificial Neural Networks. MGS Lecture 2

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Reading Group on Deep Learning Session 1

Neural Networks: Backpropagation

Lecture 5 Multivariate Linear Regression

Artificial Intelligence

Stochastic gradient descent; Classification

Multi-layer Neural Networks

Boosting. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL , 10.7, 10.13

Multilayer Perceptron

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation

Intro to Neural Networks and Deep Learning

CSC242: Intro to AI. Lecture 21

Neural Networks. Intro to AI Bert Huang Virginia Tech

Learning from Data: Multi-layer Perceptrons

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1

Logistic Regression & Neural Networks

Mul7layer Perceptrons

ECS171: Machine Learning

Lecture 17: Neural Networks and Deep Learning

CSC321 Lecture 6: Backpropagation

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Neural Network Training

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

Reward-modulated inference

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Analysis of Fast Input Selection: Application in Time Series Prediction

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Artificial Neural Networks

CS260: Machine Learning Algorithms

Neural networks. Chapter 20. Chapter 20 1

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Artificial Neural Networks

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

Multilayer Perceptrons and Backpropagation

Introduction to Neural Networks

Neural Networks. Advanced data-mining. Yongdai Kim. Department of Statistics, Seoul National University, South Korea

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav

COGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November.

Statistical NLP for the Web

CSC321 Lecture 4: Learning a Classifier

4. Multilayer Perceptrons

Neural Networks: Backpropagation

Multilayer Perceptron

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Machine Learning (CSE 446): Neural Networks

Neural Networks Lecture 4: Radial Bases Function Networks

Inf2b Learning and Data

Data Mining Part 5. Prediction

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Course 395: Machine Learning - Lectures

Neural Networks and Deep Learning

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

An Introduction to Statistical and Probabilistic Linear Models

Deep Neural Networks (1) Hidden layers; Back-propagation

Variable Selection in Data Mining Project

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2.

Selected Topics in Optimization. Some slides borrowed from

FSAN815/ELEG815: Foundations of Statistical Learning

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

CSC321 Lecture 4: Learning a Classifier

Revision: Neural Network

Introduction to Machine Learning

Deep Feedforward Networks. Lecture slides for Chapter 6 of Deep Learning Ian Goodfellow Last updated

Learning From Data Lecture 10 Nonlinear Transforms

Convolutional Neural Networks

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Sample questions for Fundamentals of Machine Learning 2018

Multilayer Perceptron = FeedForward Neural Network

Midterm: CS 6375 Spring 2018

Neural Networks and the Back-propagation Algorithm

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Intelligent Systems Discriminative Learning, Neural Networks

(Artificial) Neural Networks in TensorFlow

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12

Transcription:

Neural Networks Haiming Zhou Division of Statistics Northern Illinois University zhouh@niu.edu

Neural Networks The term neural network has evolved to encompass a large class of models and learning methods. We focus the most widely used vanilla neural net, sometimes called the single hidden layer back-propagation network, or single layer perceptron. The central idea is to extract linear combinations of the inputs as derived features, and then model the target as a nonlinear function of these features. Neural network can be viewed as a multi-stage regression or classification model, typically represented by a network diagram. 2 / 11

Schematic of a Single Hidden Layer Network 3 / 11

Neural Networks For K-class classification, there are K units at the top, with the kth unit modeling the probability of class k. There are K target measurements Y k, k = 1,..., K, each being coded as a 0/1 variable for the kth class. Derived features Z m are created from linear combinations of the inputs, and then the target Y k is modeled as a function of linear combinations of the Z m Z m = σ(α 0m + α T mx), m = 1,..., M, T k = β 0k + β T k Z, k = 1,..., K, f k (X) = g k (T), k = 1,..., K, where Z = (Z 1,..., Z M ) T and T = (T 1,..., T K ) T. σ( ) is called the activation function; e.g., the logistic function takes the form σ(v) = 1/(1 + e v ). 4 / 11

Neural Networks The output function g k (T) allows a final transformation of the vector of outputs T. For regression (i.e. T = T 1 ) we typically choose the identity function g k (T 1 ) = T 1. For K class classification, we can use g k (T) = e T k K l=1 et l The units in the middle of the network, computing the derived features Z m, are called hidden units because the values Z m are not directly observed. In general there can be more than one hidden layers. We can think of the Z m as a basis expansion of the original inputs X; the neural network is then a standard linear (multilogit) model, using these transformations as inputs. Here, different with before, the parameters of the basis functions are learned from the data. 5 / 11

Neural Networks Notice that if σ is the identity function, then the entire model collapses to a linear model in the inputs. By introducing the nonlinear transformation σ, it greatly enlarges the class of linear models. The rate of activation depends on α m ; if α m is small, the unit will indeed be operating in the linear part of its activation function. 6 / 11

Fitting Neural Networks The neural network model has unknown parameters, often called weights. Denote the complete set of weights by θ, which consists of {α 0m, α m : m = 1,..., M } {β 0k, β k : k = 1,..., K} For regression, we can use sum-of-squared errors as our measure of fit (K = 1) n K R(θ) = (y ik f k (x i )) 2 i=1 k=1 For classification we use either squared error or cross-entropy n K R(θ) = y ik log f k (x i ) i=1 k=1 7 / 11

Back-propagation Here is back-propagation in detail for squared error loss. Let z mi = σ(α 0m + α T mx i ) and z i = (z 1i,..., z Mi ). Then with derivatives n K R(θ) = (y ik f k (x i )) 2 set n = R i i=1 k=1 i=1 R i = 2(y ik f k (x i ))g β k(β T set k z i )z mi = δ ki z mi, km K R i α ml = k=1 2(y ik f k (x i ))g k(β T k z i )β km σ (α T mx i )x il set = s mi x il. The quantities δ ki and s mi satisfy s mi = σ (α T mx i ) K β km δ ki, (1) k=1 known as the back-propagation equations. 8 / 11

Back-propagation A gradient descent update at the (r + 1)st iteration has the form β (r+1) km α (r+1) ml = β (r) km γ r = α (r) ml γ r where γ r is the learning rate. n i=1 n i=1 R i β (r) km R i α (r) ml Implement these updates with a two-pass algorithm: In the forward pass, the current weights are fixed and the predicted values ˆf k (x i ) are computed. In the backward pass, the errors δ ki are computed, and then back-propagated via (1) to give the errors s mi. This two-pass procedure is what is known as back-propagation. 9 / 11

R Resources Fitting a neural network in R; neuralnet package. R-Session 11 - Statistical Learning - Neural Networks. Link Link 10 / 11

References The Elements of Statistical Learning, 2nd Edition, by Hastie, T., Tibshirani, R. and Friedman, J. (2009). http://statweb.stanford.edu/ tibs/elemstatlearn/ Prof. Xiaogang Su s Data Mining and Statistical Learning I. https://sites.google.com/site/xgsu00/. Prof. Yaser Abu-Mostafa s Lecture. Link 11 / 11