Neural Networks. Intro to AI Bert Huang Virginia Tech

Similar documents
Lecture 17: Neural Networks and Deep Learning

Approximate Q-Learning. Dan Weld / University of Washington

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Intro to Neural Networks and Deep Learning

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Convolutional Neural Networks

y(x n, w) t n 2. (1)

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Artificial Intelligence

CSC242: Intro to AI. Lecture 21

Artificial Neural Networks. MGS Lecture 2

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Conditional Language modeling with attention

Normalization Techniques in Training of Deep Neural Networks

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018

Grundlagen der Künstlichen Intelligenz

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Deep Feedforward Networks

Neural Networks 2. 2 Receptive fields and dealing with image inputs

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Supervised Learning. George Konidaris

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Artificial Neural Networks

Introduction to Deep Learning

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural networks. Chapter 20. Chapter 20 1

Logistic Regression & Neural Networks

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Introduction to Machine Learning

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Deep Feedforward Networks. Sargur N. Srihari

Machine Learning (CSE 446): Neural Networks

Neural Networks and Deep Learning

18.6 Regression and Classification with Linear Models

Neural Networks in Structured Prediction. November 17, 2015

Neural Networks: Backpropagation

Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets)

Machine Learning. Boris

Deep Learning: a gentle introduction

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Lecture 5 Neural models for NLP

Multilayer Perceptron

Multilayer Neural Networks

Lab 5: 16 th April Exercises on Neural Networks

Multilayer Neural Networks

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 13 Mar 1, 2018

Machine Learning (CSE 446): Backpropagation

Lecture 4: Perceptrons and Multilayer Perceptrons

Introduction to Convolutional Neural Networks (CNNs)

ECE521 Lectures 9 Fully Connected Neural Networks

CSC321 Lecture 5: Multilayer Perceptrons

arxiv: v3 [cs.lg] 14 Jan 2018

ECE521 Lecture 7/8. Logistic Regression

AI Programming CS F-20 Neural Networks

Neural networks. Chapter 19, Sections 1 5 1

Neural Networks: Backpropagation

Neural Architectures for Image, Language, and Speech Processing

Lecture 3 Feedforward Networks and Backpropagation

Introduction to Deep Neural Networks

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

NN V: The generalized delta learning rule

CSC 411 Lecture 10: Neural Networks

Statistical NLP for the Web

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Plan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation

Introduction to Neural Networks

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

CS 453X: Class 20. Jacob Whitehill

Deep Learning Lab Course 2017 (Deep Learning Practical)

Nonlinear Classification

Lecture 3 Feedforward Networks and Backpropagation

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

Machine Learning Lecture 10

Intelligent Systems Discriminative Learning, Neural Networks

Recurrent Neural Networks. COMP-550 Oct 5, 2017

Automatic Differentiation and Neural Networks

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Gradient Descent Training Rule: The Details

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

More on Neural Networks

Learning Neural Networks

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Artificial Neuron (Perceptron)

C4 Phenomenological Modeling - Regression & Neural Networks : Computational Modeling and Simulation Instructor: Linwei Wang

Regression Adjustment with Artificial Neural Networks

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Multilayer Perceptrons and Backpropagation

Transcription:

Neural Networks Intro to AI Bert Huang Virginia Tech

Outline Biological inspiration for artificial neural networks Linear vs. nonlinear functions Learning with neural networks: back propagation

https://en.wikipedia.org/wiki/neuron#/media/file:chemical_synapse_schema_cropped.jpg

https://en.wikipedia.org/wiki/neuron#/media/file:gfpneuron.png https://en.wikipedia.org/wiki/neuron#/media/file:chemical_synapse_schema_cropped.jpg

Parameterizing p(y x) p(y x) :=f f : R d! [0, 1] f (x) := 1 1 + exp( w > x)

Parameterizing p(y x) p(y x) :=f f : R d! [0, 1] f (x) := 1 1 + exp( w > x)

Logistic Function (x) = 1 1 + exp( x)

Logistic Function (x) = 1 1 + exp( x) lim x!1 (x) = lim x!1 1 1 + exp( x) = 1 1 = 1.0 (0) = 1 1 + exp( 0) = 1 1+1 = 0.5 lim x! 1 (x) = lim x! 1 1 1 + exp( x) = 0.0

From Features to Probability f (x) := 1 1 + exp( w > x)

Parameterizing p(y x) p(y x) :=f f : R d! [0, 1] f (x) := 1 1 + exp( w > x) x1 x2 x3 x4 x5 w2 w1 w3 w 4 w5 y

Multi-Layered Perceptron x1 x2 x3 x4 x5 w2 w1 w3 w 4 w5 h1

Multi-Layered Perceptron raw data x1 x2 x3 x4 x5 pixel values representation h1 h2 shapes shadows round y prediction faces

Multi-Layered Perceptron x1 x2 x3 x4 x5 h 1 = (w > 11x) h 2 = (w > 12x) h1 h2 h =[h 1, h 2 ] > y p(y x) = (w > 21h) p(y x) = w > 21 (w > 11 x), (w > 12x) >

Decision Surface: Logistic Regression

Decision Surface: 2-Layer, 2 Hidden Units

Decision Surface: 2-Layer, More Hidden Units 3 hidden units 10 hidden units

Decision Surface: More Layers, More Hidden Units 10 layers, 5 hidden units per layer 4 layers, 10 hidden units each layer

Training Neural Networks 1 n min W n i=1 l( f(x i, W), y i ) = L(W) average training error W j W j α ( L W j ) Gradient Descent

Gradient Descent W j W j α ( L W j ) L(W1) very positive take big step left W1 L(W2) almost zero take tiny step left W2 L(W3) slightly negative W3 take medium step right

Approximate Q-Learning ˆQ(s, a) :=g(s, a, ) := 1 f 1 (s, a)+ 2 f 2 (s, a)+...+ d f d (s, a) i i + R(s)+ max a 0 ˆQ(s 0,a 0 ) ˆQ(s, a) @g @ i i i + R(s)+ max a 0 ˆQ(s 0,a 0 ) ˆQ(s, a) f i (s, a)

Back Propagation Back propagation: Compute hidden unit activations: forward propagation Compute gradient at output layer: error Propagate error back one layer at a time Chain rule via dynamic programming

Chain Rule Review f(g(x)) d f(g(x)) d x d f(g(x)) = d g(x) d g(x) d x x g(x) f(g(x))

Chain Rule on More Complex Function h( f(a) + g(b)) d h( f(a) + g(b)) d a d h( f(a) + g(b)) d b d h( f(a) + g(b)) d f(a) d h( f(a) + g(b)) d g(b) d f(a) + g(b) d a d f(a) + g(b) d b

Back to Neural Networks raw input x weights w1 hidden layer h1 h 1 = f(x, w 1 ) weights wn-1 hidden layer hn-1 h n 1 = f(h n 2, w n 1 ) weights wn final output y y = f(h, w ) n 1 n

Back to Neural Networks raw input x weights w1 hidden layer h1 h 1 = f(x, w 1 ) weights wn-1 hidden layer hn-1 h n 1 = f(h n 2, w n 1 ) weights wn final output y y = f(h, w ) n 1 n L(y) d L d w n = d L d y d y d w n

Intuition of Back Propagation At each layer, calculate how much changing the input changes the final output (derivative of final output w.r.t. layer s input) Calculate directly for last layer For preceding layers, use calculation from next layer and work backwards through network Use that derivative to find how changing the weights affect the error of the final output

FYI: Matrix Form x h1 = s(w1x) h2 = s(w2 h1) (You will not be tested on this matrix form in this course.) hm-1 = s(wm-1 hm-2) J(W )=`(f(x, W )) f(x, W) = s(wm hm-1)

h1 = s(w1x) FYI: Matrix Gradient Recipe r W1 J = 1 x > h2 = s(w2 h1) r Wi J = i h > i 1 i =(W > i+1 i+1) s 0 (W i h i 1 ) hm-1 = s(wm-1 hm-2) r Wm 1 J = m 1 h > m 2 m 1 =(W > m m) s 0 (W m 1 h m 2 ) f(x, W) = s(wm hm-1) r Wm J = m h > m 1 m = `0(f(x, W )) J(W )=`(f(x, W )) (You will not be tested on this matrix form in this course.)

FYI: Matrix Gradient Recipe h1 = s(w1x) (You will not be tested on this matrix form in this course.) hi = s(wi hi-1) i =(W > i+1 i+1) s 0 (W i h i 1 ) r W1 J = 1 x > f(x, W) = s(wm hm-1) m = `0(f(x, W )) r Wi J = i h > i 1 J(W )=`(f(x, W )) Feed Forward Propagation Back Propagation

Other New Aspects of Deep Learning GPU computation Differentiable programming Automatic differentiation Neural network structures

Types of Neural Network Structures Feed-forward Recurrent neural networks (RNNs) Good for analyzing sequences (text, time series) Convolutional neural networks (convnets, CNNs) Good for analyzing spatial data (images, videos)

Outline Biological inspiration for artificial neural networks Linear vs. nonlinear functions Learning with neural networks: backpropagation