CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

Similar documents
CSE446: Neural Networks Winter 2015

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Lecture 17: Neural Networks and Deep Learning

Statistical NLP for the Web

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Jakub Hajic Artificial Intelligence Seminar I

Multilayer Perceptrons (MLPs)

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Deep Learning: a gentle introduction

Machine Learning. Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Introduction to Deep Learning

Understanding How ConvNets See

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Artificial Neural Networks

Course 395: Machine Learning - Lectures

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

Neural networks and optimization

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Neural networks. Chapter 19, Sections 1 5 1

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Lab 5: 16 th April Exercises on Neural Networks

Neural networks. Chapter 20. Chapter 20 1

Artificial Neural Networks

Neural networks COMS 4771

Machine Learning. Boris

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

CS 4700: Foundations of Artificial Intelligence

Neural Networks and Deep Learning

Grundlagen der Künstlichen Intelligenz

Based on the original slides of Hung-yi Lee

Neural Networks. William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ]

Introduction to Deep Learning

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

Machine Learning. Neural Networks. Le Song. CSE6740/CS7641/ISYE6740, Fall Lecture 7, September 11, 2012 Based on slides from Eric Xing, CMU

Introduction to Neural Networks

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

Intelligent Systems Discriminative Learning, Neural Networks

Lecture 5: Logistic Regression. Neural Networks

Learning and Neural Networks

Artificial Neuron (Perceptron)

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Introduction to Convolutional Neural Networks (CNNs)

Neural Networks: Backpropagation

100 inference steps doesn't seem like enough. Many neuron-like threshold switching units. Many weighted interconnections among units

Artificial Neural Networks. MGS Lecture 2

Artificial Neural Networks

Lecture 3 Feedforward Networks and Backpropagation

2018 EE448, Big Data Mining, Lecture 5. (Part II) Weinan Zhang Shanghai Jiao Tong University

Nonlinear Classification

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

SGD and Deep Learning

Making Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation

Introduction to Neural Networks

Multilayer Perceptron

How to do backpropagation in a brain

Deep Learning (CNNs)

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

CSCI567 Machine Learning (Fall 2018)

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Logistic Regression & Neural Networks

CSC 411 Lecture 10: Neural Networks

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Neural Networks DWML, /25

A summary of Deep Learning without Poor Local Minima

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35

10. Artificial Neural Networks

Lecture 3 Feedforward Networks and Backpropagation

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Convolutional Neural Networks

Intro to Neural Networks and Deep Learning

Machine Learning Lecture 10

Unit III. A Survey of Neural Network Model

Neural Networks biological neuron artificial neuron 1

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Deep Feedforward Networks. Lecture slides for Chapter 6 of Deep Learning Ian Goodfellow Last updated

The connection of dropout and Bayesian statistics

Perceptron & Neural Networks

Artificial Neural Networks

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Neural Networks and the Back-propagation Algorithm

Neural networks and optimization

Computational statistics

Machine Learning Lecture 12

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch

Neural networks. Chapter 20, Section 5 1

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Transcription:

CSE446: Neural Networks Spring 2017 Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

Human Neurons Switching time ~ 0.001 second Number of neurons 10 10 Connections per neuron 10 4-5 Scene recognition time 0.1 seconds Number of cycles per scene recognition? 100 much parallel computation!

Perceptron as a Neural Network g This is one neuron: Input edges x 1... x n, along with basis The sum is represented graphically Sum passed through an activation function g

1 0.9 Sigmoid Neuron 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-6 -4-2 0 2 4 6 g Just change g! Why would be want to do this? Notice new output range [0,1]. What was it before? Look familiar?

Optimizing a neuron We train to minimize sum-squared error Solution just depends on g : derivative of activation function!

Re-deriving the perceptron update g For a specific, incorrect example: w = w + y*x (our familiar update!)

Sigmoid units: have to differentiate g

Aside: Comparison to logistic regression P(Y X) represented by: 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-6 -4-2 0 2 4 6 Learning rule MLE:

Perceptron, linear classification, Boolean functions: x i {0,1} Can learn x 1 x 2? -0.5 + x 1 + x 2 Can learn x 1 x 2? g -1.5 + x 1 + x 2 Can learn any conjunction or disjunction? 0.5 + x 1 + + x n (-n+0.5) + x 1 + + x n Can learn majority? (-0.5*n) + x 1 + + x n What are we missing? The dreaded XOR!, etc.

Going beyond linear classification Solving the XOR problem y = x 1 XOR x 2 = (x 1 x 2 ) (x 2 x 1 ) v 1 = (x 1 x 2 ) = -1.5+2x 1 -x 2 v 2 = (x 2 x 1 ) = -1.5+2x 2 -x 1 y = v 1 v 2 = -0.5+v 1 +v 2 1 x 1 x 2 1-1.5 2-1 2-1 -1.5 v 1 v 2-0.5 1 1 y

Hidden layer Single unit: 1-hidden layer: No longer convex function!

Example data for NN with hidden layer Carlos Guestrin 2005-2009

Learned weights for hidden layer

1-hidden layer: Forward propagation Compute values left to right 1. Inputs: x 1,, x n 2. Hidden: v 1,, v n 3. Output: y 1 x 1 x 2 1 v 1 v 2 y

1-hidden layer: Forward propagation Compute values left to right 1. Inputs: x 1,, x n 2. Hidden: v 1,, v n 3. Output: y 1 x 1 x 2 1 v 1 v 2 y

1-hidden layer: Forward propagation Compute values left to right 1. Inputs: x 1,, x n 2. Hidden: v 1,, v n 3. Output: y 1 x 1 x 2 1 v 1 v 2 y

1-hidden layer: Forward propagation Compute values left to right 1. Inputs: x 1,, x n 2. Hidden: v 1,, v n 3. Output: y 1 x 1 x 2 1 v 1 v 2 y

Carlos Guestrin 2005-2009 18

Carlos Guestrin 2005-2009 19

Carlos Guestrin 2005-2009 20

Gradient descent for 1- hidden layer Dropped w 0 to make derivation simpler Gradient for last layer same as the single node case, but with hidden nodes v as input!

Gradient descent for 1-hidden layer Dropped w 0 to make derivation simpler For hidden layer, two parts: Normal update for single neuron Recursive computation of gradient on output layer

Multilayer neural networks Inference and Learning: Forward pass: left to right, each hidden layer in turn Gradient computation: right to left, propagating gradient for each node Forward Gradient

Forward propagation prediction Recursive algorithm Start from input layer Output of node V k with parents U 1,U 2, :

Back-propagation learning Just gradient descent!!! Recursive algorithm for computing gradient For each example Perform forward propagation Start from output layer Compute gradient of node V k with parents U 1,U 2, Update weight w k i Repeat (move to preceding layer)

Back-propagation pseudocode Initialize all weights to small random numbers Until convergence, do: For each training example x,y: 1. Forward propagation, compute node values V k 2. For each output unit o (with labeled output y): δ o = V o (1-V o )(y-v o ) 3. For each hidden unit h: δ h = V h (1-V h ) Σ k in output(h) w h,k δ k 4. Update each network weight w i,j from node i to node j w i,j = w i,j + ηδ j x i,j

Convergence of backprop Perceptron leads to convex optimization Gradient descent reaches global minima Multilayer neural nets not convex Gradient descent gets stuck in local minima Selecting number of hidden units and layers = fuzzy process NNs have made a HUGE comeback in the last few years!!! Neural nets are back with a new name!!!! Deep belief networks Huge error reduction when trained with lots of data on GPUs

Overfitting in NNs Are NNs likely to overfit? Yes, they can represent arbitrary functions!!! Avoiding overfitting? More training data Fewer hidden nodes / better topology Regularization Early stopping

Object Image Recognition Models Slides from Jeff Dean at Google

What are these numbers? Number Detection Slides from Jeff Dean at Google

Acoustic Modeling for Speech Recognition label Close collaboration with Google Speech team Trained in <5 days on cluster of 800 machines 30% reduction in Word Error Rate for English! ( biggest single improvement in 20 years of speech research ) Launched in 2012 at time of Jellybean release of Android Slides from Jeff Dean at Google

2012-era Convolutional Model for Object Recognition Softmax to predict object class Fully-connected layers Convolutional layers! (same weights used at all! spatial locations in layer)!! Convolutional networks developed by! Yann LeCun (NYU) Layer 7... Layer 1 Input Basic architecture developed by Krizhevsky, Sutskever & Hinton (all now at Google).! Won 2012 ImageNet challenge with 16.4% top-5 error rate Slides from Jeff Dean at Google

2014-era Model for Object Recognition Module with 6 separate! convolutional layers 24 layers deep! Developed by team of Google Researchers:! Won 2014 ImageNet challenge with 6.66% top-5 error rate Slides from Jeff Dean at Google

Good Fine-grained Classification hibiscus dahlia Slides from Jeff Dean at Google

Good Generalization Both recognized as a meal Slides from Jeff Dean at Google

Sensible Errors snake dog Slides from Jeff Dean at Google

Works in practice for real users. Slides from Jeff Dean at Google

Works in practice for real users. Slides from Jeff Dean at Google

Object Detection

YOLO DEMO 40

What you need to know about neural networks Perceptron: Relationship to general neurons Multilayer neural nets Representation Derivation of backprop Learning rule Overfitting