Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Similar documents
CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

CSE446: Neural Networks Winter 2015

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Neural Networks and Deep Learning

Intelligent Systems Discriminative Learning, Neural Networks

Grundlagen der Künstlichen Intelligenz

Statistical NLP for the Web

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning (CSE 446): Neural Networks

Machine Learning Lecture 12

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

ECE521 Lectures 9 Fully Connected Neural Networks

Machine Learning Lecture 5

Machine Learning. Boris

Jakub Hajic Artificial Intelligence Seminar I

Based on the original slides of Hung-yi Lee

Lecture 17: Neural Networks and Deep Learning

Midterm: CS 6375 Spring 2015 Solutions

PAC-learning, VC Dimension and Margin-based Bounds

Neural Networks: Backpropagation

Nonlinear Classification

SGD and Deep Learning

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018

Machine Learning Lecture 10

Neural networks. Chapter 20, Section 5 1

Neural networks. Chapter 19, Sections 1 5 1

Course in Data Science

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

Neural networks and optimization

Machine Learning. Neural Networks

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Deep Learning: a gentle introduction

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Neural networks and optimization

Deep Learning (CNNs)

Multilayer Perceptrons (MLPs)

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Artificial Intelligence

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35

Learning from Examples

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

Learning Theory Continued

CSE446: Clustering and EM Spring 2017

Lecture 5: Logistic Regression. Neural Networks

Ad Placement Strategies

Logistic Regression & Neural Networks

Neural networks. Chapter 20. Chapter 20 1

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

AI Programming CS F-20 Neural Networks

Neural Networks: Introduction

Lecture 12. Neural Networks Bastian Leibe RWTH Aachen

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

ECE 5984: Introduction to Machine Learning

Introduction to Convolutional Neural Networks (CNNs)

Course 395: Machine Learning - Lectures

Lecture 12. Neural Networks Bastian Leibe RWTH Aachen

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

CS 4700: Foundations of Artificial Intelligence

The exam is closed book, closed notes except your one-page cheat sheet.

PAC-learning, VC Dimension and Margin-based Bounds

Neural networks COMS 4771

Lecture 12. Talk Announcement. Neural Networks. This Lecture: Advanced Machine Learning. Recap: Generalized Linear Discriminants

Artificial Neural Networks

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Bits of Machine Learning Part 1: Supervised Learning

STA 414/2104: Lecture 8

CSCI567 Machine Learning (Fall 2018)

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Feature Design. Feature Design. Feature Design. & Deep Learning

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Neural Nets Supervised learning

PATTERN CLASSIFICATION

How to do backpropagation in a brain

Introduction to Machine Learning

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

Sections 18.6 and 18.7 Analysis of Artificial Neural Networks

STA 414/2104: Lecture 8

Summary and discussion of: Dropout Training as Adaptive Regularization

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Learning and Neural Networks

Understanding How ConvNets See

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

Linear classifiers: Logistic regression

Making Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation

Transcription:

3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network

3/9/7 Perceptron as a neural network w 0 x[] w w x[2] 2 x[d] w d Σ dx j=0 P d w j x[j] if g = j=0 w jx[j] > 0 otherwise 3 This is one neuron: - Input edges x[],,x[d], along with intercept x[0]= - Sum passed through an activation function g Sigmoid neuron 0.9 0.8 0.7 0.6 0.5 w 0 0.4 0.3 0.2 0. 0-6 -4-2 0 2 4 6 x[] w x[2] w 2 Σ 4 x[d] w d Just change g! - Why would we want to do this? - Notice the output range [0,]. What was it before? - Look familiar? Score(x) = dx w j x[j] j=0 g = +e Score(x) 2

3/9/7 Perceptron, linear classification, Boolean fns: x[j] {0,} Can learn x[] x[2]? - -0.5 + x[] + x[2] Can learn x[] x[2]? - -.5 + x[] + x[2] Can learn any conjunction or disjunction? - 0.5 + x[] + + x[d] - (-d+0.5) + x[] + + x[d] Can learn majority? - (-0.5*d) + x[] + + x[d] What are we missing? The dreaded XOR!, etc. x[] x[2] x[d] w 0 w w 2 w d Σ dx j=0 P d w j x[j] if g = j=0 wjx[j] > 0 otherwise 5 Introducing a hidden layer 3

3/9/7 What can t a simple linear classifier represent? XOR the counterexample to everything Need non-linear features XOR = x[] AND NOT x[2] OR NOT x[] AND x[2] 7 Solving the XOR problem: Going beyond linear classification by adding a layer XOR = x[] AND NOT x[2] OR NOT x[] AND x[2] -0.5 v[] -0.5 v[2] x[] x[2] - -0.5 - v[] v[2] y Thresholded to 0 or 4

3/9/7 Solving the XOR problem: Going beyond linear classification by adding a layer y = x[] XOR x[2] =(x[] x[2]) (x[2] x[]) v[] = (x[] x[2]) = -0.5+x[]-x[2] v[2] = (x[2] x[]) = -0.5+x[2]-x[] y = v[] v[2] = -0.5+v[]+v[2] x[] x[2] -0.5 - -0.5 - -0.5 v[] v[2] y 9 Hidden layer Single unit: out(x) =g(w 0 + X j w j x[j]) -hidden layer: out(x) =g(w 0 + X k w k g(w k 0 + X j w k j x[j])) 0 No longer convex function! 5

3/9/7 A general neural network Layers and layers and layers of linear models and non-linear transformations x[] v[] y Around for about 50 years - Fell in disfavor in 90s In last few years, big resurgence - Impressive accuracy on several benchmark problems - Powered by huge datasets, GPUs, & modeling/learning alg improvements x[2] v[2] Learning neural networks with hidden layers 6

3/9/7 Recall: Optimizing a single-layer neuron We train to minimize sum of squared errors: `(w) = X [y i g(w 0 + X w j x i [j])] 2 2 i j Taking gradients: @` @w j = X i [y i g(w 0 + X j @ @x f(g(x)) = f 0 (g(x))g 0 (x) @ w j x i [j])] g(w 0 + X @w j j w j x i [j]) = X i [y i g(w 0 + X j w j x i [j])]x i [j]g 0 (w 0 + X j w j x i [j]) 3 Solution just depends on g : derivative of activation function! Forward propagation -hidden layer: out(x) =g(w 0 + X k w k g(w k 0 + X j w k j x[j])) For fixed weights, forming predictions is easy! Compute values left to right. Inputs: x[],,x[d] 2. Hidden: v[],,v[d] 3. Output: y x[] x[2] v[] v[2] y 4 7

3/9/7 Gradient descent for -hidden layer: Output layer parameters `(w) = 2 X [y i out(x i )] 2 i Dropped w 0 to make derivation simpler @` @w k = NX i= [y i out(x i )] @out(x i) @w k 5 Gradient for last layer same as single node case, but with hidden nodes v as input! Gradient descent for -hidden layer: Hidden layer parameters `(w) = 2 @` @w k j = NX i= X [y i out(x i )] 2 i [y i out(x i )] @out(x i) @w k j Dropped w 0 to make derivation simpler @ @x f(g(x)) = f 0 (g(x))g 0 (x) For hidden layer, two parts: 6 Recursive computation of gradient on output layer Normal update for single neuron 8

3/9/7 Multilayer neural networks Inference and Learning Forward pass: left to right, each hidden layer in turn Gradient computation: right to left, propagating gradient for each node Forward Gradient 7 Forward propagation Prediction Recursive algorithm Start from input layer Output of node v[k] with parents u[],u[2], : 0 v[k] =g @ X j wj k u[j] A 8 9

3/9/7 Back-propagation Learning Just gradient descent!!! Recursive algorithm for computing gradient For each example - Perform forward propagation - Start from output layer Compute gradient of node v[k] with parents u[],u[2], : Update weight wj k Repeat (move to preceding layer) 9 Convergence of backprop Perceptron leads to convex optimization - Gradient descent reaches global minima Multilayer neural nets not convex - Gradient descent gets stuck in local minima - Selecting number of hidden units and layers = fuzzy process - NNs have made a HUGE comeback in the last few years!!! Neural nets are back with a new name!!!! - Deep belief networks - Huge error reduction when trained with lots of data on GPUs 20 0

3/9/7 Overfitting in NNs Are NNs likely to overfit? - Yes, they can represent arbitrary functions!!! Avoiding overfitting? - More training data - Fewer hidden nodes / better topology - Regularization - Early stopping 2 Neural networks can do cool things!

3/9/7 Object recognition Slides from Jeff Dean at Google 23 Number detection Slides from Jeff Dean at Google 24 2

3/9/7 Acoustic Modeling for Speech Recognition label Close collaboration with Google Speech team Slides from Jeff Dean at Google Trained in <5 days on cluster of 800 machines 30% reduction in Word Error Rate for English! ( biggest single improvement in 20 years of speech research ) Launched in 202 at time of Jellybean release of Android 25 202-era Convolutional Model for Object Recognition Softmax to predict object class Fully-connected layers Convolutional layers! (same weights used at all! spatial locations in layer)!! Convolutional networks developed by! Yann LeCun (NYU) Layer 7... Layer Input Slides from Jeff Dean at Google Basic architecture developed by Krizhevsky, Sutskever & Hinton (all now at Google).! Won 202 ImageNet challenge with 6.4% top-5 error rate 26 3

3/9/7 204-era Model for Object Recognition Module with 6 separate! convolutional layers 24 layers deep! Slides from Jeff Dean at Google Developed by team of Google Researchers:! Won 204 ImageNet challenge with 6.66% top-5 error rate 27 Good Fine-grained Classification Slides from Jeff Dean at Google hibiscus dahlia 28 4

3/9/7 Good Generalization Slides from Jeff Dean at Google Both recognized as a meal Sensible Errors Slides from Jeff Dean at Google snake dog 5

3/9/7 Works in practice for real users. Slides from Jeff Dean at Google Works in practice for real users. Slides from Jeff Dean at Google 6

3/9/7 Object detection Person: 0.64 Dog: 0.30 Horse: 0.28 Redmon et al. 205 http://pjreddie.com/yolo/ Neural network summary 7

3/9/7 What you need to know about neural networks Perceptron: - Relationship to general neurons Multilayer neural nets - Representation - Derivation of backprop - Learning rule Overfitting 35 Course Wrap-Up Emily Fox University of Washington March 0, 207 8

3/9/7 What you have learned this quarter 37 Learning is function approximation Point estimation Regression Overfitting Bias-Variance tradeoff Ridge, LASSO Cross validation Stochastic gradient descent Coordinate descent Subgradient Logistic regression Decision trees Boosting Instance-based learning Perceptron SVMs Kernel trick Dimensionality reduction, PCA K-means Mixtures of Gaussians EM Discriminative v. Generative learning Unsupervised v. Supervised learning Naïve Bayes Bayes nets Neural networks BIG PICTURE Improving the performance at some task though experience!!! J - before you start any learning task, remember the fundamental questions: What is the learning problem? From what experience? What model? What loss function are you optimizing? With what optimization algorithm? Which learning algorithm? With what guarantees? How will you evaluate it? 38 9

3/9/7 Regression Example: Predicting house prices Data ML Regression Method Intelligence $ $ $ price ($) $ =?? 39 + house features house size Classification Example: Sentiment analysis Data ML Classification Method Intelligence Sushi was awesome, the food was awesome, but the service was awful. All reviews: Score(x) < 0 awful 40 awesome Score(x) > 0 20

3/9/7 Similarity/finding data Example: Document retrieval ML Nearest Method neighbor Data 4 Intelligence Clustering Example: Document structuring for retrieval Data ML Clustering Method SPORTS 42 Intelligence WORLD NEWS ENTERTAINMENT SCIENCE 2

3/9/7 Embedding Example: Embedding images to visualize data Data ML PCA Method Intelligence Can we give each image a coordinate, such that similar images are near each other? [Saul & Roweis 03] 43 Images with thousands or millions of pixels Deep Learning Example: Visual product recommender Data Deep ML Method Learning Intelligence Input images: Layer Layer 2 Nearest neighbors: x z y x 2 z 2 44 22

3/9/7 You have done a lot!!! And (hopefully) learned a lot!!! - Implemented LASSO Logistic regression Perceptron Clustering - Answered hard questions and proved many interesting results - Completed (I am sure) an amazing ML project - And did excellently on the final! 45 Thank You for the Hard Work!!! 23