Incremental Stochastic Gradient Descent

Similar documents
Introduction To Artificial Neural Networks

Learning and Neural Networks

Artificial Neural Networks

Lecture 4: Perceptrons and Multilayer Perceptrons

Artificial Neural Networks

Neural Networks biological neuron artificial neuron 1

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

100 inference steps doesn't seem like enough. Many neuron-like threshold switching units. Many weighted interconnections among units

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Learning from Examples

Unit 8: Introduction to neural networks. Perceptrons

Chapter 18. Decision Trees and Ensemble Learning. Recall: Learning Decision Trees

1. Courses are either tough or boring. 2. Not all courses are boring. 3. Therefore there are tough courses. (Cx, Tx, Bx, )

CMSC 421: Neural Computation. Applications of Neural Networks

Introduction to Machine Learning

Multilayer Perceptrons (MLPs)

Machine Learning. Neural Networks

Supervised Learning (contd) Decision Trees. Mausam (based on slides by UW-AI faculty)

Neural networks. Chapter 19, Sections 1 5 1

} It is non-zero, and maximized given a uniform distribution } Thus, for any distribution possible, we have:

Learning from Observations. Chapter 18, Sections 1 3 1

Learning Decision Trees

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Neural networks. Chapter 20, Section 5 1

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING. Santiago Ontañón

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Introduction to Machine Learning Spring 2018 Note Neural Networks

Artificial Intelligence (AI) Common AI Methods. Training. Signals to Perceptrons. Artificial Neural Networks (ANN) Artificial Intelligence

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Neural networks. Chapter 20. Chapter 20 1

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

CS 380: ARTIFICIAL INTELLIGENCE

COMP-4360 Machine Learning Neural Networks

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CSC242: Intro to AI. Lecture 21

Sections 18.6 and 18.7 Analysis of Artificial Neural Networks

CSC321 Lecture 5: Multilayer Perceptrons

Artificial Neural Networks

Data Mining Part 5. Prediction

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Decision Trees. Ruy Luiz Milidiú

Machine Learning

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Introduction to Machine Learning

Neural Networks and the Back-propagation Algorithm

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

AI Programming CS F-20 Neural Networks

Machine Learning. Neural Networks. Le Song. CSE6740/CS7641/ISYE6740, Fall Lecture 7, September 11, 2012 Based on slides from Eric Xing, CMU

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

Bayesian learning Probably Approximately Correct Learning

The Story So Far... The central problem of this course: Smartness( X ) arg max X. Possibly with some constraints on X.

Decision Trees. CS 341 Lectures 8/9 Dan Sheldon

EECS 349:Machine Learning Bryan Pardo

Local Search and Optimization

Course 395: Machine Learning - Lectures

CS:4420 Artificial Intelligence

Local Search & Optimization

Classification with Perceptrons. Reading:

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Machine Learning

Multilayer Neural Networks

LOCAL SEARCH. Today. Reading AIMA Chapter , Goals Local search algorithms. Introduce adversarial search 1/31/14

From inductive inference to machine learning

Sections 18.6 and 18.7 Artificial Neural Networks

Sections 18.6 and 18.7 Artificial Neural Networks

Artificial Neural Networks

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Introduction to Neural Networks

Cheng Soon Ong & Christian Walder. Canberra February June 2018

CSC 411 Lecture 10: Neural Networks

CS 4700: Foundations of Artificial Intelligence

Machine Learning (CS 419/519): M. Allen, 14 Sept. 18 made, in hopes that it will allow us to predict future decisions

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

From perceptrons to word embeddings. Simon Šuster University of Groningen

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

Lecture 5: Logistic Regression. Neural Networks

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

y(x n, w) t n 2. (1)

Decision Trees. None Some Full > No Yes. No Yes. No Yes. No Yes. No Yes. No Yes. No Yes. Patrons? WaitEstimate? Hungry? Alternate?

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

Statistical Learning. Philipp Koehn. 10 November 2015

Machine Learning (CSE 446): Neural Networks

Scaling Up. So far, we have considered methods that systematically explore the full search space, possibly using principled pruning (A* etc.).

Multilayer Perceptron

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

Artificial Neural Networks Examination, June 2005

Based on the original slides of Hung-yi Lee

CS 4700: Foundations of Artificial Intelligence

Plan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation

Logistic Regression & Neural Networks

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

Multilayer Neural Networks

Transcription:

Incremental Stochastic Gradient Descent Batch mode : gradient descent w=w - η E D [w] over the entire data D E D [w]=1/2σ d (t d -o d ) 2 Incremental mode: gradient descent w=w - η E d [w] over individual training examples d E d [w]=1/2 (t d -o d ) 2 Incremental Gradient Descent can approximate Batch Gradient Descent arbitrarily closely if η is small enough

Comparison: Perceptron and Gradient Descent Rule Perceptron learning rule guaranteed to succeed (perfectly classifying training examples) if Training examples are linearly separable Sufficiently small learning rate η Linear unit training rules using gradient descent Guaranteed to converge to hypothesis with minimum squared error Given sufficiently small learning rate η Even when training data contains noise Even when training data not separable by H

Restaurant Problem: Will I wait for a table? Alternate whether there is a suitable alternative restaurant nearby Bar whether the restaurant has a comfortable bar area to wait in Fri/Sat true on Fridays and Saturdays Hungry whether we are hungry Patrons how many people are in the restaurant (None, Some or Full) Price the restaurants price range ($, $$, $$$) Raining whether its is raining outside Reservation whether we made a reservation Type the kind of restaurant (French, Italian, Thai, or Burger) WaitEstimate the wait estimate by the host (0-10 minutes, 10-30, 30-60, > 60)

Multilayer Network

A compromise function Perceptron Linear n output = 1 if w ix i > 0 i=0 0 else output = net = n i=0 w i x i Sigmoid (Logistic) output = σ (net) = 1 1+ e net

Learning in Multilayer Networks Same method as for Perceptrons Example inputs are presented to the network If the network computes an output that matches the desired, nothing is done If there is an error, then the weights are adjusted to balance the error

BackPropagation Learning

Alternative Error Measures

Neural Network Model Inputs Age 34 Gender 2 Stage 4.1.3.6.2.7.2 Σ Σ.4.2.5.8 Σ Output 0.6 Probability of beingalive Independent variables Weights Hidden Layer Weights Dependent variable Prediction

Getting an answer from a NN Inputs Age 34.6 Output Gender 2 Stage 4.1.7.5.8 Σ 0.6 Probability of beingalive Independent variables Weights Hidden Layer Weights Dependent variable Prediction

Getting an answer from a NN Inputs Age 34 Gender 2 Stage 4.3.2.2.5.8 Σ Output 0.6 Probability of beingalive Independent variables Weights Hidden Layer Weights Dependent variable Prediction

Getting an answer from a NN Inputs Age 34 Gender 1 Stage 4.1.3.6.2.7.2.5.8 Σ Output 0.6 Probability of beingalive Independent variables Weights Hidden Layer Weights Dependent variable Prediction

Minimizing the Error Error surface initial error negative derivative final error local minimum w initial w trained positive change

Representational Power (FFNN) Boolean functions 2 layers of units Continuous functions 2 layers of units (sigmoid then linear) Arbitrary functions 3 layers of units (sigmoids then linear)

Hypothesis Space and Inductive Bias

Hidden Layer Representations

Hidden Layer Representations

Overfitting

Neural Nets for Face Recognition

Learning Hidden Unit Weights

ALVINN Drives 70 mph on a public highway Camera image 30 outputs for steering 4 hidden units 30x32 pixels as inputs 30x32 weights into one out of four hidden unit

Handwritten Character Recognition Le Cun et al. (1989) implemented a neural network to read zip codes on hand-addressed envelopes, for sorting purposes To identify the digits, uses a 16x16 array of pixels as input, 3 hidden layers, and a distributed output encoding with 10 output units for digits 0-9 256 input nodes, 10 output units (1 for the liklihood of each number)

Interpreting Satellite Imagery for Automated Weather Forecasting

Recurrent Neural Nets

Neural Network Language Models Statistical Language Modeling: Predict probability of next word in sequence I was headed to Madrid, P( = Spain ) = 0.5, P( = but ) = 0.2, etc. Used in speech recognition, machine translation, (recently) information extraction

Summary Perceptrons, one layer networks, are insufficiently expressive Multi-layer networks are sufficiently expressive and can be trained by error back-propogation Many applications including speech, driving, hand written character recognition, fraud detection, driving, etc.

Local Search algorithms In many optimization problems, the path to the goal is irrelevant; the goal state itself is the solution In such cases, we can use local search algorithms keep a single "current" state, try to improve it Hill-climbing Simulated annealing Local Beam Search Stochastic Beam Search Genetic Algorithms

Genetic algorithms A successor state is generated by combining two parent states Start with k randomly generated states (population) A state is represented as a string over a finite alphabet (often a string of 0s and 1s) Evaluation function (fitness function). Higher values for better states.

Genetic algorithms Fitness function: number of non-attacking pairs of queens (min = 0, max = 8 7/2 = 28) 24/(24+23+20+11) = 31% 23/(24+23+20+11) = 29% etc

Genetic algorithms

Genetic Algorithms Continued 1. Choose initial population 2. Evaluate fitness of each in population 3. Repeat the following until we hit a terminating condition: 1. Select best-ranking to reproduce 2. Breed using crossover and mutation 3. Evaluate the fitnesses of the offspring 4. Replace worst ranked part of population with offspring

How computers play games

Minimax: An Optimal Strategy

Minimax Algorithm: An Optimal Strategy Choose the best move based on the resulting states MINIMAX-VALUE MINIMAX-VALUE(n) = if n is a terminal state then Utility(n) else if MAX s turn the MAXIMUM MINIMAX-VALUE of all possible successors to n else if MIN s turn the MINIMUM MINIMAX-VALUE of all possible successors to n