Learning in State-Space Reinforcement Learning CIS 32
|
|
- Myra Stephens
- 5 years ago
- Views:
Transcription
1 Learning in State-Space Reinforcement Learning CIS 32
2 Functionalia Syllabus Updated: MIDTERM and REVIEW moved up one day. MIDTERM: Everything through Evolutionary Agents. HW 2 Out - DUE Sunday before the MIDTERM. EVENING TEA: Next Monday, 5pm to 7pm, 0317 N Today: Training TLU s Recap Neural Networks Learning a Heuristic Search-Tree-Less Heuristic Learning Reinforcement Learning
3 Technique Error Correction Training TLU s Techniques Gradient Descent? No Widrow-Hoff Yes f = s = Threshold Function f = 1 if n! i=1 = 0 otherwise n x w i i # "! x i w i=1 i Generalized Delta Yes f (s) = e s
4 Weight Update Functions Technique Range of d (Desired Output) Range of f (Actual Training Output) Weight Update Error Correction 0 or 1 0 or 1 Widrow-Hoff -1 or 1 [-inf, +inf] Generalized Delta 0 or 1 [0, 1] sigmoid c is the learning rate parameter (small positive fraction)
5 Error-Correction Technique d f change c 1 0 +c Change in finite chunks. For small enough c: terminates after a finite number of steps (if the function is linearly separable) If the function is not linearly separable, does not terminate (but oscillates).
6 Example or Error Correction W 0 = 1.5 W 0 = 0.5? W 0 = 1.5 random W 0 = 0.5 W 0 = 0.5 W 1 = 1 W 2 = 1 W 1 = W1 1 = random W 1 = 1 W 1 = 1 W 1 = 1 W 2 = 1 W 2 = 1 random W 2 = 1 AND OR AND NOT Remember that the Threshold becomes rolled into the weights. We will start with random (can also be uniform) set of weights. Set our Learning Rate to 0.1
7 Example or Error Correction W 0 = 1.5 random W 0 = 0.5 V X1 X2 W 0 = 0.5 d W1 = random W 1 = 1 W 2 = 1 random AND W 1 = 1 W 2 = W 1 = OR training set NOT
8 Train One Example a Time V X1 X2 d w0 w1 w2 s f c d-f dw0 dw1 dw2 E random
9 Train One Example a Time V X1 X2 d w0 w1 w2 s f c d-f dw0 dw1 dw2 E
10 Train One Example a Time V X1 X2 d w0 w1 w2 s f c d-f dw0 dw1 dw2 E
11 Train One Example a Time V X1 X2 d w0 w1 w2 s f c d-f dw0 dw1 dw2 E f = 1 if n! i=1 = 0 otherwise x w i i # "
12 Train One Example a Time V X1 X2 d w0 w1 w2 s f c d-f dw0 dw1 dw2 E learning parameter (constant)
13 Train One Example a Time V X1 X2 d w0 w1 w2 s f c d-f dw0 dw1 dw2 E
14 Train One Example a Time V X1 X2 d w0 w1 w2 s f c d-f dw0 dw1 dw2 E
15 Train One Example a Time V X1 X2 d w0 w1 w2 s f c d-f dw0 dw1 dw2 E
16 Train One Example a Time V X1 X2 d w0 w1 w2 s f c d-f dw0 dw1 dw2 E Error:
17 After First Round V X1 X2 d w0 w1 w2 s f c d-f dw0 dw1 dw2 E
18 Second Round V X1 X2 d w0 w1 w2 s f c d-f dw0 dw1 dw2 E
19 Third Round V X1 X2 d w0 w1 w2 s f c d-f dw0 dw1 dw2 E
20 Fourth Round V X1 X2 d w0 w1 w2 s f c d-f dw0 dw1 dw2 E Successfully Completed a Round with Changes Done
21 Gradient Descent in Weight Space Wa Gradient of error with respect to TLU s Weights (Wa, Wb) (wa0, wb0) (wa2, wb2) (wa1, wb1) Wb
22 Widrow-Hoff Technique f(s) = s Change to the weights in variable chunks. d uses -1 to represent training examples of 0 (pulls zero s below 0) threshold f(s) 1-1 training f(s) Process never terminates, but the differences in Error will be minimized.
23 After First Round V X1 X2 d w0 w1 w2 s f f c d-f dw0 dw1 dw2 E f = s Notice Wide Range in Error
24 After 10 Rounds V X1 X2 d w0 w1 w2 s f f c d-f dw0 dw1 dw2 E Good Enough
25 Round 200-something V X1 X2 d w0 w1 w2 s f f c d-f dw0 dw1 dw2 E Still Good Enough Converged And Decreasing
26 Generalized Delta Technique Steeper slope close to the threshold causes faster change near boundary Change to the weights in variable chunks. More fuzzy boundary. d uses 0 to represent training examples of 0 (instead of -1 for W-H). More modern threshold - used in multi-node networks.
27 After First Round V X1 X2 d w0 w1 w2 s f f c d-f f(1-f) dw0 dw1 dw2 E Uses larger learning rate Notice Smaller Range in Error
28 After 14 Rounds V X1 X2 d w0 w1 w2 s f f c d-f f(1-f) dw0 dw1 dw2 E Always ranges between -0.5 and 0.5
29 Network Structures Two kinds of larger Neural Network Structures: 1. feed-forward networks - acyclic contains hidden layers and inputs. 2. recurrent networks - cyclic dynamic systems - with oscillations and chaotic behavior can exhibit short-term memory
30 Hidden Units 1 W 1, 3 W 1, 4 3 W 3, W 2, 3 W 2, 4 4 W 4, 5 Activation of Unit 5 is based on the weighted outputs of Unit 3 and 4. Units 3 and 4 represent the hidden units. Activation depends on the Unit (can use the sigmoid function)
31 Layers are usually fully connected. Multi-layer Numbers of nodes typically set by hand.
32 Multilayer Feed Forward Layers are usually fully connected; numbers of nodes typically set by hand. Single Hidden Layer is Most Common. back-propagation
33 Larger hypothesis space Combine two opposite-facing threshold functions to make a ridge Combine two perpendicular ridges to make a bump Add bumps of various sizes and locations to fit any surface
34 Hopfield Networks - Recurrent Networks contain bidirectional connections (units are inputs and outputs) stimulus results in the networks settling into an activations pattern that most closely resembles a training example N units can store N training examples. Boltzmann Machines - like Hopfield Networks, but contain hidden units activation functions are stochastic (functions based on a probability that a unit exhibits a 1 based on the total weighted unit)
35 Learning in State Space We return now to heuristics (evaluation functions): used both in Search and Minimax Search. Having a good heuristics greatly improve s an agent s performance: (i.e. A* search, and in evaluating leaf nodes in Adversarial Search) Good Knowledge of Subject Domain Good heuristics No Knowledge of Subject Domain Learn the heuristic
36 more Levels of Reinforcement Learning knowledge about the problem domain less Agent knows it s actions, results, and costs; can build an explicit Search Tree to explore; has a clear short-term goal. Agent does not have a model of it s actions; can build an explicit Search Tree to explore; has a clear short-term goal. Agent does have a model of it s actions; but cannot (too large) build an explicit Search Tree to explore; has a clear shortterm goal state. Agent knows it s actions, results, and costs; cannot (too large) build an explicit Search Tree to explore; does not have a clear short-term goal. Performance based on Reward not Goals.
37 Explicit Graph Heuristic Learning Just as we did with previous searches, Agent: knows actions, their results, and costs has enough space to build an entire search tree. Set the heuristic function h(n) = 0 for all nodes, and do an A* search. Updates the h(n) once the node is expanded: Knows the goal state: h(goal) = 0 set of all children
38 Explicit Graph Learning Performance What kind of search is this - when the agent searches for the first time?
39 Explicit Graph Learning Performance Uniform Cost Search (f = g + 0)
40 Explicit Graph Learning Performance Subsequent searches zoom in on the right solution faster and faster. This happens as the true (h(n)) values propagate to the goal. h=1 2 2 h=2 h= h= h=1 h= h=1
41 Explicit Graph Learning Performance Each run propagates the true cost of getting to goal further back through the search. Eventually the minimal path can be read off the the tree. h=1 2 2 h=2 h= h= h=1 h= h=1
42 Explicit Graph Learning Performance Each run propagates the true cost of getting to goal further back through the search. Eventually the minimal path can be read off the the tree h=1 h=2 h=2 3 1 h= Agent goes through a thought experiment, uses a model of the State-Space. h=1 h= h=1
43 No Model of Action Heuristic Learning What if there is not clear model of action for state transition? Assuming agent can build, name, and store previous states......the agent can learn heuristics in the real-world. This can be perilous... Explore: A robot uses a grid to plan a route, moves randomly about the room. Exploit: Works out which runs about the room are the most optimal, and at what time were certain operations useful.
44 Updating the heuristic value of states Start Node: Agent knows the Cost of an action after taking it. States are Named and Stored, and can be Distinguished at a later State. Heuristic function for a State is updated: heuristic value of the node agent was just in cost of the transition (i.e. action) heuristic value of node transitioned to (initially 0 if not travelled to previously)
45 Choosing Actions Initially actions are chosen randomly. After some exploring, states have h(n) values ascribed to them. And there is model built of the actions: (describes the state (i.e. node n) that is reached from node ni after carrying out action a) Actions are now chosen by: Eventually the estimated minimum path to the goal is built up. Keeping some randomness allows for discovery of possibly more optimum paths to the goal.
46 Learning without a Search Graph (or Node Table) More realistic problems are so large: it is not possible to store all the states/node and build the entire search graph. Now, if we have a model of the actions, we can create and search with an evaluation function. Assemble a heuristic function out of as many sub-functions that can describe some value of a state-space. For the 8-puzzle it a list of functions could be: W(n) : number of tiles out of place P(n) : sum of distance of each tile from it s home Any other functions : usually relaxed heuristics.
47 Weighted Heuristic Function Write our heuristic function as a linear weighted combination: All we have to do now is learn which weights are the best. One way to do that, is to notice the difference in the heuristic value once we traverse from one node to another taking into consideration that cost:
48 Updating the Heuristic Learning Rate Set of Successor Nodes We modify h(ni) by adding some proportion of (controlled by ) of the difference of what we thought h(ni) was before expansion, what we think it is after. Once we know the change in h(ni), we adjust the weights similar to the Neural Networks.
49 Rewritten: Temporal Learning controls how fast the agent learns how much weight we give to the new estimate of the heuristic. Effect 0 no adjustment to h(ni) low high slow learning erratic performance 1 h(ni) is thrown away
50 Temporal Learning Called Temporal Learning - because the difference is based on the distance in one timed step. Note that this temporal difference approach can also work without a model of the effects of actions (with suitable modification).
51 Rewards not goals For many tasks agents don t have short term goals, but instead accrue rewards over a period of time. Instead of a plan, we want a policy act over time. which says how the agent should Typically this is expressed as what action should be carried out in a given state. Express the reward an agent gets as We want an optimal policy at every node. special reward for being in state nj which maximizes the (discounted) reward
52 Finding the Optimum Policy One (non-ideal) solution is to search through all policies (randomly) until a good one is discovered. Instead, given a certain policy, one can calculate the value of a node - the reward an agent will get if it starts at that node and follows the policy. Agent at ni and follows the policy to nj, then the agent can expect this reward (in the long-term): discounting factor - adds a little long-term goal
53 Value Iteration The optimum policy then gives us the action that maximizes this reward: If we knew what the values of the nodes were under easily compute the optimal policy:, then we could The problem is that we don t know these values. But we can find them out using value iteration. We start by guessing (randomly is fine) an estimated value V(n) for each node.
54 Approximating the Estimated Values Then when we are at ni we pick the action to maximize: that is the best thing given what we currently know. We then update V(ni) by: Progressive iterations of this calculation make V(n) a closer and closer approximation to Intuitively this is because we replace the estimate with the actual reward we get for the next state (and the next state and the next state).
55 Summary This lecture has looked at a number of approaches to learning heuristic functions. We started assuming that the agent knew everything but the heuristic, and progressively relaxed assumptions. This created a battery of reinforcement learning methods that can be applied in a wide variety of situations. These models also tie learning and planning together very closely, and we will revisit them as planning models later in the course.
CS:4420 Artificial Intelligence
CS:4420 Artificial Intelligence Spring 2018 Neural Networks Cesare Tinelli The University of Iowa Copyright 2004 18, Cesare Tinelli and Stuart Russell a a These notes were originally developed by Stuart
More informationNeural networks. Chapter 20, Section 5 1
Neural networks Chapter 20, Section 5 Chapter 20, Section 5 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 20, Section 5 2 Brains 0 neurons of
More informationNeural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21
Neural Networks Chapter 8, Section 7 TB Artificial Intelligence Slides from AIMA http://aima.cs.berkeley.edu / 2 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural
More informationLeast Mean Squares Regression
Least Mean Squares Regression Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Lecture Overview Linear classifiers What functions do linear classifiers express? Least Squares Method
More information2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller
2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks Todd W. Neller Machine Learning Learning is such an important part of what we consider "intelligence" that
More informationMultilayer Perceptrons and Backpropagation
Multilayer Perceptrons and Backpropagation Informatics 1 CG: Lecture 7 Chris Lucas School of Informatics University of Edinburgh January 31, 2017 (Slides adapted from Mirella Lapata s.) 1 / 33 Reading:
More informationLecture 23: Reinforcement Learning
Lecture 23: Reinforcement Learning MDPs revisited Model-based learning Monte Carlo value function estimation Temporal-difference (TD) learning Exploration November 23, 2006 1 COMP-424 Lecture 23 Recall:
More informationLeast Mean Squares Regression. Machine Learning Fall 2018
Least Mean Squares Regression Machine Learning Fall 2018 1 Where are we? Least Squares Method for regression Examples The LMS objective Gradient descent Incremental/stochastic gradient descent Exercises
More informationIntroduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis
Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.
More informationIntroduction to Neural Networks
Introduction to Neural Networks What are (Artificial) Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning
More informationNeural networks. Chapter 19, Sections 1 5 1
Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10
More informationAI Programming CS F-20 Neural Networks
AI Programming CS662-2008F-20 Neural Networks David Galles Department of Computer Science University of San Francisco 20-0: Symbolic AI Most of this class has been focused on Symbolic AI Focus or symbols
More informationLast update: October 26, Neural networks. CMSC 421: Section Dana Nau
Last update: October 26, 207 Neural networks CMSC 42: Section 8.7 Dana Nau Outline Applications of neural networks Brains Neural network units Perceptrons Multilayer perceptrons 2 Example Applications
More informationCSC321 Lecture 8: Optimization
CSC321 Lecture 8: Optimization Roger Grosse Roger Grosse CSC321 Lecture 8: Optimization 1 / 26 Overview We ve talked a lot about how to compute gradients. What do we actually do with them? Today s lecture:
More informationLecture 16: Introduction to Neural Networks
Lecture 16: Introduction to Neural Networs Instructor: Aditya Bhasara Scribe: Philippe David CS 5966/6966: Theory of Machine Learning March 20 th, 2017 Abstract In this lecture, we consider Bacpropagation,
More informationNeural Networks Introduction CIS 32
Neural Networks Introduction CIS 32 Functionalia Office Hours (Last Change!) - Location Moved to 0317 N (Bridges Room) Today: Alpha-Beta Example Neural Networks Learning with T-R Agent (from before) direction
More informationReinforcement Learning and Control
CS9 Lecture notes Andrew Ng Part XIII Reinforcement Learning and Control We now begin our study of reinforcement learning and adaptive control. In supervised learning, we saw algorithms that tried to make
More informationBalancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm
Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu
More informationCS 188 Introduction to Fall 2007 Artificial Intelligence Midterm
NAME: SID#: Login: Sec: 1 CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm You have 80 minutes. The exam is closed book, closed notes except a one-page crib sheet, basic calculators only.
More information) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2.
1 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.034 Artificial Intelligence, Fall 2011 Recitation 8, November 3 Corrected Version & (most) solutions
More informationIntroduction to Reinforcement Learning
CSCI-699: Advanced Topics in Deep Learning 01/16/2019 Nitin Kamra Spring 2019 Introduction to Reinforcement Learning 1 What is Reinforcement Learning? So far we have seen unsupervised and supervised learning.
More informationNeural networks. Chapter 20. Chapter 20 1
Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms
More information18.6 Regression and Classification with Linear Models
18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight
More informationReinforcement Learning. Machine Learning, Fall 2010
Reinforcement Learning Machine Learning, Fall 2010 1 Administrativia This week: finish RL, most likely start graphical models LA2: due on Thursday LA3: comes out on Thursday TA Office hours: Today 1:30-2:30
More informationLecture 4: Perceptrons and Multilayer Perceptrons
Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons
More information22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1
Neural Networks Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1 Brains as Computational Devices Brains advantages with respect to digital computers: Massively parallel Fault-tolerant Reliable
More informationCSC321 Lecture 7: Optimization
CSC321 Lecture 7: Optimization Roger Grosse Roger Grosse CSC321 Lecture 7: Optimization 1 / 25 Overview We ve talked a lot about how to compute gradients. What do we actually do with them? Today s lecture:
More informationArtificial Neural Networks Examination, June 2005
Artificial Neural Networks Examination, June 2005 Instructions There are SIXTY questions. (The pass mark is 30 out of 60). For each question, please select a maximum of ONE of the given answers (either
More informationMARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti
1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early
More informationSerious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions
BACK-PROPAGATION NETWORKS Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks Cannot approximate (learn) non-linear functions Difficult (if not impossible) to design
More informationMachine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?
Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity
More informationInput layer. Weight matrix [ ] Output layer
MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.034 Artificial Intelligence, Fall 2003 Recitation 10, November 4 th & 5 th 2003 Learning by perceptrons
More informationReinforcement Learning
Reinforcement Learning Function approximation Mario Martin CS-UPC May 18, 2018 Mario Martin (CS-UPC) Reinforcement Learning May 18, 2018 / 65 Recap Algorithms: MonteCarlo methods for Policy Evaluation
More informationPattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore
Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal
More informationSimple Neural Nets For Pattern Classification
CHAPTER 2 Simple Neural Nets For Pattern Classification Neural Networks General Discussion One of the simplest tasks that neural nets can be trained to perform is pattern classification. In pattern classification
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationApproximate Q-Learning. Dan Weld / University of Washington
Approximate Q-Learning Dan Weld / University of Washington [Many slides taken from Dan Klein and Pieter Abbeel / CS188 Intro to AI at UC Berkeley materials available at http://ai.berkeley.edu.] Q Learning
More informationCOMP 551 Applied Machine Learning Lecture 14: Neural Networks
COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: Ryan Lowe (ryan.lowe@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted,
More informationReinforcement Learning
Reinforcement Learning Ron Parr CompSci 7 Department of Computer Science Duke University With thanks to Kris Hauser for some content RL Highlights Everybody likes to learn from experience Use ML techniques
More informationSections 18.6 and 18.7 Artificial Neural Networks
Sections 18.6 and 18.7 Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline The brain vs. artifical neural
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationLecture 7 Artificial neural networks: Supervised learning
Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works The neuron as a simple computing element The perceptron Multilayer neural networks Accelerated learning in
More informationAdministration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon.
Administration CSCI567 Machine Learning Fall 2018 Prof. Haipeng Luo U of Southern California Nov 7, 2018 HW5 is available, due on 11/18. Practice final will also be available soon. Remaining weeks: 11/14,
More informationMIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,
MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run
More informationIntroduction to Artificial Neural Networks
Facultés Universitaires Notre-Dame de la Paix 27 March 2007 Outline 1 Introduction 2 Fundamentals Biological neuron Artificial neuron Artificial Neural Network Outline 3 Single-layer ANN Perceptron Adaline
More informationEffect of number of hidden neurons on learning in large-scale layered neural networks
ICROS-SICE International Joint Conference 009 August 18-1, 009, Fukuoka International Congress Center, Japan Effect of on learning in large-scale layered neural networks Katsunari Shibata (Oita Univ.;
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationLearning with Momentum, Conjugate Gradient Learning
Learning with Momentum, Conjugate Gradient Learning Introduction to Neural Networks : Lecture 8 John A. Bullinaria, 2004 1. Visualising Learning 2. Learning with Momentum 3. Learning with Line Searches
More information4. Multilayer Perceptrons
4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output
More informationCS 4100 // artificial intelligence. Recap/midterm review!
CS 4100 // artificial intelligence instructor: byron wallace Recap/midterm review! Attribution: many of these slides are modified versions of those distributed with the UC Berkeley CS188 materials Thanks
More informationPlan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation
Neural Networks Plan Perceptron Linear discriminant Associative memories Hopfield networks Chaotic networks Multilayer perceptron Backpropagation Perceptron Historically, the first neural net Inspired
More informationHopfield Neural Network
Lecture 4 Hopfield Neural Network Hopfield Neural Network A Hopfield net is a form of recurrent artificial neural network invented by John Hopfield. Hopfield nets serve as content-addressable memory systems
More informationARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD
ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided
More informationToday s s Lecture. Applicability of Neural Networks. Back-propagation. Review of Neural Networks. Lecture 20: Learning -4. Markov-Decision Processes
Today s s Lecture Lecture 20: Learning -4 Review of Neural Networks Markov-Decision Processes Victor Lesser CMPSCI 683 Fall 2004 Reinforcement learning 2 Back-propagation Applicability of Neural Networks
More informationArtificial Neural Networks 2
CSC2515 Machine Learning Sam Roweis Artificial Neural s 2 We saw neural nets for classification. Same idea for regression. ANNs are just adaptive basis regression machines of the form: y k = j w kj σ(b
More informationNeural Networks Learning the network: Backprop , Fall 2018 Lecture 4
Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:
More informationPengju
Introduction to AI Chapter04 Beyond Classical Search Pengju Ren@IAIR Outline Steepest Descent (Hill-climbing) Simulated Annealing Evolutionary Computation Non-deterministic Actions And-OR search Partial
More informationMachine Learning. Neural Networks
Machine Learning Neural Networks Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 Biological Analogy Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 THE
More informationSections 18.6 and 18.7 Artificial Neural Networks
Sections 18.6 and 18.7 Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline The brain vs artifical neural networks
More informationReinforcement Learning
CS7/CS7 Fall 005 Supervised Learning: Training examples: (x,y) Direct feedback y for each input x Sequence of decisions with eventual feedback No teacher that critiques individual actions Learn to act
More informationCSE250A Fall 12: Discussion Week 9
CSE250A Fall 12: Discussion Week 9 Aditya Menon (akmenon@ucsd.edu) December 4, 2012 1 Schedule for today Recap of Markov Decision Processes. Examples: slot machines and maze traversal. Planning and learning.
More informationDecision Theory: Q-Learning
Decision Theory: Q-Learning CPSC 322 Decision Theory 5 Textbook 12.5 Decision Theory: Q-Learning CPSC 322 Decision Theory 5, Slide 1 Lecture Overview 1 Recap 2 Asynchronous Value Iteration 3 Q-Learning
More informationOptimization and Gradient Descent
Optimization and Gradient Descent INFO-4604, Applied Machine Learning University of Colorado Boulder September 12, 2017 Prof. Michael Paul Prediction Functions Remember: a prediction function is the function
More informationLearning from Examples
Learning from Examples Data fitting Decision trees Cross validation Computational learning theory Linear classifiers Neural networks Nonparametric methods: nearest neighbor Support vector machines Ensemble
More informationCS188: Artificial Intelligence, Fall 2009 Written 2: MDPs, RL, and Probability
CS188: Artificial Intelligence, Fall 2009 Written 2: MDPs, RL, and Probability Due: Thursday 10/15 in 283 Soda Drop Box by 11:59pm (no slip days) Policy: Can be solved in groups (acknowledge collaborators)
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,
More informationFinal. Introduction to Artificial Intelligence. CS 188 Spring You have approximately 2 hours and 50 minutes.
CS 188 Spring 2014 Introduction to Artificial Intelligence Final You have approximately 2 hours and 50 minutes. The exam is closed book, closed notes except your two-page crib sheet. Mark your answers
More informationDecision Theory: Markov Decision Processes
Decision Theory: Markov Decision Processes CPSC 322 Lecture 33 March 31, 2006 Textbook 12.5 Decision Theory: Markov Decision Processes CPSC 322 Lecture 33, Slide 1 Lecture Overview Recap Rewards and Policies
More informationSections 18.6 and 18.7 Analysis of Artificial Neural Networks
Sections 18.6 and 18.7 Analysis of Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline Univariate regression
More informationNeural Networks (Part 1) Goals for the lecture
Neural Networks (Part ) Mark Craven and David Page Computer Sciences 760 Spring 208 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed
More informationLinear classification with logistic regression
Section 8.6. Regression and Classification with Linear Models 725 Proportion correct.9.7 Proportion correct.9.7 2 3 4 5 6 7 2 4 6 8 2 4 6 8 Number of weight updates Number of weight updates Number of weight
More informationARTIFICIAL INTELLIGENCE. Artificial Neural Networks
INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html
More informationIn biological terms, memory refers to the ability of neural systems to store activity patterns and later recall them when required.
In biological terms, memory refers to the ability of neural systems to store activity patterns and later recall them when required. In humans, association is known to be a prominent feature of memory.
More informationCS599 Lecture 1 Introduction To RL
CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming
More informationLecture 1: March 7, 2018
Reinforcement Learning Spring Semester, 2017/8 Lecture 1: March 7, 2018 Lecturer: Yishay Mansour Scribe: ym DISCLAIMER: Based on Learning and Planning in Dynamical Systems by Shie Mannor c, all rights
More informationToday s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning
CSE 473: Artificial Intelligence Reinforcement Learning Dan Weld Today s Outline Reinforcement Learning Q-value iteration Q-learning Exploration / exploitation Linear function approximation Many slides
More informationARTIFICIAL INTELLIGENCE. Reinforcement learning
INFOB2KI 2018-2019 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Reinforcement learning Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html
More informationFeedforward Neural Nets and Backpropagation
Feedforward Neural Nets and Backpropagation Julie Nutini University of British Columbia MLRG September 28 th, 2016 1 / 23 Supervised Learning Roadmap Supervised Learning: Assume that we are given the features
More informationCS 4700: Foundations of Artificial Intelligence
CS 4700: Foundations of Artificial Intelligence Prof. Bart Selman selman@cs.cornell.edu Machine Learning: Neural Networks R&N 18.7 Intro & perceptron learning 1 2 Neuron: How the brain works # neurons
More informationIntroduction to Reinforcement Learning. CMPT 882 Mar. 18
Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and
More informationArtificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen
Artificial Neural Networks Introduction to Computational Neuroscience Tambet Matiisen 2.04.2018 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition
More informationName: UW CSE 473 Final Exam, Fall 2014
P1 P6 Instructions Please answer clearly and succinctly. If an explanation is requested, think carefully before writing. Points may be removed for rambling answers. If a question is unclear or ambiguous,
More informationCS Deep Reinforcement Learning HW2: Policy Gradients due September 19th 2018, 11:59 pm
CS294-112 Deep Reinforcement Learning HW2: Policy Gradients due September 19th 2018, 11:59 pm 1 Introduction The goal of this assignment is to experiment with policy gradient and its variants, including
More informationReinforcement Learning
Reinforcement Learning Cyber Rodent Project Some slides from: David Silver, Radford Neal CSC411: Machine Learning and Data Mining, Winter 2017 Michael Guerzhoy 1 Reinforcement Learning Supervised learning:
More informationCS230: Lecture 9 Deep Reinforcement Learning
CS230: Lecture 9 Deep Reinforcement Learning Kian Katanforoosh Menti code: 21 90 15 Today s outline I. Motivation II. Recycling is good: an introduction to RL III. Deep Q-Learning IV. Application of Deep
More informationMarkov Models and Reinforcement Learning. Stephen G. Ware CSCI 4525 / 5525
Markov Models and Reinforcement Learning Stephen G. Ware CSCI 4525 / 5525 Camera Vacuum World (CVW) 2 discrete rooms with cameras that detect dirt. A mobile robot with a vacuum. The goal is to ensure both
More informationArtificial Neural Networks. Q550: Models in Cognitive Science Lecture 5
Artificial Neural Networks Q550: Models in Cognitive Science Lecture 5 "Intelligence is 10 million rules." --Doug Lenat The human brain has about 100 billion neurons. With an estimated average of one thousand
More informationQ-learning. Tambet Matiisen
Q-learning Tambet Matiisen (based on chapter 11.3 of online book Artificial Intelligence, foundations of computational agents by David Poole and Alan Mackworth) Stochastic gradient descent Experience
More informationNeural Networks biological neuron artificial neuron 1
Neural Networks biological neuron artificial neuron 1 A two-layer neural network Output layer (activation represents classification) Weighted connections Hidden layer ( internal representation ) Input
More informationHopfield Networks and Boltzmann Machines. Christian Borgelt Artificial Neural Networks and Deep Learning 296
Hopfield Networks and Boltzmann Machines Christian Borgelt Artificial Neural Networks and Deep Learning 296 Hopfield Networks A Hopfield network is a neural network with a graph G = (U,C) that satisfies
More informationCSC242: Intro to AI. Lecture 21
CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Neural networks Daniel Hennes 21.01.2018 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Logistic regression Neural networks Perceptron
More informationCOGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November.
COGS Q250 Fall 2012 Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November. For the first two questions of the homework you will need to understand the learning algorithm using the delta
More informationCMU Lecture 12: Reinforcement Learning. Teacher: Gianni A. Di Caro
CMU 15-781 Lecture 12: Reinforcement Learning Teacher: Gianni A. Di Caro REINFORCEMENT LEARNING Transition Model? State Action Reward model? Agent Goal: Maximize expected sum of future rewards 2 MDP PLANNING
More informationMarkov Decision Processes
Markov Decision Processes Noel Welsh 11 November 2010 Noel Welsh () Markov Decision Processes 11 November 2010 1 / 30 Annoucements Applicant visitor day seeks robot demonstrators for exciting half hour
More informationy(x n, w) t n 2. (1)
Network training: Training a neural network involves determining the weight parameter vector w that minimizes a cost function. Given a training set comprising a set of input vector {x n }, n = 1,...N,
More informationArtificial Intelligence
Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement
More informationCOMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017
COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines COMP9444 17s2 Boltzmann Machines 1 Outline Content Addressable Memory Hopfield Network Generative Models Boltzmann Machine Restricted Boltzmann
More informationCS 188: Artificial Intelligence Spring Announcements
CS 188: Artificial Intelligence Spring 2011 Lecture 12: Probability 3/2/2011 Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. 1 Announcements P3 due on Monday (3/7) at 4:59pm W3 going out
More informationNeural Nets Supervised learning
6.034 Artificial Intelligence Big idea: Learning as acquiring a function on feature vectors Background Nearest Neighbors Identification Trees Neural Nets Neural Nets Supervised learning y s(z) w w 0 w
More information