An Adaptive Broom Balancer with Visual Inputs

Similar documents
Simple Neural Nets For Pattern Classification

Using a Hopfield Network: A Nuts and Bolts Approach

Chapter 2 Single Layer Feedforward Networks

Artificial Neural Networks Examination, June 2005

Keywords Eigenface, face recognition, kernel principal component analysis, machine learning. II. LITERATURE REVIEW & OVERVIEW OF PROPOSED METHODOLOGY

The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt.

AI Programming CS F-20 Neural Networks

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers

Pattern Association or Associative Networks. Jugal Kalita University of Colorado at Colorado Springs

Artificial Neural Networks The Introduction

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Artificial Intelligence (AI) Common AI Methods. Training. Signals to Perceptrons. Artificial Neural Networks (ANN) Artificial Intelligence

Neural Networks for Two-Group Classification Problems with Monotonicity Hints

3.4 Linear Least-Squares Filter

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm

Introduction to Artificial Neural Networks

Brief Introduction of Machine Learning Techniques for Content Analysis

Using Variable Threshold to Increase Capacity in a Feedback Neural Network

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Replacing eligibility trace for action-value learning with function approximation

INTRODUCTION TO NEURAL NETWORKS

Handwritten English Character Recognition using Pixel Density Gradient Method

Supervised learning in single-stage feedforward networks

CHAPTER 3. Pattern Association. Neural Networks

Machine Learning 3. week

Linear Regression. S. Sumitra

Hopfield Neural Network

This is a repository copy of Improving the associative rule chaining architecture.

Artificial Neural Networks Examination, June 2004

Machine Learning Basics

Linear Regression (continued)

Eigenface-based facial recognition

PMR5406 Redes Neurais e Lógica Fuzzy Aula 3 Single Layer Percetron

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Introduction to Neural Networks

Least Mean Squares Regression. Machine Learning Fall 2018

Neural Networks. Fundamentals Framework for distributed processing Network topologies Training of ANN s Notation Perceptron Back Propagation

Lecture: Face Recognition

Artificial Neural Networks. Historical description

Experiments on the Consciousness Prior

Perceptron. (c) Marcin Sydow. Summary. Perceptron

Least Mean Squares Regression

Edge Adaptive Prediction for Lossless Image Coding

Lab 6d: Self-Erecting Inverted Pendulum (SEIP)

THE Back-Prop (Backpropagation) algorithm of Paul

Machine Learning. Neural Networks

Randomized Decision Trees

Machine Learning (CS 567) Lecture 2

CS 730/730W/830: Intro AI

AP Physics 1 Summer 2016 Assignment

Machine Learning Linear Models

Lecture 6. Notes on Linear Algebra. Perceptron

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Artificial Neural Networks

Stand-Alone Hardware-Based Learning System. Trevor CLARKSON and Chi Kwong NG 1. Department of Electronic and Electrical Engineering,

Constrained Optimization and Support Vector Machines

Model-Free Stochastic Perturbative Adaptation and Optimization

AP Physics 2. Norton High School Teacher: Mr. Murray

Topic 3: Hypothesis Testing

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

A Novel Activity Detection Method

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms

Machine Learning Lecture 5

Neural Networks. Nethra Sambamoorthi, Ph.D. Jan CRMportals Inc., Nethra Sambamoorthi, Ph.D. Phone:

Lecture 4: Perceptrons and Multilayer Perceptrons

Neural Networks biological neuron artificial neuron 1

Neural Networks Lecture 2:Single Layer Classifiers

The Markov Decision Process Extraction Network

Linear Neural Networks

Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

Iterative Construction of Sparse Polynomial Approximations

ECE521 Lectures 9 Fully Connected Neural Networks

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Single layer NN. Neuron Model

Image Contrast Enhancement and Quantitative Measuring of Information Flow

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Reinforcement Learning for Continuous. Action using Stochastic Gradient Ascent. Hajime KIMURA, Shigenobu KOBAYASHI JAPAN

A Modular NMF Matching Algorithm for Radiation Spectra

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Extending the Associative Rule Chaining Architecture for Multiple Arity Rules

Oil Field Production using Machine Learning. CS 229 Project Report

Notes on Machine Learning for and

6 Reinforcement Learning

Data Mining Part 5. Prediction

Reinforcement Learning based on On-line EM Algorithm

Vote. Vote on timing for night section: Option 1 (what we have now) Option 2. Lecture, 6:10-7:50 25 minute dinner break Tutorial, 8:15-9

ANN Control of Non-Linear and Unstable System and its Implementation on Inverted Pendulum

Artificial Neural Network-Based Microfossil Recognition System, a Case Study on Miocene Sample in Harsin Area West of Iran 1

reflector screen 10 g masses

A Posteriori Corrections to Classification Methods.

SGD and Deep Learning

Linear Discrimination Functions

CS 4700: Foundations of Artificial Intelligence

COMPARING PERFORMANCE OF NEURAL NETWORKS RECOGNIZING MACHINE GENERATED CHARACTERS

COMP304 Introduction to Neural Networks based on slides by:

Identification of linear and nonlinear driver steering control

Representing Images Detecting faces in images

COMP-4360 Machine Learning Neural Networks

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

Transcription:

An Adaptive Broom Balancer with Visual Inputs Viral V. Tolatl Bernard Widrow Stanford University Abstract An adaptive network with visual inputs has been trained to balance an inverted pendulum. Simulation results show that the network is capable of extracting the necessary state information from time sequences of crude visual images. A single linear adaptive threshold element (ADALINE) was adequate for this task. When tested by simulation, the performace achieved was sufficient to keep the pendulum from falling. The adaptive network s ability to generalize made this possible since the training set encompassed only a fraction of all possible states. 1 Introduction Balancing a broom stick in the palm of one s hand is a fairly complex task from a control point of view. However for those of us with a modicum of coordination it is a seemingly simple task. If asked how we do it, most of us would be hard pressed to come up with an accurate account. An analogous problem is that of balancing an inverted pendulum fixed to a nonstationary platform. If the task of balancing such a system were presented to a person in the form of a video game, that person would probably master the game in a short period of time. Of course to master this game we have available to us the most sophisticated adaptive network around, the human brain. In this paper we analyze the ability of a simple adaptive network to balance an inverted pendulum. There are several reasons why the inverted pendulum was chosen as a basis for adaptive network research. First, the inverted pendulum problem is a classical control problem which has been extensively studied and is well understood. The inverted pendulum problem is representative of many other This work was supported by the NASA Center for Aeronautics and Space Information Systems (CASIS) under grant NAGW 419. 11-641

F+ + x,v 6 = pendulum angle 0 = pendulum velocity x = cart position v = cart velocity F = reaction force Figure 1: Inverted Pendulum. control problems and therefore an understanding of how to use adaptive networks to balance the pendulum will allow us to solve other control problems. By using simple visual images of the inverted pendulum, we can examine pattern recognition and vision problems as well. Finally, the complexity of the inverted pendulum is significant enough to make the problem interesting while still being simple enough to make it computationally tractable. 2 Inverted Pendulum In the inverted pendulum system illustrated in Figure 1, four variables describe the state of the system: the position of the cart (x), the velocity of the cart (v), the angle of the pendulum (8), and the angular velocity of the pendulum (U). The force required to stabilize the system at time t is where U is a positive constant representing the magnitude of the force to be applied to the system and the coefficients L1, k2, k3, k4, are derived from the physical Characteristics of the pendulum system and optimal bang-bang control theory. If the cart and pendulum velocities cannot be directly measured, they can be estimated by the instantaneous cart and pendulum velocities which can be derived from the current and earlier position measurements. Equation 1 can be rewritten as Ft = U x sgn [(kl+ b)zt - bzt--1+ (IC3 + k4)& - k@t-1], (2) where xt-1 and Bt-1 represent the cart and pendulum positions at an earlier time, the coefficients kl, Icz, kg, k4 are those of Equation 1, and the time between xt and z ~ and - ~ 8, and 8t-l is some small fixed interval. 11-642

3 Previous Work The use of an adaptive network to stabilize a mechanical inverted pendulum was first studied by Widrow and Smith[3,7,5]. They demonstrated that a network of one computing element, an ADALINE (Adaptive Linear Element), was capable of balancing an inverted pendulum if the ADALINE input consisted of the four state variables, each encoded with an N-bit linearly independent code, a so called thermometer code. The force produced by the ADALINE approximated that called for by Equation 1. The network was trained using the Widrow-Hoff LMS algorithm with the output of an optimal controller of the form in Equation 1 as the teacher. Barto[2] has also studied the problem of balancing an inverted pendulum using an adaptive network. Barto used an input code and computing unit similar to that of Widrow and Smith but for training he used a system based on reward/punishment learning. This learning algorithm was based on one devised by Widrow, Gupta, and Maitra[6] which they had applied to a system that learned to play blackjack. 4 Current Work To model the inverted pendulum system, we replaced the the mechanical cart and pendulum used by Widrow and Smith with a Macintosh computer simulation and display. The adaptive circuits were replaced by software simulation and the encoded state variable inputs were replaced with spatially quantized binary images of an inverted pendulum. Examples of the quantized images are shown in Figure 2. The images were 5 by 11 binary pixel representations of the cart and pendulum; 65 different images were possible with 5 different cart positions and 13 different pendulum angles. Pictures larger than the 5 by 11 pixel image were not used because they provided no additional information and added computational overhead. Smaller pictures did not provide the resolution needed. The actual input to the network consisted of two 5 by 11 pixel images, one representing the present visual image and the other representing the visual image at a slightly earlier time. Both images were necessary because the network had to derive the cart and pendulum velocities as well as their positions2. With these two images there were 4225 different possible input images (65 x 65) with 110 bits (5 x 11 x 2) each. 2The system could be designed to work with only one image by internally reusing the current image with the next image with the aid of delay circuits. The end result would be the same. 11-643

Figure 2: Two examples of Macintosh generated pendulum and cart images. The pictures on the right are spatially quantized images of the pendulum and cart figures on the left. For the network to balance the pendulum, in some sense it needed to implement Equation 2, estimating the pendulum and cart positions multiplied by the coefficients kl, 122, k3, and kq. Finding a set of weights to do this requires solving a system of M linear equations with N unknowns where M is the number of images and N is the number of weights or equivalently the number of bits in the input image. A solution that minimizes the mean-squared-error can be found iteratively by using the Widrow-Hoff LMS algorithm. If an exact solution exists, the LMS algorithm will eventually converge to that solution. There are various methods of calculating the least squares solution directly[4,1]. The LMS algorithm was first simulated using all the possible images for training. The learning curve shown in Figure 3 is a plot of the number of presentations of the entire training set versus the percentage of classification errors. The error drops dramatically after just a few passes through the training sequence, then flattens out. After 1000 passes through the training sequence, the error was 3.4%. If the simulation was allowed to continue, the error would have eventually reached 1.78% which is the error obtained from a directly calculated least squares solution. The weights for the solution after 1000 passes are illustrated in Figure 4. Next, simulations were done to determine the network s performance when trained on a fraction of the total number of possible images. Training sets of various sizes were picked at random from the set of possible training images. Results were computed for four different training sets of each size and averaged. The final error after 300 passes, computed over all possible images, is plotted as a function of the training set size in Figure 5. 11-644

0 100 200 300 number of presentations of entire training sequence Figure 3: Learning curve. Figure 4: Weights for a network solution using the LMS algorithm. The top 5 rows correspond to the current visual image of the cart and pendulum and the bottom 5 rows correspond to the image at an earlier time. The size of the squares denote the magnitude of the weights. Grey represents negative weights and black represents positive weights. 11-645

0 20 40 60 80 100 size of training set (percentage of all possible images) Figure 5: Generalization curve. When the adaptive network solutions were tested with the Macintosh computer simulation, the pendulum was successfully kept from falling in spite of the fact that there were a small number of incorrect responses. 5 Future Work Although control of the inverted pendulum was achieved by a single ADALINE with visual inputs, more complicated control problems will require larger adaptive networks with appropriate sensory inputs. The long term objective of this research is to demonstrate that a human expert can impart learned skills to an adaptive network in complex problem environments for which it may not be possible to develop explicit decision rules. This objective leads toward a new form of man-machine interaction in which a person trains a machine to perform a task by having the machine look over the person s shoulder, observing the environment and the person s responses to the environment. After limited training, the machine should be able to perform the task independently of the person, responding correctly to almost all situations, even those not specifically trained on. 11-646

References [l] PRO-MATLAB User s Guide. The Mathworks, Inc. [2] Andrew G. Barto, Richard S. Sutton, and Charles W. Anderson. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and cybernetics, smc-13(5), 1983. [3] Fred W. Smith. Contactor Control by Adaptive Pattern-Recognition Techiques. Technical Report 6762-1, Stanford Electronics Lab, Stanford University, April 1964. [4] Gilbert Strang. Linear Algebra and Its Applications. Harcourt Brace Javanovich, Inc., second edition, 1980. [5] B. Widrow. The original adaptive neural net broom-balancer. In International Symposium on Circuits and Systems, pages 351-357, IEEE, May 1987. [6] B. Widrow, N. K. Gupta, and S. Maitra. Punish/reward: learning with a critic in adaptive threshold systems. IEEE Transactions on Systems, Man, and Cybernetics, SMC-3:455465, 1973. [7] B. Widrow and F.W. Smith. Pattern recognizing control systems. In Computer Information Sciences (COINS) Symposium, 1963. 11-647