Systematic Selection of Parameters in the development of Feedforward Artificial Neural Network Models through Conventional and Intelligent Algorithms

Similar documents
Solving Constrained Flow-Shop Scheduling. Problems with Three Machines

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Functions of Random Variables

Summary of the lecture in Biostatistics

Objectives of Multiple Regression

Block-Based Compact Thermal Modeling of Semiconductor Integrated Circuits

Simple Linear Regression

ABOUT ONE APPROACH TO APPROXIMATION OF CONTINUOUS FUNCTION BY THREE-LAYERED NEURAL NETWORK

Median as a Weighted Arithmetic Mean of All Sample Observations

Introduction to local (nonparametric) density estimation. methods

ESS Line Fitting

Lecture 16: Backpropogation Algorithm Neural Networks with smooth activation functions

CHAPTER VI Statistical Analysis of Experimental Data

Dimensionality reduction Feature selection

An Introduction to. Support Vector Machine

Research on SVM Prediction Model Based on Chaos Theory

Rademacher Complexity. Examples

Part 4b Asymptotic Results for MRR2 using PRESS. Recall that the PRESS statistic is a special type of cross validation procedure (see Allen (1971))

Towards Multi-Layer Perceptron as an Evaluator Through Randomly Generated Training Patterns

Multiple Choice Test. Chapter Adequacy of Models for Regression

13. Parametric and Non-Parametric Uncertainties, Radial Basis Functions and Neural Network Approximations

Econometric Methods. Review of Estimation

f f... f 1 n n (ii) Median : It is the value of the middle-most observation(s).

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture)

Unsupervised Learning and Other Neural Networks

ENGI 3423 Simple Linear Regression Page 12-01

(Monte Carlo) Resampling Technique in Validity Testing and Reliability Testing

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

Model Fitting, RANSAC. Jana Kosecka

Some Statistical Inferences on the Records Weibull Distribution Using Shannon Entropy and Renyi Entropy

Statistics: Unlocking the Power of Data Lock 5

Analyzing Two-Dimensional Data. Analyzing Two-Dimensional Data

Midterm Exam 1, section 2 (Solution) Thursday, February hour, 15 minutes

Unimodality Tests for Global Optimization of Single Variable Functions Using Statistical Methods

Lecture Notes Types of economic variables

Kernel-based Methods and Support Vector Machines

Lecture 12: Multilayer perceptrons II

Beam Warming Second-Order Upwind Method

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements

Lecture 3. Sampling, sampling distributions, and parameter estimation

Generalized Minimum Perpendicular Distance Square Method of Estimation

Correlation and Regression Analysis

L5 Polynomial / Spline Curves

Multivariate Transformation of Variables and Maximum Likelihood Estimation

Estimation of Stress- Strength Reliability model using finite mixture of exponential distributions

Comparison of Dual to Ratio-Cum-Product Estimators of Population Mean

Example: Multiple linear regression. Least squares regression. Repetition: Simple linear regression. Tron Anders Moger

KLT Tracker. Alignment. 1. Detect Harris corners in the first frame. 2. For each Harris corner compute motion between consecutive frames

Module 7: Probability and Statistics

13. Artificial Neural Networks for Function Approximation

UNIT 2 SOLUTION OF ALGEBRAIC AND TRANSCENDENTAL EQUATIONS

Generalization of the Dissimilarity Measure of Fuzzy Sets

Midterm Exam 1, section 1 (Solution) Thursday, February hour, 15 minutes

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades

Analysis of System Performance IN2072 Chapter 5 Analysis of Non Markov Systems

The equation is sometimes presented in form Y = a + b x. This is reasonable, but it s not the notation we use.

Analysis of Lagrange Interpolation Formula

Principal Components. Analysis. Basic Intuition. A Method of Self Organized Learning

Investigating Cellular Automata

ESTIMATION OF MISCLASSIFICATION ERROR USING BAYESIAN CLASSIFIERS

Generalized Linear Regression with Regularization

Bootstrap Method for Testing of Equality of Several Coefficients of Variation


UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Fitting models to data.

Analysis of Variance with Weibull Data

Descriptive Statistics

Simulation Output Analysis

TESTS BASED ON MAXIMUM LIKELIHOOD

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

Chapter 9 Jordan Block Matrices

Numerical Simulations of the Complex Modied Korteweg-de Vries Equation. Thiab R. Taha. The University of Georgia. Abstract

Bayesian Classification. CS690L Data Mining: Classification(2) Bayesian Theorem: Basics. Bayesian Theorem. Training dataset. Naïve Bayes Classifier

To use adaptive cluster sampling we must first make some definitions of the sampling universe:

Third handout: On the Gini Index

Chapter 5 Properties of a Random Sample

CS 2750 Machine Learning. Lecture 8. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

Lecture 1 Review of Fundamental Statistical Concepts

Fault Diagnosis Using Feature Vectors and Fuzzy Fault Pattern Rulebase

OPTIMAL LAY-OUT OF NATURAL GAS PIPELINE NETWORK

Lecture 8: Linear Regression

Chapter 8. Inferences about More Than Two Population Central Values

4. Standard Regression Model and Spatial Dependence Tests

Chapter 13, Part A Analysis of Variance and Experimental Design. Introduction to Analysis of Variance. Introduction to Analysis of Variance

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

Multiple Linear Regression Analysis

Evaluation of uncertainty in measurements

Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn:

A New Family of Transformations for Lifetime Data

Special Instructions / Useful Data

A Robust Total Least Mean Square Algorithm For Nonlinear Adaptive Filter

Maps on Triangular Matrix Algebras

An Improved Differential Evolution Algorithm Based on Statistical Log-linear Model

: At least two means differ SST

Statistics MINITAB - Lab 5

Study on a Fire Detection System Based on Support Vector Machine

Lecture 2 - What are component and system reliability and how it can be improved?

The internal structure of natural numbers, one method for the definition of large prime numbers, and a factorization test

Bayes (Naïve or not) Classifiers: Generative Approach

Transcription:

THALES Project No. 65/3 Systematc Selecto of Parameters the developmet of Feedforward Artfcal Neural Network Models through Covetoal ad Itellget Algorthms Research Team G.-C. Vosakos, T. Gaakaks, A. Krmpes, P. G. Beardos Natoal Techcal Uversty of Athes, School of Mechacal Egeerg, Maufacturg Techology Dvso. Itroducto Feedforward artfcal eural etworks (ANNs) are curretly used a varety of applcatos (Fgure ) wth great success. The reaso behd ths wdespread adopto ca be foud two very mportat abltes that they exhbt. ANNs ca be traed to lear through examples (memorzato ablty) ad ca respod to cases that are smlar but ot detcal to the oes that they have bee traed wth (geeralzato ablty). IMAGE PROCESSING CONTROL ROBOTICS PATTERN RECOGNITION ANN SIGNAL PROCESSING FINANCE SIMULATION MEDICINE Fgure. Applcatos of ANNs The buldg block of a feedforward ANN s called a euro, ts mathematcal model show Fgure 2a. Each euro, receves put the form of weghted sgals (where p s the put sgal matrx ad W s the weght coeffcet matrx), sums them alog wth a bas term ad apples a fucto f, called actvato fucto (usually o-lear), to determe ts ow output sgal, deoted by y. A typcal feedforward ANN s composed of several such euros, whch are arraged layers as depcted Fgure 2b. The mathematcal otato used s also gve.

X IW, b IW2, K w, X2 LW, p2 p3 p4 p w,2 w,3 w,4 w, Σ b y=f(wp+b) f y X3 X4 K2 b2 bm LW,2 LW,3 by IW2, Km X IWm, (a) (b) Fgures 2. Mathematcal model of (a) a typcal euro ad (b) typcal feedforward ANN x : output sgal of the -th euro the put layer k j : output sgal of the j-th euro the hdde layer y: ANN s output sgal IW j, : weght coeffcet betwee the -th put euro ad the j-th hdde euro b j : bas of the j-th hdde euro LW,j : weght coeffcet betwee the j-th hdde euro ad the output euro b y : bas of the output euro tasg(x): hyperbolc taget fucto =,2,, ad j=,2,,m The developmet of a feedforward ANN model volves several stages, from gatherg the ecessary data to creatg a satsfactory model (Fgure 3). The most mportat aspect of ths process s the selecto of certa parameters that are crucal for the model s performace, otably the umber of hdde layers ad euros. Sce there s o theoretcal method to determe the approprate archtecture, a tral-ad-error repettve procedure s volved that s both tme-cosumg ad wth ucerta results. The outcome s maly based o the experece of the researcher regardg ANNs ad the studed pheomeo. Data collecto Data preprocessg Selecto of Algorthm Selecto of TA parameters Selecto of ANN parameters ANN trag Performace checkg Satsfactory ANN model Fgure 3. Developmet of a ANN model

The preset work attempts to deal wth the above problem by developg a systematc way to select such parameters. The emphass s placed o two phases of ANN model buldg, amely the talzato of the etwork s weghts ad the determato of the most sutable archtecture. 2. Italzato of weght coeffcets Every trag procedure starts by talzg the weght coeffcets,.e. by assgg values to them. The trag s goal s to fd the weght values that mmze the etwork s error fucto. Sce the tal values of the weghts defe the startg pot of the trag algorthm o the error fucto, they affect both the trag speed ad the acheved trag error. Ths depeds o whether ths pot s close to the global mmum or located a area wth may local mma. The most commo methods used to talze the weghts are ether to radomly select values from a predefed value feld (usually cetered aroud zero) or to use a statstcal dstrbuto (usually the Gaussa or uform dstrbuto). Pre-processg of the trag data (ormalsato, scalg etc) s also used cojucto wth these techques. 2. The approach The approach adopted s a combato of aalytcal ad radom calculato of weght values. Referrg to Fgure 3, the etwork s output s calculated below: y + = LW, k+ LW, 2 k2 +... + LW, m k m by () The output of each hdde euro s tur gve by the followg equato: k j = ta sg( IW j, x + IW j,2 x2 +... + IW j, x + b j ) (2) If a multple lear regresso s performed o the trag data the the resultg aalytcal model would be: 0 + a x + a2 x2 +... + a x = a x + a0 = y= a Comparg equatos 2 ad 3, t s cocluded that the argumet of the hyperbolc taget fucto ca be replaced by the aalytcal model of the multple lear regresso. Ths s accomplshed f: (3) b kj = a 0 (4) IW = a (5) j, The remag weghts betwee the hdde ad the output layer ad the respectve bas are talzed radomly so that the startg pot of the trag algorthm s slghtly dfferet each tme the trag s repeated.

2.2 Results The approach was tested by comparg ts results to the Nguye-Wdrow method. Data orgatg from a bar turg process were used to develop a ANN model ad the umber of requred epochs ad acheved trag error (mea squared error ) were examed. Three dfferet archtectures were vestgated, amely 5x0x, 5x6x, 5x3x, ad the trag results are gve Table. o. 5x0x 5x6x 5x3x Italzato type 466,43E-25 2325,5E-29 5000,26E-06 N-W 2 734,54E-28 344 2,07E-28 5000,60E-06 N-W 3 670,78E-29 0000 2,45E-07 675 2,69E-06 N-W 4 68 2,24E-25 9377,95E-26 5000 6,90E-07 N-W 5 753 8,30E-26 397 3,79E-24 5000,06E-06 N-W 6 765 7,63E-28 0000 3,9E-08 5000 8,38E-07 N-W 7 983 5,90E-24 2565,37E-26 62 8,0E-07 N-W 8 255,E-27 262,92E-28 5000 7,69E-07 N-W 9 256,27E-3 70 6,85E-28 5000 9,3E-07 N-W 0 39 4,74E-24 0000 3,23E-07 5000,60E-06 N-W,E-24 6,08E-08,22E-06 o. 5x0x 5x6x 5x3x Italzato type 95 5,6E-3 235 4,97E-27 6393,03E-06 MLR 2 686,72E-3 473,25E-30 6576,03E-06 MLR 3 750,67E-28 2294 2,8E-29 650,03E-06 MLR 4 328 8,47E-26 666 3,74E-30 0000 6,94E-07 MLR 5 004 5,54E-3 957 3,73E-29 0000 8,28E-07 MLR 6 985,86E-26 247,09E-30 6752,03E-06 MLR 7 032 2,2E-26 926 6,32E-28 6990,03E-06 MLR 8 899 2,52E-28 996 7,54E-3 0000 8,28E-07 MLR 9 903,04E-28 926 2,50E-30 0000 8,28E-07 MLR 0 860 3,7E-3 09 8,82E-29 7030,03E-06 MLR,25E-26 5,76E-28 9,34E-07 Table. Italzato method results It s observed that there s mprovemet both of the examed parameters, whch s proportoately hgher to the complexty of the archtecture. 3. Determato of ANN s archtecture A ANN s archtecture s drectly related to the complexty of the soluto space that t represets. A etwork that s farly smple mght ot be able to lear the teractos uderlyg the trag data, whle a very complex etwork wll memorze them to such extet that t wll o loger be able to respod to ukow data. Obtag the rght archtecture s the most crucal stage the developmet of a ANN model ad gve that there s o theory as to what ths archtecture s or how to obta t, t s also oe of the most dffcult stages to perform. Curret practce volves a tral-ad-error approach, but there are a lot of research efforts volvg the use of evoluto algorthms as well as costructve/decostructve aalytcal techques that try to address ths problem.

3. The approach If the descrbed problem s vewed as a problem of mult-parametrc optmzato, the a geetc algorthm ca be used. The am s to fd the approprate archtecture,.e. the umber of hdde layers ad the umber of euros each oe of them, whch results a ANN model wth good performace. I order to satsfy ths, crtera that quatfy the performace of the model are developed ad are cosequetly tegrated the objectve fucto to be mmzed. These crtera are:. error crtero E = = trag o o, where E trag : trag error, : target value of the -th trag data vector, o ANN s respose to the -th trag data vector ad : umber of trag data. Geeralzato error crtero E = = geeralzato o o, where E geeralzato : geeralzato error, o : target value of the -th testg data vector, : ANN s predcted value for the -th testg data vector ad : umber of testg data. Feedforward archtecture crtero FFAC =, hdde layer ad m 0 +(m-0)*0., hdde layer ad m>0 2, 2 hdde layers ad m 0 ad 0 2+(m-0)*0., 2 hdde layers ad m>0 ad 0 2+(m-0)*0.+(-0)*0.2, 2 hdde layers ad m>0 ad >0, where m ad : umber of euros the st ad 2 d hdde layer respectvely v. speed crtero.5, trspeed=, epochs< 0 epochs> 0 v. Soluto space cosstecy crtero solspc = + x * 0. 33+ y, where x: umber of test cases that the absolute value of the relatve error s the terval [5,25] ad y: umber of test cases that the absolute value of the relatve error s the terval (25, ) :

3.2 Results Usg the same data as for the talzato method testg, the developed method was compared to the results of a expereced researcher that followed the tral-ad-error approach. The model acheved by a expereced huma aalyst was 5x3x. The best objectve fucto value versus the umber of geeratos s show Fgure 4 ad the umber of euros each layer s gve Table 2. log0(f(x)) 0.6 0.4 0.2 0-0.2-0.4-0.6-0.8 0 0 20 30 40 50 60 geerato Best = 0.20897 st hdde layer 2 d h dde la y er 0 3 3 5 2 4 3 4 0 0 6 3 2 0 7 4 9 0 0 9 0 Fgure 4. Best objectve fucto value hstory Table 2. Number of euros each hdde layer As ca be see from the above table, the developed methodology performs as well as a huma expert, but at the same tme t offers advatages such as o requred experece, shorter developmet tme ad systematc selecto of the etwork s parameters. 4. Coclusos By usg the descrbed methodologes, the developmet of a ANN model s facltated ad, more mportatly, t s carred out followg a systematc procedure, rather tha a repettve tral-ad-error procedure wth ucerta results. I both felds (weght talzato ad archtecture determato), the results show a mprovemet over curret practces. Furthermore, the latter case, the focus s prmarly o the geeralzato performace of the ANN ad etwork sze, whch guaratee accurate ad cosstet model predctos. Publcatos. Italsato mprovemet egeerg feedforward ANN models, 3 th Europea Symposum o Artfcal Neural Networks, 27-29 Aprl 2005, Bruges, Belgum. 2. Optmsg feedforward artfcal eural etwork archtecture, Egeerg Applcatos of Artfcal Itellgece, submtted for publcato.