f:\spss\14\14.doc 22/03/13 - PDF Free Download

Neural Networks...2 Introduction...2 Multilayer Perceptron Neural Network Algorithm (MLP)...2 Biological Neural Structure Presenting The Structural Elements Of Neurons (Dendrites, Axon) And Synapse...3 Artificial Neuron Model With Inputs (x, x 1,, x n ), Weights (w, w 1,,w n ) And Output y...4 Example Of A Neural Network With A Hidden Layer...5 Data Set A...6 Data Characteristics...6 Summary Statistics...6 Variable Information...7 What Does It Look Like?...8 SPSS Commands...9 Neural Network With Optimal Architecture Obtained After The Algorithm Training Process... 14 Results Of The Fitting Of The Data... 15 Residuals Of The Results Of The Fitting Of The Data... 16 Neural Network Input Variables And Their Respective Relevance To The Fitting Process... 17 For additional analysis... 18 Syntax... 22 Data Set B... 23 Relevant Information... 25 Attribute Information... 25 What Does It Look Like?... 26 SPSS Commands... 27 For additional analysis... 27 Syntax... 28 Neural Network With Optimal Architecture Obtained After The Algorithm Training Process... 3 Predicted Pseudo-probability... 31 Neural Network Input Variables And Their Respective Relevance To The Fitting Process... 32 f:\spss\14\14.doc 22/3/13 1

Neural Networks Introduction Neural networks offer modelling procedures for nonlinear data, which enables you to discover more complex relationships in your data. You can thus develop more accurate and effective predictive models. The introductory section of these notes is loosely based on two notes by A.T.S. Carneiro (http://www.ibm.com/developerworks/industry/library/ind-ibm-spss-modeler/index.html and http://www.ibm.com/developerworks/industry/library/ind-ibm-spss-modeler2/index.html). Multilayer Perceptron Neural Network Algorithm (MLP) The scientific literature presents several regression algorithms, several of which are implemented in the SPSS Modeller. The multilayer perceptron neural network algorithm (MLP) was selected for these examples. The MLP neural network algorithm is based on the functional principles of biological neural structures, as indicated in the figure. Neural computing researches attempt to organise mathematical models similarly to the structures and organization of neurons in biological brains to achieve similar processing abilities, in addition to the inherent capacities of biological brains, such as learning based on examples, trial and error, knowledge generalization, among many others. 2

Biological Neural Structure Presenting The Structural Elements Of Neurons (Dendrites, Axon) And Synapse Based on this analogy to biological neurons, the MLP neural network algorithm implements a neural network that is composed of layers of artificial neurons that are stimulated by input signals, which are transmitted through the network via synapses connecting neurons in different layers. The figure presents an artificial perceptron neuron model with n inputs {x 1, x 2,, x n }, in which each input x i has an associated synapse w i, and an output y. 3

Artificial Neuron Model With Inputs (x, x 1,, x n ), Weights (w, w 1,,w n ) And Output y There is also an additional neuron parameter, named w, known as bias that can be interpreted as a synapse associated to an input x = -1. The output of the neuron y is based on the product between input vector x (x, x 1, x 2,, x n ) and vector w (w,w 1, w 2,, w n ) composed of synapses, including the bias (w ), n x. w x w i i i The neuron output is then obtained through the activation function of neuron y ( x. w), in which a hyperbolic tangent function is usually adopted (sigmoid-nature function), defined by for a generic value a; however, it is convenient to use other activation functions in certain scenarios. 1 e ( a) 1 e a a The artificial neuron model is feed forward, that is, connections are directed from inputs (x, x 1,, x n ) to output y of the neuron. The figure presents the layout of perceptron neurons in an MLP neural network, in which there are two neuron layers, one hidden and one output. Regarding the neural network that is presented in the figure, each neuron of the hidden layer is connected to each neuron in the output layer. Therefore, inputs of the output layer neurons correspond to the outputs of hidden layer neurons. The analyst that uses the artificial neural network algorithm must choose how many neurons to use in the hidden layer, considering the set of input data, since with a low number of neurons in the hidden layer; the neural 4

network is not able to generalize each class's data. However, a high number of neurons in the hidden layer prompts the over fitting phenomenon, in which the neural network exclusively learns training data, and does not generalize learning for data classes. Example Of A Neural Network With A Hidden Layer The neural network training process is conducted based on a back propagation algorithm, with the purpose of adjusting values that are associated to synapses to allow the neural network to map an input space and output space, in which the input vectors x are samples of the input space and each input vector is associated to an output z, which can be represented by a vector z (z 1, z 2,, z n ), based on a scalable value or a symbolic value. Specifically, for symbolic values, a neuron in the output layer corresponds to each of the possible symbols that are associated to the input vector. During the neural network training process, a set of input data is initially determined, to which the associated output is known, and random values are attributed to each synapse in the neural network. The data is presented to the neural network and the supplied output is compared to the actual output, generating an error value. The error value is then employed to adjust reverse neural network synapses, from output to inputs (back propagated). 5

The process of adjusting synapse values is repeated until an interruption criterion is established, for example, a fixed number of repetitions or a minimum error. Thus, in each repetition, the outputs that are provided by the neural network get closer to the actual output. The synapse value correction equations minimize errors between the output that is provided by the neural network and the actual output. For conventional regression, the methodology employed consists in using neurons with linear activation function and assigning an output neuron to map each of the output vector's components. In cases where the output is the only scalable value, the neural network is designed with a single neuron in the output layer. Two examples are considered. Data Set A This data set was downloaded from http://archive.ics.uci.edu/ml/ or more directly http://archive.ics.uci.edu/ml/datasets/concrete+compressive+strength. It describes variables affecting concrete compressive strength. Concrete is the most important material in civil engineering. The concrete compressive strength is a highly nonlinear function of age and ingredients. These ingredients include cement, blast furnace slag, fly ash, water, super-plasticizer, coarse aggregate, and fine aggregate. Original Owner and Donor Prof. I-Cheng Yeh Department of Information Management Chung-Hua University, Hsin Chu, Taiwan 367, R.O.C. Data Characteristics The actual concrete compressive strength (MPa) for a given mixture under a specific age (days) was determined from laboratory. Data is in raw form (not scaled). Summary Statistics Number of instances (observations): 13 Number of Attributes: 9 Attribute breakdown: 8 quantitative input variables, and 1 quantitative output variable 6

Missing Attribute Values: None The file that is used is organised in nine columns, in which each line represents data that is collected from a concrete mixture analysed in a lab. The first seven columns correspond to data about concentration of elements in the mixture, in kg by m3 of concrete; the following column corresponds to the age of the concrete, in days; and the last column corresponds to the sturdiness of the concrete, which is measured in MPa (mega Pascal, pressure measurement unit). Variable Information Given are the variable name, variable type, the measurement unit and a brief description. The concrete compressive strength is the regression problem. The order of this listing corresponds to the order of numerals along the rows of the database. Name Data Type Measurement Description Cement quantitative kg in a m 3 mixture Input Variable Blast Furnace Slag quantitative kg in a m 3 mixture Input Variable Fly Ash quantitative kg in a m 3 mixture Input Variable Water quantitative kg in a m 3 mixture Input Variable Superplasticizer quantitative kg in a m 3 mixture Input Variable Coarse Aggregate quantitative kg in a m 3 mixture Input Variable Fine Aggregate quantitative kg in a m 3 mixture Input Variable Age quantitative Day (1~365) Input Variable Concrete compressive strength quantitative MPa Output Variable All of these attributes are numerical variables whose values correspond to the measurement unit; thus, the neural network that is used is designed to solve a regression type problem in which the input space comprises the first eight columns of the file (cement concentration to age) and the output space corresponds to the ninth column of the file (concrete sturdiness). Past Usage 1. I-Cheng Yeh, "Modeling of strength of high performance concrete using artificial neural networks," Cement and Concrete Research, Vol. 28, No. 12, pp. 1797-188 (1998). 2. I-Cheng Yeh, "Modeling Concrete Strength with Augment-Neuron Networks," J. of Materials in Civil Engineering, ASCE, Vol. 1, No. 4, pp. 263-268 (1998). 7

3. I-Cheng Yeh, "Design of High Performance Concrete Mixture Using Neural Networks," J. of Computing in Civil Engineering, ASCE, Vol. 13, No. 1, pp. 36-42 (1999). 4. I-Cheng Yeh, "Prediction of Strength of Fly Ash and Slag Concrete By The Use of Artificial Neural Networks," Journal of the Chinese Institute of Civil and Hydraulic Engineering, Vol. 15, No. 4, pp. 659-663 (23). 5. I-Cheng Yeh, "A mix Proportioning Methodology for Fly Ash and Slag Concrete Using Artificial Neural Networks," Chung Hua Journal of Science and Engineering, Vol. 1, No. 1, pp. 77-84 (23). 6. Yeh, I-Cheng, "Analysis of strength of concrete using design of experiments and neural networks,": Journal of Materials in Civil Engineering, ASCE, Vol.18, No.4, pp. 597-64 (26). What Does It Look Like? 6 2 4 15 2 25 8 95 11 2 4 4 2 Cement 4 Blast Furnace Slag 2 2 1 Fly Ash 25 Water 2 15 3 15 Superplasticizer 11 Coarse Aggregate 95 1 8 8 6 Fine Aggregate 4 8 Age 2 4 Concrete compressive strength 2 4 6 1 2 15 3 6 8 1 4 8 8

SPSS Commands 9

Note that the covariates have been standardised, so that none are numerically dominant. 1

As a first trial a small hidden layer is adopted, this may be increased in succeeding applications of the procedure. 11

The outputs are saved so that further analysis may be undertaken. 13

Neural Network With Optimal Architecture Obtained After The Algorithm Training Process The figure presents a dispersion chart containing actual concrete resistance values on the horizontal axis and values that are estimated by the neural network with two hidden neurons in the vertical axis. 14

Results Of The Fitting Of The Data The identity line (diagonal), which represents the ideal result, in which each concrete sample would present the exact same resistance as estimated by the neural network. In this case, the high concentration of points next to the identity line is visually evident, which is confirmed by correlation and error, reported in the final table, evidencing the efficiency of the proposed methodology. 15

Residuals Of The Results Of The Fitting Of The Data As previously mentioned, each neural network input element has an associated synapse, which is represented by a numerical value that controls input; the higher the synapse value, the more relevant is the input for the result that is generated by the neural network. The figure presents a relevance chart for each input field to obtain fitting results, information that becomes available after the creation of the node corresponding to the model generated. 16

Neural Network Input Variables And Their Respective Relevance To The Fitting Process The outputs are saved so that further analysis may be undertaken. Metrics are applied to the results represent the linear correlation coefficient, which is based on the equation and absolute average error. Which is based on the equation for both generic data sets, in which the correlation coefficient measures the influence between data sets, and can also be interpreted as a coefficient of similarity between two data sets; and the absolute average error measures inconsistencies between two data sets. 17

For additional analysis 18

Also 19

Syntax Set Printback=on 1 *Multilayer Perceptron Network. MLP Concrete_compressive_strengthMPa_megapascals (MLEVEL=S) WITH Cement_kg_in_a_m3_mixture Blast_Furnace_Slag_kg_in_a_m3_mixture Fly_Ash_kg_in_a_m3_mixture Water_kg_in_a_m3_mixture Superplasticizer_kg_in_a_m3_mixture Coarse_Aggregate_kg_in_a_m3_mixture Fine_Aggregate_kg_in_a_m3_mixture Age_day /RESCALE COVARIATE=STANDARDIZED /PARTITION TRAINING=7 TESTING=3 HOLDOUT= /ARCHITECTURE AUTOMATIC=YES (MINUNITS=1 MAXUNITS=2) 2 /CRITERIA TRAINING=BATCH OPTIMIZATION=SCALEDCONJUGATE LAMBDAINITIAL=.5 SIGMAINITIAL=.5 INTERVALCENTER= INTERVALOFFSET=.5 MEMSIZE=1 /PRINT CPS NETWORKINFO SUMMARY CLASSIFICATION IMPORTANCE /PLOT NETWORK PREDICTED RESIDUAL /SAVE PREDVAL /STOPPINGRULES ERRORSTEPS= 1 (DATA=AUTO) TRAININGTIMER=ON (MAXTIME=15) MAXEPOCHS=AUTO ERRORCHANGE=1.E-4 ERRORRATIO=.1 /MISSING USERMISSING=EXCLUDE. CORRELATIONS 3 /VARIABLES=Concrete_compressive_strengthMPa_megapascals MLP_PredictedValue /PRINT=TWOTAIL NOSIG /MISSING=PAIRWISE. COMPUTE abs=abs(concrete_compressive_strengthmpa_megapascals-mlp_predictedvalue). EXECUTE. 22

DESCRIPTIVES VARIABLES=abs /STATISTICS=MEAN. delete variables MLP_PredictedValue abs 4 1 This is equivalent to <edit> <option> <viewer> drop the list to Notes <hidden>, and turns off the Notes output. 2 The size of the hidden layer may be adjusted, MINUNITS=1 MAXUNITS=2 with MINUNITS < MAXUNITS. 3 This is the start of the analysis stage. 4 It is wise to tidy up any analysis variables prior to repeating the process. Any solution will depend on the random number seed employed, so values will only be broadly similar with those presented here. Size of Hidden Layer Correlation Mean Absolute Deviation Sum of Squares Error 2.91 5.3 6.2 4.93 4.7 46.4 6.94 4.4 42.7 By analysing results that are obtained by the neural network with two, four, and six hidden layer neurons, it is concluded that the best neural network configuration is probably four hidden neurons, which presents both a higher correlation and decreased error between actual data and estimated values. The goal is to be parsimonious, combining a good fit with as few neurons as possible. The second example is possibly closer to those you might encounter. The SPSS commands are identical and the secondary analysis is simply reduces to a comparative table. Data Set B This data set was downloaded from http://archive.ics.uci.edu/ml/ or more directly http://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+(diagnostic). This breast cancer database was obtained from the University of Wisconsin Hospitals, Madison from Dr. William H. Wolberg. 1. O. L. Mangasarian and W. H. Wolberg: "Cancer diagnosis via linear programming", SIAM News, Volume 23, Number 5, September 199, pp 1 & 18. 23

2. William H. Wolberg and O.L. Mangasarian: "Multisurface method of pattern separation for medical diagnosis applied to breast cytology", Proceedings of the National Academy of Sciences, U.S.A., Volume 87, December 199, pp 9193-9196. Attributes 2 through 1 have been used to represent instances. Each instance has one of 2 possible classes: benign or malignant. -- Size of data set: only 369 instances (at that point in time) -- Collected classification results: 1 trial only -- Two pairs of parallel hyperplanes were found to be consistent with 5% of the data -- Accuracy on remaining 5% of dataset: 93.5% -- Three pairs of parallel hyperplanes were found to be consistent with 67% of data -- Accuracy on remaining 33% of dataset: 95.9% 3. O. L. Mangasarian, R. Setiono, and W.H. Wolberg: "Pattern recognition via linear programming: Theory and application to medical diagnosis", in: "Large-scale numerical optimization", Thomas F. Coleman and Yuying Li, editors, SIAM Publications, Philadelphia 199, pp 22-3. 4. K. P. Bennett & O. L. Mangasarian: "Robust linear programming discrimination of two linearly inseparable sets", Optimization Methods and Software 1, 1992, pp 23-34 (Gordon & Breach Science Publishers). 5. J. Zhang: (1992). Selecting typical instances in instance-based learning. Proceedings of the Ninth International Machine Learning Conference} 1992 pp. 47--479. Aberdeen, Scotland: Morgan Kaufmann. Attributes 2 through 1 have been used to represent instances. Each instance has one of 2 possible classes: benign or malignant. -- Size of data set: only 369 instances (at that point in time) -- Applied 4 instance-based learning algorithms -- Collected classification results averaged over 1 trials -- Best accuracy result: -- 1-nearest neighbor: 93.7% -- trained on 2 instances, tested on the other 169 -- Also of interest: -- Using only typical instances: 92.2% (storing only 23.1 instances) -- trained on 2 instances, tested on the other 169 24

Relevant Information Samples arrive periodically as Dr. Wolberg reports his clinical cases. The database therefore reflects this chronological grouping of the data. This grouping information appears immediately below, having been removed from the data itself: Group 1: 367 instances (January 1989) Group 2: 7 instances (October 1989) Group 3: 31 instances (February 199) Group 4: 17 instances (April 199) Group 5: 48 instances (August 199) Group 6: 49 instances (Updated January 1991) Group 7: 31 instances (June 1991) Group 8: 86 instances (November 1991) Total: 699 points (as of the donated database on 15 July 1992) Note that the results summarized above refer to a dataset of size 369, while Group 1 has only 367 instances. This is because it originally contained 369 instances; 2 were removed. The following statements summarizes changes to the original Group 1's set of data: Number of Instances: 699 (as of 15 July 1992) Number of Attributes: 1 plus the class attribute Attribute Information # Attribute Domain 1 Sample code number id number 2 Clump Thickness 1-1 3 Uniformity of Cell Size 1-1 4 Uniformity of Cell Shape 1-1 5 Marginal Adhesion 1-1 6. Single Epithelial Cell Size 1-1 7 Bare Nuclei 1-1 8 Bland Chromatin 1-1 9 Normal Nucleoli 1-1 1 Mitoses 1-1 11 Class 2 for benign 4 for malignant 25

Missing attribute values 16 There are 16 instances in Groups 1 to 6 that contain a single missing (i.e., unavailable) attribute value. Class distribution Benign: 458 (65.5%) Malignant: 241 (34.5%) What Does It Look Like? 5 1 5 1 5 1 5 1 1 5 Clump Thickness Uniformity of Cell Size 1 1 5 Class 2 4 5 1 5 1 Uniformity of Cell Shape Marginal Adhesion Single Epithelial Cell Size Bare Nuclei 1 5 1 5 5 1 Bland Chromatin Normal Nucleoli 1 5 5 Mitoses 5 1 5 1 5 1 5 1 5 1 26

SPSS Commands For additional analysis 27

Drag Class_... to columns and Predict to rows. Syntax Set Printback=on *Multilayer Perceptron Network. MLP Class_2_for_benign_4_for_malignant (MLEVEL=N) WITH Clump_Thickness_11 Uniformity_of_Cell_Size_11 niformity_of_cell_shape_11 Marginal_Adhesion_11 Single_Epithelial_Cell_Size_11 Bare_Nuclei_11 Bland_Chromatin_11 Normal_Nucleoli_11 Mitoses_11 /RESCALE COVARIATE=STANDARDIZED /PARTITION TRAINING=7 TESTING=3 HOLDOUT= /ARCHITECTURE AUTOMATIC=YES (MINUNITS=1 MAXUNITS=2) /CRITERIA TRAINING=BATCH OPTIMIZATION=SCALEDCONJUGATE LAMBDAINITIAL=.5 SIGMAINITIAL=.5 INTERVALCENTER= INTERVALOFFSET=.5 MEMSIZE=1 /PRINT CPS NETWORKINFO SUMMARY CLASSIFICATION IMPORTANCE 28

/PLOT NETWORK PREDICTED /SAVE PREDVAL /STOPPINGRULES ERRORSTEPS= 1 (DATA=AUTO) TRAININGTIMER=ON (MAXTIME=15) MAXEPOCHS=AUTO ERRORCHANGE=1.E-4 ERRORRATIO=.1 /MISSING USERMISSING=EXCLUDE. * Custom Tables. 1 CTABLES /VLABELS VARIABLES=MLP_PredictedValue Class_2_for_benign_4_for_malignant DISPLAY=LABEL /TABLE MLP_PredictedValue [C] BY Class_2_for_benign_4_for_malignant [C][COUNT F4.] /CATEGORIES VARIABLES=MLP_PredictedValue Class_2_for_benign_4_for_malignant ORDER=A KEY=VALUE EMPTY=EXCLUDE. delete variables MLP_PredictedValue 1 For this example it is only necessary to prepare a comparative table of the outputs.. 29

Neural Network With Optimal Architecture Obtained After The Algorithm Training Process 3

Predicted Pseudo-probability The columns have been split for clarity. Effectively the plot should display only two columns (2 and 4). 31

Neural Network Input Variables And Their Respective Relevance To The Fitting Process A brief summary of the analysis is shown in the table. Predicted Value for Class Range 1-2 selects 1 Range 1-4 selects 3 Range 1-6 selects 3 Class Class Class Count 2 4 2 4 2 4 2 431 6 435 3 43 5 4 13 233 9 236 14 234 The random nature of the procedure employed explains the difference in the final two columns. Adopting a parsimonious approach, suggests that a single hidden neuron should suffice. The decision of which algorithm is most appropriate, in this case, falls to a medical field expert, since only such qualified professional is able to assess which error would entail greater damages to patients, considering that a benign tumour erroneously identified as a malign tumour might cause psychological 32

damages to patients, and most malign tumour treatments have severe side effects, while an erroneous malign tumour diagnosis, identified as a benign tumour, might delay treatment, causing the patient to lose valuable time in his/her recovery process. An application with the goal of assisting medical diagnosis of cancer diseases, identifying whether patients have malign or benign cancer. Considering the greatest average success rates, as an alternative base for a complete diagnosis support system, which can be implemented in hospitals, medical clinics, or any other health care institutions, thus reducing the probability of incorrect diagnosis. 33