An Extreme Learning Machine (ELM) Predictor for Electric Arc Furnaces v-i Characteristics

Size: px

Start display at page:

Download "An Extreme Learning Machine (ELM) Predictor for Electric Arc Furnaces v-i Characteristics"

Lucy Lang
6 years ago
Views:

1 An Extreme Learning Machine (ELM) Predictor for Electric Arc Furnaces v-i Characteristics Salam Ismaeel, Ali Miri, Alireza Sadeghian and Dharmendra Chourishi Department Computer Science, Ryerson University, Toronto, Canada salam.ismaeel, ali.miri, asadeghi, Abstract This paper presents an Extreme Learning Machine (ELM) time series prediction strategy to estimate the current and voltage behaviour of an Electric Arc Furnace (EAF). The proposed ELM predictor is designed for both long and short term predictions of the v-i characteristics of an EAF. The proposed predictor is evaluated using two real sensors outputs collected over different time periods with a rate of 2000 samples per second, and its performance is compared against Feed-Forward Neural Networks (FFNN), Radial Basis Functions (RBF) and Adaptive Neuro-Fuzzy Inference Systems (ANFIS) algorithms. Experimental results obtained show the proposed ELM predictor to have superior speed and stability behaviour, while obtaining similar error values to comparable techniques. Index Terms Extreme Learning Machine (ELM), neural networks, real-time prediction, Electric Arc Furnace (EAF) I. INTRODUCTION In today s highly inter-connected world, large volumes of data are continuously created from various sources such as sensors, actuators, and data processors. These types of devices and resources enable real-time sensing, capturing, collection of many connected systems. While the volume and the velocity characteristics of data - collected, stored, and processed - pose many computational challenges [1, 2], the use of such data provide an invaluable tool in many different settings, including environmental monitoring, and industrial, business and human-centric pervasive applications. This paper focuses on the modelling of large amounts of sensor data collected from Electric Arc Furnaces (EAFs). EAFs are widely used in the steelmaking industry [3], and they are considered to be one of the highest electrical energy consumers on power grids. The rising cost of energy has put pressure on the steel industry to improve their process control systems to conserve energy without sacrificing quality and equipments [4]. An accurate and reliable estimate of the v-i characteristics of an EAF can provide an effective tool in the energy management of such devices. The measured v-i characteristics are nonlinear, discrete, and time-variant, and have non-stationary stochastic behaviour, with dependent variables. EAFs need significant involvement of human operators with expert operational knowledge in very noisy environments [3, 4, 5]. High speed sensors are also essential to keep up with such highly dynamic environments. However, conventional techniques are typically incapable of handling storage or processing of information from such sensors [6], due to the exponential growth of volume, variety and velocity of data generated. Meanwhile, machine learning techniques together with advances in available computational power have come to play a vital role in big data analytics and knowledge discovery [7]. The development of fast and efficient algorithms for real-time processing and prediction of the data has gained attention in recent years in both scientific research and engineering applications. Several machine learning methods have been used to predict time series data [8]. One of the most powerful tools which has attracted considerable attention is the Extreme Learning Machine (ELM), which has extremely fast learning speeds and good generalization performance [9, 10]. In this work, a simplified ELM architecture is proposed to estimate the next state of the voltage v and current i in an EAF described in [11]. This prediction only depends on outputs of two high-speed sensors that are used to provide real-time monitoring of an EAF s voltage and current. The rest of the paper is organized as follows. Section II provides an introduction of the ELM algorithm. In Section III, we describe the proposed ELM algorithm for time-series prediction. Section IV includes the comparison results of the proposed prediction system with other systems in the literature, followed by the conclusions in Section V. II. EXTREME LEARNING MACHINES ELMs are Single, hidden Layer Feed-forward neural Networks (SLFN), consisting of one input layer, one hidden layer and one output layer, as shown in Figure 1 [10]. In this architecture, n is the number of inputs; l denotes the number of hidden neurons; m is the number of output neurons; w ji denotes the weight between the i th neuron in the input layer and j th neuron in the hidden layer; β jr denotes the weights between the j th neuron in the hidden layer and r th neuron in the output layer; and b j is the threshold in the j th hidden layer. Let X(k) and Y (k) be the input and output vectors at sample k, respectively. Also, let W be the input weight matrix (l n), B be the output weight matrix (l m), and b be the bias vector. That is, X(k) = [x 1 (k) x 2 (k)... x n (k)] T (1) Y (k) = [y 1 (k) y 2 (k)... y m (k)] T (2)

2 and the output of the j th neuron is: l y j = β ij g(w i x j + b i ) (9) If target vector T is given by: t 1 t 2 T =. (10) Figure 1: The ELM architecture w 11 w 12 w 1n w 21 w 22 w 2n W = w l1 w l2 w ln β 11 β 12 β 1m β 21 β 22 β 2m B = β l1 β l2 β lm b 1 b 2 b =. b l l 1 l n l m Suppose there are Q data samples, yeilding input matrix X and output matrix Y: x 11 x 12 x 1Q x 21 x 22 x 2Q X = (6) x n1 x n2 x nq n Q y 11 y 12 y 1Q y 21 y 22 y 2Q Y = (7) y m1 y m2 y mq m Q For a sigmoid activation function, the output of the i th hidden neuron will be: (3) (4) (5) g i (wx + bi) (8) where w i = [w i1, w i2,..., w in ] and x j = [x j1, x j2,..., x nj ] T t Q m Q where t j = [t 1j,..., t mj ] m 1. Then using Equation 9, target matrix T can be written as: l β i1 g i (w i x j + bi) l β i2 g i (w i x j + bi) T = (11).. l β im g i (w i x j + bi) By separating β and the function g in Equation 11 we get: Hβ = T (12) where β = [β 1, β 2,..., β l ] T and H is the activation matrix given by: H(w 1,..., w l, b 1,..., b l, x 1,..., x Q ) = g(w 1 x 1 + b 1 ) g(w 2 x 1 + b 2 ) g(w l x 1 + b l ) g(w 1 x 2 + b 1 ) g(w 2 x 2 + b 2 ) g(w l x 2 + b l ).... (13) g(w 1 x Q + b 1 ) g(w 2 x Q + b 2 ) g(w l x Q + b l ) H is called the hidden-layer output matrix of the network. The i th column of H is the i th hidden node s output vector with respect to inputs x 1, x 2,, x n and the j th row of H is the output vector of the hidden layer with respect to input x j [12]. Huang et al [13, 14] showed that the input weights w j and biases b j for j = 1, 2,, l associated with the hidden layer can be assigned random values and require no additional training. Also, the output weights β i, i = 1, 2,, m are determined through learning from the training instances by solving the following objective function: min }{{} Hβ T (14) β and its solution is [15]: ( ) 1 I ˆβ = R + HT H H T T = GH T T (15) for l < Q and ˆβ = H T ( I R + HT H) 1 T = H T GT (16)

3 for l > Q, where I is identity matrix, and C is a regularization parameter. G is given by: ( ) 1 I G = C + HT H (17) The output for new input data (Ŷ (k)) is calculate as: Ŷ (k) = h ˆβ (18) where h is H matrix for the new input. Based on the ELM set of equations, one can note the following: i) The input weights and bias W and b can be randomly generated and used without tuning. ii) The output weights β can be tuned using simple, fast operations. iii) The only two parameters which a user needs to select are: the number of hidden neurons (l) and the regularization parameter (C) (We will discuss these parameters in more detail in the next section). The main advantages of ELM over conventional gradient based learning methods are [8, 16, 17]: It avoids many problems such as stopping criteria, learning rate, learning epochs, and local minima cannot be accomodated by classical methods. It is a fast learning algorithm, because the learning of ELM is a one-pass algorithm without re-iteration. It can obtain better generalized performance than the Back Propagation (BP) algorithm in most cases. It is suitable for almost all nonlinear activation functions. Some recent work has proposed major modifications to the original ELM architecture used in prediction applications. Wang and Han [18] suggested a Multiple Kernel Extreme Learning Machine (MKELM) for multivariate time series predictions. The proposed MKELM was then used to predict the Lorenz chaotic time series. Bian et al. [19] suggested a combination of the Logistic Regression (LR) model and the ELM algorithm in order to improve the real-time performance of a flare forecasting method. The LR model is used to map three parameters of each active region onto four probabilities. The ELM is then used to map the four probabilities into a binary label which represents the final output. The LR model not only improves the learning speed of prediction, but also the accuracy of the flare prediction. Wang and Han [8] improved the ELM performance by adding the Levenberg-Marquardt algorithm, in which a Hessian matrix and gradient vectors are calculated iteratively, to implement an n-line sequential prediction. Their modification improved the simulation results in both an artificial and real-world multivariate time series. Other modifications made to the original ELM algorithm, developed by Huang in 2004 [13] wich have focused on improving some of the algorithm s shortcomings or modifying its performance include Regularized ELM (RELM) and Extreme Learning Machine with Kernel (KELM) [20], Self-adaptive extreme learning machine [17], Multilayer Figure 2: A sample of v-i waveforms extreme learning machine (MLELM) [15], and Deep-Structure ELM (DS-ELM) [16]. The main contribution of this work is to use the basic structure of the ELM to configure the proposed predictor for v-i characteristics of EAR. This structure consists of only 4 neuron inputs, 10 hidden and one output neuron. III. PROPOSED ELM PREDICTOR Figure 2 shows a sample of EAF voltage and current waveforms. It can be observed that EAF imbalances create even and odd harmonics, inter-harmonics and footage flicker. Voltage and current are highly dynamic during its operation and this dynamic behaviour makes it challenging to model the associated characteristic accurately [11]. The proposed ELM predictor can be used either to predict the voltage v(k + 1), or the current i(k + 1), based on the existing and previous state of the voltage and the current. For example, in all experiments presented in this paper, we use v(k), v(k 1), i(k) and i(k 1) to predict the value of v(k+1). Using the proposed ELM prediction architecture in Figure 3, one obtains a model with 4 inputs and one output. Algorithm 1 describes the steps of the of the proposed ELM predictor. The inputs and output are scaled to [ 1 1]. As shown in the next section, different values will be used for the two selected parameters C and l and we will show that this will affect the performance of the network. In our experiments, the best values for C and l were found to be 5000 and 10, respectively. The most expensive operation in constructing the model is the calculation of the inverse matrix term in Equation 15. For large dimensions, one can use specialized inverse calculation algorithms, such as generalized Moore-Penrose which has been used by Samet and Miri [21] to increase the speed of computation. IV. EXPERIMENTAL RESULTS Four data sets were used to evaluate the proposed predictor. Each set consisted of 5000 samples representing measurements

4 Figure 4: MSE vs sampling rate Figure 3: Proposed ELM architecture Algrithm1: Proposed ELM Algorithm 1: Normalize the input data sets v(k) and i(k), scaled to [ 1 1] 2: Normalize the required output v(k + 1) and/or i(k + 1), scaled to [ 1 1] 3: Choose the values of the regulation parameter C and the number of hidden neurons l 4: Randomly initialize the weights w i and bias values b i 5: Calculate the hidden layer output matrix H(X, w, β) using Equation 13, where X(k) = [v(k) v(k 1) i(k) i(k 1)] T 6: Calculate the inverse matrix term G in Equation 17 7: Calculate the output of the hidden layer using the new unknown input h(z, w, β) where z = v(k + 1) 8: Calculate the predicted output, of the ELM ˆv(k +1) using Equation 18: ˆv(k + 1) = hgh T T (19) of 2.5 second, where the sampling frequency was 1920 Hz. We used 80% of the available data, that is (4000) samples of each set to update the network parameters and the last 20%, that is 1000 samples, to test the behaviour of the networks. Figure 4 gives the relation between step size, varied from 1 to 50, and prediction mean square error (MSE) for the four data sets. Given that no big difference in MSE was observed, we have multiplied the sampling rate by 10 for all networks used in order to reduce the number of required calculations in the training phase. In other words, 400 samples were used to represent 4000 samples from the real data. After adaptations, all networks are then used to predict the next 1000 samples. In order to illustrate the effectiveness of the ELM predictor, the proposed algorithm is compared with Feed Forward Neural Networks (FFNNs) using the Levenberg-Marquardt learning rule [22], Radial Basis Functions (RBFs) [19, 23] and Adaptive Neuro-Fuzzy Inference Systems (ANFISs) [24, 25, 26, 27]. Figure 5: MSE vs number of hidden neuron l The activation functions for FFNN and ELM are chosen as the Tan-Sigmoid function; while the Gaussian functions are used as the activation functions for the RBF and ANFIS. The input weights are randomly generated from the uniform interval [ 1, 1]. The inputs and output are normalized to [ 1, 1]. Figure 5 shows the relation between the number of hidden neurons, l and the MSE for the the proposed predictor. The relation between the regulation parameter C and the MSE is given in Figure 6 for the four different data sets used in this study. As we see from Figure 5 and Figure 6, l = 10 and C = 5000 are the best values for the proposed predictor with the given data set. To compare the behaviour of the ELM predictor with other algorithms, we will keep the number of hidden neurons constant (l = 10) for both FFNN and RBF, while keeping the minimum error required during the learning phase for both sets to , with 1000 epochs as maximum iterations. The parameter settings of the ANFIS selected after several tests to overcome the over-fitting problem with minimum error were: number of nodes = 55, total number of parameters is 104 (24 nonlinear and 80 linear), and number of fuzzy rules = 16. Figure 7 shows the output of the four networks compared to the desired output. In terms of accuracy, all algorithms gave a good estimate for the specified data. However, the

5 Table I: Algorithms performance Data FFNN RBF ANFIS ELM Set1 Speed MSE Set2 Speed MSE Set3 Speed MSE Set4 Speed MSE an EAF. In our proposed model, the ELM technique can use all previous knowledge to predict the next sensor output, while overcoming the learning time problem. Our proposed ELM predictor performance compares well with other machine learning techniques, with high-speed performance while producing minimal errors. Figure 6: MSE vs regulation parameter C Figure 7: Desired and predicted outputs of data set1 main feature of the proposed ELM predictor is that it does not need a learning phase. This implies that we can calculate the predicted values of every sample from the previous 4000 samples. In other words, we can always calculate the parameter of the ELM predictor, whereas in other algorithm, the training phase must always be done first, before predicting new values, making these algorithm less suitable for applications such as EAFs. This is because each EAF is unique in randomness of the arc current, the grade of steel, the size of furnace and the type and quality of scrap [28]. In other words, the selection of a fixed model for the predictor is impossible. Table I gives a comparison between all algorithms in speed in seconds and mean square error (MSE) for four different sets of 5000 samples for the same sensor. From the simulation results, we notice that the performance of the ELM predictor is quite good compared to the other algorithms. Although the ANFIS or RBF sometimes more accurate, the ELM predictor is faster than all algorithms, which is demanded by high-speed applications. V. CONCLUSION This work explored the application of ELM in prediction of EAFs devices. An ELM predictor was designed and implemented using four real v-i characteristic data sets of REFERENCES [1] B. K. Tannahill and M. Jamshidi, Big data analytic paradigms-from pca to deep learning, in Proceedings of the 2014 The Association for the Advancement of Artificial Intelligence Symposium (AAAI 2014), 2014, pp [2] A. Zaslavsky, C. Perera, and D. Georgakopoulos, Sensing as a service and big data, arxiv preprint arxiv: , [3] L. Livi, E. Maiorino, A. Rizzi, and A. Sadeghian, On the long-term correlations and multifractal properties of electric arc furnace time series, arxiv preprint arxiv: , [4] F. Janabi-Sharifi and G. Jorjani, An adaptive system for modelling and simulation of electrical arc furnaces, Control Engineering Practice, vol. 17, no. 10, pp , [5] G. Chang, M.-F. Shih, Y.-Y. Chen, and Y.-J. Liang, A hybrid wavelet transform and neural-network-based approach for modelling dynamic voltage-current characteristics of electric arc furnace, IEEE Transactions on Power Delivery, vol. 29, no. 2, pp , April [6] S. Suthaharan, Big data classification: Problems and challenges in network intrusion prediction with machine learning, ACM SIGMETRICS Performance Evaluation Review, vol. 41, no. 4, pp , [7] X. wen Chen and X. Lin, Big data deep learning: Challenges and perspectives, IEEE Access, vol. 2, pp , [8] X. Wang and M. Han, Improved extreme learning machine for multivariate time series online sequential prediction, Engineering Applications of Artificial Intelligence, vol. 40, no. 0, pp , [9] X. Mo, Y. Wang, and X. Wu, Hypoglycemia prediction using Extreme Learning Machine (ELM) and regularized ELM, in Proceedings of The 25th Chinese Control and Decision Conference (CCDC 2013), May 2013, pp

6 [10] S. Ismaeel, A. Miri, and D. Chourishi, Using the extreme learning machine (elm) technique for heart disease diagnosis, in In Proceedings of The 2015 IEEE Canada International Humanitarian Technology Conference (IHTC2015), May 2015, pp [11] A. Sadeghian and J. Lavers, Dynamic reconstruction of nonlinear characteristic in electric arc furnaces using adaptive neuro-fuzzy rule-based networks, Applied Soft Computing, vol. 11, no. 1, pp , [12] G. Feng, G.-B. Huang, Q. Lin, and R. Gay, Error minimized extreme learning machine with growth of hidden nodes and incremental learning, IEEE Transactions on Neural Networks, vol. 20, no. 8, pp , [13] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, Extreme learning machine: Theory and applications, Neurocomputing, vol. 70, no. 1-3, pp , 2006, in Proceeding of The Neural Networks Selected Papers from the 7th Brazilian Symposium on Neural Networks (SBRN 04) 7th Brazilian Symposium on Neural Networks. [14] G.-B. Huang, H. Zhou, X. Ding, and R. Zhang, Extreme Learning Machine for regression and multiclass classification, IEEE Transactions onsystems, Man, and Cybernetics, Part B: Cybernetics, vol. 42, no. 2, pp , April [15] S. Ding, N. Zhang, X. Xu, L. Guo, and J. Zhang, Deep Extreme Learning Machine and its application in eeg classification, Mathematical Problems in Engineering, [16] M. M. Ma and B. He, Study on deep structure of Extreme Learning Machine (ds-elm) for datasets with noise, in Advanced Materials Research, vol Trans Tech Publ, 2014, pp [17] G.-G. Wang, M. Lu, Y.-Q. Dong, and X.-J. Zhao, Self-adaptive Extreme Learning Machine, Neural Computing and Applications, pp. 1 13, [18] X. Wang and M. Han, Multivariate time series prediction based on multiple kernel extreme learning machine, in Proceedings of 2014 International Joint Conference on The Neural Networks (IJCNN 2014), July 2014, pp [19] R. Kruse, C. Borgelt, F. Klawonn, C. Moewes, M. Steinbrecher, and P. Held, Radial basis function networks, in Computational Intelligence, ser. Texts in Computer Science. Springer London, 2013, pp [20] G.-B. Huang, An insight into Extreme Learning Machines: Random neurons, random features and kernels, Cognitive Computation, vol. 6, no. 3, pp , [21] S. Samet and A. Miri, Privacy-preserving back-propagation and extreme learning machine algorithms, Data and Knowledge Engineering, vol. 79,80, no. 0, pp , [22] D. Howard, M. Beale, and M. Hagan, Neural network toolbox 6. User s guide. The MathWorks Inc., [23] A. Sadeghian and J. Lavers, Application of radial basis function networks to model electric arc furnaces, in In Proceedings of The International Joint Conference on Neural Networks (IJCNN 99), vol. 6. IEEE, 1999, pp [24], Neuro-fuzzy predictors for the approximate prediction of v-i characteristic of electric arc furnaces, in Proceedings of The 19th International Conference of the North American Fuzzy Information Processing Society (NAFIPS 2000), 2000, pp [25], Application of adaptive fuzzy logic systems to model electric arc furnaces, in In Proceedings of The.18th International Conference of the North American Fuzzy Information Processing Society (NAFIPS 99). IEEE, 1999, pp [26] A. F. Lutfy, A. M. Hassan, and S. A. Ismaeel, Error estimation for an integrated gps/ins system using adaptive neuro-fuzzy technique, Iraqi Journal of Computers, Communications, Control and Systems Engineering, vol. 9, no. 1, pp , [27] Mathworks, Fuzzy Logic Toolbox R2015a. User s guide. The MathWorks, Inc., [28] J. J. Snell, Improved modeling and optimal control of an electric arc furnace, Master Thesis, University of Iowa, July 2010.

Electric Load Forecasting Using Wavelet Transform and Extreme Learning Machine

Electric Load Forecasting Using Wavelet Transform and Extreme Learning Machine Song Li 1, Peng Wang 1 and Lalit Goel 1 1 School of Electrical and Electronic Engineering Nanyang Technological University