Automatic Power Quality Monitoring with Recurrent Neural Network. Dong Chan Lee

Automatic Power Quality Monitoring with Recurrent Neural Network by Dong Chan Lee A thesis submitted in conformity with the requirements for the degree of Master of Applied Science and Engineering Graduate Department of Electrical & Computer Engineering University of Toronto c Copyright 2016 by Dong Chan Lee

Abstract Automatic Power Quality Monitoring with Recurrent Neural Network Dong Chan Lee Master of Applied Science and Engineering Graduate Department of Electrical & Computer Engineering University of Toronto 2016 The electric power grid constantly experiences disturbances that hinder efficiency and reliability of the grid. This thesis is concerned with the development of automatic power quality monitoring system that classifies power quality disturbances based on the voltage waveform. The classification process involves generating training data, extracting features, and classifying the data at every time step with a neural network. The feature extraction is implemented based on short-time Fourier transform, wavelet transform, and S transform, and we present comparisons of their performance for this application. The extracted features are used as the inputs for the neural network, and the outputs are classes that the waveform belongs to. We introduce recurrent neural network as the classifier for the first time in this application. The recurrent neural network has the ability to memorize information in a time sequence data by passing information through its hidden units. We show that recurrent neural network achieves better performance than conventional feedforward neural network. ii

Acknowledgements I would like to thank my adviser, Professor Deepa Kundur for her encouragement and feedback throughout my graduate studies. I also would like to thank my family and friends, and especially my parents for their support and dedication. iii

Contents 1 Introduction 1 1 Motivation....................................... 1 2 Contributions...................................... 2 3 Overview........................................ 3 2 Power Quality Disturbance Data Generation 5 1 Overview of Power Quality.............................. 5 1.1 Standard Measurements............................ 6 1.2 Classification of power quality disturbances................. 7 2 Characterization of Power Quality Disturbances................... 8 2.1 RMS and Peak Measurement......................... 17 3 Monte Carlo simulation for data generation..................... 19 4 Literature Survey.................................... 20 5 Real-time Classification................................ 22 3 Transformation and Feature Extraction 24 1 Background....................................... 24 1.1 Fourier Transform............................... 25 2 Short-Time Fourier Transform............................ 26 2.1 Discrete Short-Time Fourier Transform and its implementation...... 26 3 Wavelet Transform................................... 28 3.1 Discrete Wavelet Transform and its implementation............ 29 4 S Transform....................................... 31 iv

4.1 Discrete S transform and its implementation................ 32 5 Comparison of Transforms............................... 34 5.1 Common limitations of the feature...................... 38 6 Summary........................................ 39 4 Classification with Long Short Term Memory 40 1 Background....................................... 40 2 Feedforward Neural Network............................. 41 2.1 Decision making with Softmax Function................... 43 2.2 Training Neural Network........................... 43 2.3 Windowed Feedfoward Neural Network.................... 45 3 Recurrent Neural Network............................... 46 4 Long Short-Term Memory............................... 47 4.1 Advantages Recurrent Neural Network.................... 48 5 Summary........................................ 49 5 Results and Case Studies 50 1 Data Generation and Training............................. 50 2 Results.......................................... 51 2.1 Comparisons of the transformations..................... 51 2.2 Effect of the size of window.......................... 54 2.3 Distribution of Misclassification........................ 57 2.4 Effect of Noise................................. 59 3 Case Studies...................................... 60 4 Limitations of the Proposed Power Quality Monitor................ 65 5 Summary........................................ 65 6 Conclusion 66 Appendices 68 A Parameters for Monte Carlo Simulation 69 v

B Results 72 Bibliography 74 vi

List of Tables 2.1 Classification of power quality disturbances and their characterization [1, 2]... 8 3.1 Comparison of computational complexity of feature extraction algorithms.... 38 5.1 Comparison of accuracy of FNN and LSTM in percentage............. 53 5.2 Accuracy of LSTM with feature from DWT..................... 54 5.3 Comparison of LSTM and FNN in data with noise................. 59 A.1 Parameters for Monte Carlo simulation in power quality data generation..... 70 A.2 Ratio of disturbance classes in the generated data.................. 71 B.1 Confusion matrix for FNN from DWT features................... 72 B.2 Confusion matrix for FNN from STFT features................... 73 B.3 Confusion matrix for LSTM from STFT features.................. 73 B.4 Confusion matrix for FNN from ST features..................... 73 B.5 Confusion matrix for LSTM from ST features.................... 73 B.6 Confusion matrix for LSTM from DWT features with noise............ 74 B.7 Confusion matrix for FNN from DWT features with noise............. 74 vii

List of Figures 2.1 An example waveform of impulsive transient..................... 10 2.2 An example waveform of oscillatory transient.................... 11 2.3 An example waveform of interruption........................ 11 2.4 An example waveform of voltage sag......................... 12 2.5 An example waveform of voltage swell........................ 13 2.6 An example waveform of DC offset.......................... 13 2.7 An example waveform of harmonics.......................... 14 2.8 An example waveform of notching.......................... 15 2.9 An example waveform of noise............................ 15 2.10 An example waveform of voltage fluctuation..................... 16 2.11 An example waveform of frequency variation.................... 16 2.12 RMS and peak voltage measurement of each class of power quality disturbances. 18 2.13 An example of generated data for the training.................... 20 2.14 (a) Classification process of existing techniques [3] (b) Classification process of the proposed technique................................. 21 3.1 Overall process of building an automatic power quality monitor.......... 25 3.2 Contour diagram of STFT coefficients........................ 27 3.3 Two-band analysis bank................................ 29 3.4 Reconstructed signal using discrete wavelet transform at different levels..... 30 3.5 Contour diagram of ST coefficients.......................... 33 3.6 Relationships between wavelt transform, S transform and Fourier transform [4]. 36 3.7 An example of waveform and extracted features based on different transforms.. 37 viii

4.1 Feed forward neural nentwork architecture...................... 41 4.2 (a) A simplified single node recurrent neural network (b) Unrolled version of the network through time................................. 46 4.3 Conventional approach for power quality disturbance classifier [3]......... 48 4.4 (a) Feedforward neural network architecture (b) Windowed feedforward neural network architecture (c) Long short-term memory architecture.......... 49 5.1 Cross entropy of the training and testing data as the training progresses..... 51 5.2 Output of the automatic power quality monitoring system............. 52 5.3 Performance of LSTM with various window sizes of DWT............. 54 5.4 Performance of LSTM with various window sizes of STFT............. 55 5.5 Performance of wfnn with various window sizes.................. 56 5.6 Performance of LSTM with various sampling frequencies.............. 56 5.7 Performance of LSTM with various output frequencies............... 57 5.8 Overall distribution of misclassification........................ 58 5.9 Distribution of misclassification for individual power quality disturbances.... 58 5.10 Case study of interruption............................... 61 5.11 Case study of oscillatory transient.......................... 62 5.12 Case study of voltage sag............................... 63 5.13 Case study of harmonics................................ 64 6.1 Future work....................................... 67 ix

Chapter 1 Introduction 1 Motivation The electric power grid provides a convenient and affordable way to deliver energy to sustain our society. Technologies ranging from personal electronics to manufacturing plants use electrical energy delivered by the electric grid. While maintaining the reliability of the grid, engineers began to realize that connecting multiple generators and consumers mitigate the uncertainties in supply and demand. Fortunately, the invention of transformers enabled a high voltage transmission system that significantly reduced the power loss over the long transmission lines. Interconnections between regions started to expand with the high voltage transmission system, and the electric power grid today is the largest human-made machine. The conventional power system structure provides most of the electricity from large power plants such as nuclear or gas-fired generators that are often distant from the customers for safety reasons. This system is centrally monitored and controlled by the transmission system operator. Today there is a strong demand for innovating the conventional design of the electric grid. The greenhouse gas emission from fossil fuel power plants is one of the major causes of climate change. The fossil fuel power plants are being replaced by alternative renewable energy sources. The wind and solar energy are two promising energy sources that can be widely and safely deployed. With the rapid advancement of technology in wind turbines and solar cells, their costs are expected to be competitive with conventional energy sources in near future. Wind turbine and solar cells are considered distributed generators because they have a smaller capacity than 1

Chapter 1. Introduction 2 conventional synchronous generators. The integration of distributed generators is leading major shifts in the structure and operation of the electric power grid. Since these distributed generators have a small capacity, they are scattered near the consumers in a low voltage system to avoid long distance delivery. Both generators and consumers need to be managed in a smaller and local grid and the low to medium voltage grid such as the microgrid is one of the most researched topics today [5]. The distributed generators are often connected to the grid through an electronic interface with a nonlinear control such as MPPT algorithm in Photovoltaics. The electronic components introduce disturbances to the grid, which result in issues with power quality of the grid. In addition, the intermittency of renewables increases variation and disturbance in the supply and operating condition. The increased uncertainty and disturbances manifest as impurities in the sinusoidal waveform of the voltage and current. The impurities in the waveform are referred to as power quality disturbances. The concerns with power quality are not new, and these issues always exist in power systems since the system is always subject to disturbances. Increasing penetration of renewable is closely related to the issues in power quality and its importance is continuously growing [6, 7]. The operation of the grid has to accommodate the increasing number of the distributed generators. The measurement from distributed generator introduce very large amount of data, and the assessment based on human operator become very expensive and unreliable. While the changes in power system structure and operation create new challenges, recent advances in smart grid technology give promising solutions to mitigate the rising issues with its metering and communication technology. 2 Contributions This thesis proposes a power quality monitor equipped with machine learning technique to assist operator s observability. The automated power quality monitor classifies the type of phenomena recorded, and the system operator can easily detect and analyze the issues that the grid is faced with. Traditionally, the operators only provided a diagnostic monitoring of power quality. Technicians are dispatched only when customers complain continuously or after

Chapter 1. Introduction 3 the damage. The automated monitoring technology enables the preventive monitoring of power quality since the data can be analyzed before customers report problems. To address the issues regarding power quality, reliability and efficiency can be greatly enhanced with the proactive system rather than the reactive system. Specifically, the goal of this thesis is to increase the accuracy of the power quality monitor, and we make several modifications from existing techniques. The specific contributions of this thesis are as follows. 1. The technique presented in this thesis eliminates the need of a pre-segmentation algorithm, which is required in existing techniques. Current techniques assume the monitor is given a nicely segmented and fixed size window of disturbance in the waveform. We eliminate this assumption and apply the classification at every time step giving a more accurate localization of the disturbance. 2. Feature extraction algorithms based on different transforms are studied under a standard classification algorithm. This thesis describes the algorithms and shows an experimental evaluation and comparison of the transforms. 3. The effectiveness of recurrent neural network is studied and compared with conventional feedforward neural network. We demonstrate that we can achieve a lower error rate and localization of the event by passing information through the hidden network. 3 Overview Chapter 2 provides background in current practice for power quality monitoring as well as the state of the art techniques. We describe the data generation process to create training data for the neural network. In Chapter 3, we go over the existing transforms and feature extraction based on short-time Fourier transform, wavelet transform, and S transform. In addition, the comparison of the transforms will be discussed with their relationship to each other. Chapter 4 presents the classification of power quality disturbances with the recurrent neural network, which is implemented with Long Short-Term Memory. The description of how this

Chapter 1. Introduction 4 classifier is implemented and how it can be trained is provided in this chapter. This is a standard method in machine learning, but some of the core techniques such as softmax output layer and training methods were not introduced for the power quality disturbance classification. In Chapter 5, we present our results and case studies. The accuracy with Long Short-Term Memory is compared with the feedforward neural network. In addition, we examine the different window size of the transform to find the optimal parameter settings. We present the limitation of the technique due to its over-fitting towards the generated data. We conclude our thesis in chapter 6 with the summary of the contents as well as the future direction in this research.

Chapter 2 Power Quality Disturbance Data Generation In this chapter, we provide an overview of power quality disturbances with their mathematical descriptions. The definition of each power quality disturbance is given with its cause and the example waveform. This chapter defines the target classes and describes how each class of disturbance can be generated based on the known magnitude and spectral content. Later we will see how the generated data can be used to build a classification system that uses machine learning to automatically recognize patterns in the data. 1 Overview of Power Quality The term power quality is defined in [1] as any power problem manifested in voltage, current, or frequency deviations that result in failure or misoperation of customer equipment. Power quality encompasses a broad range of concerns and is difficult to develop a cohesive solution in general. The IEEE Recommended practice for monitoring electric power quality (IEEE Std 1159-2009 [2]) defines the terminologies and definitions of phenomena that are adopted in this thesis. While the importance of power quality has increased throughout the past decades, power quality analysts struggle with processing massive volume of measurements [11]. The current practice in the industry is commonly equipped with Root Mean Square (RMS) and total harmonic distortion (THD) measurements of the voltage and current waveform. 5

Chapter 2. Power Quality Disturbance Data Generation 6 1.1 Standard Measurements Root mean square voltage/current Root mean square of a voltage or current waveform, v[t] is defined as V rms (t) = 1 t [v(τ)] T 2 (2.1) τ=t T where T = 1/f f is the period for the waveform s fundamental frequency, f f, which is 60 Hz for North American power systems. Peak voltage/current Peak voltage and current identify the maximum and minimum of the waveform over the period and is defined as V max (t) = V min (t) = max v(τ) τ [t T,t] min v(τ) τ [t T,t] (2.2) RMS and peak voltage of the waveform are related. For a pure sinusoidal wave, V max (t) = V min (t) = 2V rms (t) for every t. Having both maximum and minimum of the waveform can be useful for detecting dc component in the waveform. Total harmonic distortion Total harmonic distortion (THD) estimates the overall distortion of the wave from the fundamental and is defined as T HD = N n=2 V n V 1 (2.3) where V n is the magnitude of nth harmonics, and V 1 is the magnitude of fundamental frequency. The maximum harmonic, N, is 7 to 13 based on the applications. IEEE Recommended Practice and Requirements for Harmonic Control in Electric Power Systems [12] specifies the current practice on how this measurement is utilized. Although the total harmonic distortion has been the most widely used measurement for detecting waveform distortions, there is a clear limitation in characterizing different phenomena. V n is the short-

Chapter 2. Power Quality Disturbance Data Generation 7 time Fourier transform coefficient at n th frequency, and the total harmonic distortion sums up all the coefficients in the non-fundamental frequencies. Total harmonic distortion simply reduces the dimension of the coefficient so that it is easy to detect waveform disturbances, but it does not have the capacity to sufficiently represent and characterize the signal content. We can generalize the function in Equation 2.3 and use the raw information such as V 1,..., V N to extract much more information using machine learning techniques. Moreover, there is a limitation for time localization of the phenomena because Fourier transform gives only the frequency representation of the signal. This point will be elaborated in the next chapter. 1.2 Classification of power quality disturbances Classification of power quality disturbances is based on IEEE Std 1159-2009 [2]. Table 2.1 shows the categories of power quality disturbances, which can be found in both [1, 2]. Having consistent definitions of classes is important in preserving and extending our knowledge of the phenomena. Classification of the power quality disturbances directs the engineers to the solution for its fundamental issue. Each class of the disturbance often has the common causes as well as the solution. The power quality disturbances can be broadly classified into steady state and transient disturbances. The steady state disturbances include waveform distortions and voltage imbalances. The transient disturbances include impulsive and oscillatory transients as well as short duration voltage magnitude variation. While our power quality monitoring has the capability to distinguish any types of disturbances, the classes of disturbance can be selected as a subset of the disturbances defined in the standard [2]. For example, the engineers may be interested only in steady state disturbances so that some control scheme can be implemented. By including various phenomena such as transient disturbances, the proposed monitor can reduce the falsepositive identification of desired classes for control. If standard measurements such as Total Harmonic Distortion were used, the controller may react to temporary transient disturbances reducing the efficiency controller. Each phenomenon has common characterization in terms of its spectral content and the magnitude, and we currently have the knowledge to recreate them. This gives us the ability

Chapter 2. Power Quality Disturbance Data Generation 8 to create example data of power quality disturbances. With a large amount of data, we can employee the state of the art machine learning techniques to build an automatic classifier. Table 2.1: Classification of power quality disturbances and their characterization [1, 2] Categories Spectral Content Typical Duration Typical Magnitude Transients Impulsive 5 ns - 0.1 ms rise 1 ns - 1 ms plus Oscillatory 0.5 MHz - 5 khz 5 µs - 50 ms 0-8 pu Short Duration Interruptions 0.5 cycle - 1 min < 0.1 pu Variations Sags 0.5 cycle - 1 min 0.1-0.9 pu Swells 0.5 cycle - 1 min 1.1-1.8 pu Long Duration Interruptions > 1 min < 0.1 pu Variations Under-Voltages > 1 min 0.8-0.9 pu Over-Voltages > 1 min 1.1-1.8 pu Voltage Imbalances steady state 0.5-2 % Waveform Distortions dc offset steady state 0-0.1 % Harmonics 0-9 khz steady state 0-20 % Interharmonics 0-9 khz steady state 0-2 % Notching steady state Noise steady state 0-1 % Voltage Fluctuations < 25 Hz Intermittent 0.1-7 % Power Frequency Variations < 10 s ± 0.10 Hz 2 Characterization of Power Quality Disturbances In this section, we present the complete list of power quality disturbances that are subject to classification in this thesis. We explain the phenomena and present its numerical model and an example of the disturbance waveform. Although we give brief descriptions of the causes of the phenomena, the readers should consult references such as [1, 13] to gain the deeper understanding of the subject. References such as [14, 15] also contain the information on how synthetic disturbance data can be generated. After presenting the examples of power quality disturbances, we show the limitation of the

Chapter 2. Power Quality Disturbance Data Generation 9 standard measurements in the classification of the disturbances. Throughout this thesis, we will focus on the voltage waveform since the voltage is the variable that is more strictly monitored and regulated. The grid-connected equipment is generally designed for a range of current and a fixed value of the voltage. However, this assumption can be generalized to current by simply removing some of the classes that are only applicable for voltage such as voltage sag and swell. Before we present the disturbances, we first define the normal sinusoidal voltage as v normal (t) = sin(2πft) + µ(t) (2.4) where f is the fundamental operating frequency and µ(t) N(0, σ 2 ) and σ [0, 0.01]. The fundamental frequency presented throughout the thesis is the 60 Hz which is standard in North America, and the voltage is in per unit. The noise term, µ(t) was added as the regularization term to avoid over-fitting of towards perfect sinusoidal waveform. Since the power grid may not have the perfect sinusoidal waveform, adding the noise can act as a generalization of the realistic voltage waveform. The example waveforms are generated with the sampling frequency of 10 khz.

Chapter 2. Power Quality Disturbance Data Generation 10 Impulsive Transients Impulsive transients are momentary and instantaneous change in the state without changing the fundamental frequency. It is unidirectional (either positive or negative) and can be characterized by the rise and decay time and the peak value. The most common cause of impulsive transients is a lightning, and it can result in oscillatory transient if it excites the natural frequency of the circuit [2]. Impulsive transient can be synthesized by the following equation, ( v(t) = v normal (t) + βexp c t t ) start t end t start t [t start, t end ] (2.5) where β ±[0.1, 0.8] and c = log( ɛ b ) are the peak and fall time constant respectively. ɛ is the threshold where the disturbance can be neglected, and is set to 0.001 in our thesis. Since the rise time is between 5 ns to 0.1 ms, the rise delay is essentially negligible for sampling frequency up to 10 khz. An example of impulsive transient is show in 2.1 with 0.2 peak current and 1 ms duration. The impulsive transient was repeated just for the illustration. Figure 2.1: An example waveform of impulsive transient Oscillatory Transients Oscillatory transient rapidly changes polarity and can be described by the spectral content, duration, and magnitude. Back-to-back capacitor energization results in oscillatory current transient. Power electronic devices can produce voltage transients due to commutation and RLC snubber circuits. Cable switching can also result in oscillatory voltage transients [2]. This phenomenon can be synthesized with the following equation, ( v(t) = v normal (t) + βexp c t t ) start sin(2πf h t) t [t start, t end ] (2.6) t end t start

Chapter 2. Power Quality Disturbance Data Generation 11 where β ±[0.1, 0.8], c = log( ɛ b ), and f h [500, 5000] are the peak magnitude, fall time constant, and transient frequency of the transient component of the waveform respectively. Figure 2.2: An example waveform of oscillatory transient Interruptions When an interruption occurs, the supply voltage or load current decreases to less than 0.1 p.u. for a period of time. Common causes of interruptions are power system faults, equipment failures, and control malfunctions. The interruption due to faults can be restored by the instantaneous reclosure [2]. If the reclosure fails, the interruption could be permanent. Figure 2.3 shows a momentary interruption, and the waveform can be generated by v(t) = αv normal (t) t [t start, t end ] (2.7) where α [0, 0.1] is the magnitude of the waveform. Intuitively, RMS or peak measurement in Equation 2.1 or 2.2 would be ideal features for classification. Figure 2.3: An example waveform of interruption

Chapter 2. Power Quality Disturbance Data Generation 12 Voltage Sag Voltage sag is a decrease in RMS voltage or current to between 0.1 to 0.9 p.u. with durations from 0.5 cycle to 1 minute. It is also referred as voltage dips. Most common cause of voltage sags is system faults but the energization of heavy loads or starting of large motors could also cause voltage sags [2]. Similar to the interruption, the RMS voltage is an obvious indicator to classify voltage sag. Voltage sag waveform can be generated by simply changing the voltage magnitude, v(t) = αv normal (t) t [t start, t end ] (2.8) where α [0.1, 0.9] changes the voltage magnitude. Figure 2.4: An example waveform of voltage sag If the duration of the voltage sag lasts more than a minute, it is classified as under-voltage. A common cause of under-voltage is a load switching on or a capacitor bank switching off [2]. In this thesis, the voltage sag and under-voltage will be in one class because they exhibit same characteristics. They can be classified further if needed with an additional post-processing step that measures the duration of the event. Voltage Swell Voltage swell is when the voltage magnitude increases to between 1.1 and 1.8 p.u. Similar to voltage sag, swells are caused by system faults condition. It can be also caused by switching off a large load or energizing a large capacitor bank [2]. The voltage swell waveform can be generated by v(t) = αv normal (t) t [t start, t end ] (2.9)

Chapter 2. Power Quality Disturbance Data Generation 13 where α [1.1, 1.8] changes the magnitude of the waveform. If the duration of voltage swell lasts longer than a minute, it is classified as an over-voltage. Common causes are load switchings such as switching off a large load and incorrect settings of transformers. Figure 2.5: An example waveform of voltage swell DC offset DC offset occurs if there is a dc voltage or current in the system. The cause of DC offset is a geomagnetic disturbance or the effect of half-wave rectification. Direct current may be caused by the electrolytic erosion of grounding electrodes and other connectors [2]. DC offset waveform can be generated by simply adding a bias to the normal wave form, v(t) = v normal (t) + γ(t) t [t start, t end ] (2.10) where γ(t) ±[0.001, 0.01]. Intuitively, tracking both the minimum and maximum voltage in equation 2.2 can be a good feature to identify this disturbance. Figure 2.6: An example waveform of DC offset Harmonics/Interharmonics Harmonics are sinusoidal voltage or currents having frequencies that are the integer multiple of the fundamental frequency (60 Hz). Interharmonics are voltage or current having frequency

Chapter 2. Power Quality Disturbance Data Generation 14 that are non-integer multiples of the operating frequency. Harmonic distortion usually originates from the nonlinear characteristics of devices and loads [2]. It creates waveform distortion from the fundamental frequency. The harmonics and inter-harmonics can be synthesized by v(t) = v normal (t) + β sin(2πf h t) t [t start, t end ] (2.11) Figure 2.7: An example waveform of harmonics where β [0.1, 0.2] and f h [180, 900] and b are the magnitude, and frequency of the harmonic respectively. Since the phenomenon is periodic with the frequency greater than the operating frequency, it is hard to detect the harmonic with the RMS voltage. The harmonics and interharmonics will be one class in the automatic classification because distinguishing them is difficult for the classifier. Determining whether the harmonic frequency is integer multiple or not is a highly non-convex discrete set, and thus the feature required by the classifier will have to span a large range of frequency. Therefore we combine these classes to one, and the further classification can be made by an additional layer of classification. Notching Notching is a periodic voltage disturbance. The source of nothcing is the operation of power electronic devices. When current is commutated from one phase to another, there is a momentary short circuit between two phases resulting in notching [2]. The waveform of notching can be generated by v(t) = v normal (t) + i ( βexp ) t t start,i c t end,i t start,i t [t start, t end ] (2.12)

Chapter 2. Power Quality Disturbance Data Generation 15 where beta [0.25, 0.5], c = log( ɛ b ) and t start,i+1 t start,i is constant for all i making the notching periodic. Figure 2.8: An example waveform of notching Noise Noise is an electrical signal with spectral content less than 200 khz. Power electronic devices, control circuits, arcing equipment, and switching power supplies are common causes of noise [2]. The noise can be added to the waveform by v(t) = v normal (t) + µ(t) t [t start, t end ] (2.13) where µ(t) N(0, σ 2 ) and σ [0.05, 0.1]. Figure 2.9: An example waveform of noise Voltage Fluctuations Voltage fluctuation is a variation in the voltage magnitude and is also referred as the voltage flickers. An arc furnace is one of the most common causes of the flickers [2]. Solar panels can also produce flickers when the irradiation condition changes and the voltage is modified with Maximum Power Point Tracking (MPPT) algorithms. Flicker can be reproduced by introducing low frequency waveform,

Chapter 2. Power Quality Disturbance Data Generation 16 v(t) = v normal (t) + β sin(2πf f t) t [t start, t end ] (2.14) where β [0.05, 0.1] and f f [10, 25] are magnitude and frequency of flicker. Flicker appears as a fluctuation in RMS voltage, but the instantaneous measurement may appear to be a continuous switching between voltage sag or swell. Figure 2.10: An example waveform of voltage fluctuation Power Frequency Variations Frequency variation is when the fundamental frequency of the power system deviates from its nominal fundamental frequency significantly. Frequency variation normally occurs due to the faults on bulk power transmission system or due to a large block of load or generator goes off [2]. Synthesizing the power frequency variation can be done by v(t) = sin(2π(60 + f f )t) t [t start, t end ] (2.15) where f f ±[0.05,.01]. Figure 2.11: An example waveform of frequency variation

Chapter 2. Power Quality Disturbance Data Generation 17 Sag/Swell and Harmonic Power grid can also experience combinations of the disturbances listed above. Two phenomena that will be considered in this thesis are voltage sag with harmonics and voltage swell with harmonics. The synthesis of those disturbances can be done by v(t) = αv normal (t) + β sin(2πf h t) t [t start, t end ] (2.16) where α [0.1, 0.2], f h [180, 900], β [0.1, 0.9] are the magnitude, frequency, and phase of the harmonic respectively. The combination of disturbances are rarer than individual disturbance, and thus only these two were considered. However, if a system regularly experiences certain combinations, then those class can be added to this list. The machine learning approach for classification allows this expansion of the list very easy because the modification of the monitoring system is automated. Voltage imbalance was not considered because the system is designed with input from single phase input. Building a classifier with three phase input is more complex than having three separate single phase classifier. This is because the feature required for the three phase input is three times the single phase input, and the neural network will require a much larger capacity to process the tripled input features. For the three phase system, dqo frame could be effective in classifying severely unbalanced disturbances. Although this section gave the complete list of power quality disturbance listed in the IEEE standard [2], it may not cover the complete list of phenomena that can occur in the grid. One of the advantages of establishing automatic data generation process is that we can easily expand our definition by adding a class with its mathematical description. This approach allows us to preserve definitions and expands records and understanding of the power quality disturbances. 2.1 RMS and Peak Measurement In this section, we give RMS and peak voltage measurements that were presented in section 1.1. Figure 2.12 shows RMS and peak measurement of each example waveform. While RMS measurement is generally good for classifying events related to the voltage magnitude such as voltage sag and swell, it is unable to retrieve the spectral information. Since this is limited

Chapter 2. Power Quality Disturbance Data Generation 18 information retrieved from the waveform, we need an additional layer of feature extraction to extract more information about the waveform. This layer will be presented in the next chapter. Figure 2.12: RMS and peak voltage measurement of each class of power quality disturbances

Chapter 2. Power Quality Disturbance Data Generation 19 3 Monte Carlo simulation for data generation The characterization developed in the previous sections will be used to generate data for training the neural network. The data will be generated by Monte Carlo simulation from a uniform distribution with the range given in Table A.1. A general form of the equation can be described by the following equation, v(t) = α sin(2π(60 + f f )t) + i ( β i exp c t t ) start cos(2πf h (t t start )) + µ(t) + γ(t) t end t start (2.17) for t [t start, t end ], µ(t) N(0, σ 2 ), and c = log( ɛ b ) where ɛ = 0.01. Since we have a full characterization of power quality disturbances, it removes the need of manual labeling process. The pseudocode for the data generation is given in Algorithm 1 with some notations adopted from MATLAB. We let N be the length of the data we want to generate. The function randi(a, b) draws random integer between a and b. We denote the type of disturbance by i and insert normal waveform (i = 1) between each disturbance since the probability of a disturbance right after another disturbance is very low. An example of data generated by the proposed method is presented in Figure 2.13. Algorithm 1 Data generation with Monte Carlo simulation 1: t start 0 2: i 1 3: while t start < N do 4: if i==1 then 5: i randi(2, 14) 6: else 7: i 1 8: end if 9: α, f, β, f h, σ, γ, duration sampled with range given in table A.1 with event i 10: t end t start + duration 11: v(t start : t end ) equation 2.17 12: label(t start : t end ) i 13: t start t end 14: end while 15: Output v Since the duration of each disturbance is different, there is an issue of fairness of generated data between the disturbances. For example, if the majority of the data is voltage swell, then

Chapter 2. Power Quality Disturbance Data Generation 20 Figure 2.13: An example of generated data for the training the bias of the neural network towards voltage swell would be very high. Therefore, all of the disturbances were sampled with approximately equal probability in their total duration. The total duration was chosen to be balanced between classes because the training of neural network will be minimizing the objective function equally distributed over time. The number of occurrence for the impulsive and oscillatory transients were higher than other classes by about 15 and 5 times respectively. This result in almost equal ratio of total duration for each disturbance except the normal waveform. The normal waveform is about 10 times the rest of the data, and this is to weakly represent the realistic grid, which is usually in the normal condition. Table A.2 shows both the ratio of classes in terms of the total duration and the number of occurrences. In the next section, we present the literature survey where we collect the history of the work that attempted to achieve the automatic monitoring of power quality. 4 Literature Survey The goal of automatic classification of power quality disturbances has a long history and some of the works date back to mid to late 1990s. The automatic classification involves two major steps, which are feature extraction and classification. Feature extraction algorithms are from techniques in signal processing such as short-time Fourier transform, wavelet transform and S transform. The classification algorithms are from techniques in machine learning such as neural network, support vector machine, and etc. In the literature, we will see that many papers are different combinations of feature extraction and classification algorithm. One of the earliest papers that tried to address this problem with neural network was done by

Chapter 2. Power Quality Disturbance Data Generation 21 Gosh et. al. [16]. Wavelet transform was noticed to be effective in detecting disturbances [17], and the first classification system combining wavelet transform and neural network was achieved [18]. After then, Gaouda et. al. [19] found that by Parseval s theorem, the energy in discrete wavelet transform coefficient is a much better feature for classification. Wavelet transform is still one of the dominating feature extraction algorithm in this application [20, 21, 22, 23]. The variation was in the classification algorithm, and multiple neural networks with decision making with voting scheme algorithm [20], neural structure [21], probabilistic neural network [22], and self-organizing learning array [23]. While these classification algorithms often show great results, all of the existing techniques avoid the issue of windowing of the sampled waveform with a segmentation algorithm. The segmentation algorithm divides the data sequence into normal and disturbance parts by adding a layer of an algorithm such as the triggering method [3]. Figure 2.14 shows the comparison between existing and proposed algorithms where we remove the segmentation layer and have output in the function of time. The decision-making layer is also eliminated by the softmax output layer, which is part of the neural network. Figure 2.14: (a) Classification process of existing techniques [3] (b) Classification process of the proposed technique S transform is also another transform for the feature extraction [24, 25, 26, 27, 28, 14]. The classification algorithm that were considered were feedforward neural network [24, 27], probabilistic neural network [24, 25], modular neural network [26], and decision tree [14]. Ray et. al. [28] studied the system with distributed generation and renewable and compared the discrete wavelet transform and S transform.

Chapter 2. Power Quality Disturbance Data Generation 22 Other approaches such as Hilbert transform [29], neural tree [29], and combining multiple algorithms [30] were studied as well. Existing literature is summarized with through literature survey in [31, 3]. There are recent papers as well [15, 32, 33] proposing similar ideas for power quality classification. With recent advancements in neural networks [34, 35], the automatic power quality monitoring system can be greatly improved and simplified by removing many layers of the process. We propose a new recurrent neural network from the deep neural network architecture to address the implementation of power quality disturbance classification at every time step. 5 Real-time Classification Removing the segmentation layer in the existing techniques enables real-time implementation of the power quality disturbance classification. The real-time monitoring system can be implemented without extensive computing hardware upgrade. The computing capability requirement for the proposed monitoring system is not significantly greater than the existing capabilities. The implementation of total harmonic distortion requires computing the magnitude of integer multiples of fundamental frequency with Fourier transform. In the real-time implementation, those magnitudes are the coefficient of short-time Fourier transform. The only computation left is the forward propagation through time in Long-Short Term Memory, which takes much less time than the short-time Fourier transform. The most expensive step in the online classification is the feature extraction, which uses signal transform. Since the existing infrastructure can handle the signal transforms, the additional classification layer can be implemented without much upgrade. In the future, more primitive signal transforms such as dqo transformation could be considered to significantly reduce the computational requirement of the monitoring system. Currently, the standard measurements such as RMS voltage and THD are constantly monitored to detect any abnormality in the grid. Those values are often used to initiate control action when those measurements are over the thresholds defined by standards [2, 12]. The proposed technique in power quality monitoring can extend the current capability to report multiple classes of disturbances. The identified class can be used to initiate different control

Chapter 2. Power Quality Disturbance Data Generation 23 actions. These control actions include Static Synchronous Compensator (STATCOM), configuring the distribution network and determining the status of the capacitor bank deployed throughout the grid. There are a number of potential applications in increasing fault tolerance of the grid, setting control parameters for STATCOM, and operating the microgrid [8, 10, 28]. In the next section, we will first explore the options for feature extraction and give overview and comparison of short-time Fourier transform, wavelet transform, and S transform.

Chapter 3 Transformation and Feature Extraction 1 Background In order to assess the power quality issues, engineers have to carefully and often tediously observe the system states. Most common variables that are monitored are voltage magnitude, frequency and total harmonic distortion as mentioned in the previous chapter. These can be considered as features that are extracted from the voltage and current waveforms. While the standard measurements are usually quite effective in detecting whether the system is in a normal or abnormal condition, these features are not sufficient to distinguish different disturbances. In order to extract more information from the waveforms, this chapter investigates short-time Fourier transform, wavelet transform, and S transform. The overall process of classification is presented in Figure 3.1. The first step is to generate labeled data based on Monte Carlo simulation, and we apply the transform and feature extraction. The extracted feature data will be used to train the neural network that will classify the input features, and the next chapter will discuss the classifier. In Figure 3.1, the upper part of the graph is the process that is done offline before the monitor is deployed. The training of the classifier is the most time-consuming step, and it has to be done only once to set up the parameters for the classifier. The bottom part of the graph shows the implemented system. It will only require the feature extraction of 24

Chapter 3. Transformation and Feature Extraction 25 the data and classification. Figure 3.1: Overall process of building an automatic power quality monitor In this chapter, we will discuss the feature extraction process using the signal transforms. We first go over the Fourier transform, which extracts the spectral information about the waveform. 1.1 Fourier Transform Fourier transform breaks down a signal into sinusoids at different frequencies. It transforms our view of the signal from a time domain to frequency domain. Fourier transform uses complex exponential as the basis function. The formal definition of Fourier Transformation of a continuous signal v(t) is V (f) = + v(t)e j2πft dt. (3.1) Discrete Fourier transform of a discrete signal v[n] is defined as V [k] = N 1 n=0 v[n]e j 2π N kn (3.2) where N is the size of the signal. For the implementation of Fourier transform, fast Fourier transform (FFT) efficiently computes based on the divide and conquer method. The algorithmic complexity of the naive implementation of Fourier transform is O(N 2 ), and FFT is O(N log N). The limitation of Fourier transform in power quality monitor is its inability to localize events. The sense of time is completely lost in Fourier transform, and therefore it needs to be modified in order to be applicable for localizing power quality events.

Chapter 3. Transformation and Feature Extraction 26 2 Short-Time Fourier Transform In order to overcome the limitation of the Fourier transform, a window is applied to the signal for localization. Short-time Fourier transform (STFT) applies Fourier transform to only a fixed section of the signal at a time. By taking Fourier transform on the window in the specified period, the time is part of the representation of the location of the window. STFT can be defined formally as ST F T x (τ, f) = + v(t)w(t τ)e j2πft dt (3.3) where w(t) is the windowing function such as rectangular window, Hann window, Hamming window, etc. 2.1 Discrete Short-Time Fourier Transform and its implementation Suppose the sampling frequency of the waveform is f i and the required sampling frequency of the output is f o. We define the input and output sampling ratio as g = f i f o, and we will only consider an integer ratio, g Z. Since the window function of size Q has the property of w[n] = 0 for every n (, 0) [Q, ), the discrete STFT can be written as: with the Hanning window ST F T x [m, k] = m+q 1 n=m v[n]w[n m]e j 2π N kn, (3.4) ( ( 2πn )) w[n] = 0.5 1 cos. (3.5) Q 1 In Figure 3.2, we present the short-time Fourier transform of the example signals given in the previous chapter. The window size of one cycle was used at every time step. The figure demonstrates the existence of other harmonics as shown in oscillatory transient and harmonic disturbances. However, the fixed window size in STFT is limited for distinguishing high-frequency short term events such as impulsive transients and notch. Having a shorter time window would suffer from higher uncertainty in determining the coefficient. This shows the limitation of short-time Fourier transform due to having the fixed size window.

Chapter 3. Transformation and Feature Extraction 27 Figure 3.2: Contour diagram of STFT coefficients The magnitude of the STFT coefficients are selected as features. We take the spectrogram definition of short-time Fourier transform, which the square root of short-time Fourier transform coefficients. In addition, we concatenate the peak voltage presented in the equation 2.2 to complete the feature vector x, x = [ ST F T x 2, V min, V max ]. (3.6) The complete algorithm for implementing feature extraction with short-time Fourier transform is given in Algorithm 2. In addition to the concatenation, we normalize the feature. The normalization of the feature sets the mean and variance of the feature data to 0 and 1 respectively. This step is to help the convergence of the training neural network, and the normalization constant is obtained during the offline training step. When the classifier is implemented after

Chapter 3. Transformation and Feature Extraction 28 training the neural network, the same constant that was used for training is loaded to ensure the computation of feature data is consistent during both training and testing. Algorithm 2 Feature extraction from short-time Fourier transform 1: for m from 1 to L/g do 2: vw 0 3: for n from m to m + Q 1 do 4: w[n] 0.5(1 cos((2πn)/(q 1))) 5: vw[n] v[n] w[n m] 6: end for 7: ST F T [m, :] F F T (wv) 8: 9: V max [m], V min [m] equation 2.2 x[m, :] [ ST F T x [m, :] 2, V max [m], V min [m] ] 10: if online then 11: load(x mean, x var ) 12: x[m, :] (x[m, :] x mean )/ x var 13: end if 14: end for 15: if offline then 16: for k from 1 to end do 17: x mean [k] mean(x[:, k]) 18: x var [k] var(x[:, k]) 19: end for 20: x[m, :] (x[m, :] x mean )/ x var m 21: save(x mean, x var ) 22: end if 23: Output x While short-time Fourier transform gives the spectral analysis of the waveform, it has limitations due to its fixed-size window. The size of the window is fixed, and there is a trade-off in varying the window size for classifying high frequency and low-frequency disturbances. For example, having a small window would be advantageous for detecting short duration events such as impulsive transient, but it will be at a disadvantage for detecting flickers. 3 Wavelet Transform In order to overcome the limitation of short-term Fourier transform, wavelet transformation has been developed. Wavelet transform introduces the scale s, which changes the size of the

Chapter 3. Transformation and Feature Extraction 29 window. Continuous wavelet transform is defined as follows: CW T ψ v (τ, s) = + v(t)ψτ,s(t)dt = 1 + s v(t)ψ ( t τ )dt, (3.7) s where τ and s are the translation and scale parameters. As it can be seen, ψ τ,s (t) = 1 ψ( t τ s s ) where ψ is the mother wavelet. 3.1 Discrete Wavelet Transform and its implementation Detail derivation of discrete wavelet transform will not be discussed in this thesis since they can be found in many standard books such as [36]. Discrete wavelet transform samples the scale and position in a dyadic grid. The wavelet system satisfies the multiresolution conditions where wavelets in higher resolution can span wavlets in lower resolution. Moreover, the lower resolution coefficients can be efficiently computed from higher resolution coefficients by a filter bank. For implementation of discrete wavelet transform, the following coefficients are results of single-level wavelet analysis, c j (k) = m h(m 2k)c j+1 (m), (3.8) d j (k) = m g(m 2k)c j+1 (m), (3.9) where c j is called approximation coefficients and d j is called detail coefficients. Implementation of equation 3.8 and 3.9 can be done using FIR filtering and then down-sampling by two as shown in figure 3.3. The filter banks, h( n) is a lowpass filter outputting the approximation coefficient, c j, and g( n) is a highpass filter outputting the detail coefficieint, d j. Figure 3.3: Two-band analysis bank Decomposed detail coefficients were used to reconstruct the signal, and it can be computed

Chapter 3. Transformation and Feature Extraction 30 by f j (t) = k d j 2 j/2 ψ(2 j t k) (3.10) where ψ is the wavelet and j is the reconstruction level. In this thesis, the Daubechies wavelets (db5) are used to decompose the signal into 5 levels. Daubechies wavelet system is one of the most widely used wavelet systems. Figure 3.4 shows reconstructed details of power quality disturbances at 5 levels using Equation 3.10. Unlike short-time Fourier transform where frequencies are sampled, the signal is decomposed to different levels. In order to make the features classifiable, the energy of a windowed signal will be used as the feature. We present the algorithm for getting features based on wavelet decomposition in Algorithm 3. In the algorithm, H is an initial buffer to obtain both peak voltage as well as the energy of the decomposed waveform. Similar to Algorithm 2, we apply the normalization of data to make the gradient descent work better for training the classifier. Figure 3.4: Reconstructed signal using discrete wavelet transform at different levels

Chapter 3. Transformation and Feature Extraction 31 Algorithm 3 Discrete Wavelet Transform 1: c 5 v 2: for j from 4 to 1 do 3: c j, d j equation 3.8, 3.9 with c j+1 4: f[:, j] equation 3.10 with d j 5: end for 6: H max (Q, f s /60) 7: for m from H m + 1 to L do 8: g[m, :] n=m Q f[n, :]2 9: V max [m], V min [m] equation 2.2 10: x[m, :] [ g[m, :], V max [m], V min [m] ] 11: if online then 12: load(x mean, x var ) 13: x[m, :] (x[m, :] x mean )/ x var 14: end if 15: end for 16: if offline then 17: for k from 1 to end do 18: x mean [k] mean(x[:, k]) 19: x var [k] var(x[:, k]) 20: end for 21: x[m, :] (x[m, :] x mean )/ x var m 22: save(x mean, x var ) 23: end if 24: Output x 4 S Transform S transform was first proposed by Stockwell in [37]. S transform has a fixed modulating sinusoids with respect to the time axis, and a Gaussian window is dilated and translated like the wavelet transform. It maintains relationship with both the wavelet transform and short-time Fourier transform. The fact that S transform retains direct relationship with Fourier transform gives a good characterization. Continuous S transform is defined as follows, ST x (τ, f) = + The continouous S transform can be also written as x(t) f 2π e (τ t)2 f 2 2 e j2πft dt. (3.11) ST x (τ, f) = + X(α + f)e 2π 2 α 2 f 2 e j2πατ dα, f 0 (3.12) where X(f) is the Fourier transform of x(t). We can show equation 3.12 is equivalent to 3.11.

Chapter 3. Transformation and Feature Extraction 32 ST x (τ, f) = = = = = + + X(α + f)e 2π 2 α 2 f 2 e j2πατ dα [ + + + + + + [ + ] x(t)e j2π(α+f)t dt e 2π 2 α 2 f 2 e j2πατ dα x(t)e j2π(α+f)t e 2π 2 α 2 f 2 e j2πατ dtdα x(t)e j2πft e 2π 2 α 2 f 2 e j2πα(τ t) dαdt e 2π 2 α 2 f 2 +j2πα(τ t) dα ]x(t)e j2πft dt. (3.13) The integral of a Guassian function in a general form can be written as + e ax2 +bx + π c = a e b 2 4a +c, (3.14) which can be used for the integration inside Equation 3.13. Hence ST x (τ, f) = = = + + + [ πf 2 2π 2 e 4π 2 (τ t) 2 f 2 8π ]x(t)e 2 j2πft dt [ ] f 2 2π e (τ t) 2 f 2 2 x(t)e j2πft dt x(t) f 2π e (τ t)2 f 2 2 e j2πft dt (3.15) which is equivalent to Equation 3.11. 4.1 Discrete S transform and its implementation Using equation 3.12, we can utilize Fast Fourier transform to increase the computational efficiency. S transform can be written in discrete time by ST x [m, k] = N 1 p=0 X[p + k]e 2π2 p 2 k 2 e j2πpm. (3.16) Some part of the implementation can utilize fast Fourier transform and inverse fast Fourier transform. However, multiplication of the Gaussian window e 2π2 p 2 k 2 has to be done for every p and m, and thus the algorithmic complexity is O(N 2 ). Contour plots of S transform are shown in Figure 3.5. When this is compared to the contour plot of short-time Fourier transform in

Chapter 3. Transformation and Feature Extraction 33 Figure 3.5: Contour diagram of ST coefficients

Chapter 3. Transformation and Feature Extraction 34 Figure 3.2, we can see clearer and more localized representation of the signal. Especially the high-frequency disturbances such as impulsive transient, notch were accurately captured and shown. Algorithm 4 shows the implementation of S transform and feature extraction. Since the discrete sample size is large, we will sample it down by d. In the implementation, the window size, Q, was set to 12 cycles of the fundamental frequency, and the frequency sampling was done at every 240 Hz. Algorithm 4 Implementation of Feature extraction with Discrete Time S transform 1: for i from 1 to 2N/Q do 2: ˆv v[i Q/2 + 1 : i Q/2 + Q] 3: ˆV F F T (ˆv) 4: for k from 1 to l + 1 do 5: for p from 1 to Q do 6: B[p] ˆV [p + k]e 2π2 p 2 k 2 7: end for 8: D[:, k] IF F T (B) 9: end for 10: ST x [i Q/2 + Q/4 + 1 : i Q/2 + 3Q/4] D[Q/4 + 1 : 3Q/4] 11: end for 12: x ST x [:, 1 : d : K] 13: if online then 14: load(x mean, x var ) 15: x[m, :] (x[m, :] x mean )/ x var m 16: end if 17: if offline then 18: for k from 1 to end do 19: x mean [k] mean(x[:, k]) 20: x var [k] var(x[:, k]) 21: end for 22: x[m, :] (x[m, :] x mean )/ x var m 23: save(x mean, x var ) 24: end if 5 Comparison of Transforms In this section, we give both theoretical and experimental comparisons of the transforms. At this stage, the extracted features are too ambiguous to compare their performances. Their performance will be evaluated after applying a standard classifier in Chapter 5. The theoretical

Chapter 3. Transformation and Feature Extraction 35 comparison was given by Ventosa et. al. [4], and this section will briefly go over it. The comparison is done on a continuous version of transforms to establish straightforward mathematical relationships. Firstly, we compare S transform and Fourier transform. S transform can recover Fourier transform by following equations, + ST x (τ, f)dτ = X(f). (3.17) This can be seen by directly substituting and using equation 3.14, + ST x (τ, f)dτ = = = = + + + + + [ + [ 2π f 2 x(t) f 2π e (τ t)2 f 2 2 e j2πft dtdτ ] e (τ t)2 f 2 2 dτ x(t) f e j2πft dt 2π ] x(t) f 2π e j2πft dt x(t)e j2πft dt = X(f). (3.18) Therefore, we can see that summing S transform over time gives Fourier representation of the signal. Short-term Fourier transform is a windowed version of Fourier transform, and within the window, the above equation holds. Next, the relationship between the continuous wavelet transform and S transform is shown. ST x (τ, f) = + = e i2πfτ + x(t) f 2π e (τ t)2 f 2 2 e j2πft dt x(t) f 2π e (τ t)2 f 2 2 e j2πf(τ t) dt (3.19) We give the definition of Morlet Wavelet, ψ(t) = 1 2π e 1 2 t2 e j2πft (3.20) and substitute s = 1 f to replace the frequency with scale. Then,

Chapter 3. Transformation and Feature Extraction 36 + ST x (τ, f) = e i2πτ/s 1 x(t) s 2π e = e i2πτ/s s 1 s + 1 2 ( t τ s )2 t τ j2π( e s x(t)ψ ( t τ )dt s ) dt (3.21) = e i2πτ/s s CW T ψ x (τ, s) Now we can relate S transform with the continuous wavelet transform by the phase factor e i2πτ/s. Figure 3.6 summarizes the theoretical comparison. Although continuous transform s shows the direct relationship between transforms, the discrete implementation gives perk for the wavelet transform over short-time Fourier transform and S transform due to its efficiency in implementation. Figure 3.6: Relationships between wavelt transform, S transform and Fourier transform [4] Although S transform may appear to be a good feature, S transform and short-time Fourier transform shares a disadvantage compared to discrete wavelet transform. Both S transform and short-time Fourier transform introduce redundant representation if the frequency sampling points are too many [38]. The redundant representation means the feature input size is larger for the classifier, and there are more computations involved. If the sampling point is too low, it may not convey sufficient information for classification. While S transform shows more favorable characteristics as shown in Figure 3.5, we will see that the classification accuracy is not as good as it was expected in Chapter 5. This issue may come from insufficient frequency sampling points. Currently, there is no consensus on how to effectively select the limited sampling point in the frequency, and it needs further investigation. In figure 3.7, we show an example features that were obtained based on Algorithms 2, 3 and

Chapter 3. Transformation and Feature Extraction 37 4. These are the features that will be sent to the classifier for both offline training and online classification. Figure 3.7: An example of waveform and extracted features based on different transforms Algorithmic complexities of proposed transforms are also important for implementation, and they are given in the table 3.1. While the short-time Fourier transform computes the features in O(Q log Q) with fast Fourier transform, the S transform requires O(Q 2 ) for computing feature per window with size Q. The discrete wavelet transform is most efficient with the implementation as described in the previous section. While the characterization has different advantages and disadvantages in terms of efficiency in implementation and characterization, there is a fundamental limitation in extracting features. The features from the transforms are based on the frequencies of the signal, and they are subject to the Heisenberg uncertainty principle.

Chapter 3. Transformation and Feature Extraction 38 Table 3.1: Comparison of computational complexity of feature extraction algorithms Algorithmic complexity short-time Fourier transform O(N Q log Q) discrete wavelet transform O(N) S transform O(NQ 2 ) 5.1 Common limitations of the feature The Heisenberg uncertainty principle states the following. Given the following variables, m t = [ σ t = m f = [ σ f = t x(t) 2 dt ] 1 (t m t ) 2 x(t) 2 2 dt f X(f) 2 df (f m f ) 2 X(f) 2 df ] 1 2 (3.22) where m t and σ t are the average and uncertainty in time, and m f and σ f are the average and uncertainty in frequency, Heisenberg uncertainty principle states that σ t σ w 1 2. (3.23) This means there is a fundamental trade-off between the localization of the feature and uncertainty of feature. As we try to extract more accurate frequency feature, the localization is lost. And as we obtain more accurate localization, there is more uncertainty in the frequency feature. The wavelet transform and S transform tries to address the issue by having windows in various sizes depending on the frequency that we are trying to capture. While we expect the short-time Fourier transform to perform significantly worse than the other transforms, we will see that if the classifier is powerful enough, it may not have as significant impact as it was expected. The implication of Heisenberg uncertainty principle is more important in the transition state, which is the moment that the disturbance happens. The classification during the transition outputs features with high uncertainty. As a result, we expect the majority of misclassification to be in the transition state, and we will see this in Chapter 5.

Chapter 3. Transformation and Feature Extraction 39 6 Summary In this chapter, we went over the short-time Fourier transform, wavelet transform and S transform to extract features that will be used for the classifier in the next chapter. We presented the relationships and comparisons between the transforms and discussed pros and cons of each transform based on its characterization and efficiency in implementation.

Chapter 4 Classification with Long Short Term Memory 1 Background The previous chapter discussed about how different transform can refine the waveform to features appropriate for classification. In this chapter, we finally describe the classifier that will output the types of power quality disturbance. In general, classifiers are associated with parameters that implicitly or explicitly define thresholds. The explicit parameters are determined by the engineers, and it can be used when the disturbance is directly related to the parameter. For example, we could define voltage sag as the time period where voltage magnitude is greater than 1.1 p.u. The voltage magnitude would be an explicit parameter classifying voltage sag and swell with a threshold. However, the classification with the explicit parameters usually involves a tree structure classification. The tree structure will become very complex and difficult to tune as the number of classes increases. In addition, the modification of the classification becomes very challenging and may require much time for engineers to reconfigure the tree structure. The alternative is to use implicit parameters, and example of this approach is the neural network. Neural network is one of the most successful techniques that trains a classification system to find patterns in training data. We will use the supervised learning to figure out the 40

Chapter 4. Classification with Long Short Term Memory 41 implicit parameters of the classifier, which are the weights and bias of the neural network. The implicit parameters are determined systematically, and there is no need for manual work in order to modify the classifier and to reconfigure the parameters. Neural network was invented in 1950s, inspired by how the brain works. Neural network distributes the pattern matching across many nodes, which corresponds to the neurons in the brain. Recently, it made great achievement in numerous fields including speech recognition, object recognition, natural language processing, etc [34, 39, 40]. While there are other approaches such as Support Vector Machine and neural tree, the deep neural network showed most recent success across many fields. In the next chapter, different architectures of the neural network will be reviewed. 2 Feedforward Neural Network Feedforward neural network (FNN) is a basic neural network architecture that has multiple hidden layers with the direction of the edge from a layer to the layer above. This structure uses input in the bottom layer and computes layer by layer from the bottom to top where the top layer is the output. An example of 3 layer feedforward neural network is shown in figure 4.1. Figure 4.1: Feed forward neural nentwork architecture The neural network is a relation of the input x R M T with output y R N T. The true label will be denoted by ŷ R N T. M is the number of features obtained from the feature extraction algorithm, N is the number of classes, and T is the number of data points, which is the number of time steps. Individual training data, x (t), is the t th column of x. The output y (t) i {0, 1} indicates which class the data belongs to. It also satisfies the condition i y(t) i = 1.

Chapter 4. Classification with Long Short Term Memory 42 A FNN with l hidden layers has parameters that are l + 1 weight matrices W = (w 0,..., w l ) and biases b = (b 1,..., b l+1 ). We will denote the parameters by θ = [W, b]. Given kth layer has m k units, W k R m k m k 1, and b k R m k, and m 0 = M. The output y = [y (1)... y (T ) ] can be computed from the input x using forward propagation with the following algorithm, Algorithm 5 Forward Propagation for FNN 1: set z 0 x (t) 2: for i from 1 to l do 3: g i W i 1 z i 1 + b i 4: z i σ(g i ) 5: end for 6: Output y (t) P (y (t) z l ) The function σ is an activation function, and P (y (t) z) is the softmax function, which will be explained later. The activation could be the sigmoid function, hyper-tangent function, and rectified linear function, and the sigmoid function will be used in this thesis. The last hidden layer is connected to the output by the softmax function, which produces a probability distribution. The architecture of neural network also allows an efficient algorithm for the derivative of an objective function with respect to its parameters. Using the chain rule, the computation is done in the reverse way of the forward propagation, so it is called the backward propagation. Backward propagation requires the output of each node z i, which can be computed by running forward propagation before running the backward propagation. The following algorithm shows the implementation of backward propagation. Algorithm 6 Backward Propagation for FNN 1: dz l dl(y, ŷ)/dz l 2: for i from l to 1 do 3: dg i σ (x i ) dz i 4: dz i 1 W T i 1 dg i 5: db i dx i 6: dw i 1 dg i z T i 1 7: end for 8: Output θ L [dw 0,..., dw l, db 1,..., db l+1 ] We will define the objective function or the loss function L(y, ŷ) in the later section. The backward propagation will be utilized to set the parameters θ.

Chapter 4. Classification with Long Short Term Memory 43 2.1 Decision making with Softmax Function Existing algorithms for power quality monitoring were often equipped with additional layer of algorithms to interpret the output from the neural network [3]. In our thesis, we propose softmax function, which is currently the most widely used technique for multi-class classifier. It only changes the output layer of the neural network and is much easier and simpler to implement than having additional layer of algorithm. The softmax function is defined as follows, P (y = j z) = exp(wt j z) (4.1) k exp(wt k z). Since P (y = j z) (0, 1) and j P (y = j z) = 1, it satisfies the condition for a probability distribution. The compact notation, P (y z) = [P (y = 1 z)... P (y = 14 z)] T, is used where N = 14 is the number of disturbance classes in our definition. To determine which class the signal belongs, the one with maximum probability is chosen. We can reconstruct the output prediction by 1 if i = arg max j P (y = j x) ȳ i = 0 otherwise (4.2) where we assign class based on the maximum probability. Based on the output prediction, the error rate can be defined as E = 1 T T ŷ (t) ȳ (t) (4.3) t=1 where ŷ (t) ȳ (t) = 1 if the prediction matches the actual label, and 0 otherwise. This is summed over all the training cases and divided by the total number of training data. 2.2 Training Neural Network If we consider FNN as the input to output mapping relation, then y (t) = f(x (t), θ); (4.4) is our prediction based on the input x with fixed hyper-parameters such as number of hidden layers and number of nodes at each layer. Our goal is to get our prediction y as close as possible

Chapter 4. Classification with Long Short Term Memory 44 to its label ŷ by adjusting the parameter, θ = [W, b]. We can formulate this as an optimization problem where we first define the negative log probability of the target class, L(y, ŷ) = 1 T ŷ (t) log(y (t) ), (4.5) t where only the log probability of the right class is selected and summed. This is the cross entropy between the actual output and the prediction. Then, θ is the argument that minimizes the cross entropy function, θ = arg min θ L(y, ŷ) = arg min L(f(x, θ), ŷ). (4.6) θ In order to find this θ, we use the gradient descent method with the learning rate α, θ k θ k 1 + α k θ L (4.7) where θ L is the gradient of the loss function L with respect to θ, and α k is the learning rate at step k. The gradient, θ L, can be calculated efficiently with the backpropgation presented in Algorithm 6. In addition, we decay the learning rate to yield better performance, α k = α 0 η (k/k) (4.8) where α 0 is the initial learning rate, η is the decay rate, and K is the decay steps. In this thesis, α 0 = 0.01, η = 0.9 and K = 2000 were used. While the gradient descent method is straightforward, there are much better methods for this optimization. Adagrad [41], RMSProp [42] and ADAM [43] are some of the popular algorithms for training neural network. For recurrent neural network, there are Hessian free Newton s method as well [44]. In this thesis, ADAM optimizer was used, and the training was done with the mini-batch of size 128.

Chapter 4. Classification with Long Short Term Memory 45 2.3 Windowed Feedfoward Neural Network While feedforward neural network is a basic neural network architecture, there is a limitation for application in time series data such as power quality disturbance classification. In voltage and current waveforms, each data point is part of a sequence. FNN has access to only the instantaneous data, and it is unable to retrieve information from neighbouring data in the sequence. Heisenberg uncertainty principle states the fundamental limitation on the certainty of the localization and characterization. We introduce windowed feedforward neural network (wfnn) in this thesis as an attempt to address the problem of FNN having access to only instantaneous features. The feedforward neural network architecture in equation 4.4 can be reformulated to include the past data, y (t) = f(x (t w),..., x (t), θ); (4.9) where w + 1 is the size of the window. The improvement with the windowed feedforward neural network is that the neural network has access up to w previous data in the sequence. However, there are still disadvantages with this approach. 1. Windowed neural network does not have any way to access the data prior to the window. 2. The length of input data increases proportional to the size of the window. It may require more capacity and training time for the neural network. 3. If the length of the window is too large, it will require more memory and computational power for the monitoring system. The size of window needs to consider both benefits of accessing information as well as the complexity and dilution of the feature. In Chapter 5, we will see that this approach is not very effective in improving the accuracy of FNN. The increased window step increases the size of input features, and the benefit of window is not realized with fixed capacity of FNN. In the next section, we introduce the recurrent neural network that elegantly addresses the problems stated above with feedforward neural network and windowed feedforward neural network.

Chapter 4. Classification with Long Short Term Memory 46 3 Recurrent Neural Network Recurrent neural network describes the neural network that has recurrence in its structure. The recurrence is when the neural network structure yields a directed cycle within the network. Figure 4.2 (a) shows a simple single node neural network with a directed cycle within the node h. This cycle allows information to flow within itself, enabling neural network to maintain information within the network. Figure 4.2: (a) A simplified single node recurrent neural network (b) Unrolled version of the network through time Recurrent neural network shares similarities with dynamical system studied in many engineering applications including power systems. The unrolling of the recurrent neural network in Figure 4.2 makes the similarity more apparent. Figure 4.2 (b) is exactly the same representation as a linear dynamical system assuming the node activation functions are linear. Non-steady state power system analysis is already very familiar with this type of model. Both discrete time-invariant dynamical system and a single neuron RNN can be written as h[t] = e(h[t 1], x[t], θ) y[t] = g(h[t]) (4.10) where h, x, and θ are state, input and system parameters respectively. The function e is the activation function of RNN, and the function g is the output function. The computation of recurrent neural network above goes forward in time, and it can handle continuously sampled data very elegantly. The forward propagation through time in Algorithm 7 describes how the computation can be carried out in this architecture. The pre-activation value u (t) is a linear combination of the input unit and the hidden state

Chapter 4. Classification with Long Short Term Memory 47 Algorithm 7 Forward Propagation Through Time for RNN 1: for t from 1 to T do 2: u (t) W hx x (t) + W hh h (t 1) + b h 3: h (t) e(u (t) ) 4: o (t) W oh h (t) + b o 5: z (t) g(o (t) ) 6: ŷ (t) P (y (t) z (t) ) 7: end for 8: Output ŷ from the previous time step t 1 plus the bias b h. The pre-activation value for the output o t is a linear function of the hidden state h t. Similar to the feedforward neural network, finding the derivative of the network with respect to the parameters, θ = [W, h], can be implemented efficiently using the chain rule. Backward propagation through time algorithm considers the unrolled recurrent neural network as a big neural network and goes backward in time to get the gradient. The implementation of the algorithm can be found in [45, 44]. The training of the recurrent neural network can be done in the same way as training feedforward neural network. The computation of the gradient will be replaced by backpropagation through time. However, the training of general recurrent neural network is much more difficult than FNN due to vanishing gradient problem [46]. To overcome this issue, we use the gating of the recurrent unit with Long Short-Term Memory. 4 Long Short-Term Memory Long Short-Term Memory (LSTM) is a special type of RNN architecture that avoids the vanishing gradient problem by utilizing memory units. LSTM was first proposed by Hochreiter and Schmidhuber in 1997 [47]. Gating units controls the flow of the information through time, which gives LSTM s ability to memorize important features. LSTM showed successful applications in speech and handwritten text recognition tasks, and partially-observable Markov Decision Processes [39, 40, 48], which are similar to power quality classification problem in terms of recognizing characteristics of a continuous signal. Figure 4.3 shows the architecture of LSTM unit.

Chapter 4. Classification with Long Short Term Memory 48 Figure 4.3: Conventional approach for power quality disturbance classifier [3] At the classification time step t, LSTM computes the following: a t = tanh(w hh h t 1 + W hx x t + b a ) i t = sigmoid(w ih h t 1 + W ix x t + b i ) f t = sigmoid(w fh h t 1 + W fx x t + b f ) o t = sigmoid(w oh h t 1 + W ox x t + b o ) c t = f t c t 1 + i t a t h t = o t tanh(c t ) y t = h t (4.11a) (4.11b) (4.11c) (4.11d) (4.11e) (4.11f) (4.11g) where denotes the elementwise multiplication, i g t, f t and o t denotes input, output, and forget gates respectively, and c t is the memory unit. 4.1 Advantages Recurrent Neural Network As mentioned in the limitation of feedforward neural network, recurrent neural network address the issue with limited information sharing along the sequence. To illustrate point, Figure 4.4 shows configurations of feedforward neural network, windowed feedforward neural network, and LSTM. The figure shows that feedforward neural network has no communication between the sequence and the windowed feedforward neural network has access to only fixed neighbours. The long short-term memory is the most elegant form and the system has access to information from previous times.

Chapter 4. Classification with Long Short Term Memory 49 Figure 4.4: (a) Feedforward neural network architecture (b) Windowed feedforward neural network architecture (c) Long short-term memory architecture For power quality monitoring purpose, this gives an advantage in classification because many of the disturbances last for a long time. For example, if there was harmonics in the waveform in the previous step, the chance that it is continuing is high. LSTM incorporates these information in making decision for the classification. This temporal information allows the algorithm to build confidence over time and to hold on to various information such as existing frequency over time. It also reduces the false positive rate by the forget unit, which rapidly drains the confidence in classification if those disturbances are no longer observed. 5 Summary In this chapter, we proposed recurrent neural network as the new classifier for the power quality disturbance classification. We presented the limitation of feedforward neural network and discussed how recurrent neural network passes information through time. Particularly, we use the Long-Short Term Memory, which employs gating of input, output and memory cell to prevent the gradient from vanishing or blowing up during the training. In the next chapter, we will show and evaluate the performance of compared to feedforward neural network and windowed version of it.

Chapter 5 Results and Case Studies In this chapter, we synthesize feature extraction algorithms and classifiers to build the automatic power quality monitoring system. We test the performance of the system with feature extractions based on different transforms. We also compare the performance of LSTM and traditional FNN. 1 Data Generation and Training Power quality disturbance data was generated with the sampling frequency of 3840 Hz, and the length of the data was 10 million time step, which corresponds to approximately 43 minutes. The effect of changing the sampling frequency will be presented in the later section. Data generation and feature extraction processes were implemented with MATLAB, and the classifier was implemented with Tensorflow developed by Google [49]. Tensorflow is an open source software library that is designed for research in machine learning. The descriptions of algorithms and parameters were done according to Chapter 4. Since the training data could be generated as much as needed, the neural network is free from the issue of over-fitting within the data. However, later in the case studies, we will observe that the monitoring system over-fits to the generated data, and there could be cases where the classification does not work very well. This is a fundamental issue with neural network approach based on generated data for power quality monitoring. Figure 5.1 shows an example of how the cross entropy drops as the training progresses. The 50

Chapter 5. Results and Case Studies 51 Figure 5.1: Cross entropy of the training and testing data as the training progresses cross entropy of training and testing data were very close to each other indicating the data set was large enough to avoid the over-fitting within the generated data. 2 Results Figure 5.2 shows an example of output from the classifier. LSTM was used with features from different transforms. The shaded area shows the output of the softmax layer, which can be interpreted as the probability of the colored disturbance. The square boxes are the true label where the output is either one or zero for each class. While the cross entropy was the objective for the minimization, we only show the accuracy throughout this chapter. The cross entropy includes the confidence of the result, but the accuracy is what matters at the end and gives a more straightforward interpretation. The accuracy of the monitoring system is defined as (1 E) where E is the error rate defined in Equation 4.3. 2.1 Comparisons of the transformations We first show the comparisons of the accuracy between different combinations of feature extraction and classifier in Table 5.1. Feedforward neural network was built with 3 hidden layers and 8 hidden units in each layer. The result shows that LSTM increases the performance by 2.95%, 3.40 % and 3.86% for STFT, DWT, and ST respectively. It shows that the discrete

Chapter 5. Results and Case Studies 52 Figure 5.2: Output of the automatic power quality monitoring system

Chapter 5. Results and Case Studies 53 wavelet transform works the best with LSTM in the overall result. However, it shows that the short-time Fourier transform is quite comparable. The performance of S transform was not as good as it was expected since the frequency variation did not work in the classification. This is likely due to insufficient sampling in frequency, and it will require further investigation. Table 5.1: Comparison of accuracy of FNN and LSTM in percentage Feature STFT DWT ST Classifier FNN LSTM FNN LSTM FNN LSTM (i) normal 94.44 95.15 91.88 95.01 91.02 92.33 (ii) interruption 88.23 88.86 89.00 90.22 88.24 89.33 (iii) sag 88.31 89.54 91.52 89.78 87.04 87.60 (iv) swell 87.78 85.97 90.20 88.9 84.68 81.28 (v) impulsive 20.66 62.87 82.29 86.54 17.63 50.90 (vi) oscillatory 74.14 79.65 76.59 82.68 74.49 85.55 (vii) dc offset 90.78 90.94 93.54 91.7 83.16 84.38 (viii) harmonics 82.08 81.37 90.67 93.82 89.54 80.65 (ix) notch 75.79 83.43 82.07 83.37 59.49 85.12 (x) flicker 86.61 89.04 74.49 87.06 68.58 70.26 (xi) noise 89.98 93.53 94.04 96.26 88.48 98.09 (xii) frequency variation 82.58 91.33 67.71 84.1 0.00 0.00 (xiii) sag and harmonics 86.89 82.39 86.09 80.63 78.83 86.21 (xiv) swell and harmonics 81.8 82.91 86.10 86.5 85.95 83.70 overall 87.04 89.61 88.05 91.04 79.66 82.74 Since short-time Fourier transform and S transform have disadvantages with computational complexity and sampling resolution in frequency, we will focus on the discrete wavelet transform based features for testing the effect of sampling frequency and output frequency. Table 5.2 is the confusion matrix for discrete wavelet transform with recurrent neural network. The values are in percentage. It shows that the highest confusion was made between sag & harmonics and just harmonics with 12.54 % followed by the swell & harmonics and harmonics with 6.96 %. The impulsive transient was often confused with the notch with 4.47 %. These were, in fact, the ones that can be difficult for human to distinguish especially when the sag or swell is not significant.

Chapter 5. Results and Case Studies 54 Table 5.2: Accuracy of LSTM with feature from DWT (i) (ii) (iii) (iv) (v) (vi) (vii) (viii) (ix) (x) (xi) (xii) (xiii) (xiv) (i) 94.07 0.06 0.05 0.16 1.51 0.5 0.88 0.18 0.39 0.45 0.39 1.23 0.03 0.1 (ii) 5.41 89.9 2.15 0 2.03 0.03 0.01 0.01 0.25 0 0.03 0.07 0.12 0 (iii) 6.59 1.33 90.65 0 0.57 0.08 0.01 0.01 0.49 0 0.03 0.18 0.06 0 (iv) 5.95 0 0 90.54 0.69 0.05 0 0 0.59 2.07 0.01 0.02 0 0.09 (v) 6.13 0 0 0.01 89 0.79 0.01 0 3.76 0.09 0.15 0 0 0.06 (vi) 12.84 0.08 0.22 0.25 0.45 81.45 0.41 1.21 0.54 0.57 1.29 0.17 0.11 0.42 (vii) 7.34 0 0 0 0 0 92.66 0 0 0 0 0 0 0 (viii) 5.45 0 0 0 0 0.12 0 93.57 0.01 0 0 0 0.75 0.09 (ix) 16.03 0 0 0.07 1.36 0.13 0.03 0.01 81.93 0.03 0.3 0.11 0 0 (x) 8.21 0 0 4.26 2.2 0.12 0 0.01 0.29 84.63 0.01 0.02 0 0.25 (xi) 1.09 0 0 0 0.21 1.08 0 0 1.25 0 96.36 0 0 0 (xii) 17.63 0.02 0.29 0 2.11 0.42 0.01 0.25 0.11 0 0.04 79.05 0.08 0 (xiii) 5.59 0.32 0 0 0.86 0 0 12.51 0.28 0 0.01 0 80.42 0 (xiv) 5.11 0 0 0.03 0.77 0.09 0 6.28 0.51 0.14 0.01 0 0 87.07 2.2 Effect of the size of window In this section, we adjust the window size of short-time Fourier transform and discrete wavelet transform to find the optimal window size. The default settings for sampling frequency was 3840 Hz, and window size for STFT and DWT were 1 cycles and 0.5 cycles of the fundamental frequency respectively. Window in Discrete Wavelet Transform Figure 5.3 shows the effect of varying size of the energy window in features from discrete wavelet transform. Some notable classes are highlighted with the legend. It shows that if the window size is too small or too big, the accuracy of impulsive transient suffers the most. Frequency variation and notch also suffer from a small window because the uncertainty in Figure 5.3: Performance of LSTM with various window sizes of DWT

Chapter 5. Results and Case Studies 55 determining periodicity increases. The ideal size of the window was determined to be about half the cycle of the fundamental frequency. Window in Short-Time Fourier Transform The window size of short-time Fourier transform was varied in this section as shown in Figure 5.4. It showed that impulsive transients are difficult to distinguish when the window size is too small. Having a small window size, the fundamental and other frequencies can be significantly impacted by impulsive transients. The performance slightly improves with the window size of 3 cycles, but it significantly drops again when the window is too large. Having a large window loses the localization of the impulsive transient, and the result reflects this intuition. Voltage sag and swell also decreased with increasing the window size. This clearly illustrates the limitation of the short-time Fourier transform, which has only the fixed window size. Figure 5.4: Performance of LSTM with various window sizes of STFT Window in windowed Feedforward Neural Network In this section, the window size of the wfnn network was varied. Although it was expected that having a larger window in wfnn will increase the accuracy, the result did not show this to be true. The neural network used for the wfnn was fixed with 3 layers with 8 hidden units in each layer. Increasing the window size of wfnn increased the size of the input units, and it

Chapter 5. Results and Case Studies 56 rather degraded the performance of the classifier. Figure 5.5: Performance of wfnn with various window sizes Sampling frequency In this section, the sampling frequency was varied in logarithmic scale as shown in Figure 5.6. Due to aliasing, we expect the accuracy to drop as the sampling frequency goes down. As the figure shows, the high-frequency disturbances such as impulsive, oscillatory and notch were significantly affected if the sampling frequency was too low. Figure 5.6: Performance of LSTM with various sampling frequencies

Chapter 5. Results and Case Studies 57 Output frequency The required frequency of the output may not be as frequent as the sampling frequency. The output frequency was varied in this section to see if varying the output frequency will have a similar effect as varying the sampling frequency. The data was obtained by sampling the input data down so that the same data can be used in testing. The input frequency was fixed to 3840 Hz, and the input-output frequency ratio was varied. The result shows that the performance of the recurrent neural network is quite constant for different input-output frequency ratio. The consistent performance in this case study indicates that the training set was large enough to avoid the over-fitting, and the learning rate was decreased slowly enough with enough duration. Figure 5.7: Performance of LSTM with various output frequencies 2.3 Distribution of Misclassification In this section, we investigate and determine when the misclassification occur. We plot the distribution of time steps away from the transition time for every misclassification. The transition time is the time that a disturbance starts to occur or end. From the Heisenberg uncertainty principle, we expect most of the misclassification to occur near the transition time. In Figure 5.8, the distribution of the entire misclassification is plotted and its distance from the transition time. It shows that the mode of the distribution is at the transition time and 81.61% of the misclassification occur within a cycle away from the transition time.

Chapter 5. Results and Case Studies 58 Figure 5.8: Overall distribution of misclassification Figure 5.9: Distribution of misclassification for individual power quality disturbances

Chapter 5. Results and Case Studies 59 In Figure 5.9, we show the distribution for individual disturbance. DC offset, notching, flicker, and sag/swell and harmonics were disturbances that had misclassification away from the transition time. This section shows that majority of our misclassification is associated with the fundamental limitation for power quality disturbance classification. 2.4 Effect of Noise In this section, we present the effect of noise in classification accuracy. We test both feedforward neural network and LSTM and who and show the results in Table 5.3. The signal-to-noise ratio (SNR) of the waveform was 37.1 db. The feature extraction was based on discrete wavelet transform with half of cycle window size and 3840 Hz as the sampling frequency. The result showed that the classification of dc offset, flicker and sag and harmonics drops significantly. The SNR with respect to those disturbances are low, and as a result, it becomes difficult to classify them in general. The result shows that the performance of LSTM is consistently higher than feedforward neural network under noise. Table 5.3: Comparison of LSTM and FNN in data with noise Feature FNN LSTM Classifier Without Noise With Noise Without Noise With Noise (i) normal 91.88 91.41 95.01 93.31 (ii) interruption 89.00 90.15 90.22 89.28 (iii) sag 91.52 87.24 89.78 90.50 (iv) swell 90.20 91.08 88.90 91.26 (v) impulsive 82.29 82.86 86.54 88.25 (vi) oscillatory 76.59 74.24 82.68 79.75 (vii) dc offset 93.54 19.68 91.70 19.96 (viii) harmonics 90.67 93.32 93.82 92.82 (ix) notch 82.07 82.31 83.37 81.75 (x) flicker 74.49 28.46 87.06 39.49 (xi) noise 94.04 92.60 96.26 95.29 (xii) frequency variation 67.71 56.08 84.10 54.35 (xiii) sag and harmonics 86.09 0.00 80.63 3.43 (xiv) swell and harmonics 86.10 90.19 86.50 84.10 overall 88.05 78.77 91.04 80.08

Chapter 5. Results and Case Studies 60 3 Case Studies In this section, we show case studies with data from electrical simulation in SimPowerSystems/MATLAB. These data are a more realistic representation of the power quality disturbance, and we evaluate the performance and discuss the limitation of the proposed automatic monitoring system. Interruption The interruption was generated by opening the breaker of the system at 0.04 second and reclosing it at 0.12 second. This was one of the straightforward classification tasks. There is a high misclassification rate in the first cycle of the transition time, and this is due to the definition of the peak voltage in Equation 2.2. In order to obtain the peak voltage, it needs to wait until the new peak is reached.

Chapter 5. Results and Case Studies 61 Figure 5.10: Case study of interruption

Chapter 5. Results and Case Studies 62 Oscillatory Transient In the second case study, the oscillatory transient was generated by a short disconnection of RL load. Figure 5.11 shows the waveform as well as the output from RNN and FNN. As the oscillation phases away, the FNN reduces its confidence with the oscillatory transient. For RNN, the classifier is constantly confident that it is the oscillatory transient until the end. Figure 5.11: Case study of oscillatory transient

Chapter 5. Results and Case Studies 63 Voltage Sag Voltage sag was generated by a three-phase fault on 230 kv line connected to a synchronous machine. The fault happens at 0.3 seconds and is cleared at 0.5 seconds. The system goes through the voltage sag and voltage swell after the clearance. The classification result identifies voltage sag as well as voltage swell afterward. Figure 5.12: Case study of voltage sag

Chapter 5. Results and Case Studies 64 Harmonics Harmonics was generated by the three phase AC system connected to a rectifier with a constant DC load. Although the proposed method has performed well in the previous studies, this case study shows the limitation of the proposed power quality monitoring system. The harmonic occurs between 0.4 to 0.6 second with multiple harmonics combined. The output of the classifier sees the state as normal. This indicates that there is an over-fitting towards the generated data, and there could be cases where the classifier fails to generalize towards any Figure 5.13: Case study of harmonics