Identification of False Data Injection Attacks with Considering the Impact of Wind Generation and Topology Reconfigurations

Size: px
Start display at page:

Download "Identification of False Data Injection Attacks with Considering the Impact of Wind Generation and Topology Reconfigurations"

Transcription

1 1 Identification of False Data ion Attacks with Considering the Impact of Wind Generation and Topology Reconfigurations Mostafa Mohammadpourfard, Student Member, IEEE, Ashkan Sami, Member, IEEE, and Yang Weng, Member, IEEE Abstract Robust control of the power grid relies on accurate state estimation. Recent studies show that state estimators are vulnerable to false data injection attacks which can pass the bad data detection mechanisms. A grand restriction of the existing countermeasures is dealing with integration of renewable energy sources and topology changes because they are designed for a specific system configuration. In this paper, to detect cyberattacks in power systems that are affected by sustainable energy sources or system reconfigurations, an unsupervised anomaly detection algorithm is developed. The proposed method firstly analyzes recent historical data and detects the state vectors that are different from normal trend since they may indicate attacks. Then, among the recent state records, most similar state vectors to this suspicious vector are determined. These vectors are combined to form a new vector in a way that the attacked states have smaller/larger values than the normal ones. Finally, different outlier detection algorithms are applied on the new vector and the attacked states are determined through a majority voting among those algorithms. The proposed method is tested on the IEEE 14 bus system with different attack scenarios including attacks to the system after a contingency and integration of wind farms. Results demonstrate the robustness of our method in dealing with changes. Index Terms- false data injection, wind farm, topology changes, state estimation, outlier detection, robustness S I. INTRODUCTION tate estimation (SE) is one of the key functional modules in the power grid energy management systems (EMS) [1]. SE helps us to calculate the best possible estimates of the physical quantities with existence of noises and derives the optimal estimate of the system states [2]. SE plays an important role in the stability of the smart grid because the provided system state information is used in other functions of EMS, such as contingency analysis, optimal power flow, transmission stability analysis, etc [3] [4]. However, in recent years, cyber-attacks are emerging threats to the integrity of SE and secure operation of power systems [5] [6]. As reported in the paper [7], well planned cyber-attacks to the power grid can have catastrophic impacts and trigger cascading events, leading to system blackout. A means to orchestrate such an attack is called false data injection (FDI) [8]. In this attack, the adversary changes the states of the grid by altering readings of multiple sensors and M. Mohammadpourfard and A. Sami are with Department of Computer Science and Engineering, School of Electrical and Computer Engineering, Shiraz University, ( m.mohammadpourfard@shirazu.ac.ir; sami@shirazu.ac.ir). Y. Weng, is with Group of Electric Power and Energy Systems, School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, USA ( yang.weng@asu.edu) Phasor Measurement Units (PMUs) to mislead the energy management system at the control center [9] [10] [11]. This attack can circumvent traditional Bad Data Detection (BDD) in power system state estimation and inject arbitrary errors to state estimates without being detected. Wrong estimates can mislead the grid operators to make decisions that endanger the security of the power system [5]. A. Related Works Several methods are developed to combat false data injection attacks. These countermeasures can be classified into two main groups: 1) Protection-based approach [5] [1] [12] [13] 2) Detection-based approach [3] [9] [14] [15] The first type tries to alleviate FDI attacks through identifying critical meter measurements and ensuring the security of them by encryption, tamper-proof communication systems, etc. The second solution type relies on anomaly detection techniques to recognize maliciously altered measurements. These methods are able to detect anomalies by analyzing the distribution of historical data. For instance, the paper [3] proposed to use Kullback Leibler distance (KLD) to detect FDI attacks. ing false data into the system, leads to larger KLD compared to normal, allowing for detection of the attacks. To detect FDI attacks, a KLD threshold is set using historical data. Proper selection of this threshold affects the accuracy of detection. Then, this threshold is compared with every sample to detect FDI attacks. Reference [14] proposed to use supervised learning methods such as Perceptron, k-nearest Neighbor and Support Vector Machines to predict new observations using a set of training data. Reference [9] proposed to learn the structure of the grid utilizing conditional covariance test CMIT. Under normal circumstances, the Markov graph of the bus phase angles will be consistent with the power grid graph. Hence, a discrepancy between the calculated Markov graph and learned structure can lead to attack detection. A major drawback: Almost all developed methods within two categories are designed for a specific system configuration [16]. In other words, they have not considered the impact of introduction of renewable energy resources into the existing system or topology reconfigurations. In another hand, such assumption will no longer hold in the future grid, where the frequent topological changes and intermittent generation can lead to remarkable state variation in the grid operations [17]. An approach which is designed for a specific system configuration might not be able to detect an attack or might locate it wrongly under a different system configuration. This is because the underlying data distribution and network

2 2 topology will change after system reconfiguration or integration of sustainable energy sources into the power grid. For example, in the paper [16], the authors have emphasized on the issue of variable system configurations and have proposed a method to detect FDI attacks with considering system reconfigurations. It utilizes redundant measurements to get a statistical characterization of the difference between state estimation using supervisory control and data acquisition (SCADA) data and forecast-based predictions for detecting FDI attacks. However, there are some assumptions that might not be so practical in the real world: 1) it is assumed that PMU data is completely secure. To this end, using NASPINet [18], an isolated network with various cyber security mechanisms have been suggested to withstand tampering in sending PMU data. But at least, NASPINet should connect to other networks for sending PMU data to Phasor Data Concentrator (PDC). Moreover, most of the mentioned security mechanisms in NASPINet have not been implemented yet. 2) Topology processing is performed by just SCADA data. Therefore, the adversary can launch an attack by manipulating SCADA data through hiding the contingencies or generating fake contingencies for misleading the topology processer. B. Summary of Contributions In this paper, we propose a novel unsupervised anomaly detection method to address the limitations of existing methods in managing system reconfigurations and integration of renewable energy sources. Attacked system state vectors will have different behavior from their recent historical data [3]. To find such state vectors, F-test [19] is employed on each sample for verifying the equality of its variance against the recent system vector sets. If the F-test rejects the hypothesis of being from the same variance, we mark the sample as suspected. Then we compute the vector difference between the current vector and average of the most similar recent vectors. Afterwards, different outlier detection algorithms are applied on the new vector and the attacked states are determined through a majority voting scheme over those algorithms. This implies that attack detection and localization are performed simultaneously. For simplicity, the DC SE model is utilized in this paper while the proposed method can be easily adapted for the AC SE. This is because we only use the state vectors to detect FDI attacks. Section V discusses the applicability of the proposed method for AC SE. Our proposed method: Detects FDI attacks using just the state vectors. This means that we detect attacks using minimal set of features. Does not require supervision or an attack model. Can detect attacks after topology reconfigurations and integration of distributed generations. May detect other types of data integrity attacks since it is an outlier detection method. Paper Outline: The remainder of the paper is organized as follows. Section II presents a brief overview of the FDI attack model. Section III explains the employed algorithms and then discusses how we leverage these algorithms to detect and localize attacks. Section IV illustrates the simulation results on IEEE 14 bus system. Section V discusses the applicability of the proposed method for AC SE. Possible threats to validity are listed in Section VI and the conclusion is drawn in Section VII. II. FALSE DATA INJECTION ATTACKS MODEL The aim of the DC state estimation is to estimate the state variables = [, ] using the ( ) observed measurements = [, ] from remote meters. In the DC power flow model, the phase angles of all buses are considered as state variables since it assumes that bus voltage magnitudes are all equal to one [20]. Generally speaking, the DC state estimation is formalized as (1), where is a constant Jacobian matrix and is derived from physical structure of the grid [20]. = [, ] is the vector of measurement errors which follows Gaussian distribution with zero mean. We treat bus 1 as the reference bus ( = 0). Hence, the state vector becomes = [, ]. Based on the weighted least squares (WLS) approach, the state estimates can be obtained from (2), where is the error covariance matrix. In traditional BDD, a common approach to detect the presence of bad measurements is to use 2-Norm of measurement residual. Specifically, is compared with a prescribed threshold, and the state estimates are regarded valid only if the 2-Norm of the measurement residual is less than that predefined threshold as shown in (3) and (4). However, this assumption is breached through a newly introduced attack called false data injection [8]. = + = ( ) = r τ = + = = = + ( + ) = + ( ) = = (1) (2) (3) (4) (5) (6) FDI attack can pass the traditional BDD test only if the attacker has knowledge of the system structure ( ) and can manipulate multiple measurements at the same time. In this attack, the adversary can falsify state estimation results through injecting a nonzero attack vector = [, ] to the original measurement vector, as shown in (5), such that this vector satisfies (6) where = [, ] is an arbitrary error which is injected into the system states maliciously. Utilizing the new manipulated measurement vector will result in false estimates = + where is the estimates of when using original measurements. As shown in (7), the FDI attack will not lead to a change in the measurement residual and it is equivalent to the normal one. Hence, traditional BDD methods cannot detect the FDI attacks. III. PROPOSED METHODOLOGY The overview of the proposed method is illustrated in Fig. 1. As can be seen from this figure, several algorithms have been used in the proposed method (step 2 and step 4). First in this section, these algorithms are briefly described. Afterwards, the proposed approach is presented. (7)

3 3 Different variances Current Vector New Vector IQR [3] MAD [3] FCM [2, 3] is detected as an attacked system state since majority of methods have same output. Step1: Min-max Normalization Step2 (using F-test): Verifying the null hypothesis that current state vector and average of its recent state vectors have same variance Step3: Finding the most similar state vectors to the current state vector and subtracting the average of those similar vectors from the current vector Step4 (using FCM, IQR and MAD): Applying different outlier detection methods and detecting attacks through a majority voting scheme over those algorithms original dataset ; % (i, 1: 13) denotes the i th state vector ( (i, 1: 13) = = [, ] ) F test mean (i 288: i 1,1: 13), (i, 1: 13) (p value significance level go to step 3 verify the next sample j = i 288: i 1 cosine similarity ( (j, 1: 13), (i, 1: 13)) sort(similarities) NewVector = (i, 1: 13) mean ( most similar state vectors) at least two of three algorithms agree on results Recongninze the detected states as attacked ones Verify the next sample Fig. 1. Overview of the proposed FDI attack detection and localization method A. Employed Algorithms 1) F-test for Equality of Two Variances: In statistics, F-test is used to test the null hypothesis that the variances of two independent populations are equal [19]. F-test is a ratio of the two populations variances as follows: F = s s = (Standard Deviation ( )) (Standard Deviation ( )) where and are sample (system state vector = [, ] ) standard deviations. If the p-value 1 of test is less than or equal to the significance level, the null hypothesis will be rejected. The significance level is usually set to 5%, 1% or 0.1% [21]. Lower values give more confident decisions in accepting/rejecting the null hypothesis [22]. Thus, 0.1% is selected as the significance level in this paper. 2) Fuzzy c-means (FCM) clustering: The goal of clustering analysis is to divide a data set into several clusters such that all objects in the same cluster are more similar to each other than to those in another cluster. FCM is a clustering method wherein each data point can belong to several clusters with different degrees which are defined through membership grades [23]. This means a membership grade is assigned to each data point. In the FCM algorithm, the following cost function is minimized iteratively (i starts from 2 in our problem since we treat bus 1 as the reference bus and the index of starts from 2): (8) J = μ x c (9) where is the number of data points, is the number of clusters which is equal to two in our problem as we have 1 The P value or probability value is a continuous measure of the strength of evidence against a null hypothesis in which very small values indicate strong evidence to reject the null hypothesis and large values gives reasonable evidence to support the null hypothesis. attacked and normal state variables, is the i-th data point, is the center of the j-th cluster and is obtained by: c = μ x μ (10) where µ is the membership degree of the i-th data point in ij the j-th cluster and is calculated as follows: 1 μ = x c x c (11) where is the fuzziness index. A larger will lead to smaller membership grades and fuzzier clusters. When there is no domain knowledge, is usually set to 2. The FCM algorithm starts with a pre-defined cluster number and randomly initiated cluster membership value. The cost function is minimized iteratively with respect to c and by increasing the number of iterations, the cluster centers converge to their optimum values. Each data point belongs to specified cluster if the membership degree of this point in the mentioned cluster is maximum compared to the other clusters. 3) Box plot rule: Box plot rule is a statistical technique that has been utilized to detect anomalies in different applications [24]. This rule represents the limits beyond which any data point will be regarded as an anomaly. A box plot describes the behavior of data using summary attributes such as lower quartile (Q )(the middle value between the smallest number and the median of the state vector), median, upper quartile (Q )(the middle number between the median and the highest value of the state vector). The quantity Q Q is called the Inter Quartile Range (IQR). This method defines 2 rules to detect outliers which are defined as follows: Q 1.5 IQR < x < Q IQR (12) Q 3 IQR < x < Q + 3 IQR (13)

4 4 Any observation outside intervals [ 1.5, ] and [ 3, + 3 ] will be considered as an outlier. In particular, the (12) is poorly conservative approach and raises many false positives while the (13) is considered a very conservative approach [25]. The interval [ 3, + 3 ] has been used in this paper. 4) Median Absolute Deviation (MAD) method: One of the simple and classical outlier detection methods is to use MAD method. Unlike the standard deviation which is strongly affected by outliers, MAD is more resilient to outliers even though a few observations make the distribution of the data skewed. Note that the attacked state variable will behave as an outlier and will make the distribution of the data skewed but this will not affect the MAD method since it is robust against outliers. MAD is the median of absolute deviation from sample median and is defined as follows [26]: MAD = M ( x M( ) ) (14) Where ( ) is the median of the sample, is the n original observations and is the i-th observation of the sample. The rules of this method are as follows: M( ) 2 MAD < x < M( ) + 2 MAD (15) M( ) 3 MAD < x < M( ) + 3 MAD (16) This method indicates that any point beyond the intervals ± 2 and ± 3 will be treated as an outlier. More specifically, any point outside the interval of (15) is considered as a mild outlier and any point beyond the interval of (16) is known as a strong outlier [26]. In this paper, we have used the strong outlier detection approach because the mild approach raises relatively many false positives. B. Proposed Method Step 1: Normalization: As illustrated in the Fig. 1, we first use the min max normalization [27] to transform all samples into the same scale. Step 2: Filtering suspicious state vectors: When false data are injected into the system, probability distribution of state vectors will be different from normal ones [3]. F-test is used to filter out the suspected system state vectors. To this end, state vectors of the past 24 hours (288 samples since sampling interval is five minutes) are considered and their average is calculated. This average is a good representation of the trend of recent historical data. The variance of this average is compared against the variance of current state vector using F- test. This is because we assume that if the current state vector is an attacked one, F-test will reject the null hypothesis that the data in two vectors comes from the same variance. If the null hypothesis is not rejected, the sample will be considered as a normal one; otherwise, it is considered as a suspected sample and is further investigated in the next step. Step 3: Constructing a New State Vector: In this step, the cosine similarity [27] of the current state vector with state vectors of past 24 hours is computed. Afterwards, the n most similar state vectors to the current vector are determined. The number n is set as 10% of the number of past 24 hour samples (29 samples here). The average of these most similar samples is subtracted from the current state vector. This is because if a voltage angle is modified maliciously, it will have significantly larger/smaller value after that differencing. The reason is that the attacked states are farther than the average of their most similar recent values in comparison with normal system states. Hence, the attacked states will be inconsistent with normal ones. In other words, instead of detecting and localizing FDI by looking into a single time slot, we utilize state vectors of multiple (i.e. ) time slots to filter out attacked system states. Experiments show that so large/small values of, have poor performance in different configurations. To illustrate the impact of this step, we simulate 10% decremental attack on the state variables (, ), = (0.1,,0.1, 0,,0). Fig. 2.a shows the distribution of the falsified system state vector before applying the third step. As it is clear, if one apply an outlier detection algorithm on this vector without applying the third step, the system state can be detected as an attacked state variable while it is a normal one. Now, a new state vector is constructed using the third step. Fig. 2.b shows the distribution of this vector. Different from the primary falsified state vector, the attacked state variables exhibit different behavior from the normal ones after applying the third step of the proposed method. Finally, this new constructed vector will be given as an input to different outlier detection methods. (a) System State vector- before applying the proposed method (b) System State vector- after applying the proposed method Fig. 2. Functionality of the third step of the proposed algorithm Step 4: Majority Voting Scheme over Different Outlier Detection Methods: To detect the inconsistent data points in the new constructed state vector, this vector is inspected from three different perspectives using three well-known outlier detection algorithms. Method selection: To choose methods, we treated outlier detection like a supervised learning task. To this end, we selected different methods [24] and turned each algorithm into a feature using their internal parameters. Afterwards, information gain of each feature was calculated. Finally, the top ranked features (methods) were selected which were: FCM, IQR and MAD. Outlier is a point that stands apart from other points and deviates from the normal trend. This means that the attacked state variable will behave as an outlier and will be inconsistent with other states. Now, detection and localization are done by a majority voting. If a state variable is detected as an outlier in the majority of these three algorithms, that system state will be considered as an attacked one. In other words, if at least two of FCM, IQR and MAD algorithms detect a data point as an outlier in a system state vector, that data point will be recognized as an attacked state.

5 5 IV. NUMERICAL RESULTS To verify the performance of our proposed method, four experiments are designed and simulated. FDI attacks to the state variables without any system reconfigurations are simulated in Case I. Case II and case III are concerned with the evaluation of robustness of our detection algorithm to topology changes including a line outage and a bus outage. Finally, Case IV shows how the proposed method can manage integration of renewable energy sources into the power grid. To show the effectiveness of the proposed method, we have compared our method with different classification algorithms. It should be noted that the mean of measurements residual of each case study is presented in Appendix A. A. Test System The test system for validating the performance of the proposed methodology is IEEE 14 bus system, as shown in Fig. 3. Matpower [28] is used to complete simulations. To simulate the power system behavior in a more realistic manner the load data used in the simulations are adopted from the New York independent system operator (NYISO) [29]. There are online load flow profiles for 11 regions recorded every five minutes. This means there are about 288 values for each day. Therefore, 10% of 24 hours ago is 29( = 29). As one can see, the results of the proposed method with different values of are presented in Appendix B. The load data used is between January 1, 2016 and January 7, To generate data for SE, we first link each load bus of the test system with one region of NYISO using the shown map in Table I and then fit the normalized load data into the case file. Subsequently, a power flow is run to obtain the true measurement sets. To mimic the effect of random errors, Gaussian noises with zero mean and the standard deviation of 0.02 are added to the measurements. After orchestrating FDI attacks (Section II), the measurement data is given as inputs to the SE. The measurements placement is plotted in Fig. 3. Note that conducting FDI attacks when the measurement data contain gross error is discussed in Appendix C. B. Simulating False Data ion Attacks For orchestrating an FDI attack and manipulating a certain state variable, the adversary should inject false data into all the measurements which are dependent on that state variable. This implies that by detecting the attacked state, the manipulated measurements will be known. We orchestrate attacks to each system state and multiple states (, ) and (, ). For each targeted state, two injection amounts, which are 90% and 110% of the true value are simulated. Therefore, for each attack, one system state variable is decreased/increased by 10 percentage of its original value. 110% means that the manipulated system state variable by the adversary is 10% larger than the original value. C. Evaluation Metrics F-measure and detection rate are used to evaluate the performance of the proposed method which are defined as shown in Table II. In Table II, true positive (TP) is correctly detected and localized attacks. False positive (FP) is incorrectly detected and localized attacks. True negative (TN) is correctly rejected normal samples. False negative (FN) is Wind Farm Generation = 18 MW Maximum Real Power Demand at Bus-3: 94 MW Maximum Real Power Demand at Bus-4: 48 MW Flow measurement G ion measurement Voltage measurement G C P= 62 MW Fig. 3. IEEE 14 bus test system TABLE I NYISO LOAD DATA CHARACTERISTICS C G C C GENERATORS Region Bus Range (MW) Mean(MW) SD (MW) CAPITL Bus2 [ ] (14.29%) CENTRL Bus3 [ ] (13.10%) DUNWOD Bus4 [ ] (14.46%) GENESE Bus5 [ ] (14.18%) HUD VL Bus6 [ ] (15.40%) LONGIL Bus9 [ ] (16.15%) MHK VL Bus10 [ ] (16.22%) MILLWD Bus11 [ ] (14.12%) N.Y.C. Bus12 [ ] (15.32%) NORTH Bus13 [ ] (8.09%) WEST Bus14 [ ] (11.01%) TABLE II EVALUATION METRICS SYNCHRONOUS CONDENSERS Metric Formula Recall + Precision + F-measure 2 + Detection rate The number of attacks detected and localized by method divided by the total number of attacks in the dataset. incorrectly rejected samples (undetected attacks). F-measure value 1.0 indicates the best performance [27]. D. Case Study I: Detecting and Localizing Attacks without System Reconfiguration In this case, it is supposed that the system works well until the seventh day. This means that we have normal data for six days and the measurements of the last day are completely replaced by the attacked ones. Therefore, there are 288 attacks samples for each attacking scenarios. For comparison purpose, the classification algorithms such as support vector machine (SVM) 1 [27], k-nearest neighbor (KNN) 2 [27] and C4.5 [27] are tested on this dataset. The models were built using tenfold cross validation. Table III provides the test results for this case study. Each row of this Table shows one state variable which is attacked. As it is clear, KNN and SVM were able to detect all attack samples correctly. Table IV represents the calculated average F-measure based on Table III. 1 Polynomial kernel is used as kernel function because it provides best results. 2 k=1 and Euclidean have been used in KNN since they provide best results. C THREE WINDING TRANSFORMER EQUIVALENT

6 6 TABLE III SUMMARY OF TEST RESULTS FOR FDI ATTACKS CASE I Org*0.90 Org*1.10 Org*0.90 Org*1.10 Org*0.90 Org*1.10 Org*0.90 Org* % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % %, % % % % % % % %, % % % % % % % % TABLE IV EVALUATION OF METHODS CASE I KNN 1 100% 0 0% SVM 1 100% 0 0% C % % Proposed Method % % E. Case Study II: Detecting and Localizing Attacks after Topology Changes (Line Outage) In this case, for evaluating the robustness of the proposed method in managing topology changes, false data are injected into the system after a contingency. Contingencies can occur due to many reasons such as planned maintenance, cyberattacks, etc [30]. A line (4-5) outage is simulated as a contingency. The mentioned line is depicted in Fig. 3 in red. For this simulation, it is assumed that the system works fine until 10 AM on the seventh day. The line outage occurs at this moment and the system works under that contingency until end of the day (12 PM). But, the adversary manipulates the measurements from 6 PM to 12 PM and replaces these measurements with the attacked ones. Therefore, we have 72 attack samples for each state variable under that contingency. Table V represents the results of the proposed method and the results of testing pre-built classification models on this dataset. As it is clear from the results, the pre-constructed models are not able to yield their performance and cannot predict samples correctly after system reconfiguration. This is because the distribution of data changes after topology reconfigurations and old observations became irrelevant to the new ones. Therefore, because of dynamic environment, prediction models are not able to predict the samples correctly. Table VI shows the average F- measure related to the Table V. TABLE V SUMMARY OF TEST RESULTS FOR FDI ATTACKS CASE II Org*0.90 Org*1.10 Org*0.90 Org*1.10 Org*0.90 Org*1.10 Org*0.90 Org* % % 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% % % 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% % % 0 0% 0 0% 0 0% % % 0 0% % % 0 0% 0 0% 0 0% 0 0% 0 0% % % % 0 0% 0 0% 0 0% 0 0% 0 0% % % % 0 0% 0 0% 0 0% 0 0% 0 0% % % % 0 0% 0 0% 0 0% 0 0% % 0 0% % % 0 0% 0 0% 0 0% 0 0% % 0 0% % % 0 0% 0 0% 0 0% 0 0% % 0 0% % % 0 0% 0 0% 0 0% 0 0% % 0 0% % % % 0 0% 0 0% 0 0% % % % % % 0 0% 0 0% 0 0% % % % % 0 0% 0 0% 0 0% 0 0% % 0 0%, % % 0 0% 0 0% 0 0% 0 0% % 0 0%, % % 0 0% % 0 0% 0 0% % TABLE VI EVALUATION OF METHODS CASE II KNN % % SVM % % C % % Proposed Method % %

7 7 F. Case Study III: Detecting and Localizing Attacks after Topology Changes (Bus Outage) In this case study, to analyze the robustness of the proposed method in dealing with bus outages, false data are injected into the system after a bus outage (as a contingency). To this end, it is supposed that the system works normally till day 7. In the seventh day, all connected lines to the bus 13 (6-13; 12-3; 13-14) are removed which leads to the isolation of bus 13. These lines are depicted in Fig. 3 in blue. The system works under this situation till 6 PM and then the bus is reconnected back to the system but the measurements between 9 AM and 4 PM are manipulated by the attacker. Therefore, there are 84 attack samples for each state variable under that contingency. Table VII represents the results for this case study. Table VIII provides the values of evaluation metrics for Table VII. TABLE VII SUMMARY OF TEST RESULTS FOR FDI ATTACKS CASE III G. Case Study IV: Detecting and Localizing Attacks after Integration of Distributed Generations The increased energy demand necessitates the integration of sustainable energy resources into the existing electric networks [31]. The Smart Grid incorporates renewable energy sources (RES) as well as conventional sources of electricity. But, the drastic increase in the integration of distributed generations affects the data distribution in the power system. In this case study, we have evaluated the robustness of the proposed method in dealing with integration of RES into the network through injecting false data after that integration. To this end, it is supposed that the system works normally till day 4. In the fourth day, two wind farms are added into the system to provide at most 18 MW of the load of the bus 3 and bus 4 as shown in Fig. 3. Org*0.90 Org*1.10 Org*0.90 Org*1.10 Org*0.90 Org*1.10 Org*0.90 Org* % % 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% % % 0 0% 0 0% 0 0% 0 0% % 0 0% % % 0 0% 0 0% % % % % % % 0 0% 0 0% 0 0% % % % % % 0 0% 0 0% 0 0% 0 0% % 0 0% % % 0 0% 0 0% % 0 0% % % % % 0 0% 0 0% 0 0% 0 0% % % % % 0 0% 0 0% % % 0 0% % % % 0 0% 0 0% 0 0% % % % % % 0 0% 0 0% 0 0% 0 0% % % % % 0 0% 0 0% 0 0% 0 0% 0 0% % % % % % 0 0% 0 0% % 0 0%, % % 0 0% 0 0% 0 0% % % %, % % 0 0% 0 0% % % % % TABLE VIII EVALUATION OF METHODS CASE III KNN % % SVM % % C % % Proposed Method % % TABLE IX SUMMARY OF TEST RESULTS FOR FDI ATTACKS CASE IV Org*0.90 Org*1.10 Org*0.90 Org*1.10 Org*0.90 Org*1.10 Org*0.90 Org* % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % %, % % % % % % % %, % % % % % % % %

8 8 TABLE X EVALUATION OF METHODS CASE IV KNN % % SVM % % C % % Proposed Method % % a) Average detection rate over case studies b) Average F-measure over case studies Fig. 4. Summary of the numerical results Each wind farm includes 18 turbines. All wind turbines used in the test system are of the same type and of 1 MW, with rated speed of 14 m/s, cut-in speed of 4 m/s, the rated power of the wind turbine of 1000 kw and cut-out speed of 25 m/s like [32]. The real hourly wind speeds information have been taken from the Wunderground weather forecast website for Niagara Fall, NY [33]. The output power of a wind turbine is calculated by (17). 0, 0 ( ) ( ) 0, (17) where is the forecasted wind speed; is the rated power of the wind turbine; is the cut-in speed; is rated speed; and is cut-off speed of the wind turbine. It is assumed that the adversary attacks to the measurements of seventh day completely. This means the attack is orchestrated three days after installing the wind farms. Therefore, we have 288 attack samples for each attacking scenarios. Table IX represents the results for this case study. Table X provides the evaluating metrics values for Table IX. Fig. 4 presents a condensed view of the numerical results. As it is clear from the figure, in contrast to the classification methods, the proposed method is robust to changes and achieves a high accuracy in detecting attacks in different situations. It is noteworthy that results of the proposed method with different values of are presented in Appendix B. V. APPLICABILITY OF THE PROPOSED METHOD FOR AC SE FDI attack can be orchestrated both in DC and AC SE. Reference [4] expands the DC false data injection into AC false data injection. Different from DC SE, in AC SE, measurements are nonlinearly dependent on system state variables, which results in the following nonlinear mathematical dependencies: = ( ) + (18) where ( ) is a nonlinear vector function derived from the network topology relating the vector of state variables (voltage magnitudes and angles) to the vector of measured values (active and reactive power flows, active and reactive power injections, voltage magnitudes and angles). While in the DC SE, the adversary does not need to know the value of the system state, this is no longer the case in the AC analysis. Therefore, the attacker needs to know the values of the state variables in AC case. The attack will be hidden if the vector of attack injections in measurements satisfies (19). Further details of the FDI attack on AC SE can be found in [3, 4]. = ( + ) ( ) (19) The proposed algorithm is a general anomaly detection method and can be easily adapted for the AC SE because we only use the state vector to detect attacks. This means the proposed method uses the minimal set of features. More specifically, the proposed detection algorithm receives only a numerical vector and then inspects the possibility of malicious manipulations. From this perspective, there is no difference for the proposed method to work in DC case or AC case since the voltage magnitudes are a numerical vector. However, to show that the proposed method is applicable for AC SE, the FDI attacks are orchestrated in AC case when there is no system reconfiguration. A. AC SE Case Study: Detecting and Localizing Attacks without System Reconfiguration For this case, it is assumed that the false data are added to the measurements from 12 PM to 4 PM on the third day. Therefore, there are 48 samples for each attacking scenario. Table XI and Table XII summarize the test results for this

9 9 case study. Like Section IV, the classification models were built using 10-fold cross validation. As it is clear from the results, the proposed method is applicable for the AC SE. Moreover, considering the results on AC SE and the numerical results on DC SE, we expect that the proposed unsupervised method which does not need supervision can deal with system reconfigurations and integration of RES on AC SE. It is also clear that SVM wasn t able to predict the attack samples correctly. Unlike phase angles, voltage magnitudes of buses are close together and approximately 1. Therefore, attacks do not lead to remarkable differences for SVM to be able to find a meaningful margin between classes to separate them. VI. THREATS TO VALIDITY Two possible threats to validity of this study are discussed here. The first one is about the result of F-test. Analyzing the variance is one of the steps of the proposed method to filter suspicious state vectors since injecting false data into the system will lead to a change in the probability distribution of the state vectors. But this test may disrupt the functionality of the proposed method if it makes a wrong decision by incorrect acceptance/rejection of the null hypothesis. Two scenarios can be discussed: 1) If F-test supports the null hypothesis that the data in two vectors comes from the same variance for an attacked state vector. When the F-test supports the null hypothesis, the sample will be considered as a normal one and will not be a case of further investigation (step 3 and step 4). To complete simulations, online load profile from NYISO is adopted which its characteristics are shown in Table I. Regarding this fact and the numerical results (Fig. 4) which show that the proposed method has achieved a high accuracy in different case studies, the proposed method can be suitable for the real world. But there is possibility that the attacker launches the attacks with low injection amounts which leads to jumps in the data where the variance is not significantly changed. Therefore, the F-test TABLE XI SUMMARY OF TEST RESULTS FOR FDI ATTACKS AC SE Org*0.90 Org*1.10 Org*0.90 Org*1.10 Org*0.90 Org*1.10 Org*0.90 Org* % % % % 0 0% 0 0% % % % % % % 0 0% 0 0% % % % % % % 0 0% 0 0% % % % % % % 0 0% 0 0% % % % % % % 0 0% 0 0% % % % % % % 0 0% 0 0% % % % % % % 0 0% 0 0% % % % % % % 0 0% 0 0% % % % % % % 0 0% 0 0% % % % % % % 0 0% 0 0% % % % % % % 0 0% 0 0% % % % % % % 0 0% 0 0% % % % % % % 0 0% 0 0% % % % % % % 0 0% 0 0% % % might conduct that the data in two vectors comes from the TABLE XII EVALUATION OF METHODS AC SE same variance and consider the attacked vector as a normal one which leads to an increase in FN rate. KNN 1 100% 0 0% To evaluate this threat, we have simulated injection SVM % 0 0% amounts 95% and 105% (the manipulated system state C % % variable by the adversary is 5% larger/smaller than the original Proposed Method % % value) of the true value in Appendix D. We noticed that F-test still rejects the null hypothesis for the attacked vectors but when the manipulated state variable is 95%/105% of its variable, there are slightly more samples that cannot be detected by both the proposed method and the supervised algorithms. This is because 95%/105% is closer to the original value, thus the attacks have smaller impact on the state variables and measurements. However, experimental results show the superiority of the proposed method over the supervised methods in these low injection amounts. Fig. 5 presents a condensed view of the numerical results. Note that lower values of significance level give more confident decisions in accepting/rejecting the null hypothesis [22]. Therefore, we have used a very small value (0.1%) which provides robust results in different case studies. In addition, normalization improves the accuracy of the proposed method and causes all features to contribute equally. 2) If F-test rejects the null hypothesis for the normal state vectors. More specifically, there could be natural jumps in the data that inflate the variance in the absence of attacks which leads to consider some normal vectors as suspected ones. Note that there are already such natural jumps in the data. Distribution of the used NYISO load data for each bus is represented in Appendix E. Moreover, these natural jumps increase when the power systems are affected by sustainable energy sources or system reconfigurations like the simulated case studies in Section IV. But considering a normal sate vector as a suspected one does not mean that the proposed method has raised a false alarm since analyzing variance is only one step of the proposed method and the suspected samples will be analyzed deeper in the next steps. After the fourth step of the proposed method, if none of the state variables are recognized as outlier, that vector will be considered as a normal one. This means steps 3 and 4 of the proposed method are also helpful in reducing the false positive (FP) rate. However, numerical results showed that the proposed method has a reasonable FP rate in different case studies (Tables IV, VI, VIII and X).

10 10 a) Average detection rate over case studies b) Average F-measure over case studies Fig. 5. Summary of the test results for injection amounts 95 % and 105% The second threat is when the adversary has injected false data into the system for a long time. In other words, there is no clean data when the method is deployed. This threat is also true for other detection based methods. However, we believe in that this is rarely practical in the real world. VII. CONCLUSION False data injection attacks can cause inaccurate state estimates and wrong monitoring decisions. Existing defense and mitigation methods are designed for a specific system configuration. Therefore, they might not be able to detect attacks in different configurations. In this paper, we discussed how to combat FDI attacks in the power systems that are affected by renewable energy sources or system reconfigurations. In particular, we presented a novel unsupervised FDI attack detection and localization method which could detect attacks after topology reconfigurations and integration of renewable energy resources into the power grid. When false data are injected into the system, the distribution of the state vector will be different from the normal one, allowing for detection of the attack. We first filtered out the suspected samples through verifying the equality of variance of the current state vector and the average of recent historical data using F-test. The suspected samples that reject the hypothesis of the test were investigated deeper. The vector difference between suspected vector and average of most similar system state vector sets was calculated. This new vector was analyzed by three different outlier detection algorithms: FCM, IQR and MAD for a majority voting. If at least two of three algorithms recognize a state as an outlier, that state will be detected as an attacked state. This means that detection and localization of attacks are done simultaneously. Numerical results showed the robustness of the proposed method against changes in the system such as a contingency and integration of wind farms. APPENDIX A Table XIII shows the mean of residual values of the attacked measurements and the corresponding normal ones. Although false data are injected into the system, the residuals are the same. Hence, the FDI attacks will bypass the BDD. TABLE XIII BAD DATA DETECTION RESULTS Value Case I Case II Case III Case IV Mean of normal residual Mean of attack residual APPENDIX B Table XIV, XV, XVI and XVII represent the performance of the proposed method with values 14 and 58 for which are 5% and 20% of 24 hours ago data respectively. As it is clear from the Tables, our method is able to acquire a high accuracy with different values of (for injection amounts 90% and 110%). TABLE XIV EVALUATION OF THE PROPOSED METHOD CASE I Value of n F-Measure Detection Rate False Positive n= % % n = % % n = % % TABLE XV EVALUATION OF THE PROPOSED METHOD CASE II Value of n F-Measure Detection Rate False Positive n = % % n = % % n = % % TABLE XVI EVALUATION OF THE PROPOSED METHOD CASE III Value of n F-Measure Detection Rate False Positive n = % % n = % % n = % % TABLE XVII EVALUATION OF THE PROPOSED METHOD CASE IV Value of n F-Measure Detection Rate False Positive n = % % n = % % n = % %

11 11 APPENDIX C Almost all developed methods against FDI attack are designed with the assumption that large measurement errors can be detected using residual analysis of traditional bad data detection method. However, Reference [34] shows the cases that this assumption is violated. To this end, the authors proposed an index called Undetectability Index (UI). Gross errors in measurements with high UIs are very difficult to be detected by methods based on residual analysis. It is important to highlight that undetected bad data in the 3 30 standard deviations range by the largest normalized residual test, can lead to significant errors in the estimated states [34]. Tables XVIII to XXIII represent the results when the false data are injected into the measurements with gross errors. More specifically, in the following simulations, measurements P:2 (active power injection measurement at bus 2), P:12-13 (active power flow measurement from bus 12 to bus 13) and Q:2-3 (reactive power flow measurement from bus 2 to bus 3) contain gross error of 6 standard deviations given by = ( ), where is the value obtained from a load flow solution and is the meter precision, equal to 5%. For these cases, it is assumed that the false data are added to the measurements from 12 PM to 4 PM on the third day. Therefore, there are 48 samples for each attacking scenario. TABLE XVIII SUMMARY OF DETECTION RESULTS FOR FDI ATTACKS ON MEASUREMENTS WITH GROSS ERROR ON P:2 PHASE ANGLES KNN % % SVM % % C % % Proposed Method % % TABLE XIX SUMMARY OF DETECTION RESULTS FOR FDI ATTACKS ON MEASUREMENTS WITH GROSS ERROR ON P:2 VOLTAGE MAGNITUDES KNN 1 100% 0 0% SVM % 0 0% C % % Proposed Method % % TABLE XX SUMMARY OF DETECTION RESULTS FOR FDI ATTACKS ON MEASUREMENTS WITH GROSS ERROR ON P:12-13 PHASE ANGLES KNN % % SVM % 0 0% C % % Proposed Method % % TABLE XXI SUMMARY OF DETECTION RESULTS FOR FDI ATTACKS ON MEASUREMENTS WITH GROSS ERROR ON P:12-13 VOLTAGE MAGNITUDES KNN 1 100% 0 0% SVM % 0 0% C % % Proposed Method % % TABLE XXII - Phase SUMMARY OF DETECTION RESULTS FOR FDI ATTACKS ON MEASUREMENTS WITH GROSS ERROR ON Q2-3 - PHASE ANGLES KNN % % SVM % % C % % Proposed Method % % TABLE XXIII SUMMARY OF DETECTION RESULTS FOR FDI ATTACKS ON MEASUREMENTS WITH GROSS ERROR ON Q:2-3 VOLTAGE MAGNITUDES KNN % 0 0% SVM % 0 0% C % % Proposed Method % % APPENDIX D Tables XXIV to XXXI represent the results for the designed case studies with injection amounts 95% and 105%. APPENDIX E Figure 6 shows the distribution of used NYISO load values for each zone. Characteristics of each zone are also shown in Table I. As it is clear, the standard deviations (SD) are more than 10% in most of the buses. TABLE XXIV SUMMARY OF TEST RESULTS FOR FDI ATTACKS CASE I Org*0.95 Org*1.05 Org*0.95 Org*1.05 Org*0.95 Org*1.05 Org*0.95 Org* % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % %, % % % % % % % %, % % % % % % % %

Cyber Attacks, Detection and Protection in Smart Grid State Estimation

Cyber Attacks, Detection and Protection in Smart Grid State Estimation 1 Cyber Attacks, Detection and Protection in Smart Grid State Estimation Yi Zhou, Student Member, IEEE Zhixin Miao, Senior Member, IEEE Abstract This paper reviews the types of cyber attacks in state estimation

More information

False Data Injection Attacks Against Nonlinear State Estimation in Smart Power Grids

False Data Injection Attacks Against Nonlinear State Estimation in Smart Power Grids 1 False Data Injection Attacks Against Nonlinear State Estimation in Smart Power rids Md. Ashfaqur Rahman and Hamed Mohsenian-Rad Department of Electrical and Computer Engineering, Texas Tech University,

More information

Anomaly Detection. Jing Gao. SUNY Buffalo

Anomaly Detection. Jing Gao. SUNY Buffalo Anomaly Detection Jing Gao SUNY Buffalo 1 Anomaly Detection Anomalies the set of objects are considerably dissimilar from the remainder of the data occur relatively infrequently when they do occur, their

More information

Role of Synchronized Measurements In Operation of Smart Grids

Role of Synchronized Measurements In Operation of Smart Grids Role of Synchronized Measurements In Operation of Smart Grids Ali Abur Electrical and Computer Engineering Department Northeastern University Boston, Massachusetts Boston University CISE Seminar November

More information

Weighted Least Squares Topology Error Detection And Identification

Weighted Least Squares Topology Error Detection And Identification Weighted Least Squares Topology Error Detection And Identification A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Jason Glen Lindquist IN PARTIAL FULFILLMENT

More information

STATE ESTIMATION IN DISTRIBUTION SYSTEMS

STATE ESTIMATION IN DISTRIBUTION SYSTEMS SAE ESIMAION IN DISRIBUION SYSEMS 2015 CIGRE Grid of the Future Symposium Chicago (IL), October 13, 2015 L. Garcia-Garcia, D. Apostolopoulou Laura.GarciaGarcia@ComEd.com Dimitra.Apostolopoulou@ComEd.com

More information

THE future smart grid, which leverages advanced information. CCPA: Coordinated Cyber-Physical Attacks and Countermeasures in Smart Grid

THE future smart grid, which leverages advanced information. CCPA: Coordinated Cyber-Physical Attacks and Countermeasures in Smart Grid CCPA: Coordinated Cyber-Physical Attacks and Countermeasures in Smart Grid Ruilong Deng, Member, IEEE, Peng Zhuang, and Hao Liang, Member, IEEE Abstract Smart grid, as one of the most critical infrastructures,

More information

Unsupervised Anomaly Detection for High Dimensional Data

Unsupervised Anomaly Detection for High Dimensional Data Unsupervised Anomaly Detection for High Dimensional Data Department of Mathematics, Rowan University. July 19th, 2013 International Workshop in Sequential Methodologies (IWSM-2013) Outline of Talk Motivation

More information

Robustness Analysis of Power Grid under False Data Attacks Against AC State Estimation

Robustness Analysis of Power Grid under False Data Attacks Against AC State Estimation Robustness Analysis of Power Grid under False Data Attacks Against AC State Estimation Presenter: Ming Jin INFORMS 2017 Ming Jin, Prof Javad Lavaei, and Prof Karl Johansson 1 Power system resilience against

More information

Power Grid State Estimation after a Cyber-Physical Attack under the AC Power Flow Model

Power Grid State Estimation after a Cyber-Physical Attack under the AC Power Flow Model Power Grid State Estimation after a Cyber-Physical Attack under the AC Power Flow Model Saleh Soltan, Gil Zussman Department of Electrical Engineering Columbia University, New York, NY Email: {saleh,gil}@ee.columbia.edu

More information

Price Correction Examples

Price Correction Examples Price Correction Examples Billing and Price Correction Task Force September 19, 2005 Nicole Bouchez, Ph.D. Sr. Economist Market Monitoring and Performance Price Correction Examples Data Input Errors Bad

More information

Minimum Sparsity of Unobservable. Power Network Attacks

Minimum Sparsity of Unobservable. Power Network Attacks Minimum Sparsity of Unobservable 1 Power Network Attacks Yue Zhao, Andrea Goldsmith, H. Vincent Poor Abstract Physical security of power networks under power injection attacks that alter generation and

More information

PowerScope: Early Event Detection and Identification in Electric Power Systems

PowerScope: Early Event Detection and Identification in Electric Power Systems PowerScope: Early Event Detection and Identification in Electric Power Systems Yang Weng 1, Christos Faloutos 2, and Marija Ilic 1 1 Department of Electrical and Computer Engineering, Carnegie Mellon University,

More information

Cascading Outages in Power Systems. Rui Yao

Cascading Outages in Power Systems. Rui Yao Cascading Outages in Power Systems Rui Yao yaorui.thu@gmail.com Outline Understanding cascading outages Characteristics of cascading outages Mitigation of cascading outages Understanding cascading outages

More information

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology

More information

Power System Security. S. Chakrabarti

Power System Security. S. Chakrabarti Power System Security S. Chakrabarti Outline Introduction Major components of security assessment On-line security assessment Tools for contingency analysis DC power flow Linear sensitivity factors Line

More information

Fine Tuning Of State Estimator Using Phasor Values From Pmu s

Fine Tuning Of State Estimator Using Phasor Values From Pmu s National conference on Engineering Innovations and Solutions (NCEIS 2018) International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume

More information

CS570 Data Mining. Anomaly Detection. Li Xiong. Slide credits: Tan, Steinbach, Kumar Jiawei Han and Micheline Kamber.

CS570 Data Mining. Anomaly Detection. Li Xiong. Slide credits: Tan, Steinbach, Kumar Jiawei Han and Micheline Kamber. CS570 Data Mining Anomaly Detection Li Xiong Slide credits: Tan, Steinbach, Kumar Jiawei Han and Micheline Kamber April 3, 2011 1 Anomaly Detection Anomaly is a pattern in the data that does not conform

More information

Quantifying Cyber Security for Networked Control Systems

Quantifying Cyber Security for Networked Control Systems Quantifying Cyber Security for Networked Control Systems Henrik Sandberg ACCESS Linnaeus Centre, KTH Royal Institute of Technology Joint work with: André Teixeira, György Dán, Karl H. Johansson (KTH) Kin

More information

Impacts of Bad Data and Cyber Attacks on Electricity Market Operations

Impacts of Bad Data and Cyber Attacks on Electricity Market Operations Impacts of Bad Data and Cyber Attacks on Electricity Market Operations Final Project Report Power Systems Engineering Research Center Empowering Minds to Engineer the Future Electric Energy System Impacts

More information

WIDE AREA CONTROL THROUGH AGGREGATION OF POWER SYSTEMS

WIDE AREA CONTROL THROUGH AGGREGATION OF POWER SYSTEMS WIDE AREA CONTROL THROUGH AGGREGATION OF POWER SYSTEMS Arash Vahidnia B.Sc, M.Sc in Electrical Engineering A Thesis submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy

More information

Proper Security Criteria Determination in a Power System with High Penetration of Renewable Resources

Proper Security Criteria Determination in a Power System with High Penetration of Renewable Resources Proper Security Criteria Determination in a Power System with High Penetration of Renewable Resources Mojgan Hedayati, Kory Hedman, and Junshan Zhang School of Electrical, Computer, and Energy Engineering

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Introduction to State Estimation of Power Systems ECG 740

Introduction to State Estimation of Power Systems ECG 740 Introduction to State Estimation of Power Systems ECG 740 Introduction To help avoid major system failures, electric utilities have installed extensive supervisory control and data acquisition (SCADA)

More information

Local Cyber-physical Attack with Leveraging Detection in Smart Grid

Local Cyber-physical Attack with Leveraging Detection in Smart Grid Local Cyber-physical Attack with Leveraging Detection in Smart rid Hwei-Ming Chung, Wen-Tai Li, Chau Yuen, Wei-Ho Chung, and Chao-Kai Wen Research Center for Information Technology Innovation, Academia

More information

Mining Classification Knowledge

Mining Classification Knowledge Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology COST Doctoral School, Troina 2008 Outline 1. Bayesian classification

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

ECE 661: Homework 10 Fall 2014

ECE 661: Homework 10 Fall 2014 ECE 661: Homework 10 Fall 2014 This homework consists of the following two parts: (1) Face recognition with PCA and LDA for dimensionality reduction and the nearest-neighborhood rule for classification;

More information

A Search Method for Obtaining Initial Guesses for Smart Grid State Estimation

A Search Method for Obtaining Initial Guesses for Smart Grid State Estimation 1 A Search Method for Obtaining Initial Guesses for Smart Grid State Estimation Yang Weng, Student Member, IEEE, Rohit Negi, Member, IEEE, and Marija D. Ilić, Fellow, IEEE Abstract AC power system state

More information

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Anomaly Detection for the CERN Large Hadron Collider injection magnets

Anomaly Detection for the CERN Large Hadron Collider injection magnets Anomaly Detection for the CERN Large Hadron Collider injection magnets Armin Halilovic KU Leuven - Department of Computer Science In cooperation with CERN 2018-07-27 0 Outline 1 Context 2 Data 3 Preprocessing

More information

URTDSM Initiative in India and Controlled Islanding using PMU Measurements

URTDSM Initiative in India and Controlled Islanding using PMU Measurements URTDSM Initiative in India and Controlled Islanding using PMU Measurements Akhil Raj Gopal Gajjar Meenal Chougule Narayanan Rajagopal Prashant Navalkar Rajeev Gajbhiye S. A. Soman PowerAnser Labs, IIT

More information

Forecasting Wind Ramps

Forecasting Wind Ramps Forecasting Wind Ramps Erin Summers and Anand Subramanian Jan 5, 20 Introduction The recent increase in the number of wind power producers has necessitated changes in the methods power system operators

More information

ECEN 615 Methods of Electric Power Systems Analysis Lecture 18: Least Squares, State Estimation

ECEN 615 Methods of Electric Power Systems Analysis Lecture 18: Least Squares, State Estimation ECEN 615 Methods of Electric Power Systems Analysis Lecture 18: Least Squares, State Estimation Prof. om Overbye Dept. of Electrical and Computer Engineering exas A&M University overbye@tamu.edu Announcements

More information

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics DETAILED CONTENTS About the Author Preface to the Instructor To the Student How to Use SPSS With This Book PART I INTRODUCTION AND DESCRIPTIVE STATISTICS 1. Introduction to Statistics 1.1 Descriptive and

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Malicious Data Detection in State Estimation Leveraging System Losses & Estimation of Perturbed Parameters

Malicious Data Detection in State Estimation Leveraging System Losses & Estimation of Perturbed Parameters Malicious Data Detection in State Estimation Leveraging System Losses & Estimation of Perturbed Parameters William Niemira Rakesh B. Bobba Peter Sauer William H. Sanders University of Illinois at Urbana-Champaign

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification

More information

Performance Evaluation and Comparison

Performance Evaluation and Comparison Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

Contents. Acknowledgments. xix

Contents. Acknowledgments. xix Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables

More information

A NEW STATE TRANSITION MODEL FOR FORECASTING-AIDED STATE ESTIMATION FOR THE GRID OF THE FUTURE

A NEW STATE TRANSITION MODEL FOR FORECASTING-AIDED STATE ESTIMATION FOR THE GRID OF THE FUTURE A NEW STATE TRANSITION MODEL FOR FORECASTING-AIDED STATE ESTIMATION FOR THE GRID OF THE FUTURE MOHAMMAD HASSANZADEH Dissertation submitted to the faculty of the Virginia Polytechnic Institute and State

More information

Optimal Placement & sizing of Distributed Generator (DG)

Optimal Placement & sizing of Distributed Generator (DG) Chapter - 5 Optimal Placement & sizing of Distributed Generator (DG) - A Single Objective Approach CHAPTER - 5 Distributed Generation (DG) for Power Loss Minimization 5. Introduction Distributed generators

More information

arxiv: v1 [math.oc] 8 Nov 2010

arxiv: v1 [math.oc] 8 Nov 2010 A Cyber Security Study of a SCADA Energy Management System: Stealthy Deception Attacks on the State Estimator arxiv:1011.1828v1 [math.oc] 8 Nov 2010 Abstract André Teixeira a György Dán b Henrik Sandberg

More information

Evaluation. Andrea Passerini Machine Learning. Evaluation

Evaluation. Andrea Passerini Machine Learning. Evaluation Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain

More information

Midterm: CS 6375 Spring 2015 Solutions

Midterm: CS 6375 Spring 2015 Solutions Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi

More information

Computational Intelligence Lecture 3: Simple Neural Networks for Pattern Classification

Computational Intelligence Lecture 3: Simple Neural Networks for Pattern Classification Computational Intelligence Lecture 3: Simple Neural Networks for Pattern Classification Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Fall 2011 arzaneh Abdollahi

More information

CSE 546 Final Exam, Autumn 2013

CSE 546 Final Exam, Autumn 2013 CSE 546 Final Exam, Autumn 0. Personal info: Name: Student ID: E-mail address:. There should be 5 numbered pages in this exam (including this cover sheet).. You can use any material you brought: any book,

More information

MIRA, SVM, k-nn. Lirong Xia

MIRA, SVM, k-nn. Lirong Xia MIRA, SVM, k-nn Lirong Xia Linear Classifiers (perceptrons) Inputs are feature values Each feature has a weight Sum is the activation activation w If the activation is: Positive: output +1 Negative, output

More information

Introduction to Supervised Learning. Performance Evaluation

Introduction to Supervised Learning. Performance Evaluation Introduction to Supervised Learning Performance Evaluation Marcelo S. Lauretto Escola de Artes, Ciências e Humanidades, Universidade de São Paulo marcelolauretto@usp.br Lima - Peru Performance Evaluation

More information

A CUSUM approach for online change-point detection on curve sequences

A CUSUM approach for online change-point detection on curve sequences ESANN 22 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges Belgium, 25-27 April 22, i6doc.com publ., ISBN 978-2-8749-49-. Available

More information

Modeling disruption and dynamic response of water networks. Sifat Ferdousi August 19, 2016

Modeling disruption and dynamic response of water networks. Sifat Ferdousi August 19, 2016 Modeling disruption and dynamic response of water networks Sifat Ferdousi August 19, 2016 Threat to water networks The main threats to water infrastructure systems can be classified in three different

More information

Evaluation requires to define performance measures to be optimized

Evaluation requires to define performance measures to be optimized Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation

More information

A Search Method for Obtaining Initial Guesses for Smart Grid State Estimation

A Search Method for Obtaining Initial Guesses for Smart Grid State Estimation 1 A Search Method for Obtaining Initial Guesses for Smart Grid State Estimation Yang Weng, Student Member, IEEE, Rohit Negi, Member, IEEE, and Marija D. Ilić, Fellow, IEEE Abstract AC power system state

More information

Completion Time of Fuzzy GERT-type Networks with Loops

Completion Time of Fuzzy GERT-type Networks with Loops Completion Time of Fuzzy GERT-type Networks with Loops Sina Ghaffari 1, Seyed Saeid Hashemin 2 1 Department of Industrial Engineering, Ardabil Branch, Islamic Azad university, Ardabil, Iran 2 Department

More information

Detection and Identification of Data Attacks in Power System

Detection and Identification of Data Attacks in Power System 2012 American Control Conference Fairmont Queen Elizabeth, Montréal, Canada June 27-June 29, 2012 Detection and Identification of Data Attacks in Power System Kin Cheong Sou, Henrik Sandberg and Karl Henrik

More information

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Lecture - 04 Basic Statistics Part-1 (Refer Slide Time: 00:33)

More information

Predictive analysis on Multivariate, Time Series datasets using Shapelets

Predictive analysis on Multivariate, Time Series datasets using Shapelets 1 Predictive analysis on Multivariate, Time Series datasets using Shapelets Hemal Thakkar Department of Computer Science, Stanford University hemal@stanford.edu hemal.tt@gmail.com Abstract Multivariate,

More information

Smart Home Health Analytics Information Systems University of Maryland Baltimore County

Smart Home Health Analytics Information Systems University of Maryland Baltimore County Smart Home Health Analytics Information Systems University of Maryland Baltimore County 1 IEEE Expert, October 1996 2 Given sample S from all possible examples D Learner L learns hypothesis h based on

More information

Mining Classification Knowledge

Mining Classification Knowledge Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology SE lecture revision 2013 Outline 1. Bayesian classification

More information

Discovery Through Situational Awareness

Discovery Through Situational Awareness Discovery Through Situational Awareness BRETT AMIDAN JIM FOLLUM NICK BETZSOLD TIM YIN (UNIVERSITY OF WYOMING) SHIKHAR PANDEY (WASHINGTON STATE UNIVERSITY) Pacific Northwest National Laboratory February

More information

DUE to their complexity and magnitude, modern infrastructure

DUE to their complexity and magnitude, modern infrastructure ACCEPTED TO IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING 1 REACT to Cyber Attacks on Power Grids Saleh Soltan, Member, IEEE, Mihalis Yannakakis, and Gil Zussman, Senior Member, IEEE Abstract Motivated

More information

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest

More information

Chapter 1 - Lecture 3 Measures of Location

Chapter 1 - Lecture 3 Measures of Location Chapter 1 - Lecture 3 of Location August 31st, 2009 Chapter 1 - Lecture 3 of Location General Types of measures Median Skewness Chapter 1 - Lecture 3 of Location Outline General Types of measures What

More information

Unsupervised Learning Methods

Unsupervised Learning Methods Structural Health Monitoring Using Statistical Pattern Recognition Unsupervised Learning Methods Keith Worden and Graeme Manson Presented by Keith Worden The Structural Health Monitoring Process 1. Operational

More information

THE electric power system is a complex cyber-physical

THE electric power system is a complex cyber-physical Implication of Unobservable State-and-topology Cyber-physical Attacks Jiazi Zhang, Student Member, IEEE, Lalitha Sankar, Senior Member, IEEE arxiv:509.00520v [cs.sy] Sep 205 Abstract This paper studies

More information

6.036 midterm review. Wednesday, March 18, 15

6.036 midterm review. Wednesday, March 18, 15 6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Hypothesis Space variable size deterministic continuous parameters Learning Algorithm linear and quadratic programming eager batch SVMs combine three important ideas Apply optimization

More information

Data Mining Based Anomaly Detection In PMU Measurements And Event Detection

Data Mining Based Anomaly Detection In PMU Measurements And Event Detection Data Mining Based Anomaly Detection In PMU Measurements And Event Detection P. Banerjee, S. Pandey, M. Zhou, A. Srivastava, Y. Wu Smart Grid Demonstration and Research Investigation Lab (SGDRIL) Energy

More information

Chart types and when to use them

Chart types and when to use them APPENDIX A Chart types and when to use them Pie chart Figure illustration of pie chart 2.3 % 4.5 % Browser Usage for April 2012 18.3 % 38.3 % Internet Explorer Firefox Chrome Safari Opera 35.8 % Pie chart

More information

Generalized Injection Shift Factors and Application to Estimation of Power Flow Transients

Generalized Injection Shift Factors and Application to Estimation of Power Flow Transients Generalized Injection Shift Factors and Application to Estimation of Power Flow Transients Yu Christine Chen, Alejandro D. Domínguez-García, and Peter W. Sauer Department of Electrical and Computer Engineering

More information

Comparative analysis of neural network and regression based condition monitoring approaches for wind turbine fault detection

Comparative analysis of neural network and regression based condition monitoring approaches for wind turbine fault detection Downloaded from orbit.dtu.dk on: Oct, 218 Comparative analysis of neural network and regression based condition monitoring approaches for wind turbine fault detection Schlechtingen, Meik; Santos, Ilmar

More information

Multi-Plant Photovoltaic Energy Forecasting Challenge: Second place solution

Multi-Plant Photovoltaic Energy Forecasting Challenge: Second place solution Multi-Plant Photovoltaic Energy Forecasting Challenge: Second place solution Clément Gautrais 1, Yann Dauxais 1, and Maël Guilleme 2 1 University of Rennes 1/Inria Rennes clement.gautrais@irisa.fr 2 Energiency/University

More information

Learning with multiple models. Boosting.

Learning with multiple models. Boosting. CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models

More information

Learning Methods for Linear Detectors

Learning Methods for Linear Detectors Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2011/2012 Lesson 20 27 April 2012 Contents Learning Methods for Linear Detectors Learning Linear Detectors...2

More information

Anomaly (outlier) detection. Huiping Cao, Anomaly 1

Anomaly (outlier) detection. Huiping Cao, Anomaly 1 Anomaly (outlier) detection Huiping Cao, Anomaly 1 Outline General concepts What are outliers Types of outliers Causes of anomalies Challenges of outlier detection Outlier detection approaches Huiping

More information

BAYESIAN CLASSIFICATION OF HIGH DIMENSIONAL DATA WITH GAUSSIAN PROCESS USING DIFFERENT KERNELS

BAYESIAN CLASSIFICATION OF HIGH DIMENSIONAL DATA WITH GAUSSIAN PROCESS USING DIFFERENT KERNELS BAYESIAN CLASSIFICATION OF HIGH DIMENSIONAL DATA WITH GAUSSIAN PROCESS USING DIFFERENT KERNELS Oloyede I. Department of Statistics, University of Ilorin, Ilorin, Nigeria Corresponding Author: Oloyede I.,

More information

Historical Data-Driven State Estimation for Electric Power Systems

Historical Data-Driven State Estimation for Electric Power Systems 1 Historical Data-Driven State Estimation for Electric Power Systems Yang Weng, Student Member, IEEE, Rohit Negi, Member, IEEE, Marija D. Ilić, Fellow, IEEE I. ABSTRACT This paper is motivated by major

More information

Reliability of Bulk Power Systems (cont d)

Reliability of Bulk Power Systems (cont d) Reliability of Bulk Power Systems (cont d) Important requirements of a reliable electric power service Voltage and frequency must be held within close tolerances Synchronous generators must be kept running

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Optimal PMU Placement

Optimal PMU Placement Optimal PMU Placement S. A. Soman Department of Electrical Engineering Indian Institute of Technology Bombay Dec 2, 2011 PMU Numerical relays as PMU System Observability Control Center Architecture WAMS

More information

An Alternative Algorithm for Classification Based on Robust Mahalanobis Distance

An Alternative Algorithm for Classification Based on Robust Mahalanobis Distance Dhaka Univ. J. Sci. 61(1): 81-85, 2013 (January) An Alternative Algorithm for Classification Based on Robust Mahalanobis Distance A. H. Sajib, A. Z. M. Shafiullah 1 and A. H. Sumon Department of Statistics,

More information

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning Linear Regression. Prof. Matteo Matteucci Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares

More information

CÁTEDRA ENDESA DE LA UNIVERSIDAD DE SEVILLA

CÁTEDRA ENDESA DE LA UNIVERSIDAD DE SEVILLA Detection of System Disturbances Using Sparsely Placed Phasor Measurements Ali Abur Department of Electrical and Computer Engineering Northeastern University, Boston abur@ece.neu.edu CÁTEDRA ENDESA DE

More information

CptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1

CptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1 CptS 570 Machine Learning School of EECS Washington State University CptS 570 - Machine Learning 1 IEEE Expert, October 1996 CptS 570 - Machine Learning 2 Given sample S from all possible examples D Learner

More information

Dynamic Attacks on Power Systems Economic Dispatch

Dynamic Attacks on Power Systems Economic Dispatch Dynamic Attacks on Power Systems Economic Dispatch Jinsub Kim School of Electrical Engineering and Computer Science Oregon State University, Corvallis, OR 9733 Email: {insub.kim}@oregonstate.edu Lang Tong

More information

Wiley. Methods and Applications of Linear Models. Regression and the Analysis. of Variance. Third Edition. Ishpeming, Michigan RONALD R.

Wiley. Methods and Applications of Linear Models. Regression and the Analysis. of Variance. Third Edition. Ishpeming, Michigan RONALD R. Methods and Applications of Linear Models Regression and the Analysis of Variance Third Edition RONALD R. HOCKING PenHock Statistical Consultants Ishpeming, Michigan Wiley Contents Preface to the Third

More information

MEASUREMENTS that are telemetered to the control

MEASUREMENTS that are telemetered to the control 2006 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 19, NO. 4, NOVEMBER 2004 Auto Tuning of Measurement Weights in WLS State Estimation Shan Zhong, Student Member, IEEE, and Ali Abur, Fellow, IEEE Abstract This

More information

A Data-driven Approach for Remaining Useful Life Prediction of Critical Components

A Data-driven Approach for Remaining Useful Life Prediction of Critical Components GT S3 : Sûreté, Surveillance, Supervision Meeting GdR Modélisation, Analyse et Conduite des Systèmes Dynamiques (MACS) January 28 th, 2014 A Data-driven Approach for Remaining Useful Life Prediction of

More information

Smart Grid State Estimation by Weighted Least Square Estimation

Smart Grid State Estimation by Weighted Least Square Estimation International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249 8958, Volume-5, Issue-6, August 2016 Smart Grid State Estimation by Weighted Least Square Estimation Nithin V G, Libish T

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision

More information

Metric-based classifiers. Nuno Vasconcelos UCSD

Metric-based classifiers. Nuno Vasconcelos UCSD Metric-based classifiers Nuno Vasconcelos UCSD Statistical learning goal: given a function f. y f and a collection of eample data-points, learn what the function f. is. this is called training. two major

More information

A Posteriori Corrections to Classification Methods.

A Posteriori Corrections to Classification Methods. A Posteriori Corrections to Classification Methods. Włodzisław Duch and Łukasz Itert Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland; http://www.phys.uni.torun.pl/kmk

More information