NOWADAYS, power systems are strongly changing all

1 Blackout probabilistic risk assessment and thermal effects: impacts of changes in generation Pierre Henneaux, Student Member, IEEE, Pierre-Etienne Labeau, and Jean-Claude Maun, Member, IEEE Abstract Renewable energy integration and deregulation imply that the electric grid will be operated near its limits in the future, and that the variability of cross-border flows will increase. Therefore, it is becoming more and more crucial to study the impact of these changes on the risk of cascading failures leading to blackout. We propose in this paper to emphasize important factors leading to blackouts, to review methodologies which were developed to simulate cascading failure mechanisms and to study specifically the impact of thermal effects on the risk of blackout for several changes in generation (variations in cross-border flows, wind farms penetration, shut-down of power plants). This is studied by applying to a test system the first level of a dynamic probabilistic blackout risk assessment developed previously. We show that taking into account thermal effects in cascading failures is important not only to have a good estimation of the risk of blackout in different grid configurations, but also to determine if a specific change in generation has a positive or a negative impact on the blackout risk. Index Terms Blackout, Power system reliability, Power system security, Monte Carlo methods, Risk analysis I. INTRODUCTION NOWADAYS, power systems are strongly changing all over the world, due to two main factors. First, a transition from fossil and nuclear energy sources to renewable sources (wind and photovoltaic) for electricity generation entails a transition from dispatchable generation to non-dispatchable generation. Moreover, a large penetration of renewable sources increases the variability of cross-border flows. Secondly, the liberalization of the electricity market increases also the crossborder flows due to the market rules and the search of the economic optimum by electrical producers. The impact of these changes on the reliability of the grid has been studied from several years mainly through quasi-static reliability methods: the different states (or configurations) of the system are analyzed in a static way with an Optimal Power Flow (OPF) to evaluate load shedding (complete enumeration of the states, selective enumeration, non-sequential Monte Carlo (MC) simulation or sequential MC simulation) [1], [2]. However, these methods cannot deal with the specific problems of cascading outages leading to blackouts. Previous blackouts showed that, despite their low occurrence, their contributions to mean reliability indices are very important. For example, in Italy, the Average Interruption Time (AIT) and the Energy Not Supplied (ENS) are, on average on the period 2007-2011, respectively 0.73 min/year and 460 MWh/year [3]. The targets are maximum 1.00 min/year and 550 MWh/year. As P. Henneaux, P.E. Labeau and J.C. Maun are with the École polytechnique de Bruxelles, Université Libre de Bruxelles, Brussels, Belgium. P. Henneaux is F.R.S.-FNRS Research Fellow ( Aspirant F.R.S.-FNRS ) and corresponding author (e-mail: pierre.henneaux@ulb.ac.be). a comparison, the 2003 blackout in Italy caused an AIT of several hundreds of minutes (50% of the load was re-supplied after 6.5 hours and 99% after 15 hours) and estimated ENS of 177 GWh [4]. Therefore, it is also crucial in reliability studies to study the impact of actual changes in generation on the risk of blackout. This paper aims to demonstrate how a relevant approach to blackout Probabilistic Risk Assessment (PRA) allows to conduct such studies. A special attention will be paid to the impact on the blackout risk of cross-border power flows, of the installed wind power, of power plants definitive shut-down and of power plants maintenance. The paper is organized as follows. First, Section II recalls the main mechanisms likely to lead to a blackout. Then, Section III reviews and compares different methodologies developed to analyze cascading failures. We will then apply a methodology taking into account thermal dependencies to a test case in order to study these impacts and we will compare the results obtained to a similar methodology neglecting these thermal dependencies. Section IV presents the test case and Section V analyzes the results. Finally, Section VI concludes. II. BLACKOUTS The blackout state is defined by the European Network of Transmission System Operators for Electricity as the interruption of electricity generation, transmission, distribution and consumption processes, when operation of the transmission system or a part thereof is terminated. Blackout state is always qualified as wide [5]. A large variety of mechanisms are involved in cascading outages: common mode failures and hidden failures which cause several initial outages, high static loads after power flow redistribution which can lead to additional thermal failures, no corrective actions or wrong operators actions due to lack of situational awareness, static currents or apparent impedances triggering relays, voltage/smalldisturbance/transient/frequency instabilities,... A blackout is due to a cascading failure, following the occurrence of an initiating event (e.g. line fault or loss of a power plant). However, the N 1 security rule applied by Transmission System Operators (TSOs) is a rule according to which elements remaining in operation after a fault of transmission system element must be capable of accommodating the new operational situation without violating operational security limits [5]. Therefore, only one contingency should not entail a fast collapse of the electrical grid and at least one more contingency is necessary. Obviously, a second event, independent of the first one can occur before any corrective action. But, as the mean time between two independent failures is high (some hours to some days) compared to the operators

2 Fig. 1. Event tree after an initiating event. Adapted from [6]. characteristic times (tens of minutes to some hours), the probability of such a succession of independent events is usually very low. Some blackouts can be due to multiple initiating events which the occurrence short-circuits the N 1 security rule. For example, earthquakes, storms, tower failures can be the cause of the simultaneous (or quasi-simultaneous) loss of several elements. For example, the blackout which occurred in November 2009 in Brazil and Paraguay was due to heavy rains and strong winds which caused short-circuits in power transformer, leading to the loss of the Itaipu hydroelectric power plant. We showed in [6] that additional contingencies can be due to thermal effects. Following the occurrence of a first event, the reconfiguration of the power flows in the grid can increase the temperatures of overhead lines, underground cables and transformers (with thermal time constants ranging from tens of minutes to some hours). When the temperature of a line increases, its sag also increases, possibly leading to a short circuit between the line and the vegetation. When the temperature of a cable or a transformer increases, the dielectric strength decreases, possibly leading to a dielectric breakdown. If another element undergoes a thermal failure, the thermal effect on other elements will be reinforced, possibly leading to a cascade. The most famous example of such a thermal and slow cascade is the 2003 blackout in the Northeastern area of the United States and in the Southeastern area of Canada, where about 20 high voltage overhead lines sagged low enough to enter in contact with something below the line between 3 PM and 4 PM [7]. An operator action can also trigger a collapse, as it was the case in the major system perturbation in November 2006 in Europe: based on an incorrect state estimation, a busbar coupling caused a line tripping. As explained in [6], the typical development of a blackout can then be split in three phases (two for the cascading failure itself), as illustrated in Figure 1. Following the occurrence of an initial perturbation, two possibilities arise. If this perturbation causes the simultaneous loss of several elements, the N 1 rule is short-circuited and the system can become electrically unstable. A fast collapse of the electrical grid can then start. But, in several blackouts, thanks to the N 1 rule, the grid stays electrically stable after the initiating event. A competition then starts between operators corrective actions and additional failures, either due to thermal effects or independent. This phase is called slow cascade, because it displays characteristic times between successive events ranging from tens of seconds to hours. The occurrence of additional events during this phase can trigger an electrical instability (violation of protections set points, angular instability, etc.). Then a second phase called fast cascade occurs, ruled by electrical transients, displaying characteristic times between successive events ranging from milliseconds to tens of seconds. This phase is too fast to allow operators to take corrective actions and is characterized by a rapid succession of electrical events (additional failures, protection actions, etc.) whose occurrence order and timing are driven by the power system s dynamic evolution in the course of this transient. After this fast cascade, the electrical grid reaches a stable state: a possible collapse of the power system in some zones, or a major load shedding. The last phase is then the recovery period. Fig. 2. Rate of line and generator trips during the fast cascade of the 2003 US/Canada blackout. From [7]. We should note that events occurring during the slow and the fast cascade can be very different in number and in type. During the slow cascade, only a small number of elements are lost, typically from two to about twenty. There were 4 elements lost before the fast cascade in the major disturbance which occurred in the Western Interconnection (United States) in August 10, 1996. There were 22 elements lost before the fast cascade in the 2003 US/Canada blackout. The slow cascade was very short during the 2003 Sweden/Denmark blackout: 3 elements were lost in 5 minutes, the last two simultaneously

3 (double busbar fault due to a disconnector damage). Only 2 elements were lost during the slow cascade of the blackout which occurred in September 2003 in Italy. The fast cascade was triggered by 3 events in the major system perturbation in November 2006 in Europe. On the contrary, as the overall network collapses during a fast cascade leading to a blackout, several hundreds of elements can be lost. This is illustrated in Figure 2 for the 2003 US/Canada blackout (the fast cascade began at 16:06). Elements trip mainly due to failures during the slow cascade. On the contrary, elements trip mainly because electrical variables crossed their protections setpoints without additional fault during the fast cascade. A. Introduction III. CASCADING FAILURES ANALYSIS Most of power systems reliability studies are quasi-static and are based on the analysis of situations: for each run, an initial set of contingencies is sampled and consequences analysis is based on an OPF. These studies cannot deal with cascades of events because they do not account for the constraints imposed to the grid elements in the course of a transient. Various methodologies have been developed to simulate cascading failure mechanisms. We propose in this section to review and analyze some of them. A additional review, analysis and classification can be found in [8]. B. Methodologies 1) OPA model: Carreras et al. proposed in [9] a model able to take into account outages due to overloads and the reconfiguration of the power flows in the grid after the loss of one or several elements. For different load patterns, the steady-state of the grid is computed through a DC OPF. An overloaded line has then a probability p to suffer an outage. If an outage occurs, the steady-state of the grid is recomputed through a DC OPF, etc. This model can give the probability distribution of total load shed. Failure of lines caused by heating (tree flashover) are included in OPA model explicitly in [10] through the time evolution of line temperature, of its sag and the vegetation height: a line is trip when the sag and the vegetation are such that a tree flashover occurs. 2) TRELSS model: One of the most well-known industrial reliability analysis programs is the Transmission Reliability Evaluation of Large-Scale Systems (TRELSS) software developed for the EPRI [11]. The simulation approach included in this software package was developed to deal with cascading outages. After an initiating event, the steady-state is computed by solving load-flow (LF) equations. Then, if a load bus voltage, a generator voltage or a circuit power flow is outside the limits, the corresponding component is tripped, the steadystate is re-computed, etc. If no threshold is exceeded, the procedure is restarted with the next initiating event. 3) Manchester model: An AC power blackout model was developed at the University of Manchester during the early 2000s in order to represent a range of interactions occurring in cascading failures leading to a blackout (cascade and sympathetic tripping of transmission lines, generator instability, load shedding, post-contingency redispatch of active and reactive resources, etc.) [12]. The analysis is based on a MC simulation. For each MC run, an initial disturbance is created by simulating random outages of system components. If some of the faults which are the cause of these outages induce transient instability of one or more generators, the latter are disconnected. The impact of hidden failures in the protection system, which cause intact equipment to be unnecessarily disconnected following a fault on a neighboring component (sympathetic tripping), is then simulated: the tripping or not of each component whose the vulnerability region contains the original faulted element is sampled. After restoring the generation-load balance if necessary, the new steady-state of the system is calculated by solving LF equations. In case of voltage instability, the iterations do not converge and the model assumes that operators shed specific load blocks to arrest this voltage collapse. This load shedding is repeated until convergence of the load flow iterations. The tripping of overloaded lines is then sampled. As there is a competition between operators attempts to eliminate the overload and the thermal transient, the probability that the operator is unable to eliminate the overload before the protection operates can be modeled as a function of the overload. If there is a trip, the LF equations are solved (with further load shedding if needed), etc. Once there is no more contingency, the ENS is then computed. Several extensions and variations of the Manchester model actually exist, based on two main modifications. First, if the frequency deviation is outside a specific range after the restoration of the generation-load balance, the simulation algorithm considers that the system is collapsed. Secondly, in order to take into account operators corrective actions modeling, an OPF can be used to minimize the load shedding when the steady-state of the system is calculated, instead of shed specific load blocks. Thirdly, DC LF and/or DC OPF can be used to accelerate calculations, according to the importance given to voltage stability. 4) Stochastic model: The first try to model in details the competition between operators corrective actions and additional failures seems to appear in [13]. Random line failures are modeled through constant failure rates, overloaded-line failures occur when the temperature (computed through a time-dependent equation) reaches the equilibrium temperature corresponding to a power flow equal to the line rating for reference weather conditions, line restoration is modeled (constant repair rate after a constant time delay), such as a simple model for the utility response. The same model is used to cover all the cascading failure. 5) Three-level dynamic PRA model: According to the analysis of previous blackouts and the typical blackout development, we proposed in [6] a methodology for blackout PRA based on dynamic PRA. The main idea of dynamic PRA is to describe the electrical grid not only by discrete states, but also by a set of process variables (like temperatures, currents, voltages, etc.). This allows to consider the mutual interaction between system states and process variables. The PRA is decomposed in three levels. Level-I is the assessment of the slow cascade: it starts with an initiating event and ends when the system becomes electrically unstable. In this

4 level, we have to take into account the competition between operators corrective actions and additional failures, either due to thermal effects or independent. Level-II is the assessment of the fast cascade: it starts when the system becomes electrically unstable and finishes when the system reaches an electrically stable state (blackout state or operational state with load shedding). In this level, we have to take into account the competition between protections and relays, according to the actual evolution of electrical variable, protections setpoints and possible failures. Level-III is the assessment of the restoration. Therefore, level-i reveals the vulnerability paths of an electrical grid (successive failures that could trigger an electrical instability), level-ii gives the magnitude of possible blackouts (in terms of loss of supplied power) and level-iii the consequences (in terms of ENS). We developed in [6] mainly the level-i and we proposed a simulation algorithm for the slow cascade, given in Figure 3. Initial conditions and the initiating event(s) are sampled. After the restoration of the generation-load balance if necessary, the electrical steady state is computed and the electrical stability is evaluated. If the system becomes electrically unstable, the current MC run is dangerous, the slow cascade simulation is stopped and the fast cascade simulation should be started. In the opposite, the thermal stability is then evaluated, through the competition between operators corrective actions and thermal transients. If a new failure occurs (system thermally unstable), the simulation continues with this new contingency: the electrical steady state is computed and electrical stability is assessed, etc. In the opposite, the slow cascade simulation is stopped and the MC run is labeled as safe (non-dangerous). Details about thermal failure models can be found in Appendix A. C. Discussion All models presented are probabilistic simulations, but with different levels of details and different mechanisms taken into account. The OPA model can be viewed as the simplest model where the lines loadings are computed at each step through a DC OPF. The TRELSS model considers also the tripping of overloaded lines, but the steady-state is computed at each step through a classical power flow and loads/generators whose voltage is outside limits are disconnected. The Manchester model is able to deal with a lot of phenomena occurring in cascading failures leading to blackouts. In particular, the tripping of overloaded lines can be done in a probabilistic way, in order to model the competition between operators corrective actions and the thermal transient. The aim of the dynamic PRA is to propose a general framework in order to consider in a realistic way the mechanisms of cascades. In particular, the analysis of the cascading failure itself is separated in two levels, because phenomena are very different in each of them. Indeed, only one subset 1 of mechanisms involved in blackout can be considered during the slow cascade and only one different subset 2 during the fast cascade, while 1 Common mode failures (storm, earthquake,...), hidden failures, thermal failures, operators actions. 2 Relays which trip elements when electrical variables crossed setpoints, voltage collapse, angular instability (small-disturbance and/or transient), frequency instability. Fig. 3. Simulation flowchart of dynamic PRA model - level-i. From [6]. keeping a satisfactory modeling of cascading failures. However, even if these mechanisms can be split into two different disjoint sets, several challenges must be overcome for each level in such a dynamic PRA approach. For the level-i, one of the biggest challenges is to model in a realistic way operators actions. Simple but robust criteria must be defined for the transition between level-i and level-ii. For the level- II, the main challenge is the development of a fast dynamic electrical simulation algorithm taking into account significant protections actions and their possible failures. Moreover, this approach has the disadvantage to require numerous parameters. For the level-i analysis, not only the electrical parameters are needed, but also thermal and mechanical properties. Even if electrical data are well-known and available for each real network in corresponding TSO s databases, thermal and mechanical data could be more difficult to estimate, as they are not gathered in a database. In particular, the statistical distribution of the vegetation height below overhead line must be estimated with a satisfying accuracy. A good estimation of the weather conditions endured by each line is also needed. The problem of bulk parameters estimation for methodologies using a probability that an overloaded line trips before corrective actions are carried out is also important but quite different. Indeed, such a probability is difficult to estimate because it depends on the actual corrective scheme to eliminate overloads and the competition between operators and the thermal transient, but it is a key point to estimate frequencies of dangerous scenarios. Nedic showed in [14] that the parameters used in the probability function giving the tripping probability as a function of load can significantly change the risk and the

5 ranking of scenarios. If such a function giving the tripping probability as a function of load relies on a small number of parameters 3, the latter can be estimated through statistics of observed blackouts by data fitting. This method should give a good estimation, but it needs in general data from several networks (to reach a satisfactory statistical precision) and neglects specificities of different networks. Moreover, correlations between the load, wind and solar generation and the cooling of the lines (influenced by weather conditions) can strongly influence the risk of blackout. It can be crucial to consider these correlations in order to solve specific questions, for example when a unit should be maintained to limit the risk of blackout (see further) and this cannot be done based on this probability. The biggest limitations of models based on a one-level PRA are due to the confusion between the two phases of a cascading failure leading to a blackout. Indeed, either the system is electrically stable and the operators can react to the situation, or the system is electrically unstable and the operators cannot do anything before the fast cascade ends, so both cannot be mixed. In particular, performing a consequence analysis through an OPF when the system becomes electrically unstable does not seem to reflect the behavior of past blackouts. Consequently, even if the dynamic PRA method requires more parameters than others, important amounts of computations and it is not yet a complete and finished methodology, it proposes a general framework which should allow a probabilistic assessment of dangerous cascading scenarios leading to blackout in a realistic way. In the last part of this paper, we use a simplified level-i of this modeling in order to study the impact of generation changes on the risk of blackout. A. Blackout test system IV. TEST CASE To apply the methodology proposed, we need a complete test system with electrical parameters for each machine and electrical, geometrical and thermal properties for each link. Initial states have also to be N 1 secure to respect TSO s actual requirements. To have such a blackout test system, we adapted a classical test system which is a reduced-order equivalent of the interconnected New England Test System (NETS) and New York Power System (NYPS) with three other neighboring regions, in order to have a 69-bus test system. The purpose of the adaptation is not only to have all parameters needed (electrical parameters for each machine and electrical, geometrical and thermal properties for each link), but also to have coherency between these properties (geometrical and material properties determine electrical and thermal properties). Initial states have also to be N 1 secure to respect TSO s actual requirements. Each power plant in NETS and NYPS has two identical units, except the power plant 1 which has only one unit. We also implemented two wind farms of 150 MW and seven wind farms of 200 MW. The modified network is shown in Figure 4 and is available on http://homepages.ulb.ac.be/ phenneau/. Fig. 4. Blackout Test System with two critical area. B. Weather conditions In a modern power system, weather conditions are important for two main reasons: wind generation depends on the wind speed and temperatures of overhead lines depend on the wind speed and angle, air temperature,... Parameters describing random variables for wind speeds and temperatures are computed on the basis of data provided by the KNMI. The ambient temperature is modeled by a normal random variable whose parameters depend on the moment of the season and hour. The mean ambient temperature and mean wind speed as a function of the hour for the four seasons are shown in Figures 5 and 6, respectively. To consider the correlation between production in different wind farms, the joint normal distribution method is used to sample wind speeds [15], so the rank correlations are kept. The main idea of this algorithm is to sample standard normal dependent variables and to apply a transformation to get samples distributed according to the desired marginals. The sampling of dependent wind speeds is then done in four steps: the sampling of normal independent variables, the transformation of the sample into a sample of dependent variables through the correlation matrix, the transformation of the previous sample into a sample distributed along uniform laws (through standard normal cdf), and then transformation into a sample distributed along the desired marginals (through desired inverse cdf s). To extrapolate the wind speed at wind power plant height, we use the logarithmic wind profile. 3 For example, a tripping probability equal to p 1 when the current is less than its nominal value and equal to p 2 in the opposite, or with an exponential increase for currents greater than the nominal value,... Fig. 5. Mean ambient temperature.

6 TABLE I CROSS-BORDER POWER FLOWS Power flow (MW) Mean Min Max Std NETS NYPS 140-140 525 117 NYPS 3-13 -150 182 69 NYPS 5 57-105 360 110 3 4 229 117 382 51 4 5 93-13 180 31 TABLE II MOST FREQUENT DANGEROUS SCENARIOS (BASE CASE) Fig. 6. Mean wind speed. C. Main modeling assumptions A unique vegetation height is used for all lines and all MC runs, such that the probability of having a short circuit with the ground in all normal situations (no contingency) is 10 5. Only voltage and frequency instabilities are considered for the electrical instability of the system. We consider the system thermally stable if there is no new contingency during 60 minutes. V. RESULTS Results obtained with the dynamic PRA level-i method are each time compared to a so-called independent method, based on the same simulation scheme, but neglecting thermal effects. This means neglecting the impact that a failure has on the failure rates of other elements (failures are sampled independently on the previous events on the basis of average failure rates). Results are expressed in terms of frequency of dangerous scenarios. As previously stated, a so-called dangerous scenario is a scenario leading to an electrical instability and thus possibly leading to a major system disturbance or a blackout, depending on the fast cascade. The frequency is the inverse of the expected time between two dangerous scenarios: if the frequency is 10 2 /year, a dangerous scenario is expected every 100 years in average. If scenarios are grouped according to the sequence of occurring events, several different situations (different timings, different load/generation patterns, different climatic conditions) are behind each sequence. A large variety of loss of power supplied could be revealed by a level-ii analysis for the same sequence of events identified during the level-i analysis, depending on the load/generation pattern. Moreover, a level-i scenario whose frequency is lower than another one could induce a bigger loss of supplied power and thus a greater risk. Consequently, level-i results give a first indication of vulnerability paths, but the level-ii analysis is important to estimate the risk induces by each vulnerability path. The computing time required per MC run is approximately 4.7 ms for this network. Since we use an analog MC simulation to estimate rare events, a huge amount of MC runs is necessarily to reach a satisfactory precision. This poor efficiency is the current limiting factor for an application to real-size grids. To improve efficiency, special biaising techniques to Initiating Event Event Event Level-I PRA Ind. method event 1 2 3 freq. (/yr) freq. (/yr) 58-59 57-58 62-65 65-66 2.2E-03 <1E-05 65-66 57-58 59-60 2.3E-04 <1E-05 58-59 57-58 65-66 1.8E-04 <1E-05 59-60 57-58 62-65 65-66 1.5E-04 <1E-05 38-46 46-49 1.1E-04 1.4E-04 23-24 68-24 1.4E-04 1.4E-04 49-18 46-69 1.2E-04 1.3E-04 46-69 49-18 1.1E-04 1.3E-04 force the occurrence of cascading failures should be developed and applied, as explained in [16]. This work is under progress. A. Base case We give in this Subsection detailed results obtained by applying our methodology to the base case (100 million MC runs), in order to identify the vulnerabilities of the blackout test system. Similar results were also presented in [6], [17], but with a slightly different vegetation height. Mean cross-border power flows between the five areas in normal conditions are given in Table I. The variability in load and wind generation induces variability in power flows. 1) Dangerous scenarios: The estimation of the total frequency of dangerous scenarios is 8.1 10 3 /year (with a good precision: the standard deviation is given by σ = 1.2%) by the dynamic PRA level-i method and 4.6 10 3 /year (σ = 1.7%) by the independent method. Table II lists dangerous scenarios whose events correspond to the trip of the line i j and compares their frequencies ranked according to both dynamic PRA level-i estimation and the independent estimation (σ is between 2 and 10% for dynamic PRA level-i). The distribution of lines outages for the dynamic PRA level-i method is shown in Figure 7. All cascading failures have at least 2 events during the slow cascade (N 1 security rule). Most cascades comprise 2 to 4 events. It can be surprising to have short cascades of few events. However, the aim of the level-i analysis is to model the slow cascade: a level- I scenario indicates how the system can reach an electrical instable state. These numbers of events are in concordance with those observed in recorded slow cascades (see Section II). 2) Criticality of links: The Vesely-Fussel factor of diagnosis VF i is the probability that element i is failed, knowing that the scenario is dangerous. Table III compares Vesely-Fussel

7 Fig. 7. Distribution of lines outages during dynamic PRA level-i cascades. TABLE III VESELY-FUSSEL FACTORS Link VF i (%) Independent VF i (%) 57-58 42.5 <1.0 65-66 42.3 <1.0 58-59 37.0 4.5 62-65 34.5 <1.0 59-60 11.8 4.6 TABLE IV LAMBERT FACTORS Link C i (%) Independent C i (%) 65-66 32.3 <1.0 59-60 4.5 1.9 17-43 4.1 6.6 factors of links, ranked according to dynamic PRA level-i estimation. The Lambert factor of critical importance C i is the probability that element i causes the electrical instability, knowing that the scenario is dangerous. Table IV compares Lambert factors of links, ranked according to dynamic PRA level-i estimation. 3) Importance of weather conditions: The total frequency of dangerous scenarios as a function of the ambient temperature and as a function of the mean wind speed are represented in Figures 8 and 9, respectively. Fig. 8. Total frequency of dangerous scenarios as a function of ambient the temperature. 4) Discussion: Results given either by a dynamic PRA level-i simulation or an independent simulation are very dif- Fig. 9. Total frequency of dangerous scenarios as a function of the mean wind speed. TABLE V SEASONS IMPORTANCE Season Level-I PRA Ind. method Mean load freq. (/year) freq. (/year) (% of annual peak) Winter 5.8E-03 5.1E-03 67.11 Spring 5.2E-02 4.4E-03 55.52 Summer 1.6E-02 4.4E-03 63.70 Fall 5.0E-03 4.2E-03 56.35 ferent. Indeed, the impact of thermal effects influences not only the value of the frequency of most dangerous scenarios but also their ranking and the critical lines. In particular, the dynamic PRA level-i simulation reveals a critical area, located approximately between buses 2, 3 and 56, as shown in Figure 4. The importance of thermal effects is higher in summer, when the temperature is high and the mean wind speed low, even if the load is less than in winter, as Table V shows it. B. Cross-border power flows The total frequency of dangerous scenarios for different mean cross-border power flows from NETS to NYPS is shown in Figure 10 (40 millions MC runs for each point, which means that σ is less than 2% for each point). Cross-border power flows are modified by decreasing the generation in one zone (the power of each unit is decreased proportionally to its peak power in the base case) and increasing it in the other (in the same way) by the same amount. The independent method does not reveal a sensitivity of frequency to cross-border power flows. In the opposite, the dynamic PRA level-i method reveals an important sensitivity when the power flow goes from NETS to NYPS, but not when the power flow goes from NYPS to NETS. This can be explained by two factors. First, the superposition between local power flows and cross-border power flows induces a non-symmetrical behavior for crossborder power flows: a flow from NYPS to NETS is opposed to local flows and, consequently, it decreases the frequency of dangerous cascading scenarios in this region when the flow from NYPS to NETS increases (until the two are equal and opposite). On the contrary, when the mean power flow from NETS to NYPS grows, the frequency of dangerous scenarios in the critical region revealed by the base case increases, as shown in Table VI. Secondly, a second critical area appears between buses 30, 33 and 36 (NYPS), as the power flow from

8 TABLE VI MOST FREQUENT DANGEROUS SCENARIOS FOR A MEAN POWER FLOW FROM NETS TO NYPS OF 398 MW E0 E1 E2 E3 Freq. (/yr) 58-59 57-58 62-65 65-66 5.2E-03 58-59 57-58 65-66 8.1E-04 59-60 57-58 62-65 65-66 7.7E-04 65-66 57-58 59-60 7.6E-04 TABLE VII MOST FREQUENT DANGEROUS SCENARIOS FOR A MEAN POWER FLOW FROM NYPS TO NETS OF 195 MW E0 E1 E2 E3 E5 Freq. (/yr) 58-59 57-58 62-65 65-66 6.5E-04 32-33 32-33 30-32 61-36 61-36 6.2E-04 NYPS to NETS increases, leading to the emergence of new dangerous scenarios, as shown in Table VII and in Figure 4. As the two effects balance each other, the total frequency of dangerous scenarios is nearly constant. Fig. 11. Influence of total installed wind power on the total frequency of dangerous scenarios. the shutdown of this power plant decreases the production and consequently decreases the thermal cascading failure. This is confirmed by Table VIII: most frequent dangerous scenarios which were in the critical area in the base case disappear from the top of the ranking. On the contrary, the shutdown of power plants 5, 7, 8, 10 and 12 increases significantly the frequency when the lost power is compensated by the NETS. Indeed, the frequency of dangerous scenarios in the critical area increases in these cases, as shown in Table IX. To summarize the output of the Figure 12, we can say that, in the dynamic level-i approach, situations increasing the congestion in the critical area lead to a significant rise of the total frequency of dangerous scenarios, and vice-versa. In the independent approach, variations can be significant when the lost power is compensated only by one area, but does not have the same behavior as in the dynamic level-i approach. Fig. 10. Influence of cross-border power flows on the total frequency of dangerous scenarios. C. Installed wind power The total frequency of dangerous scenarios for different installed wind powers is shown in Figure 11 (40 million MC runs for each point). Above 2000 MW, frequencies of dangerous scenarios slightly increase. The absolute variation rate is nearly the same for both methods. These increases are not due to specific scenarios, but are due to a global increase of all scenarios. D. Definitive shut-down We study here a definitive shutdown of all units of the same power plant. The lost power can be compensated either in the same area, or by the other developed area (NETS or NYPS), or by all areas. Figures 12 and 13 give the relative augmentation of the total frequency of dangerous scenarios (40 million MC runs for each power plant). Surprisingly, in the dynamic PRA level-i approach, the shutdown of the power plant 2 decreases this frequency. It can be explained by the congestion revealed by the base case in the critical area between buses 2, 3 and 56: Fig. 12. Impact of the definitive shutdown of some power plants in the NETS (2,5,7,8) and in the NYPS (10,12) on the total frequency of dangerous scenarios - Dynamic PRA level-i. E. Maintenance This Subsection studies the impact of power plants maintenance on the risk of blackout. Figures 14 and 15 give the relative augmentation of the total frequency of dangerous scenarios for a 4-week maintenance of one unit in each NETS power plant, according to the season in which the power plant is maintained (40 million MC runs for each power plant). The lost power is compensated in the same area. In the dynamic

9 Fig. 13. Impact of the definitive shutdown of some power plants in the NETS (2,5,7,8) and in the NYPS (10,12) on the total frequency of dangerous scenarios - Independent method. Fig. 14. Impact of a four-week maintenance of one unit for each NETS power plant on the total frequency of dangerous scenarios - Dynamic PRA level-i. TABLE VIII MOST FREQUENT DANGEROUS SCENARIOS AFTER THE SHUTDOWN OF THE POWER PLANT 2 (NETS) WHEN THE LOST POWER IS COMPENSATED BY ALL AREAS. E0 E1 Freq. (/yr) 23-24 68-24 1.6E-04 28-29 26-28 1.4E-04 39-44 39-45 1.4E-04 48-40 41-40 1.4E-04 53-52 37-52 1.4E-04 TABLE IX MOST FREQUENT DANGEROUS SCENARIOS AFTER THE SHUTDOWN OF THE POWER PLANT 12 (NYPS) WHEN THE LOST POWER IS COMPENSATED BY THE NETS. E0 E1 E2 E3 Freq. (/yr) 58-59 57-58 62-65 65-66 6.1E-03 59-60 57-58 62-65 65-66 1.1E-03 58-59 57-58 65-66 1.1E-03 65-66 57-58 59-60 7.7E-04 PRA level-i approach, for the majority of them, a maintenance in the summer increases significantly the frequency of dangerous scenarios and a maintenance in another season increases it slightly. However, the maintenance of a power plant 2 or 3 unit decreases this frequency. As previously, it can be explained by the congestion in the critical area between buses 2, 3 and 56: the maintenance decreases the production and consequently decreases the thermal cascading failure. In the independent approach, variations are not significant. VI. CONCLUSIONS Several methodologies have been developed to simulate cascading failure mechanisms, but they use in general an unique model for the two phases which occurred in a typical cascading outage leading to a blackout. A two-level dynamic PRA method for the cascade itself is proposed as a general framework in order to consider the mechanisms of cascades, accounting for these two different phases. However, several challenges are still to be solved in such an approach. We applied in this paper a simplified dynamic PRA level-i method- Fig. 15. Impact of a four-week maintenance of one unit for each NETS power plan on the total frequency of dangerous scenarios - Independent method. ology to a test system in order to study the impact of several changes in the electric grid, such as variation of cross-border power flows, wind generation penetration, maintenance and shut-down of power plants. Even if the modeling adopted for the vegetation height does not reflect perfectly the reality, we showed that thermal effects can play an important role in cascading failure, during the first phase (the slow cascade). It was well-known that only dependencies between events could explain statistical properties of blackouts which occurred regularly, and we showed in this paper an approach to simulating and estimating the impact of dependent thermal failures. A comparison with a methodology neglecting these thermal failures showed that not only the overall frequency of level-i dangerous scenarios is not the same if thermal effects are taken into account or not, but also their ranking according to their respective frequencies. Moreover, conclusions on the impact of changes in the electric grid on the risk of blackout can be strongly different according to an independent analysis or to a dynamic PRA level-i analysis. APPENDIX A THERMAL MODELS FOR DYNAMIC PRA LEVEL-I This Appendix briefly explains thermal models used in dynamic level-i PRA. More details can be found in [6].

10 A. Thermal failure models 1) Overhead lines: If S(T) is the sag of the line at temperaturet,h L is the height of the suspension point,h V is the height of vegetation, E D is the breakdown electric field of ambient air and V is the phase-to-ground voltage of the line, the trip of the line occurs when S(T) = H L H V V/E D. 2) Underground cables: The main failure mode for cables is dielectric breakdown. The dielectric strength reduces temporarily when temperature increases for extruded isolation cables. For slow temperature transients, the failure rate at time t and temperature T can be derived as λ(t, T) = λ ref [τ(t)] ( E 0(T ref ) E 0(T) ) b dτ dt (t,t), where λ ref[τ(t)] is the failure rate at the reference temperature T ref and the age τ(t), b is a constant depending on material and cable dimensions, and E 0 is the stress leading to a nominal breakdown probability. 3) Power transformers: The dielectric strength reduces temporarily when temperature increases due to thermally induced gas bubbles. If dielectric stress and transformer withstand voltage are normal random variables, respectively V S N(µ S,σ S ) and V R N(µ R (T),σ R ), the dielectric failure rate of an overloaded transformer can be calculated 1 φ[ µr ] (T) µ S σ by λ d (T) = λ d (T ref ) 2 S +σ2 R ] where T is the µr (T 1 φ[ ref ) µ S σ 2 S +σ2 R hottest spot temperature, T ref is the hottest spot temperature in reference conditions, and φ(x) is the normal cumulative distribution function. B. Temperature evolution 1) Overhead lines: The evolution of a line temperature is ruled by a heat balance depending on convective and radiation heat losses, heat gain from sun and Joule losses: q c + q r + dt mc p dt = q s +I 2 r(t), where q c is the convective heat loss rate,q r the radiation heat loss rate,mc p the total heat capacity of conductor, q s the heat gain rate from sun, r(t) the AC resistance of conductor at temperature T, and I the current. 2) Underground cables: A two-layer thermal model with a mean dielectric temperature leading to a set of two first-order differential equations depending on thermal capacities, thermal resistances and Joule losses is used. 3) Power transformers: The hottest spot temperature is the sum of the ambient temperature, the top-oil temperature rise over ambient temperature and the hottest-spot conductor rise over top oil temperature. The ultimate steady-state topoil temperature rise T u, over ambient temperature T a is ( computed as T u, = T K ) 2 +1 n, u,rl R+1 where Tu,rl is the transformer top oil temperature rise over ambient temperature at rated load, K the ratio of load to rated load, R the ratio of loss at rated load to no-load loss, and n the exponential power of loss versus top-oil temperature rise. The evolution of top-oil temperature rise T u (t) is ruled by τ u T u dt (t) = T u, T u (t) where τ u is the oil s thermal time constant. The hottest spot temperature rise above oil temperature rise T h is given by T h = T h,rl K 2m, where T h,rl is the hottest-spot conductor rise over top oil temperature at rated load and m the exponential power of winding loss versus winding gradient. REFERENCES [1] R. Billinton and R. N. Allan, Reliability evaluation of power systems. Plenum Press, 1996. [2] R. Billinton and D. Huang, Incorporating wind power in generating capacity reliability evaluation using different models, IEEE Transactions on Power Systems, vol. 26, no. 4, pp. 2509 2517, 2011. [3] Terna, Annual reports 2007-2011, http://www.terna.it/. [4] Investigation Committee of UCTE, Final report of the investigation committee on the 28 September 2003 blackout in Italy, UCTE, Tech. Rep., 2004. [5] ENTSO-E, Draft Network Code for Operational Security, 2012. [6] P. Henneaux, P.-E. Labeau, and J.-C. Maun, A level-1 probabilistic risk assessment to blackout hazard in transmission power systems, Reliability Engineering & System Safety, vol. 102, no. 0, pp. 41 52, 2012. [7] U.S.-Canada Power System Outage Task Force, Final report on the August 14, 2003 blackout in the United States and Canada, Tech. Rep., 2004. [8] M. Vaiman, K. Bell, Y. Chen, B. Chowdhury, I. Dobson, P. Hines, M. Papic, S. Miller, and P. Zhang, Risk assessment of cascading outages: Methodologies and challenges, IEEE Transactions on Power Systems, vol. 27, no. 2, pp. 631 641, 2012. [9] B. Carreras, V. Lynch, I. Dobson, and D. Newman, Critical points and transitions in an electric power transmission model for cascading failure blackouts, Chaos, vol. 12, no. 4, pp. 985 994, 2002. [10] J. Qi, S. Mei, and F. Liu, Blackout model considering slow process, Power Systems, IEEE Transactions on, vol. PP, no. 99, pp. 1 1, 2013. [11] EPRI, TRELSS Application Manual: For Cascading Failure, Reliability and Deterministic Analysis, 2003, Palo Alto, CA, 1002637. [12] M. A. Rios, D. S. Kirschen, D. Jayaweera, D. P. Nedic, and R. N. Allan, Value of security: Modeling time-dependent phenomena and weather conditions, IEEE Transactions on Power Systems, vol. 17, no. 3, pp. 543 548, 2002. [13] M. Anghel, K. Werley, and A. Motter, Stochastic model for power grid dynamics, in Proceedings of the 40th Annual Hawaii International Conference on System Sciences (HICSS 07). Washington, DC, USA: IEEE Computer Society, 2007, p. 113. [14] D. P. Nedic, Simulation of large system disturbances, Ph.D. dissertation, University of Manchester Institute for Science and Technology, 2003. [15] G. Papaefthymiou, Integration of stochastic generation in power systems, Ph.D. dissertation, Technische Universiteit Delft, 2007. [16] P. Labeau and E. Zio, Procedures of monte carlo transport simulation for applications in system engineering, Reliability Engineering & System Safety, vol. 77, no. 3, pp. 217 228, 2002. [17] P. Henneaux, P.-E. Labeau, and J.-C. Maun, Blackout pra based identification of critical initial conditions and contingencies, in 12th International Conference on Probabilistic Methods Applied to Power Systems (PMAPS 2012), Istanbul, Turkey, Jun. 2012, pp. 50 54. Pierre Henneaux received his M.Sc. degree in Physics Engineering in 2009 from the Université Libre de Bruxelles (ULB). He is pursuing his Ph.D. degree in ULB as F.R.S-FNRS Research Fellow. His research fields are the reliability of electrical grids and blackout risk analysis. Pierre-Etienne Labeau was graduated as Physics Engineer in 1992 and obtained his PhD in Applied Sciences in 1996, both from ULB. He is currently professor in reliability engineering and nuclear engineering. His research interests are PRA methodology, and system reliability and maintenance modeling. Jean-Claude Maun received the M.Sc. in Mechanical and Electrical Engineering in 1976 and PhD degree in Applied Sciences in 1981, both from ULB. He is currently full professor in electrical engineering. His research includes all aspects of power system protection, power quality monitoring and event analysis, as well as the dynamics and control of synchronous machines.