Cause Analysis. Focusing on Misoperations. Ed Ruck, Rick Hackman WECC Misoperations Workshop June 5-6, 2018

Similar documents
February 2011 Cold Snap. January 23, 2011 y, Earl Shockley, Director of Reliability Risk Management

Electromotive Force. The electromotive force (emf), ε, of a battery is the maximum possible voltage that the battery can provide between its terminals


Section 5. TADS Data Reporting Instruction Manual DRAFT Section 5 and Associated Appendices With Proposed Event Type Numbers

Failure Analysis of Medium Voltage Capacitor Banks: The Egyptian Experience

Chapter 28. Direct Current Circuits

PUB NLH 185 Island Interconnected System Supply Issues and Power Outages Page 1 of 9

Instruction. Vacuum Circuit Breaker Operator Module. Type 3AH 4.16kV to 38kV. Power Transmission & Distribution

Chapter 18. Direct Current Circuits -II

Causal & Frequency Analysis

Circuit Analysis and Ohm s Law

1. Explain the various methods of methods of grounding. In power system, grounding or earthing means connecting frame of electrical equipment (non-cur

Understanding ph Troubleshooting and Diagnostic information

Cascading Outages in Power Systems. Rui Yao

Winter Preparation for Severe Weather Events

Module No. # 03 Lecture No. # 11 Probabilistic risk analysis

Lab 4. Series and Parallel Resistors

Lamp and Control Panel (Lamp Panel)

An Introduction to Insulation Resistance Testing

24 volts (0.25 amps current-limited)

Lab 3. Ohm s Law. Goals. Introduction

Mitigation of Distributed Generation Impact on Protective Devices in a Distribution Network by Superconducting Fault Current Limiter *

J.H.E.F Space Station. By: Hannah Power, Jersey Smith, Felicity Stead and Emily Power

Exam 3--PHYS 102--S14

Chapter 28: DC and RC Circuits Kirchhoff s Rules

Business Continuity Planning (BCP)

12 Moderator And Moderator System

Reactive Power Solutions

Periodic Inspection and Testing of Electrical Installations Sample Test ( ) Version 1.3 July 2018

Lab 6. Current Balance

R E A D : E S S E N T I A L S C R U M : A P R A C T I C A L G U I D E T O T H E M O S T P O P U L A R A G I L E P R O C E S S. C H.

A brief overview of the new order in the Universe

Reliability of Technical Systems

Digital logic signals

Digital logic signals

HACCP Concept. Overview of HACCP Principles. HACCP Principles. Preliminary Tasks Development of the HACCP Plan

IS YOUR BUSINESS PREPARED FOR A POWER OUTAGE?

6. The August 14 Blackout Compared With Previous Major North American Outages

Risk Analysis of Highly-integrated Systems

Overview of HACCP Principles. OFFICE OF THE TEXAS STATE CHEMIST Texas Feed and Fertilizer Control Service Agriculture Analytical Service

Lesson Plan: Electric Circuits (~130 minutes) Concepts

NEW YORK CENTRAL SYSTEM HOT BOX DETECTORS JUNE 1, 1959

An Introduction to Electricity and Circuits

Snow Cover. Snow Plowing and Removal Services Best Practices Guide. l

County of Cortland HAZARD COMUNICATION POLICY

Lab 5 RC Circuits. What You Need To Know: Physics 212 Lab

Effects of Capacitor Bank Installation in a Medium Voltage (MV) Substation

AIR CONDITIONER (SPLIT TYPE) SERVICE MANUAL

EAS 535 Laboratory Exercise Weather Station Setup and Verification

Considerations in Choosing Directional Polarizing Methods for Ground Overcurrent Elements in Line Protection Applications

Winter Preparation for Severe Weather Events

Circuits. PHY2054: Chapter 18 1

SC125MS. Data Sheet and Instruction Manual. ! Warning! Salem Controls Inc. Stepper Motor Driver. Last Updated 12/14/2004

CIRCUITS: Series & Parallel

For the electronic measurement of current: DC, AC, pulsed..., with galvanic separation between the primary and the secondary circuit.

Electromagnetic Induction

Suppressor film capacitor Highlight and Application in Series with the main

Lake Lure Community Meeting with Duke Energy

PHY 1214 General Physics II

Physics 2020 Lab 5 Intro to Circuits

OFFSHORE. Advanced Weather Technology

Failures in Process Industries

Product Manual (Revision D) Original Instructions. Speed Switch and Integrated Speed Switch. Installation and Operation Manual

Experiment 5: Measurements Magnetic Fields

A Perspective on On-line Partial Discharge Testing of Stator Windings. Greg Stone Iris Power Qualitrol

WCAP-AS5H Aluminum Electrolytic Capacitors

CHAPTER 4 DC Electricity

Electric Current Unlike static electricity, electric current is a continuous flow of charged particles (electricity). For current to flow, there must

KULLEĠĠ SAN BENEDITTU Secondary School, Kirkop

Reducing Wildfire Risk through the Use of Advanced Electrical Waveform Monitoring and Analytics

EXEMPLAR NATIONAL CERTIFICATE (VOCATIONAL) ELECTRICAL PRINCIPLES AND PRACTICE NQF LEVEL 3 ( ) (X-Paper) 09:00 12:00

ELE 491 Senior Design Project Proposal

CHAPTER 2 OVERVOLTAGE DUE TO SELF-EXCITATION AND INRUSH CURRENT DUE TO CAPACITOR SWITCHING

Wednesday 21 June 2017 Morning

Please bring the task to your first physics lesson and hand it to the teacher.

Request Ensure that this Instruction Manual is delivered to the end users and the maintenance manager.

DOWNLOAD PDF AC CIRCUIT ANALYSIS PROBLEMS AND SOLUTIONS

Hazard Communication Program

GENERATOR INTERCONNECTION APPLICATION

S-882Z Series ULTRA-LOW VOLTAGE OPERATION CHARGE PUMP IC FOR STEP-UP DC-DC CONVERTER STARTUP. Rev.1.2_00. Features. Applications.

Practical Aspects of MIL-DTL Appendix A Initiator Testing. Barry Neyer NDIA Fuze Conference May 14 th 2008

Feel like an astronaut!

Saturday Science Lesson Plan Fall 2008

Capacitance. Resources and methods for learning about these subjects (list a few here, in preparation for your research):

Chapter 20 Electric Circuits

Switch + R. ower upply. Voltmete. Capacitor. Goals. Introduction

DISCUSSION CENSORED VISION. Bruce Le Catt

Procedure To Be Adopted When Working on High Voltage Mechanically Switched Capacitors with Damping Network (MSCDN)

Hazard Communications

POWER FACTOR IN THE DIGITAL AGE A N E N V I R O N M E N T A L P O T E N T I A L S W H I T E P A P E R. Power Quality For The Digital Age

Performance Characteristics of Deterministic Leak Detection Systems

Chapter 15. General Probability Rules /42

ELECTRIC CURRENTS D R M A R T A S T A S I A K D E P A R T M E N T O F C Y T O B I O L O G Y A N D P R O T E O M I C S

Engineering Science. Advanced Higher. Finalised Marking Instructions

Electrical measurements:

Chemical Health and Safety General Program

Chemistry: Introduction to Chemistry

Chemical Inventory. Each area must maintain a complete, accurate and up to date chemical inventory. The inventory should include: All Chemicals

University of Maryland Department of Physics

Lab 2. Projectile Motion

Transcription:

Cause Analysis Focusing on Misoperations Ed Ruck, Rick Hackman WECC Misoperations Workshop June 5-6, 208

Why conduct a Root Cause Analysis 2

What is a Misoperation Informal definition System does not activate when it should o Example: A line experiences a phase to ground fault halfway between the two terminals, but the relay does not detect the fault. System takes action when it should not o Example: Buckhead-Chatooga trips high-speed for a fault on the Chatooga- Peachtree line. Can be associated with Protection System Control System Remedial Action Scheme (RAS) Not limited to what is reportable through NERC tools such as MIDAS 3

Selection of methods (cont d) METHOD WHEN TO USE ADVANTAGES DISADVANTAGES REMARKS Task Analysis Events and Causal Factor Analysis Change Analysis Use whenever the problem appears to be the result of steps taken in a task (just about all the time) Use for multi-faceted problems with long or complex causal factor chain Use when cause is obscure. Especially useful in evaluating equipment failures Shows the steps which should have been taken. Provides visual display of analysis process. Identifies probable contributors to condition. Simple 6-step process Requires personnel and (possibly) equipment time to be performed correctly and completely Time-consuming and requires familiarity with process to be effective. Limited value because of the danger of accepting wrong obvious answer. Should be conducted as both a Cognitive Task Analysis (what was the person thinking while conducting the task) and a Contextual Task Analysis (what was going on while the task was being done). Requires a broad perspective of the event to identify unrelated problems. Helps to identify where deviations occurred from acceptable methods. A singular problem technique that can be used in support of a larger investigation. All root causes may not be identified. Barrier Analysis Used to identify barrier and equipment failures, and procedural or administrative problems. Provides systematic approach. Requires familiarity with process to be effective. This process is based on the MORT Hazard/Target concept 4

Selection of methods (cont d) METHOD WHEN TO USE ADVANTAGES DISADVANTAGES REMARKS Used when there is a shortage of experts to Can be used with limited If this process fails to ask the right questions prior training. Provides a May only identify area of identify problem areas, and whenever the list of questions for cause, not specific seek additional help or problem is a recurring specific control and causes. use cause-and-effect one. Helpful in solving management factors. analysis. programmatic problems. MORT/Mini-MORT Human Performance Evaluations (HPE) Kepner-Tregoe Use whenever people have been identified as being involved in the problem cause. Use for major concerns where all aspects need thorough analysis Thorough analysis Highly structured approach focuses on all aspects of the occurrence and problem resolution. None if process is closely followed. More comprehensive than may be needed Requires HPE training. Requires Kepner-Tregoe training. Fault Tree Analysis Normally used for equipment-related problems Provides a visual display of causal relationships, Does not work well when human actions are inserted as a cause Uses Boolean algebra symbology to show how the causes may combine for an effect Cause and Effect Charting (e.g., Reality Charting ) Useful for any type of problem. Visual display showing cause sequence. Provides a direct approach to reach causes of primary effect(s). May be used with barrier/change analysis. Focus is on best solution generation. May not provide entire background to understand a complex problem. Requires experience/knowledge to ask all the right questions. Requires knowledge of the Apollo Root Cause Analysis techniques. Apollo RealityCharting software may be used as a tool to aid problem resolution. 5

Something happens Brief Description of Event: At 0645 a initiating fault on the 500 kv B-G # line resulted in: The 500kV B-G # line locking out at both ends The 500 kv C-B # and #2 lines being open ended Tripping of Generation C Units, 2, 3 and 4 on overspeed Note: The B-G # line was patrolled to investigate and determine the cause of the initiating fault and no cause found. The line was successfully returned to service. Identify contributing causes of the event to the extent known. Identify any Protection System misoperations to the extent known. Identify any GADS, DADS, TADS, or Protection System misoperations reports that will be submitted. An initiating fault on the 500 kv B G # line resulted in potential misoperation relay openings of CB #5 and CB #6 breakers at Station C. Station C CB #5 and #6 relays have been taken out of service and are being tested to confirm whether or not they mis-operated. Potential Relay Misoperation of CB #5 and CB #6 breakers at Station C. GADS and TADS will be reported. Any misoperations identified will be reported. Narrative A single line to ground fault on 500kV line BG resulted in the line locking out after an unsuccessful reclose attempt at the station B end. Coincident with the initial trip, Station C breaker #5 tripped. Coincident with the reclose at Station B, the Station C #6 breaker tripped. Upon losing both lines from Generating Station C, all four units tripped on overspeed. 6. If a one-line diagram is included, please provide an explanation. Single lines showing Stations C, B and G.

Initial line-up Generating Station C 5 7 Switching Station B 9 Switching Station G 2 3 2 3 6 8 0 2 4 4 7

06:45:00.000 Initial Fault Generating Station C Switching Station B 5 7 9 Switching Station G 2 3 2 3 6 8 0 Fault 2 4 4 8

What should happen next? If fault is temporary and clears:? If fault is permanent (does not clear): 9

06:45:00.000 Initial Fault Generating Station C Switching Station B 5 7 9 Switching Station G 2 3 2 3 6 8 0 Fault 2 4 4 0

06:45:00.005 Is this what you expected? Generating Station C 5 7 Switching Station B 9 Switching Station G 2 3 2 3 6 8 0 Fault 2 4 4

06:45:00.300 Testing fault 2 3 Generating Station C 2 3 5 7 6 Switching Station B 8 9 Reclose 0 Fault Switching Station G 2 4 4 2

06:45:00.305 Fault continues Generating Station C 5 7 Switching Station B 9 Switching Station G 2 3 2 3 6 8 0 Fault 2 4 4 3

06:45:00.305 Is this what you expected? Generating Station C 5 7 Switching Station B 9 Switching Station G 2 3 2 3 6 8 0 Fault 2 4 4 4

So what happened?. Initial fault occurred 2. CB5 tripped 3. CB6 tripped 4. Loss of 4 generators 5

Problem Statement What: When: Where: Loss of generation 0/0/207 at 06:45 Generating Plant C; ABC Utilities Significance: Loss of 2000MW of generation 6

How to visualize what is known 7

Where do you go next? 8

What are possible causes of these CB trips? 9 Breaker problems or actions Operator action from EMS Operator action locally Mechanism failure and release Animal intrusion Trip coil malfunction Improper latching Mechanical agitation of breaker cabinet Relay problems or actions Proper relay operation Correct Relay action due to faulted condition on bus, transformer, or line Incorrect relay operation Faulty relay Incorrect relay signal (no system condition exists); induced signal Relay settings issue Wiring issues Mechanical agitation of EM Relays Protections System problems or actions Relay settings issue (over-reach scenario) Brainstorming

SF6 Breaker Failure Modes & Mechanisms 20

What could you do to prove or disprove any of these potential cause(s)? Breaker problems or actions Relay problems or actions Protections System problems Brainstorming 2

What was learned? After initial review of the events it was determined the C end of the 500kV # and #2 line ground instantaneous relays mis-operated for the fault on the 500kV # between B and G. More extensive analysis revealed the cause of the SLYP/ SLCN mis-operations was a setting issue. A review of history shows the relay settings were correct when established, but over time, with system changes, had not been kept correct for the changing system conditions. 22

So what happened and why?. Initial fault occurred Unknown, after patrol of line with nothing found???? 2.CB5 tripped Misoperation of Instantaneous Ground-overcurrent, due to incorrect settings Settings were not updated as changes to system occurred 3.CB6 tripped Misoperation of Instantaneous Ground-overcurrent, due to incorrect settings Settings were not updated as changes to system occurred 4.Loss of 4 generators Overspeed trip (loss of outlet path) as designed 23

Visual representation of final results 24

final results drill down 25

Food for thought Why should we use Root Cause Analysis? Why not use Apparent Cause Analysis like the Why Staircase? What is the goal of a Apparent Cause Analysis? What is the goal of a Root Cause Analysis? What prevents your company from performing Root Cause Analysis? 26

DINNER! 27

Apparent Cause Analysis (ACA) Example of a Why Staircase for a Loss of cooling water Lost cooling water to equipment Why? Why? Pump Bearing Failed Why? Inadequate lubrication Why? Not enough oil in Reservoir Why? Operator did not replenish oil to the correct level Sight glass installed upside down Why? Why? Mechanic put in a new sight glass during overhaul - without a drawing in the work package Document Control Back Log 28

Apparent Cause Analysis (ACA) Example of a Why Staircase for a leak that occurred during International Space Station (ISS) EVA 22 Why? Water observed in helmet at end of EVA 22 Why? Water from drink bag leaked fix Bite valve not reclosing correctly Replace Bite Valve and water bag Solution? 29

ISS EVA 23 Information Another (larger) water leak occurred -.5 liters of water leaked in helmet during EVA 23 Could have been a fatal accident Inorganic materials causing blockage of the drum holes in the space suit water separator, resulting in water spilling into the vent loop 30

ISS EVA 23 Root Causes for Water Leak Program emphasis was to maximize crew time on orbit for utilization. The strong emphasis on utilization was leading team members to feel that requesting on-orbit time for anything non-science related was likely to be denied and therefore tended to assume their next course of action could not include on-orbit time. ISS Community perception was that drink bags leak. When presented with the suggestion that the crew member s drink bag leaked out the large amount of water that was found in EV2 s helmet after EVA 22, no one in the EVA community challenged this determination and investigated further. Flight Control Team s perception of the anomaly report process as being resource intensive made them reluctant to invoke it. Several ground team members were concerned that if the assumed drink bag anomaly experienced at the end of EVA 22 were to be investigated further, it would likely lead to a long, intensive process that would interfere with necessary work needed to prepare for the upcoming EVA 23, and that this issue would likely not uncover anything significant enough to justify the resources which would have to be spent. 3

ISS EVA 23 Root Causes Cont. No one applied knowledge of the physics of water behavior in zero-g to water coming from the PLSS vent loop. The ground teams did not properly understand how the physics of water behavior inside the complex environment of the EMU helmet would manifest itself. The occurrence of minor amounts of water in the helmet was normalized. Despite the fact that water carryover into the helmet presented a known hazard the ground teams grew to accept this as normal EMU behavior. 32

Reliability Drifting to Failure* Hi Expectations: Desired approach to work (as imagined) Normal Practices: Work as actually performed (allowed by mgmt!) Stated Expectations Real Margin for Error Drift Error Normal Practice Hidden hazards, threats, unusual conditions, & system weaknesses Latent Error Inconspicuous and seemingly harmless buildup of hidden error and organizational weaknesses Lo 33 Time * Adapted from Muschara Error Management Consulting, LLC

Additional info: ISS EVA 23 RCA Additional reading https://spaceflight.nasa.gov/outreach/significantincidentseva/assets/suit_water_intrus ion_mishap_investigation_report.pdf https://sma.nasa.gov/docs/default-source/safety-messages/safetymessage-204-04-07- isseva23suitwaterintrusion-vits.pdf?sfvrsn=6 34

In another place, another time Brief description of event: On June 23, 207, the Station A 38kV breaker B breaker failure protection operated when Capacitor Bank C was closed. The breaker failure operation cleared the 38kV bus by opening breakers A, C, D, E and F. 35

Initial -up (23 June 207) Station A A B C D E F Cap C 36

Shunt Capacitor C Closed in, for voltage control Station A A B C D E F Cap C 37

Breaker Failure, B Station A X A B C D E F Cap C 38

Bus Cleared Station A X A B C D E F Cap C 39

What could cause this?? Discussion 40

Final Report will this happen again? Brief description of event: On June 23, 207, the Station A 38kV breaker B breaker failure protection operated when Capacitor Bank C was closed. The breaker failure operation cleared the 38kV bus by opening breakers A, C, D, E and F. Identify contributing causes of the event to extent known: The cause remains unknown. Identify any protection system misoperations to the extent known: Station A breaker B breaker failure protection Narrative: When shunt capacitor C was closed in, for voltage control, breaker B breaker failure operated de-energizing the Station A 38kV bus. After exhaustive testing, no cause was found and everything was put back in service. 4

Final Outcome for THIS event Nothing Found Put back into Service 42

In same place, another time Brief description of event: On August 2, 207, the Station A 38kV breaker B breaker failure protection operated when Capacitor Bank C was closed. The breaker failure operation cleared the 38kV bus by opening breakers A, C, D, E and F. Identify contributing causes of the event to extent known: The cause remains unknown. Identify any protection system misoperations to the extent known: Station A breaker B breaker failure protection Narrative: When shunt capacitor C was closed in, for voltage control, breaker B breaker failure operated de-energizing the Station A 38kV bus. After exhaustive testing, no cause was found and everything was put back in service. 43

Initial -up (2 August 207) Station A A B C D E F Cap C 44

Shunt Capacitor C Closed in, for voltage control Station A A B C D E F Cap C 45

Breaker Failure, B Station A X A B C D E F Cap C 46

Bus Cleared Station A X A B C D E F Cap C 47

What could cause this?? Findings: Breaker failure initiate contact was partially conducting at times. This contact is a hybrid solid state contact that would build up voltage and initiate breaker failure randomly. A 0k ohm resistor was connected from the contact to negative to pull down this stray voltage build up, and not initiate breaker failure randomly (LL2030703) Manufacturer changed the design incorporate the same fix being implemented in the field. 48

In same place, another time Brief description of event: On August 2, 207, the Station A 38kV breaker B breaker failure protection operated when Capacitor Bank C was closed. The breaker failure operation cleared the 38kV bus by opening breakers A, C, D, E and F. Identify contributing causes of the event to extent known: Leakage current through a solid state contact erroneously triggered the Breaker B breaker failure. Identify any protection system misoperations to the extent known: Station A breaker B breaker failure protection Narrative: When shunt capacitor C was closed in, for voltage control, breaker B breaker failure operated de-energizing the Station A 38kV bus. After extensive testing it was determined that the breaker failure initiate contact from the SEL 32 relay was partially conducting at times. This contact is a hybrid solid state contact that would build up voltage and initiate breaker failure randomly. A 0k ohm resistor was connected from the contact to negative to pull down this stray voltage build up, and not initiate breaker failure randomly. 49

Ed Ruck Senior Reliability Engineer, RRM 302-376-569 office 847-62-0487 cell Ed.Ruck@nerc.net Questions??? and Answers!!! Jule Tate Associate Director, Event Analysis, RRM 404-446-2572 office 609-65-775 cell Jule.Tate@nerc.net Andy Slone Senior Reliability Engineer, RRM 404-446-979 office 404-99-8303 cell Andrew.Slone@nerc.net Rich Bauer Associate Director, Event Analysis, RRM 404-446-9738 office 404-357-9843 cell Rich.Bauer@nerc.net Wei Qiu Senior Reliability Engineer, RRM 404-446-962 office 404-532-5246 cell Wei.Qiu@nerc.net Rick Hackman Senior Reliability Advisor, RRM 404-446-9764 office 404-576-5960 cell Richard.Hackman@nerc.net 50

5

52

53

Dinner Discussion Why should we use Root Cause Analysis? We want to understand what happened and prevent it from happening again Why not use Apparent Cause Analysis like the Why Staircase? Apparent cause analysis only fixes the one issue We may not have identified the correct issue. Are we confident that what you found won t happen elsewhere? What is the goal of an Apparent Cause Analysis To fix the problem you have right now. What is the goal of a Root Cause analysis? To find effective correction actions that will prevent the problem from happening again. Are within our control and do not cause unexpected problems. What prevents your company from performing Root Cause Analysis? There is a perception that RCA needs is costly and requires extensive training of all parties involved. 54