A strategic approach to asset and alarm management will reduce alarm fatigue, mitigate risk and support more effective response
On April 20, 2010, an explosion on the Deepwater Horizon offshore drilling rig (owned by Transocean and operated by BP) killed 11 crewmen. The resulting fire was inextinguishable, and the rig sank to the ocean floor two days later, leading to the largest marine oil spill in history.
During the U.S. Coast Guard and Bureau of Energy Management joint investigation into the explosion [1, 2], Mike Williams, chief electronics technician for Transocean, was asked: “Did you at any time hear any alarm that would indicate a general muster?” His response: “Never.”
As chief electronics technician, Williams was responsible for “maintaining the fire and gas systems and any and all electronic signaling devices throughout the rig.” During his testimony, it was revealed that the alarm system had been “inhibited,” which is an alarm condition where the sensors are active but will not trigger an alarm. It was also revealed that the alarm system had been bypassed for longer than a year because “they did not want people to wake up at 3:00 in the morning due to false alarms [3].” This unfortunate decision resulted in the deaths of 11 people and injuries to 17 others.
Asset oversight
Sensors, alerts and alarms are critical to the safety, reliability and operations of manufacturing plants and only deliver value when they are trusted and when the right action is taken at the right time, as was so tragically demonstrated on the Deepwater Horizon. Organizations that adopt a strategic asset management approach in their operations can apply that framework to reduce alarm fatigue, mitigate risk and support effective response.
In the past, only the most critical systems were monitored because sensors had to be hardwired from the asset to the control room. At $100 per foot for wiring, it was economically prohibitive to monitor all essential assets. With the advent of inexpensive wireless sensors and the industrial internet of things (IIoT) technologies, manufacturers are looking to reduce safety risks and improve operations by implementing online condition-based monitoring throughout their facilities.
As with most digital transformation initiatives, manufacturers struggle with where to start. Their struggle is understandable. According to McKinsey, more than 70% of all digital transformation initiatives do not yield the intended results [4]. Industry best practices suggest starting small and identifying projects that will yield positive results. Historically, this has meant installing sensors on assets that mitigate risk and improve reliability. One obvious example is to put sensors on equipment in hazardous environments that pose a substantial safety risk, like cooling tower fans.
As noted above, sensors and alarms are only valuable when personnel acknowledge the alerts and take the proper corrective action. On Deepwater Horizon, Williams acknowledged that he became “immune” to the constant barrage of alarms. This is a common problem in many distributed control system (DCS) control rooms (Figure 1). During the design and implementation of DCS systems, engineers will incorporate alerts and alarms for various equipment based on multiple variables. This approach often results in multiple alarms being triggered without an indication of the root cause of the most critical alarm. To minimize “nuisance” alarms, system designers can perform alarm rationalization to create an effective alarm system that evaluates each alarm to determine its relevance, importance and impact on the process, as well as the operator’s effectiveness.
Identify critical assets
Underlying the alarm rationalization process is the assumption that all assets have been prioritized with regard to importance, criticality and impact to the process. This is typically done through criticality analysis, but unfortunately, many companies assume they already know the critical systems and assets and find no need to perform a criticality analysis. In many cases, they do know about 80% of what is critical, but it’s the other 20% that will end up causing the most problems.
In the case of Deepwater Horizon, the internal BP investigation found [1, 2] that “the design of the gas detection system was apparently based on a single online combustible-gas detector (CGD) at each location. Such a system lacks the redundancy levels associated with a high-reliability design. If a single CGD did not work, was inhibited, or in some other way was out of commission, the protective functions provided by that device would have been lost.”
The result of the lack of CGD redundancy is one of the key findings of the investigation: “The fire and gas system did not prevent hydrocarbon ignition.” A properly performed criticality analysis should catch key factors like the lack of necessary redundancy in an automated safety system. The great benefit of performing a criticality analysis is that it will indicate the right things to do, in the right order, for the right reasons, and to get the right results that align with the plant’s objectives.
From a criticality analysis, reliability and maintenance personnel can prioritize which assets need online monitoring through sensors. Furthermore, plant managers will have confidence that not only are sensors being installed on the right assets, but also that their digital transformation projects will be successful.
Put the right procedures in place
In the excitement of “going digital,” many manufacturers do not take the time to make the necessary changes in their current work procedures to take advantage of new technologies. One of the reasons for conducting a pilot or proof of concept is to uncover areas that need change management.
For example, consider the experience of conducting an online condition-monitoring pilot project using wireless vibration sensors on rotating equipment. The online condition-monitoring-solution vendor’s systems engineers were monitoring the equipment along with the client’s maintenance department. One of the engineers had noticed an alarm had triggered and yet no acknowledgement of the alarm had taken place within the system. When the engineer called the client, the maintenance person confirmed they had received the alarm, but they had no procedures on how to respond. The problem was rectified by also notifying the reliability engineer and developing and updating the work processes. Any deployment of new technology will warrant a review of affected work processes and additional changes.
Alarms and asset management
An alarm is only as good as the responding action. A sensor can quickly identify a fault condition and provide insight to operators of potential problems months before a failure. However, it is imperative that once an alarm is received, the work procedures for corrective action are readily available to the maintenance crews in the field. Work-order management systems that are integrated into an end-to-end asset-management system can capture the task procedures and display in real-time the workflow progress to assure supervisors that the corrective actions have been properly followed and completed (Figure 2).
More importantly, an end-to-end asset-lifecycle management system, also known as an enterprise asset-management solution, incorporates the correct asset interventions and strategies that are foundational to an alarm rationalization implementation. As plant operators and maintenance crews start to retire and leave the workforce, asset management systems can capture their institutional knowledge to assure future plant personnel deploy the correct intervention for the various asset classes when alarms are received.
The advent of inexpensive wireless sensors has resulted in a proliferation of online condition-based monitoring systems. Real-time alerts from these sensors allow plant personnel to become more proactive to potential problem situations. Furthermore, precious maintenance resources are more effectively deployed to assets based on their condition from sensor data, which reduces unnecessary, time-based preventative maintenance.
Asset and alarm investment plan
Incorporating alarm management into an asset management system can have a major impact in moving manufacturers from a reactive position to a proactive or predictive stance in their maintenance and reliability strategies. One such area is asset lifespan. As enterprise asset management systems intake more real-time data, manufacturers can pinpoint assets that are approaching their end of life far in advance. Building on the criticality analysis, companies can utilize an asset investment planning (AIP) application to rank the capital improvement projects that will have the greatest impact on reducing risk and improving operational excellence.
As a case in point, a large refinery had a capital improvement plan for a boiler-feed pretreatment system. They had conducted several design, HAZOP and construction reviews. Given the importance of the system, they decided to conduct a criticality analysis to support their planned improvements. After 18 years of operation, they did not expect to find any unidentified issues and were therefore surprised by previously unidentified risks that were uncovered during the analysis. One such risk was a small brine system that would have brought the facility to a full shutdown in less than 8 hours if it failed. The four-week lead time to replace this system was an unknown and unacceptable risk. The capital plan was quickly adjusted to include a relatively small $10,000 asset investment that averted a potential $25 million risk.
Alarm rationalization
Operators can experience a dramatic reduction of risk by considering alarm rationalization as an asset management strategy. The advent of wireless sensors and the internet have launched a proliferation of technologies for alarm management within the industrial marketplace, making it increasingly easier to create a cacophony of alarms. The explosion on Deepwater Horizon demonstrated that more is not necessarily better, and it can, in fact, be deadly. Now more than ever, it is important that all alarms are relevant and meaningful with clear procedures for response. By supporting alarm rationalization with an asset management strategy through an asset lifecycle-management platform, industrial manufacturers can dramatically improve their operations, reduce their overall risk and optimize their capital investments. ■
Edited by Mary Page Bailey
References
1. U.S. Coast Guard & Bureau of Energy Management, Joint Investigation of Deepwater Horizon Oil Rig Explosion, July 2010.
2. U.S. Securities and Exchange Commission (SEC), EX-99.3 3 dex993.htm: Deepwater Horizon Accident Investigation Report, Sept., 2010.
3. Hilzenrath, D.S., Technician: Deepwater Horizon warning system disabled, Washington Post, July 2010.
4. McKinsey & Co. Survey Results, Losing from day one: Why event successful transformations fall short, Dec. 2021.
Author
Tacoma Zach is co-founder and CEO of MentorAPM asset-performance and work-management software solutions (2416 E Goldenrod St., Phoenix, AZ. 85048; Phone: 602-492-6212). Having spent most of his career in the operation and management of both municipal and industrial water/wastewater operations, Zach is an expert on the application of asset management best practices, risk management and ISO 55000 standards to w/www utilities. He is the author of Criticality Analysis Made Simple and speaks frequently at leading asset-management conferences. He holds B.S.Ch.E. and M.S.Ch.E. degrees from the University of Toronto and is a licensed Professional Engineer in Ontario, Canada.