The Next Step Change in Process Safety

October 1, 2014 | By Andrew Soignier, Ventyx; Angela Summers, SIS-Tech; Mike K. Williams, Modern Automation Consulting Services, LLC

From plastics and synthetics to fertilizers and fuel, the general public takes for granted so much of what the chemical process industries (CPI) produces, and the risks associated with CPI production facilities are not typically considered. That is, until the next headline-creating industrial catastrophe occurs. The human toll of such disasters plays out across ruined lives, devastated communities and obliterated opportunities. The effects can stretch out for years in the form of chronic health conditions, diminished earning capacities and contaminated environments.

However, those connected to the CPI do not forget the risks, nor can they afford to. Catastrophic incidents, large and small, have increased the focus on process safety management in production facilities around the world. Although enormous progress has been made in preventing them, incidents continue to occur. In fact, while safety incidents have been declining in number since 2008, those that do happen have been increasing in severity, according to the American Institute of Chemical Engineers (AIChE; www.aiche.org).

Today, the industry is on a cusp, seeking the next breakthrough improvement in process safety management — one that can empower facility operators by notifying them when risk increases, and enabling them to respond before abnormal operation escalates uncontrollably and causes injuries, monetary loss or environmental damage. This article examines how key elements, such as automation and information-management systems, as well as operating discipline, can begin to converge to make this proactive approach a reality.

Cascading causes of incidents

Numerous studies and analyses have concluded that the majority of safety incidents result from human error. Operating discipline is typically at fault, since these errors spring primarily from defects and deficiencies in following operating or maintenance procedures, and in applying necessary administrative controls to ensure competency, effective communications, performance measurement and change management. For example, work instructions that are incomplete, inaccessible or illegible lead to inconsistent execution of procedures; poor communication causes insufficient worker-to-worker handoffs; stress and excessive fatigue impair decision-making and contribute to procedural missteps; and poor human-machine interfacing can lead to operator confusion and delayed responses.

Many companies minimize the contribution of human error in incident initiation through the implementation of safeguards or barriers based on the so-called “Swiss Cheese” model of accident causation used in risk management (Figure 1). The purpose of this model is to reduce the likelihood of an incident occurring, or to reduce the impact if an incident does occur. However, once these safeguards and barriers are introduced into operations, rigorous procedural controls are then necessary to ensure their integrity. Otherwise, inevitable human errors and equipment degradation reduce the effectiveness of the model.

Shifting tasks from manual to automated operation reduces the potential for direct human errors in initiating an incident, but in order for the automated functions to be effective, asset-integrity systems covering the operation and maintenance of these safety-critical pieces of equipment become necessary. Whether manual or automated, a consistent and accurate execution of safety-critical tasks requires operating discipline and the ability to monitor for changes in risk. Incident investigations reveal that deterioration of the barriers and safeguards often start long before the accident occurs, and that no systems existed to detect and report their loss. There are three main contributors to this deterioration of barriers and safeguards: the passage of time, covert risks and complacency.

The passing of time. Just because a safety incident has not occurred for some time does not mean that all is well. If assets are poorly maintained and operating processes not regularly checked for safety effectiveness, they will eventually stop providing the level of risk reduction they were originally designed to provide.

Covert risk. Risk has a propensity to emerge from the least-expected places. The inability to visualize where the risks are and the source of the next incident is an open door to disaster.

Complacency. Statements like “This is how we always do it” and “Don’t fix it if it isn’t broken” are heard frequently. As months and years pass without incidents, it is all too easy to become complacent, especially when it is not readily apparent where hazards can stealthily develop. That is when poor habits can infiltrate processes, and running overtime becomes the new normal.

Overall, the “Swiss Cheese” model alone is too porous and static to achieve the next safety performance breakthrough. Instead, the confluence of human error, operating data and information-management systems must be addressed. Process-safety challenges must be viewed as a holistic system-integration problem in order to make meaningful progress.

Holistic systems integration

Non-integrated process-safety programs may address some of the “low hanging fruit,” but they do not prevent everything, as is evident from the prevalence of catastrophic incidents. The integration of manual tasks with automated procedures and smart interface design aids in avoiding deficiencies in procedure implementation and communication, as well as minimizing the potential for human error. The key pieces already exist in the operational-technology (OT) and information-technology (IT) realms and now must be pulled into conjunction to comprehensively address the growing complexity of the human factors that contribute to safety incidents.

For example, on the operational technology side, companies can implement state-based unit-control schemas to address a range of normal and abnormal operating conditions. They can also employ automation systems to detect abnormal events and take preemptive actions to stop incident propagation. Additionally, they can implement abnormal situation management (ASM) graphical standards to optimize operator navigation during abnormal events and alarm management to promptly focus attention on safety-critical issues.

On the IT side, companies can integrate standard operating procedures (SOPs) with automated delivery of the latest revision of the required tasks, electronic time-stamped signatures, prescribed records and quality tolerances. Any procedure can be further supported with safety rules and automated interlocking functions, to ensure that manual tasks are sequenced properly, that the proper individuals are notified for certain tasks, and that detailed findings are consistently recorded for monitoring purposes. Further, workflow applications can be deployed to aggregate the results to the desired level of granularity to facilitate adequate information sharing and reduce the chances of human error.

Most promising of all is the opportunity to integrate realtime analysis into the overall picture to identify operational risk before it translates into incidents, and to drive the most effective risk-mitigation schemes. Figure 2 illustrates how data can be leveraged through proper integrated procedures to help visualize risk.

Static versus dynamic risk

Risk is not static; things are always changing. Safeguards can develop faults, or they can be down for maintenance. New and different activities may be taking place during installations. Organizations, people, resources and logistics can easily and quickly shift. In short, nothing should be taken for granted. Rather, organizations must bolster safeguards and barriers in a dynamic fashion. To do this, they can monitor leading indicators of increased vulnerability to incidents on a day-to-day basis. A daily review of cumulative risk enables organizations to digest new operations information. Where are breaks in the plan that will defer planned work? What defective equipment has been found that cannot be repaired immediately? Which employee is unexpectedly off the job today? The accumulated information can be used to fuel risk assessments, both operational and safety-critical, leading to decisions on whether to shut down or to take compensating measures.

While this provides a reliable basis for operational decision-making and control, and ensures that levels of cumulative risk remain tolerable, the key step is to leverage the convergence of OT and IT and react ahead of incidents through realtime risk monitoring, analysis and advising. Figure 3 shows a visualized representation of cumulative risk for an operating unit.

Proactive analysis

Traditional approaches to process safety management have leaned heavily on post-mortem analysis, investigating what went wrong and lessons to be learned. Although analysis after the fact is useful, organizations must start asking “How can we be more proactive in addressing operational risk?” in order to truly slow the pace of incident occurrence and decrease incidents’ severity. The answer lies in the integration of operational and information technologies.

The clearest way to be more proactive is to integrate a production facility’s wealth of realtime operational data, such as instrumentation and control data, with technology for dynamic risk analysis. When done comprehensively, this can result in a realtime dynamic risk advisory capacity for monitoring and immediately alerting personnel to risks as they change and develop, and for providing an optimal course of action to maintain the integrity of the facility.

Accessing and amalgamating all of the required data is challenging, because data can reside everywhere: from operations, maintenance and automation systems, log books, operator rounds, mechanical inspections, lock-out or tag-out applications and databases. Additionally, information is being created and changed constantly. Data needs to be accessed from disparate systems, validated, transformed, integrated and contextualized in order for it to drive actionable intelligence. Technologies do exist to complete the heavy-lifting integration tasks. Thus, analytics applications can then connect the dots, comparing asset operation and maintenance performance against a safety basis, and alerting personnel to any deviations.

Further, the proactive interface can be strengthened even more through the use of visualization technologies in addition to analytics. Analytic technology can calculate the change in risk dynamically, (for instance, weighing the consequences of an uncompleted operator round or a non-implemented proof test on a critical device) and update a facility’s risk matrix in realtime with the revised impact. Visualization technology can then help focus employees’ attention on the change in risk using geospatial representation and color-coded graphics (Figure 4) for impacted facilities, units and equipment.

Putting intelligence into action

The next step is to automatically direct appropriate corrective actions based on this realtime intelligence. In an example scenario, an operator scans a screen that maps an entire facility and displays the dynamic risk levels associated with each of the facility’s unit operations at that moment. A specific unit displays an elevated level risk and the operator investigates to discover a deviation from the design specification that is now raising the likelihood of an incident, and perhaps one of higher severity, than was originally planned into the design of the unit. In effect, the operator is viewing a window of a unit’s behavior “as operating” versus “as designed.”

Examining further, the operator consults a safety rulebook to identify the cause of the increasing risk and to see what tasks are necessary to resolve the problem, ensure continued safe operation and restore operational integrity. The operator can then schedule those tasks appropriately, and monitor their completion, while taking the production unit or some of its equipment offline if required by procedure.

Multilevel utilization of data

The same collected operational data, along with its attendant data-integration infrastructure, that drives realtime analytics and improved situational awareness can also be leveraged in other ways to increase operational risk awareness, power predictive insights and enable proactivity.

By definition, realtime data have a limited shelf life and quickly become historical data. As today’s realtime data become tomorrow’s historical data, they can be compiled to inform periodic critical reviews with asset managers and technical authorities, monthly governance reviews with asset leadership teams and quarterly reporting to executive committees. Engineers are able to analyze the accumulated data to enhance designs and processes, while managers and executives can utilize the data to improve process safety and enterprise asset-management strategies. Meanwhile, analyzing realtime data in conjunction with historical data can help organizations more effectively identify and track trends, not just within one facility but across many, and help drive successful preemptive actions.

However, time-based information alone is not enough to present a complete picture of operational risk. The attendant data-integration infrastructure mentioned above is required to provide the context (for example, the unit processing state) in which the data are collected and presented. This context is a necessity for comparing best-practice historical data with realtime operational performance. Within this process-state context, events (planned or abnormal) can be collated, analyzed and acted upon quickly.

Procedural, state-based process-control standards, such as ISA 106, have been established to provide the appropriate data context and process-control vehicle for realtime event mitigation of an abnormal situation. Without providing such context keys, it is very difficult to provide the multilevel datasets required to enable the analytics engine to work. However, with state-based process automation standards in place, basic process control is fully integrated with safety shutdown systems, providing a key piece of enablement technology for rapid, closed-loop response to an abnormal situation. State-based control strategies also provide the event context to continuous, streaming information, which is vital to analysis and reporting.

Overall, interactions with integrated data, aggregated at the appropriate level, improves the ability to benchmark performance, plan alternative courses of action and mitigate abnormal situations effectively. Figure 5 illustrates some ways that aggregated data can be used to schedule reviews and discussions about safety-critical tasks.

Employing cloud and mobile

The previously discussed emerging paradigm for process safety management is proactive and driven by data from a myriad of sources. Now, it is time to discuss how to best deliver the data to the appropriate location, using three technologies that have proven to be game-changing in many industries: mobile (display), cloud (database) and industrial Ethernet (IE) communications.

Mobile technology has been problematic from a control standpoint for the CPI. There is a natural propensity for individuals to use mobile technologies to “move things along” outside of normal workflows. Sometimes this works to great effect; sometimes it does not. Nonetheless, there is no arguing the value that the correct application of mobile technology can have in maximizing asset health and process safety.

Areas where mobile is particularly valuable include field-data capture, event logging, operator rounds, safety inspections and audits and communicating operations and maintenance instructions, among others. The larger the facility, and the more dispersed the resources and assets are, the more valuable mobile capabilities become. Mobile technology is also economical, eliminates paperwork, enhances compliance and — because it is so pervasive in everyday life — the learning curve is far from steep.

The advantages of cloud deployments are clear as well. Cloud-based process-safety management enables organizations to quickly roll out new process-safety capabilities, and can enhance collaboration between individuals and departments. The cloud can also reduce the costs of accessing and integrating data, help eliminate information silos and enable cost-effective enterprise-wide visibility. Importantly, the cloud is also pivotal for transferring data in a controlled and secure fashion.

Another key enabler is IE. Supported by wireless-mesh networking and secure communications protocols, IE enables a flexible, responsive, end-to-end networking architecture that provides connectivity, collaboration and integration. As a result, data can be delivered from the shop floor to the cloud, and then from the cloud to a mobile device efficiently and securely.

Operational integrity windows

The new imperative for CPI companies is knowing with certainty the source of the next incident, and being able to avoid it or at least minimize its effects. Regulatory agencies, shareholders and employees demand this. Doing this will require the ability to:

• Reduce operational risk by unlocking data for realtime insights and enterprise-level visibility

• Examine the “big picture” through enterprise-level benchmarking and trend analyses to guide safety strategies and keep incidents at bay

• Trust the data by leveraging validation, whether it is mobile data, or data created automatically by instrumentation and systems

• Receive instantaneous alerts on developing issues through visualization and alarm or alert technologies on top of realtime analytic applications

•Use mobile and cloud technologies to deploy and access data from the point of work or the process edge

• Implement realtime detection and automated response to abnormal conditions, be they partial process impairment or emergency shutdown events

• Have a single window into operational integrity powered by dynamic risk analysis and leveraging existing risk data, automation systems, maintenance and operations procedures and business systems for planning, scheduling and cost control

When all these abilities fall into place, a company can capitalize in new ways on existing infrastructure and benefit from a breakthrough in process safety performance. A process-safety paradigm shift is achieved through integration of realtime information, best operating practices and closed-loop control. Mobile procedure assistants can help avoid human error. However, in today’s complex processes, human response time may not be quick enough. This is where pre-programmed safety systems provide the final layer of protection to return a process to a safe state.

Unfortunately, safety systems are conservatively designed to fail safe. In some cases, this mode of action may be premature, resulting in false process trips, causing undesirable loss of productivity. Advances in state-based control can respond in advance of the last line of defense and address the abnormal event, when it is less than critical, before shutting down the entire process. This example of “man-machine-method” integration provides the optimum response to ensure high levels of both safety and productivity.

Edited by Mary Page Bailey

Authors

Andrew Soignier is serving as vice president of Oil, Gas & Petrochemical Solutions for Ventyx, an ABB company (800 Town & Country Blvd. Suite 500, Houston, TX 77024; Phone: 225-751-3348; Email: [email protected]). In this role, he directly oversees Ventyx Level 3 solutions in the areas of operations, safety and asset health. Soignier joined Ventyx in 2007 and has held several positions in executive account management, and as the sales director of Oil, Gas & Petrochemical Solutions. Prior to joining Ventyx, he spent 11 years in the process automation industry, with a focus on electrical, process safety, critical controls and rotating equipment. He holds a B.S. in electrical engineering and an executive MBA from Louisiana State University.

Angela Summers is president of SIS-TECH (12621 Featherwood Drive, Suite 120, Houston, TX 77034; Phone: 281-922-8324; Website: www.sis-tech.com). She has more than 25 years of experience in safety controls, alarms and interlocks, process engineering and environmental engineering. She is an active participant in standards committees and has written over 60 papers, technical reports and book chapters. She is currently editing the Center for Chemical Process Safety (CCPS) book “Guidelines for Safe Automation of Chemical Processes.” Summers received a B.S.Ch.E. from Mississippi State University and a Ph.D. in chemical engineering from the University of Alabama.

Mike K. Williams is a consultant with Modern Automation Consulting Services, LLC (6204 Woodview Pass, Midland, MI 48642; Email: [email protected]). With over 39 years of experience in the CPI, Williams provides work-process guidance in the automation and operation of batch and continuous facilities. Prior to starting external consulting, Williams worked for The Dow Chemical Company in the Specialty and Advanced Materials divisions, where he consulted on and provided analysis and investment planning for a number of new and rehabilitation capital projects. He has a B.S.Ch.E. from the University of Michigan, Ann Arbor and is certified in Lean-Six Sigma methodologies.

Click here for full pdf version of this article – includes all graphs, charts, tables, and author information

Events

Categories

The Next Step Change in Process Safety

Cascading causes of incidents

Static versus dynamic risk

Proactive analysis

Putting intelligence into action

Multilevel utilization of data

Employing cloud and mobile

Operational integrity windows

Authors

Mobile Navigation

Events

Categories

The Next Step Change in Process Safety

Cascading causes of incidents

Static versus dynamic risk

Proactive analysis

Putting intelligence into action

Multilevel utilization of data

Employing cloud and mobile

Operational integrity windows

Authors