This post is also available in: עברית (Hebrew)
Written by Or Shalom
The OT (operational technology) networks present a security challenge for cyber security officials. Such networks are kinetic networks in different sectors, such as the energy industry, production lines (cars, robotics, food, pharma, weapons). They act as operational networks, ingrained in command and control systems (for example, luggage conveyer belt at the airport), structural systems and BMS (Building Management Systems) that are trusted with, amongst others, the safety and security of the residents etc. The operational complexity and security threats require preparations, tools for business maintenance and continuation, and the ability for fast recovery from cyber attacks in the OT sectors and production lines. These are the main focus of this article. While planning preparations you must pay mind to several supporting layers: organizational policy, mapping and examining risks procedures (as seen by the attacker) primarily founded on core processes and critical areas, acquiring devices and training programs to maintain cyber worthiness.
Firstly, we must differentiate between BCP, a term used for the continuation of a business, in our case following a cyber attack, and the cyber crisis itself. A BCP’s goal is to provide a continuous business plan to ensure consistent business functionality in production lines even following an unexpected event. In an outlook of dealing with the cyber crisis itself, this process will be show itself in the containment phase where the method of crisis management and ability to overcome the attack are being mostly measures, to prevent a negative deterioration in response to the attack and continue the routine work of the production line. These types of abilities will manifest in different processes (technologies such as the ability to PLC set & reset after an attack and return to production outlines, backup process and returning to functionality. Alternatively, utilizing critical devices for replacement following an incident and more. In organizational and methodical processes, the ability to switch to manual labor during production, documentation, and customer registration processes.
Risk Assessment – examination and mapping of core processes and the attacker’s viewpoint
There plenty of methodologies out there for mapping and risk examination in IT and OT networks. Most of the resources used for risk examination should focus on the systems and the accompanying processes in the following areas: the PLC, the Data Historian (as Database Servers) and the HMI themselves. The risk assessment should be conducted from the viewpoint of the attacker while using intelligence, and it has to be designated and coordinated to the operational area and in relation to the sector itself. Even though there are set patterns in operational network attacks there is also a difference in the interest of the attacker, their motivation, skillset, and knowledge. The risk management process has to be comprehensive in any areas relating to major subjects in the operational environments including risks from the supply chain, protocol vulnerabilities, problems in software updates, connectivity between devices (controls and software), risks from supporting systems (for example generators), attention to outdoor devices, mostly physical devices that might lead to computing and operational damage.
Equipping critical components in the production lines – as part of recovery preparations
The process of pinpointing the critical components when it comes to the cyber world are yielded from the risk mapping process ICS – Critical Cyber Assets Identifies. This process is aimed to detect critical components in the production line itself. The production lines rely on quite a lot of machines, components, devices, and controls, which is why the required focus should be on digital devices (due to potential for cyber-attacks). A point of interest for prioritization of components is mapping those closest to the IP areas as potential victors of exploitation or an attack. Specifically mapping the components closest to the 4 – 5 layers (according to the Purdue model) due to the proximity to the organizational and internet IT layers.
Examining events from the last few years and considering the impacts of the COVID-19 virus on the operational arenas raises several deductions. For example, the need in equipping oneself with redundancies (as part of the existing danger of a fight stopping cyber event etc.). Another interesting fact, the vehicle industry surprisingly has been highlighting a series of circumstances that brought about an operational shutdown of a Japanese manufacturer and as such: dependency on a main supplier, acquiring devices in the ‘just in time’ method (that prevented the usage of reserves and the ability to lean on existing supply) and more. Despite the losses, they were able to recover in a few days to routine work and that is an interesting achieving regarding recovery capabilities. These events demonstrate a thought-provoking conclusion in logistical planning, equipping and device reserves that allows for redundancy and the use in previous devices not from the event itself. The need for redundancy, a number of suppliers, and most importantly the need for possible cyber risk mapping procedures on the supply timeline as part of the process and the communication agreement with the suppliers.
Optimization processes in organizational management decision making – planning of monitoring as a deal breaker
During an active event, time works against the organization. Often there is a window of opportunity that allows for the process of adjustment and prevention of a worse outcome in a cyber crisis. For that, the IR team needs to be able to intervene and carry out a pinpointed forensic investigation of the event, examining the spread speed and the appropriate consequences. The monitoring process of the systems is relevant as well during an event in which it is possible to inspect the impact on the production capabilities of production. The ability to plan the manner of monitoring in different layers in production line networks will allow for investigation isolation and research that will support management decisions in the future. For example, legislation of monitoring processes in layer zero, during an event, could direct the business personnel to conclusions and decision making regarding the continuing of production (it will allow to examine if there is a deviation associated with damage on physical activity). Therefore, it is right to plan the deployment of monitoring systems according to layer (by the Purdue model) and support management decisions accordingly firing an event. Alternatively, bettering of the system and maximizing its functionality regarding security and operational aspects.
Plan for IR team readiness and worthiness maintenance
A plan for maintaining cyber worthiness requires emergency cyber exercises. These exercises must combine between the management personnel and the technical personnel in the organization. The involvement of the management is needed due to the necessity in management and decision making in the different sectors, for example: judicial aspects, supply line, public arena, resource management and more. This activity will aid in the examination of lessons learnt and the actions that must be taking for future preparations in the eyes of the management following an exercise (and of course those who experienced it firsthand). The decision making process is complicated, management are required to accept fast decision, recruit the service of professionals, and sometimes get into verbal communications with the hackers. The HYDRO steel company event is an interesting study in complex decision making and the impact they have of the activities of almost 35000 works in different branched, the ability to switch to manual labor, decision tied to manpower and the need to recruit workers for the production lines. In addition to management training and under the same exercise, there’s necessary to activate the technical personnel to ensure their hands on ability to cope in the organization. Defining the success parameters will be about the ability to identify and track malfunctions, isolation the event, fast investigation skills for curbing the event, as well as returning the production lines to routine activity as part of the recovery process. Popular OT environment exercise focus on ransom events, HMI system attacks, the exploitation and manipulation of protocols, internal threats, trust abuse, as well as physical events and destruction of infrastructure and computing systems as part of the attackers efforts to plant computing damage via physical attack (such as the UAV attack on the Aramco company). The measures of success will be yielded from the ability to crosscheck and detect functionality changes in controls, the ability to cross analyze between monitoring process etc. Planning the exercise directed at training the workforce, archiving of knowledge against risks from workforce changes, leaving the organization, changing of positions and the like. Therefore, the training ladder must be classified in a way that calculates the conclusions and lessons learnt from previous exercises and archives those. Another important aspect is the exercise friction in OT and IT networks in a method that ensures blocking capabilities in the changing arenas.
Or Shalom – Security and cyber expert and consultant to government ministries and defense industries, international business development consultant for companies in the fields of HLS and cyber and leads centers of excellence and advanced training programs in Cyber and HLS for various organizations in the civilian, security, industry, and academic sectors. He holds a master’s degree, as well as civil and national qualifications in the realm of HLS and Cyber Security. He has experience in security, innovation, planning, and characterization of technological security systems, HLS, and Cyber preparedness.