Former Black Hat’s Perspective on Fundamentals of Incident Management
Incidents aren’t planned unless I was somehow involved, needless to say, you better have a plan. Whatever the circumstances, it is prudent and even imperative to have and follow a procedure for the unexpected.
After all, any amount of downtime could cost the business money, including a strike against its reputation. Additionally, research shows outages resulting in downtime can cost businesses anywhere from tens of thousands to hundreds of thousands per hour depending on the size of the business.
Outages can affect innumerable individuals. It doesn’t stop with not having access to a service. If people can’t pay their bills or purchase flight tickets or communicate with friends and loved ones, the incident has the ability to affect real life. After all, cyberspace and real life are now symbiotic.
Since incidents typically disrupt services as well as hobble productivity, Incident management is the measures followed in responding to the incidents in order to restore service back to a viable state. The restorative process focuses on quickly establishing normal operations in order to mitigate the impact on the business.
There’s a saying of mine, which goes “Technology is like a house of cards”. All it takes is one bug, one application, or network configuration error that consumes CPU resources and “breaks your internet”, disrupting workflow and suddenly you find yourself grasping for answers and a quick fix if you ever happened to create such contingencies for incidences. Forget about any general incident scenario. Rather, imagine a large-scale incident that has the potential to cause lengthy downtime or loss of service.
As a former black hat, I had continencies for incidents. While my methods were not sophisticated enough to necessitate complex remediation procedures, they worked and kept our operations and public image afloat, even during cyberattacks from our rivals.
In the event that our website went offline, that meant that our content was unreachable, including access to third-party web apps such as our e-store. In all, every lost visitor negatively impacted our public perception.
I delegated responsibilities and access to these services to several members based on availability, so I always had team members on call to remediate problems as they happen. The flip side to this was causing incidents against the resources and services used by rival hackers. This created windows of opportunity for me to operate in while every able hand on deck was preoccupied trying to fix the errors I caused.
For this reason, executing a fast incident resolution will not only reduce incurred costs imputed during downtime but also allow the business to maintain a level of control over its public image. Knowing how to communicate during an incident is key to successfully managing an incident.
Therefore, let’s get you acquainted with some Incident management fundamentals.
Plan the Procedure
You will want a team that has been instructed on the incident procedure at a moment’s notice. These are people who will need access to the Incident response plans. A list of contacts to notify in case of an incident. A list of on-call schedules. Escalation policies. Access to conferencing tools and access credentials. Policy documents for review, as well as technical documentation and run books.
Run books offer step-by-step instructions on what to do during an incident. They’re most useful when your systems administrators are unavailable.
It can be difficult to know the appropriate actions associated with specific events without simulating them. For this reason, engineering incidents can teach you how to fortify your systems in an effort to minimize the possibility of an actual incident. It’s not just a lesson in causality, but a lesson in prevention.
Learn from your Incident Management Software
For companies with a large network, centralized incident management software is a vital asset. These are pretty resilient and help remove some of the daunting elements of human dependency which can result in errors and delays.
By delivering defined automated alerts, team schedules, as well as policies for escalation, precious time is saved, so problems can be resolved faster. Relying on multiple incident management tools is arguably impractical when managing the integrity of a large network.
Identify and Log the Incidents
Not every incident is going to materialize in the way you expect them to. You might receive a call or an email from an employee or customer. Regardless of the scenario, if there’s an incident, log it.
Incident logs (tickets) must be inputted into your service desk, and should typically include the following:
Name of the individual reporting the incident
Date of the incident
Description of the incident
Assign a unique identifier to the incident, for tracking purposes.
After these steps have taken place and the incident has been properly categorized, prioritize its severity, and begin remediating the problem so the incident ticket can be closed.
Categorizing Severity Levels
Not every incident is critical. Therefore, categorizing each incident based on its severity is how you prioritize the event. As a former hacker, I created a threat level based on color codes, adopted from the Defcon level colors. Each color communicated an appropriate action in our procedure memo to be followed.
More practically, severity levels typically utilize numbers 1-3, with 1 indicating a severe public-facing incident that affects your customers. However, this must be catered to fit your business model. By assigning a 1-3 severity level, you can define which incidents will encompass the scope of each level, and develop appropriate protocols for addressing them.
One time our hacking forum and website experienced a Defcon Red or Severity level 1 data breach. I trusted so much in our defenses and ability to recover quickly from any security incident that I didn’t believe the day would come.
We never simulated a Defcon Red security level before. Consequently, we wasted precious time during our recovery process. By the time we restored services, the cat was already out of the bag, and everyone knew what had happened, even as we were still trying to understand it.
Incident Communication Plan: When an Incident Goes Public
If knowledge of an incident becomes public, your business needs to anticipate a response to this news in order to mitigate any impact on your business's public image. Having an incident communication plan is key.
If a disruption has taken place, customers need to know that your business has acknowledged that a service disruption has taken place and that the business is taking action to resolve it as quickly as possible. After all, if you aren’t driving the narrative, someone else will.
As the leader of a public-facing black hat hacking website and community, I failed to have an Incident Communication Plan. After all, I was more focused on our internal workings and not our public relationships.
Thus, when our services went down due to a cyberattack, visitors thought their information might be compromised, and believed we couldn’t adequately protect our online assets. At this point, I had no plan for addressing the public about the incident other than repairing the damage. However, loyalty is not easily repaired when public relations are neglected.
Tools like Statuspage are reliable for distributing information. Many companies use social media, such as Twitter.
An article by Jesse McGraw
Edited by Anne Caminer