From the Desk of the CISO -  The CrowdStrike Incident of 7/19/24

Thomas Pioreck, CISO

The CrowdStrike update identified as the cause for the mass outage of Windows systems across the globe has many organizations asking questions; how could this happen, could this happen again, what should I do to protect my business in the future, what about my vendors and partners; were they affected, how have they protected themselves? Plus, a whole litany of other questions. We at Cybersafe are aware of these questions, many which we have ourselves, and wanted to address some of them for you to hopefully allow you to rest a little easier.

First, what actually happened? Without getting too deep in the weeds on the technical aspects and components, CrowdStrike issued an update for one of the heuristic library files their agents use to stay current with threat actor (TA) activity. The file is a config file update and not an update of the agent itself. These config files and other library files are updated by CrowdStrike multiple times a day. There was a corruption in the file which caused a failure within the CrowdStrike application itself. It was magnified because of how CrowdStrike runs, in the kernel. By running in the kernel, when the corrupt file caused the system crash, it was causing it at the very core of the Windows OS, resulting in a failure at the OS level and the system itself crashing, generating the “blue screen of death” (BSoD).

Cybersafe was not affected internally by the CrowdStrike issue. All systems remained online and operational, which allowed us to respond rapidly to customers that may have been affected. While there is always an element of luck involved when it comes to an occurrence of this size and scope, it’s also true that “luck is the residue of design.” We have taken great care in the design of our infrastructure to create fault tolerance, levels of defense-in-depth, and our business continuity and disaster recovery structure. Our ability to maintain an infrastructure that is adaptable across operating environments with requisite standby systems and other failover capabilities, allows us to maintain operations, or at worst return to standard operations, in a timely manner. Multiple forms of backups, staged environments, and a system of rotation ensure there are always operational capabilities for our organization and its mission.

One takeaway a lot of organizations are having is that this was due to a “bad update” or “patch,” and that the conclusion is to avoid updating and patching systems to mitigate this type of risk. While there is some validity, it’s important to remember that patches and updates are critical to ensuring an organization is well protected. Patch and Vulnerability Management needs to be designed so that new releases may always be tested in an environment that would not impact operations should a fatal error occur. Once tested and proven, updates should be staggered and rolled out across an environment in stages, with a consideration to leave your most critical systems until the end of the patch cycle. Cybersafe believes in operating at least at an “n-1” design. That means not automatically updating to the latest release until it has been thoroughly vetted “in the wild” and then again within our own internal testing infrastructure. Staying one version, or one minor version, behind keeps organizations off the “bleeding edge” without sacrificing too much in a risk perspective where you’re allowing older vulnerabilities to continue to exist, just waiting for a TA to discover and exploit it.

Though dominant, CrowdStrike is not the only EDR/EPP on the market. Many organizations who are not using CrowdStrike are asking about their chosen vendor, especially our Sentinel One clients. The way CrowdStrike and Sentinel One perform these similar updates is very different. Sentinel One occurs at the user space and CrowdStrike occurs at the core kernel level. In a statement, Sentinel One themselves say, “At Sentinel One, our Live Security Updates (LSU) are confined to detection-related logic and models that operate in an isolated user-mode space, separate from the core of our agent. These updates do not affect the kernel or core components of the Sentinel One agent. Since our agent primarily operates in user-space, our Live Security Updates only impact user-space components. This was an intentional design choice to increase stability and significantly decrease interoperability risks.”

We contacted our partners immediately to determine if this was possible with other products and asked them to identify what testing and design elements exist that allows them to ensure their customers that a similar occurrence is not a concern.

Security, like partnerships, is built on trust. We hold the trust of our clients in the highest regard; it is a driving force in everything we do. We hold our partners and vendors to that same level of trust. If there is a perceived crack, we work diligently to gain answers to our questions and your questions. Though it has been a painstaking process in the days after this event, the work and response demonstrated by our partners during it has been encouraging and has convinced us that our trust in them is well-deserved. Security never stops, there is no point where we rest, and we will continue to work tirelessly to ensure our solutions meet our high standards.

Thank you for your continued trust in Cybersafe and giving us the opportunity to help you strengthen your business’s security so that you can focus more on your business’s mission.

For more about bolstering your cybersecurity posture with our services, schedule a consultation or contact us today.