Explaining the CrowdStrike Outage to Virginia Business Owners
Imagine waking up to find your computer displaying the dreaded “Blue Screen of Death” because of an overnight software update. Sometimes, an update is just an update, but it can be a global disaster on rare occasions.
Unfortunately, we learned that the hard way this past week. On Friday, July 19, 2024, a routine software update from CrowdStrike, a leading cybersecurity company, caused a significant issue affecting an estimated 8.5 million Windows computers. This glitch left millions staring at the infamous blue screen, causing chaos for businesses and individuals. Small and large companies lost crucial data, and individuals missed essential deadlines. This incident highlights the delicate balance between maintaining cybersecurity and ensuring a seamless user experience.
This incident led to significant disruptions in many sectors, including airports, governments, banks, grocery stores, media, and more. In Virginia, some state agencies and local governments experienced disruptions. So, what went wrong? Here, we explain what CrowdStrike is, what went wrong with the update, how it impacted businesses, and how to protect your business.
What Is CrowdStrike?
CrowdStrike is a leading cybersecurity company based in the United States. Founded in 2011, it serves as a digital bodyguard for businesses and large organizations, protecting them from cyber threats such as ransomware, malware, and other online attacks. The company is trusted by numerous businesses, including over 500 companies from the Fortune 1000 list. It has a solid reputation for responding quickly to cyber threats and has been involved in investigating major cyber incidents. Their main product is the Falcon sensor program, a cloud-based security system that detects and stops cyber threats in real time.
What Is Falcon Sensor?
Think of your computer as a house. Regular antivirus software is like a security system that looks for specific types of bad guys (like burglars) that it recognizes from before. If it sees any of these known bad guys, it stops them from getting in.
Falcon Sensor is called an Endpoint Detection and Response (EDR). It’s like having a competent security guard for your house. This guard not only looks for the bad guys that the antivirus knows but also keeps an eye out for any strange or suspicious activity. The guard can also investigate unfamiliar situations and take action to protect your house, even if the threat is something new.
So, while an antivirus is good at stopping known threats, an EDR is much better at handling new and unexpected threats to keep your computer safe. The trade-off is that EDR requires a deeper level of access. EDR requires rapid updates to keep up with rapidly emerging threats. Unlike other software updates, these can’t be rolled out in stages.
What Happened?
Based on what we know from CrowdStrike’s preliminary post-incident review, on July 19, a routine software update from CrowdStrike caused significant disruption for many businesses worldwide. Early that morning, CrowdStrike released an update to their Falcon Sensor program. This update was intended to improve security by targeting specific cyberattack tools. However, the update contained a coding mistake known as a “logic error.” This mistake caused Windows computers running Falcon Sensor to crash, leading to the infamous “Blue Screen of Death” (BSOD). The impact was immediate and widespread.
Many businesses found their Windows computers completely unusable, resulting in significant disruption. Airports experienced chaos as their systems failed, grocery store checkouts malfunctioned, and journalists faced difficulties reporting on the issue due to crashing equipment. The problem affected millions of devices globally. People reported that their computers went into a reboot loop, making it impossible to use them. CrowdStrike responded quickly. They began working on a fix within an hour of identifying the issue. By 5:27 am UTC, they released an update to correct the faulty configuration files.
However, the recovery process varied. For many, the issue could be resolved remotely by deleting the problematic file if the system was online. For those with offline systems, manual file deletion was necessary, which often required help from IT support.
What Was the Impact on Businesses?
The CrowdStrike outage impacted businesses across various sectors and could be considered one of history’s most significant IT disasters. According to Reuters, the Fortune 500 companies are expected to experience over $5.4 billion in financial losses.
- Airports and airlines: The outage led to significant disruptions at airports. Systems that manage flight schedules, ticketing, and customer service were hit, causing confusion and delays. Passengers experienced long lines and delays as airport staff struggled to cope without their usual digital tools.
- Banks and financial services: The financial sector also felt the impact, with banks experiencing system outages that affected transactions and customer service. Online banking services were disrupted, making it difficult for customers to access their accounts or perform financial transactions.
- General business operations: Businesses that relied on Windows systems experienced productivity losses. Employees could not access important files, communicate effectively, or perform tasks. Many companies found it difficult to provide customer support as their systems were down. Call centers and online help desks faced increased queries and complaints, further straining resources.
- Grocery stores and retail: Many grocery store checkouts malfunctioned, making it impossible to process sales. This frustrated customers and lost sales as stores struggled to operate without point-of-sale systems. Some retailers had to close temporarily until their systems were restored.
- Healthcare: While not as widely reported, some healthcare institutions using affected systems faced delays in accessing patient records, appointment scheduling systems, and other critical operations, impacting patient care.
- Media and journalism: Journalists and media companies faced significant challenges as their computers crashed, leaving them without the essential tools needed to report on the incident. This disrupted news coverage and the ability to provide timely updates to the public.
Overall, the CrowdStrike outage demonstrated how critical and reliable cyber security tools are for business continuity. It highlighted how interconnected modern business operations are and the widespread impact that a single software issue can have. Businesses are now likely to review their contingency plans and IT support readiness to handle similar incidents better.
Business Continuity and Disaster Recovery Plans Are Imperative
The most important lesson from this incident is always having a backup plan. It is crucial to have a business continuity and disaster recovery plan in place. It would be best to haveΒ a business continuity andΒ disaster recovery plans. If you do not have these, you are jeopardizing your whole business. If a ransomware attack occurs, your cloud provider goes down, your data is suddenly deleted, or something else entirely, you need a plan with data backups, redundancy, and a written plan of action (not just stored on your computer — the one that won’t work in an outage). Whether you are a multinational business (Delta Air Lines still has problems) or a cash-n-carry deli, your employees must understand how to shift from digital to analog seamlessly. It doesn’t have to be complex and may be as simple as an analog plan and a cash box. Decide how you will operate if all systems are down.
Many businesses are now reviewing their disaster recovery plans and business continuity software. They want to be sure they have clear procedures to help mitigate the impact of future disruptions. At AIS Network, we support dozens of businessesβnot just in Virginia but all over the countryβto stay safe from cyber threats while helping their teams stay productive through excellent IT planning and support. Business continuity and disaster recovery planning are essential strategies that could have significantly mitigated the impact of the CrowdStrike outage. Here’s a brief glimpse of how these plans could have helped:
Business Continuity Planning (BCP)
- Proactive Risk Assessment
- Identification of Critical Systems: BCP involves identifying systems critical to business operations. Knowing the potential risks associated with software updates can lead to more cautious and phased deployment strategies.
- Vendor Risk Management: Regular assessment of vendors like CrowdStrike for potential risks and contingency plans.
- Preparation and Prevention
- Backup Systems: Ensuring all data is backed up regularly and securely to restore data quickly during a crash.
- Alternative Workflows: Develop workflows that allow essential business operations to continue even if primary systems fail.
- Communication Plans
- Stakeholder Communication: Predefined communication strategies to inform employees, customers, and partners about the issue and expected resolution times.
- Training and Awareness: Regular training for employees on what steps to take during a system outage, minimizing panic and ensuring swift action.
Disaster Recovery Planning (DRP)
- Data Recovery
- Regular Backups: Ensuring that backups are up-to-date and accessible, allowing for data restoration without significant loss.
- Offsite Backups: Storing backups offsite or in the cloud to avoid data loss from localized issues.
- System Redundancy
- Failover Systems: Implementing failover systems that can take over if the primary system fails, ensuring continuity of service.
- Redundant Infrastructure: Maintaining redundant infrastructure, such as servers and networks, to switch over if the central systems are compromised.
- Recovery Procedures
- Detailed Recovery Plan: A step-by-step recovery plan that outlines procedures for restoring systems, applications, and data.
- Regular Drills: Conduct disaster recovery drills to ensure the plan is effective and all stakeholders know their roles.
Specific Steps During the CrowdStrike Outage
- Immediate Response
- Activate BCP/DRP: Quickly activate business continuity and disaster recovery plans.
- Communicate: Inform all stakeholders of the issue and steps being taken.
- Contain and Mitigate
- Isolate-Affected Systems: Isolate-affected systems to prevent the issue from spreading.
- Switch to Backup Systems: Activate backup and failover procedures to maintain business operations.
- Recovery and Restoration
- Restore Data: Use backups to restore lost or corrupted data.
- System Check: Conduct thorough checks to ensure systems are fully operational before resuming normal activities.
Long-term BCP/DRP Strategies
- Regular Review and Update of BCP/DRP
- Continuous Improvement: Regularly review and update business continuity and disaster recovery plans to address new threats and vulnerabilities.
- Technology Upgrades: Keep systems and software updated with the latest security patches and improvements.
- Collaboration With Vendors
- Regular Audits: Conduct audits and assessments of vendor reliability and business continuity plans.
- Joint Drills: Participate in disaster recovery drills with critical vendors to ensure coordinated response efforts.
Conclusion
In summary, robust business continuity and disaster recovery plans would have helped businesses respond quickly to the CrowdStrike outage. This would have minimized downtime and data loss, ensuring continued operations through well-defined procedures and backup systems. If you want help with that, AIS Network can assist. Why not start today? Don’t wait for the following IT outage. Ask us to review your current operations or plan a strategy to protect your business. Contact us today.