Biggest yet, but not the last: Lessons from a global IT outage
What happened?
On Friday 19 July, organisations around the world across financial services, aviation, health, government, IT, education, retail, entertainment, media and other sectors experienced a major IT outage. The outage was the result of a software update to Windows systems for CrowdStrike’s Endpoint Detection and Response (EDR) platform known as Falcon Sensor.
Globally, 8.5 million devices running Windows experienced a critical error. We understand several thousand Australian and New Zealand organisations were affected, while many individuals experienced some form of disruption associated with this outage. While recovery efforts are ongoing at time of writing, the crisis phase of this outage is now over, and many organisations will start Post Incident Reviews (PIRs).
Why now?
We have been impacted by global IT outages before, however, the scale and impact of this outage was unprecedented. The extensive impact reflects the growing interconnectivity of our IT infrastructure. It also reflects the market share of CrowdStrike, a global leader in EDR, and Microsoft, one of three dominant operating systems.
How could this impact me and my organisation?
This outage was caused by an accident. Future mistakes by technology vendors are inevitable. We have also seen malicious actors deliberately target commonly used software previously.
Earlier this month, the Australian government attributed widespread cyber espionage to China’s Ministry of State Security. The attribution noted that Chinese government-backed hackers have rapidly exploited flaws in popular software, such as Atlassian Confluence and Microsoft Exchange. In late 2021, a critical vulnerability in log4J, one of the most ubiquitous digital tools, let malicious actors start seizing control of affected devices. Log4J is in so many different pieces of software, that many organisations didn’t know they were exposed.
What should I do?
Since Friday, CyberCX has fielded many questions from customers, partners, media and stakeholders. The most burning questions have been practical in nature. As we transition to PIR mode, the questions that will burn slower, but longer, are also beginning to emerge. To assist executives and Audit and Risk Committees grappling with these questions, here are some signposts for you to follow.
How can we be better prepared for the next outage?
Recommended by LinkedIn
How do we build resilience against future, inevitable outages?
How will we bounce back better, when major incidents do happen?
Security starts in the c-suite. Executives are high-value targets. Well-connected, they’re gateways to their organisation, sensitive information and professional network. High-profile, they’re easy to find. Trusted and influential, their brand is readily exploited. C-Suite Cyber helps business leaders master their cyber risk.
About CyberCX Intelligence
CyberCX Intelligence is a uniquely Australia and New Zealand focused capability. We have the information, access and context to give executives a decision advantage – whether that’s minimising their personal risk or leading their organisation’s risk posture.
Want more? Contact cyberintel@cybercx.com.au to explore how you could partner with cyber intelligence experts who speak your business language and know your sector. You can also subscribe to Cyber Adviser, our bite-sized monthly intelligence newsletter.
Cyber Security Analyst | CompTIA, Security+ Certified | Network Security | Cloud Security
4moGreat post! The insights on the global IT outage are eye-opening. Quick question: What practical steps can organizations take to strengthen their business continuity plans for handling massive IT outages like this one?
Business strategist | Program Manager | Specialist: RetrieveProjects | Fiber Networks,Towers|Specialist Defence Business | Mentor MSMEs for business growth | Mentor and advise professionals on career growth & transition
5mo1. Dont put all your eggs in the same basket -use mix of / windows servers 2. All updates/upgrades to be tested thoroughly and implementation should start from non critical , non service impacting systems, once validated on these, only then global service impacting systems should be upgraded. 3. Last state restoration points for recovery to state prior to update.
A proactive independent risk assessment goes a long way to identify key gaps. Agreed, it's always best to be prepared.
Information Security and Risk Management Leader | Board-Level Advisor on Information Security, Governance, Risk, Compliance and Privacy | CISO | CISM | CDPSE | CISA | Shaping Secure and Resilient Enterprises Globally
5moGreat and succinct points CyberCX