A widespread Microsoft Azure outage, coupled with a faulty CrowdStrike update, has caused major disruptions globally. The simultaneous issues have impacted airlines, trains, banks, and various services.
Widespread Disruptions
The cloud company reported that the Central US region is back online after over five hours of downtime. However, services may take longer to resume normal operations. According to a status update, “Mitigation has been confirmed for all Azure Storage clusters, the majority of services are now recovered. A small subset of services is still experiencing residual impact. Impacted customers will be continuing to communicate through the Azure service health portal.”
CrowdStrike’s Blue Screen of Death
CrowdStrike’s update inadvertently caused blue screen of death errors on Windows PCs. The overlap with the Azure outage has made it challenging to pinpoint the exact cause of various service disruptions. CrowdStrike has rolled back the update, but the damage had already been done for many users. “Our entire company is offline,” one IT admin said on Reddit, blaming the CrowdStrike update.
A faulty update from cybersecurity provider CrowdStrike is knocking affected PCs with Windows, forcing them into a BSOD (blue screen of death).
These are apparently the workaround steps. pic.twitter.com/ABbtRZQyON
— Massimo (@Rainmaker1973) July 19, 2024
Impact on Airlines
Several airlines, including Frontier, American, and United Airlines, were forced to ground flights. Melbourne airport reported global technology issues affecting check-in procedures. Melbourne airport told customers it was “experiencing a global technology issue which is impacting check-in procedures for some airlines.”
This is MASSIVE.
Delta, United & American Airlines have grounded their flights due to a communication issue, according to the FAA.
A “major Microsoft technical outage” is impacting Azure cloud computing and Microsoft 365 applications.
The timing is remarkable. pic.twitter.com/UNfdOmH2sV
— Kyle Becker (@kylenabecker) July 19, 2024
Train Services Halted
UK’s Thameslink Railway brands—Southern, Thameslink, Gatwick Express, and Great Northern—experienced significant disruptions due to the outages. The timing of the two seemingly unrelated outages has made it hard to ascertain which one has led to the following service disruptions, or if it is a combination of both.
Microsoft Services Down
Microsoft services such as Xbox Live and Microsoft Teams were down for several hours, affecting countless users worldwide. The disruption highlighted the dependency on cloud services for essential communications and entertainment.
Banking and Retail Impact
Australian banking apps and supermarket systems faced significant outages, disrupting transactions and operations. This incident underscores the vulnerabilities in the financial sector’s reliance on cloud-based services.
Telecom and Media Affected
Telecom firm Telstra reported disruptions due to both Microsoft and CrowdStrike outages. Sky News in the UK and Australia was unable to broadcast live during the incident. This event emphasizes the critical role of IT infrastructure in media and telecommunications.
IT Admins Struggle
IT admins worldwide shared their frustrations on platforms like Reddit, with one admin stating their entire company was offline due to the CrowdStrike update. Although CrowdStrike has rolled back the update, affected systems remain problematic. The simultaneous issues have amplified the challenges faced by IT professionals in managing and recovering from such incidents.
"Put everything in the cloud," they said.
Global @Microsoft @Azure DevOps #outage pic.twitter.com/ge9So5M8Ir
— Ronny Fritz (@RonnyFritz) July 19, 2024
Microsoft’s Explanation
In a status update, Microsoft explained that a configuration change in a backend cluster management workflow caused the issue. “A backend cluster management workflow deployed a configuration change causing backend access to be blocked between a subset of Azure Storage clusters and compute resources in the Central US region. This resulted in the compute resources automatically restarting when connectivity was lost to virtual disks.” While most services have recovered, a small subset continues to experience residual impacts.
How Prepared is Your Organization for Simultaneous IT Outages?
This incident serves as a stark reminder of the importance of robust IT disaster recovery plans. How prepared is your organization for simultaneous IT outages? Share your thoughts and experiences in the comments below.
Check back for regular updates to this outage.