Email delivery delays that affected a subset of users in North Europe were caused by power issues in the Microsoft datacenter in Dublin. Even though the servers where CodeTwo services are hosted were not directly impacted, most of our clusters were side-impacted by the outage, as the connectivity to all the clusters in this region was experiencing stress or downtime. Our high availibility secondary services in this region were partially impacted as well, which led to email processing delays for some tenants and created bottlenecks in the mail transport pipeline before our failover systems hosted on unaffected nodes kicked in to mitigate the problem for affected users.
Our failover services mitigated the problem completely within minutes. When the entire datacenter was fully operational, we switched back to primary services.
For more information, please read the RCA provided by Microsoft:
Between 15:40 and 16:20 UTC on 23 Mar 2020, a subset of customers North Europe may have seen errors connecting to resources hosted in this region.
During an electrical switching procedure that was being performed on a construction site that shares utility power with one of our operational datacenters, an incorrect process was followed. Due to this improper switching, a large voltage sag was seen by our operational datacenter. While there was no loss of power to server racks, the event led to a subset of servers within a single storage scale unit to experience a reboot event. The rebooting of the various servers led to some of the region’s Storage subscriptions and their associated Azure services to be unreachable while the systems recovered.
As this was a transient power sag event, the Storage servers were allowed to automatically recover.
We sincerely apologize for the impact to affected customers. We are continuously taking steps to improve the Microsoft Azure Platform and our processes to help ensure such incidents do not occur in the future. In this case, this includes (but is not limited to):
We apologize for any inconvenience this may have caused.