[Global] Users are unable to log in to Admin Panel
Incident Report for CodeTwo
Postmortem

Below you will find the Post Incident Review (PIR/RCA), provided by Microsoft, regarding this incident. In short, Microsoft claims they failed to properly scale out their services and a high surge of traffic in a couple of regions lead to their servers run out of resources that caused Azure AD B2C availability issues and ultimately overloaded the entire service. The Azure AD B2C service is used by CodeTwo to handle some of our sign in options for our Customers, therefore our Customers were unable to sign in to CodeTwo Admin Panel during this incident.

MICROSOFT’S POST INCIDENT REVIEW (Azure Portal ID: QTRM-RPZ):

What happened?

A platform issue caused some of our Azure Active Directory B2C customers in Europe to experience elevated failure rates for end users accessing B2C applications between 13:40 UTC on 5 May 2024 and 00:27 UTC on 6 May 2024. This issue impacted approximately 70% of Azure AD B2C customers in Europe, including one or more of your Azure subscriptions.

What went wrong and why?

The incident was caused by an unusually sharp spike in requests to Azure Active Directory B2C service in North Europe and West Europe regions. We’re choosing not to characterize the high surge of traffic at this time. While the service was scaled out to handle the increase in load, this increase in traffic was much higher than the service could handle. As a result of the traffic spike, server machines started running out of resources to handle requests leading to an increase in failed requests.

How did we respond?

At 13:44 UTC on 5 May 2024, Azure AD B2C monitoring detected a decrease in availability, and elevated failure rates for end users in North Europe and West Europe regions, so we began an investigation.

While efforts to understand and remediate the issue began immediately, a combination of elevated traffic, conservative timeout limits, and excessive retries caused the system to become overloaded - leading to resource exhaustion and request failures. Due to the nature of the spike, the failures were happening in both the North Europe and West Europe regions where Azure Active Directory B2C is deployed within the Europe geography - and our mitigation option of failing over to other regions was not available to us.

Since the failures were happening across multiple regions, we investigated if there was an incident in the services on which we are dependent. Once we confirmed there was not, we identified the cause of the resource exhaustion as a sudden spike in incoming traffic. Once this was understood, we deployed configuration changes to manage the pace of incoming traffic better, and to remove problematic requests more accurately. As a result of these configuration changes, we saw the resource utilization come down and the service start to recover. This configuration change led to partial recovery. Our team deployed an additional set of configuration changes to further reduce resource utilization.

Once this next set of configuration changes was deployed, service recovered to full health as request queues on our infrastructure were cleared. All customer impact was confirmed as fully mitigated by 00:27 UTC on 6 May 2024.

How are we making incidents like this less likely or less impactful?

  • Our Azure AD B2C team has rolled out configuration changes to throttle the types of traffic patterns that caused this incident, and to improve timeouts and retry logic, in order to preserve service health for other customers. (Completed)
  • Our Azure AD B2C team is improving automation that identifies customers with elevated end user error rates, so we can notify all potentially impacted customers of issues like this more quickly in future. (Estimated completion: May 2024)
  • Our Azure AD B2C team is working on systematically reviewing failure modes throughout our system, and implementing changes to improve efficiency of managing resources under load. (Estimated completion: June 2024)
  • Our Azure AD B2C team is working on a plan to improve the mitigation process and reduce the time taken to mitigate similar issues in the future. (Estimated completion: June 2024)
Posted May 13, 2024 - 05:01 UTC

Resolved
This incident has been resolved.
Posted May 05, 2024 - 15:33 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted May 05, 2024 - 15:17 UTC
Update
Microsoft is aware of the problem on their end and is currently investigating it.

If you want to manage signature or autoresponder rules, just log in to CodeTwo from this website: https://app.codetwo.com. Apart from Admin Panel (https://login.codetwo.com), all other services are operational. Signatures are added as normal.
Posted May 05, 2024 - 15:12 UTC
Investigating
We are investigating an issue with CodeTwo Admin Panel logon issues for some users. It looks like Microsoft Azure Active Directory B2C services are down, which means users are unable to log in to CodeTwo Admin Panel. Logging in to CodeTwo Manage Signatures App is available, so if you want to manage signature or autoresponder rules, just log in to CodeTwo from this website: https://app.codetwo.com. All other services are operational. Signatures are added as normal.
Posted May 05, 2024 - 14:32 UTC
This incident affected: Australia East (Mail flow, Signature adding (rules processing), Signature management (app.codetwo.com), Sent Items Update, Admin Panel (emailsignatures365.codetwo.com), Azure AD data caching, Onboarding, Outlook add-ins, Autoresponder management (app.codetwo.com), Autoresponder sending (rules processing), Attributes manager (attributes.codetwo.com), My user info editor (user.codetwo.com), One-click surveys (ratings counting), CodeTwo Insights (insights.codetwo.com)), Canada East (Mail flow, Signature adding (rules processing), Signature management (app.codetwo.com), Sent Items Update, Admin Panel (emailsignatures365.codetwo.com), Azure AD data caching, Onboarding, Outlook add-ins, Autoresponder management (app.codetwo.com), Autoresponder sending (rules processing), Attributes manager (attributes.codetwo.com), My user info editor (user.codetwo.com), One-click surveys (ratings counting), CodeTwo Insights (insights.codetwo.com)), North Central US (Mail flow, Signature adding (rules processing), Signature management (app.codetwo.com), Sent Items Update, Admin Panel (emailsignatures365.codetwo.com), Azure AD data caching, Onboarding, Outlook add-ins, Autoresponder management (app.codetwo.com), Autoresponder sending (rules processing), Attributes manager (attributes.codetwo.com), My user info editor (user.codetwo.com), One-click surveys (ratings counting), CodeTwo Insights (insights.codetwo.com)), North Europe (Mail flow, Signature adding (rules processing), Signature management (app.codetwo.com), Sent Items Update, Admin Panel (emailsignatures365.codetwo.com), Azure AD data caching, Onboarding, Outlook add-ins, Autoresponder management (app.codetwo.com), Autoresponder sending (rules processing), Attributes manager (attributes.codetwo.com), My user info editor (user.codetwo.com), One-click surveys (ratings counting), CodeTwo Insights (insights.codetwo.com)), West Europe (Mail flow, Signature adding (rules processing), Signature management (app.codetwo.com), Sent Items Update, Admin Panel (emailsignatures365.codetwo.com), Azure AD data caching, Onboarding, Outlook add-ins, Autoresponder management (app.codetwo.com), Autoresponder sending (rules processing), Attributes manager (attributes.codetwo.com), My user info editor (user.codetwo.com), One-click surveys (ratings counting), CodeTwo Insights (insights.codetwo.com)), West US (Mail flow, Signature adding (rules processing), Signature management (app.codetwo.com), Sent Items Update, Admin Panel (emailsignatures365.codetwo.com), Azure AD data caching, Onboarding, Outlook add-ins, Autoresponder management (app.codetwo.com), Autoresponder sending (rules processing), Attributes manager (attributes.codetwo.com), My user info editor (user.codetwo.com), One-click surveys (ratings counting), CodeTwo Insights (insights.codetwo.com)), UK South (Mail flow, Signature adding (rules processing), Signature management (app.codetwo.com), Sent Items Update, Admin Panel (emailsignatures365.codetwo.com), Azure AD data caching, Onboarding, Outlook add-ins, Autoresponder management (app.codetwo.com), Autoresponder sending (rules processing), Attributes manager (attributes.codetwo.com), My user info editor (user.codetwo.com), One-click surveys (ratings counting), CodeTwo Insights (insights.codetwo.com)), Germany West Central (Mail flow, Signature adding (rules processing), Signature management (app.codetwo.com), Sent Items Update, Admin Panel (emailsignatures365.codetwo.com), Azure AD data caching, Onboarding, Outlook add-ins, Autoresponder management (app.codetwo.com), Autoresponder sending (rules processing), Attributes manager (attributes.codetwo.com), My user info editor (user.codetwo.com), One-click surveys (ratings counting), CodeTwo Insights (insights.codetwo.com)), and UAE North (Mail flow, Signature adding (rules processing), Signature management (app.codetwo.com), Sent Items Update, Admin Panel (emailsignatures365.codetwo.com), Azure AD data caching, Onboarding, Outlook add-ins, Autoresponder management (app.codetwo.com), Autoresponder sending (rules processing), Attributes manager (attributes.codetwo.com), My user info editor (user.codetwo.com), One-click surveys (ratings counting), CodeTwo Insights (insights.codetwo.com)).