Major Dashboard Outage
Incident Report for Amplitude
Postmortem

Major Dashboard Outage Postmortem


On June 19th, 2018, a DB component of our web reporting system got overloaded for about 27 minutes causing the web reporting system to behave abnormally with intermittent access. We apologize for any disruptions this may have caused. In all cases, there was no data loss.

WHAT HAPPENED?

At 11:11 PDT Monday, June 19th a DB component of our web reporting system was overloaded by our notification feature. After figuring out the issue, We decided to turn the notification feature off until we completely fix it. We pushed the fix later and our systems fully recovered by 20:51 PDT Monday, June 19th.

WHAT ARE THE RAMIFICATIONS?

During the period that web reporting system was affected, you were not able to successfully use Amplitude web reporting functionalities. After we turned off the notification feature, you could use all web reporting functionalities except that unable to receive any notifications until we fully resolved the issue and turned the notification feature back on around 20:51 PDT Monday, June 19th.

NEXT STEPS

We have already optimized the notification feature to reduce its load on the DB by a few orders of magnitude. In addition, we have also improved throttling heuristics on other high DB load features. We are confident that similar issues will be unlikely to happen again. We understand that our customers rely on us to find the insight and make decisions, we apologize for the incident and any complications it has caused. We will continue to improve the availability of our web reporting service.

Thank you for your support.

Posted Jun 28, 2018 - 10:47 PDT

Resolved
We have rolled out fixes and improvements to the notifications feature and is now re-enabled.
Posted Jun 19, 2018 - 20:51 PDT
Monitoring
We have disabled the notifications feature as a temporary fix. We are working on a long term fix and expect to roll it out by 9 pm PDT. All other features are working as expected.
Posted Jun 19, 2018 - 12:11 PDT
Identified
One of our critical databases was overloaded. Our notifications feature was used at a significantly larger scale than we had anticipated. We are in the process of temporarily disabling notifications while we work on the long term fix.
Posted Jun 19, 2018 - 11:45 PDT
Investigating
Dashboards are temporarily down. We are investigating the issue at a high priority.
Posted Jun 19, 2018 - 11:28 PDT
This incident affected: Analytics (Web Reporting).