Partial Event Collection Outage
Incident Report for Amplitude
Resolved
We created a new endpoint, api2.amplitude.com, at 15:30pm UTC May 30, with a new certificate chain and worked with some of our integration partners to resolve the issue for mutual customers. To address other legacy clients which are not validating certificate chains correctly and can’t be updated easily to talk to a new endpoint, we updated the certificate chain on api.amplitude.com at 22:15pm UTC Jun 2.

Issue
At around 10:50am UTC May 30, Sectigo's legacy AddTrust External CA Root certificate expired which impacted many companies across the internet. Amplitude, some of our customers, and our integration partners were impacted by this expiration. Amplitude’s certificate chain is still valid. The expired root certificate has caused data collection outage with some non-compliant & legacy clients that do not correctly verify certificate chains. Note that most clients are not affected. Details of affected devices and details of the issue can be found here: https://calnetweb.berkeley.edu/calnet-technologists/incommon-sectigo-certificate-service/addtrust-external-root-expiration-may-2020.


Impact
Starting from 10:50am UTC, some clients began failing to send data to Amplitude. You are likely completely unaffected or affected to a large degree depending on your client. This is an on-going issue so please use the Ingestion Debugger and Event Segmentation charts to verify data is being collected If data is not being collected there will be a noticeable, unexpected dip in data for May 30 starting at around 10:50am UTC.


Resolution:
* We created a new endpoint, api2.amplitude.com, at around 15:30pm UTC May 30 and worked with partners and customers to resolve the issue for most of the customers. At this point we couldn’t replace the certificate on api.amplitude.com because we support SSL pinning and that needed further testing.
* After testing with SSL pinning on different clients, we were able to construct a chain that works for all our existing clients (with the exception of the iOS SDK with SSL pinning) and updated the certificate chain on api.amplitude.com at 22:15pm UTC Jun 2. Please reach out to support if you are still having data collection issues.
* Note that if you use the iOS SDK with SSL pinning, you must upgrade to version 5.2.0: https://github.com/amplitude/Amplitude-iOS/releases/tag/v5.2.0. Previous versions of the SDK break regardless of the certificate our endpoint serves.
Posted May 30, 2020 - 22:02 PDT
Update
We've created a new subdomain api2.amplitude.com which will work for clients that do not correctly handle the certificate chain provided by api.amplitude.com Both domains will continue to work. Clients may use the domain that is compatible with the ssl/tls library they are using to integrate with Amplitude.
Posted May 30, 2020 - 15:46 PDT
Update
We're still working to assess the impact of the issue. If your integration with Amplitude relies on openssl, we suggest upgrading to openssl 1.1.1 or greater.
Posted May 30, 2020 - 09:18 PDT
Identified
Our outage is due to Sectigo's legacy AddTrust External CA Root certificate which expires on May 30, 2020. Not all clients are affected. Details of affected clients and details of the issue can be found here: https://calnetweb.berkeley.edu/calnet-technologists/incommon-sectigo-certificate-service/addtrust-external-root-expiration-may-2020
Posted May 30, 2020 - 06:04 PDT
Investigating
Our servers are partially unavailable due to an issue with our SSL certificate. We believe about 10% of traffic to our /batch ingestion endpoint is affected. This issue began around 3:50am Pacific Time.

We are investigating the issue and will post an update after 15 minutes or earlier if we identify the issue. We recommend retrying failed events until you receive a 200 response from Amplitude. Amplitude SDKs already take care of the retry logic.
Posted May 30, 2020 - 04:32 PDT
This incident affected: Data Collection.