iOS SDK Crashes

Incident Report for Amplitude

Postmortem

In an effort to live out Amplitude’s commitment to transparency, accountability, and improvement, we're sharing the following FAQ regarding a recent incident that impacted our iOS Session Replay customers. Customer trust is our top priority, and in this instance, we did not meet the high standards you rightfully expect from us. Below, you'll find a detailed explanation of what went wrong, what we learned, and the concrete steps we're taking to ensure this doesn't happen again.

What Happened?

Starting at 10:27am PT on June 16, 2025, all iOS apps using Amplitude-Swift v1.13.0+ and AmplitudeSessionReplay-iOS v0.0.7+ SDKs experienced crashes on startup. These crashes not only interrupted end-user experiences but also stopped Amplitude data collection from these devices during the incident window. The incident lasted 1 hour and 25 minutes and was resolved via a code revert at 11:52am PT. Customers did not have to take any action.

The root cause was a remote configuration change that introduced optional null values in the API response. On iOS, JSON parsing is schema-less and bridged from Objective-C. Null values in JSON are represented as NSNull objects, a special marker in Objective-C collections.

Our parsing logic correctly attempted to cast to expected types and treated NSNull as nil. However, the results were then cached using UserDefaults, which cannot store NSNull objects. This triggered a runtime exception, ultimately causing our customers’ apps to crash.

Unfortunately, this bug slipped through internal testing due to gaps in end-to-end test coverage for our mobile SDKs, which would have caught issues introduced by remote dependencies.

How do I know if I was impacted?

This issue only affected iOS Session Replay clients. So if you have implemented Amplitude on your iOS app and were using Amplitude-Swift v1.1.3.0+ and AmplitudeSessionReplay-iOS v0.0.7+ SDKs during the incident window (June 16, 2025 10:27am - 11:52am PT) you were impacted.

What actions am I supposed to take?

Impacted customers do not need to take any action. This was resolved via a code revert on June 16th.

What Did the Amplitude Team Learn from this Incident?

This incident revealed several blind spots in our development and release processes:

  • Test Coverage Gaps While we had relatively robust unit tests, we lacked end-to-end testing that could simulate real-world usage of our SDKs and their interaction with remote services.
  • Insufficient Canarying We prioritized velocity over rigor and did not implement staged rollouts for SDK-adjacent changes (i.e. remote dependencies). This led to a widespread impact before we had the opportunity to catch early warning signals.
  • Alerting Delays Our internal telemetry and monitoring was insufficient and failed to surface crash spikes quickly enough, delaying our response window.

These gaps are serious, and addressing them has been a top priority for our team.

What Have You Done to Prevent Something Like this From Happening Again?

Trust is earned every day, and we know we have work to do. We have implemented significant changes to ensure an incident like this does not happen again:

Defensive Coding

  • Added stronger null-handling in our mobile SDKs.
  • Added stricter validation in our remote configuration service to enforce API contracts.

Expanded Test Coverage

  • Increased unit test coverage for both mobile SDKs and remote configuration services.
  • In Progress - Building cross-team end-to-end tests that validate SDK behavior against all remote dependencies.

Process Improvements

  • Introduced feature flags for controlled rollout of all SDK-adjacent changes.
  • Introduced cross-team notifications when SDK-adjacent changes are being submitted.
  • Next: Integrating crash reporting into our SDKs to gain real-time visibility into production incidents.

If I have questions, who do I contact?

Please fill out a request form on support.amplitude.com using the subject line "Question about Session Replay SDK Incident” or if you would like to talk to the engineering team directly you can email us at SessionReplayTeam@amplitude.com. We are immensely thankful for customers who report issues quickly, work with us under pressure, and push us to be better. Your feedback matters deeply.

Posted Aug 08, 2025 - 09:17 PDT

Resolved

Incident Summary:
Between 17:27 UTC and 18:52 UTC, some applications using Amplitude’s Swift SDKs (v1.13.0+) and Session Replay iOS SDKs (v0.0.7+) may have experienced crashes. The root cause was an error when deserializing an invalid API response.

Resolution:
We identified and resolved the issue by deploying a backend fix. No changes or updates are required to the SDKs for customers.

Next Steps:
We sincerely apologize for the disruption. Our engineering teams are implementing additional safeguards to prevent similar issues in the future.
Posted Jun 16, 2025 - 10:30 PDT