Postmortem Library

Neon outage May 2025: kubernetes IP exhaustion disrupted services

Neon experienced outages caused by Kubernetes IP exhaustion, impacting service availability. Explore what went wrong, Neon's response, key actions taken, and lessons learned to improve reliability.

Company

Neon is a fully managed serverless PostgreSQL provider optimised for cloud-native workloads. It decouples storage and compute, offering rapid autoscaling, branching and pay-per-use efficiency. Developers use Neon for production and development databases, benefiting from quick startup, dynamic environments, and cost-effective scaling.

What happened during the Neon outage on May 16 and 19, 2025?

On May 16 and May 19, 2025, Neon experienced two outages totalling 5.5 hours in the AWS us-east-1 region. Customers were unable to start or create inactive databases, though active databases remained unaffected. The incidents resulted from exhausted IP addresses in Kubernetes subnets, triggered by control plane overload and AWS CNI misconfigurations.

Immediate mitigations included reconfiguring IP allocation parameters and scaling prewarmed compute pools. Neon is restructuring its architecture to avoid future occurrences.

Timeline

When did the Neon incident start?

The first incident began at 14:13 UTC on May 16, 2025, when customers started experiencing failures to activate databases. The second incident occurred on May 19, 2025, at 13:17 UTC, triggered by reverting the previous fixes.

How was the Neon incident detected and escalated?

Internal alerts detected service disruptions within minutes during both incidents, with initial triage rapidly identifying IP exhaustion issues. Escalation to a high-severity incident occurred promptly as the scope of the impact became clear.

When was the Neon outage resolved?

On May 16, mitigations began within 2 hours, fully resolving after 3.5 hours at approximately 17:43 UTC.


On May 19, mitigation started within 90 minutes, achieving full resolution after approximately 4 hours at 17:10 UTC.

May 16 MTTD: ~2 minutes

May 16 MTTR: ~3.5 hours

May 19 MTTD: ~1 minute

May 19 MTTR: ~4 hours

Who was affected by the Neon outage, and how bad was it?

Customers using Neon databases with scale-to-zero configurations in AWS us-east-1 were directly impacted. Users couldn't activate or create new inactive databases, disrupting development workflows and CI/CD processes.

Most severely affected:

  • Databases with autosuspend configurations (unable to restart)
  • Database creation workflows

Partial failures:

  • Proxy rate-limit errors during mitigation steps ("Rate Limit Exceeded")
  • Active databases and customers outside AWS us-east-1 remained unaffected.

How did Neon communicate during the outage?

Neon maintained steady, transparent communication throughout the outage, primarily via brief public updates acknowledging the incidents and promising detailed follow-up postmortems.

While internal mitigation progressed quickly, external messaging lacked real-time granularity, particularly during the regression event on May 19. Customers who depended on detailed updates experienced uncertainty around resolution timelines.

What patterns did the Neon outage reveal?

The outage revealed recurring risks in scaled infrastructure systems:

  • IP exhaustion acts as a hidden infrastructure bottleneck.
  • Configuration regressions were introduced during incident remediation.
  • Kubernetes clusters exceeding the designed pod limits under dynamic load conditions.

Quick summary

On May 16 and May 19, 2025, Neon faced two outages totalling 5.5 hours due to IP exhaustion in Kubernetes subnets in AWS us-east-1. Users were unable to activate databases with autoscaling configurations. Neon responded with rapid mitigations and transparent, though brief, communication. The incidents underscored the importance of robust infrastructure safeguards, effective configuration management, and clear, timely updates during critical incidents.

Find more Postmortems:
Ready to elevate your incident management?
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Our Cookie Policy
We use cookies to improve your experience, analyze site traffic and for marketing. Learn more in our Privacy Policy.
Open Preferences
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.