Postmortem Library

GitHub: Enterprise Importer migrations stalled for more than 5 hours

GitHub Enterprise Importer migrations were stalled for 5h34m following an infrastructure change that introduced new egress IP ranges. Allowlisting was not in place, causing jobs to queue and not progress. Service recovered after the ranges were added.

Company & product

GitHub operates the world’s largest developer platform. The affected product, GitHub Enterprise Importer (GEI), is GitHub’s high‑fidelity migration service used to move repositories, orgs, and related collaboration history (PRs, reviews, comments) from sources like GitHub Enterprise Server, Bitbucket, and Azure DevOps to GitHub Enterprise Cloud.

What happened?

On July 28, 2025 at 21:41 UTC, GEI entered a degraded state in which migrations could not be processed. GitHub’s investigation found that a component of the GEI infrastructure was improperly taken out of service during routine internal improvements and could not be restored to its previous configuration, requiring the provisioning of new resources to recover. As part of the fix, GEI moved to new IP ranges that customers must allow if IP allow lists are enabled.

Timeline

  • Start: Mon, July 28, 2025 21:41 UTC — GEI in a degraded state; migrations stalled.
  • End / Recovery: Tue, July 29, 2025 03:15 UTC — Service restored after new resources were provisioned. TTR: 5h34m.
  • Detection and escalation: The report provides the incident window but does not state time-to-detect or the declaration timestamp. 

Who was affected

Enterprises and organizations running migrations via GEI during the window; follow-up action is required for any customer that enforces IP allow lists.

How GitHub responded

GitHub investigated promptly, traced the issue to an infra component removed during routine changes, and restored service by provisioning new infrastructure resources. Post-incident, GitHub implemented improvements for infra recovery enablement, unit testing, and better validation using test data prior to changes. 

They also announced new GEI IP ranges and advised customers to update allow lists across GitHub organizations/enterprises, Azure Blob Storage or Amazon S3 (when used for migrations), and Azure DevOps. The report lists the new ranges and the ranges that can be removed.

How GitHub communicated

  • Status page: GitHub communicated the incident in real-time via the status page.
  • Monthly availability report: Published a post‑incident recap (this July 2025 report) detailing the window, root cause, remediation, and customer actions. 
  • Direct outreach: Email alerts were sent to users who ran migrations in the prior 90 days to prompt IP allow‑list updates.

Key learnings for other teams (with action items)

Guardrails for infra changes

A routine infrastructure improvement removed a critical component from service without a fast restore path, underscoring the need to enforce change freezes or additional approvals for migration/egress infrastructure, use canary or blue-green patterns with automated rollback for configuration changes, and maintain configuration snapshots with one-click restore procedures.

Pre‑deployment validation

Strengthen validation before rollout by expanding unit tests and exercising changes with realistic test data; require pre-flight health checks and synthetic migrations in staging to pass before promotion, and gate deployments with policy checks (e.g., required test suites and explicit success thresholds).

Network dependency readiness (IP allow lists & storage egress)

Recovery introduced new egress IPs and forced customer allow-list updates, highlighting the need to centralize egress/IP management, publish change windows to customers in advance where possible, automate allow-list updates via APIs across GitHub orgs, cloud storage, and DevOps tools, and keep an emergency runbook for urgent IP changes.

Quick summary

On July 28, 2025, at 21:41 UTC, GitHub’s GEI migrations stalled and remained unprocessed for 5h34m, due to an infra component taken out of service during routine improvements. GitHub restored service by 03:15 UTC on July 29 after provisioning new resources, introducing new GEI IP ranges, and implementing recovery/validation hardening. Customers with IP allow lists must update them accordingly. Communications included the public availability report and guidance to follow the status page; email alerts went to users who used GEI in the last 90 days.

How ilert can help improve incident response

  • Real‑time alerting & on‑call orchestration: Ensure symptom‑level signals (migration queue age, job failures) page the right responder immediately, with automatic escalation policies to contain TTD.
  • Change Intelligence: Ingest deployment and infrastructure change events into ilert to correlate spikes in migration errors with recent changes and auto-flag likely causes.
  • Stakeholder & customer comms: Publish status page updates using AI within ilert to speed accurate communications.
  • Postmortems with action tracking: Create a structured postmortem with owner-led follow‑ups and track them to completion.
Find more Postmortems:
Ready to elevate your incident management?
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Our Cookie Policy
We use cookies to improve your experience, analyze site traffic and for marketing. Learn more in our Privacy Policy.
Open Preferences
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.