1Password: Sign-in outage blocks logins
On Aug 5, 2025, 1Password customers couldn’t sign in for roughly an hour. This overview covers what happened, how it was handled publicly, and the playbook updates you can adopt now to protect your primary user journey.
Company & product
1Password is a cross-platform password manager used by individuals and enterprises to store and autofill credentials, passkeys, and secrets across devices and browsers.
What happened
On August 5, 2025, 1Password experienced an incident preventing customers from signing in. Users saw errors like “Can’t sign in. The request took too long.” 1Password mitigated and then resolved the issue the same day. 1Password did not publish a root cause on the incident page.
Timeline
- Start: Tue, Aug 5, 2025 16:46 EDT (20:46 UTC / 22:46 CEST).
- Resolution: Tue, Aug 5, 2025 17:59 EDT (21:59 UTC / 23:59 CEST).
- TTD (time to detect): Public status page showed the incident started at 16:46 EDT. Third-party monitors indicated user reports began ~11 minutes earlier.
- TTR (time to resolve): 1h 13m.
How 1Password responded
Triage and identification began with confirming the issue after an initial investigation window, followed by mitigation where 1Password “rolled out changes to mitigate” and shifted the incident to monitoring. The service recovered and was marked resolved at 17:59 EDT. For continuity, customers were advised they could access items offline in the app (if permitted by admins), with the caveat that changes wouldn’t sync until recovery—an update also mirrored by IsDown.
How 1Password communicated
- Channels: Status page carried the investigation → identified → monitoring → resolved sequence with clear, plain-language updates and a practical workaround (offline access).
- Cadence: Multiple updates occur across the ~1-hour window, culminating in the issuance of a resolved notice.
Key learnings for other teams
- Protect your primary user journey (auth) with canary checks. Run synthetic sign-ins per region and tenant type; alert on elevated auth latency and error spikes to cut TTD.
- Design for offline resilience. If your client apps can safely operate read-only offline, document and pre-approve this path so support can share it immediately (as 1Password did).
- Stage mitigations behind feature flags. Being able to “roll out changes to mitigate” quickly implies preflighted toggles and safe rollback—make this standard.
- Own the comms narrative. Include a short impact summary (scope, percentage of failures, regions), known workarounds, and next update time to set expectations.
- Capture auth dependencies. Map third-party/infra dependencies (IdP, network edges, DBs). Pre-define degraded modes (rate limits, circuit breakers) to hold the line during partial failures.
How ilert can help
- Reliable escalation policies: Layered on-call schedules and service-based routing notify the right responder fast, with automatic handoffs if there’s no acknowledgement. Fail-safe fallbacks across voice, SMS, push, and chat ensure no alert is dropped.
- AI-assisted incident communications: ilert drafts clear status page updates and stakeholder summaries in seconds, ensuring a consistent tone across all channels.
- Reports for better post-incident learnings: Out-of-the-box dashboards track MTTA/MTTR, alerts, and escalation effectiveness so you can see what’s working and what isn’t. Trend and SLO/SLA impact views prioritize the fixes that matter most.
