The Interface Is the Intelligence: Why Action-First UX Beats Conversational AI in Incident Response
.png)
The problem with bolting AI onto a product
It’s 2:47 a.m. A P1 alert fires. The on-call engineer opens ilert, sees the AI has already investigated, and is presented with three remediation options. What happens next is the moment we obsessed over.
Most AI tooling at that moment hands the engineer a numbered list in a chat window and waits. The engineer reads, selects mentally, types a reply, and the agent resumes. That sequence takes seconds under pressure, but it also introduces ambiguity, re-reading, and cognitive overhead at exactly the wrong moment.
We’re building an SRE Agent, an AI agent embedded directly inside ilert’s incident response platform to handle everything from RCA and triage to on-call queries and object creation. As we made agents a first-class part of the product, one question kept coming up: what’s the right interface for a human approving an AI decision during an active incident?
Chat is the obvious default. But it’s not always the right one.
Does the agent run as a sidecar? An overlay? Is there a dedicated place to talk to it? Is chat the only interface?
Chat has one strong argument going for it: the agent can be wherever the user is: Slack, WhatsApp, Teams. Whenever it needs input, it reaches you on your preferred channel.
But chat also has real drawbacks. In many cases, it’s still too much input. Users don’t always know what to type or where to start. And when you push the interaction into a chat channel, you’re limited to what that channel supports, which usually means text.
Here’s how we’re approaching it at ilert:
.png)
The bet is that the best agent UX won’t feel like a chatbot. It’ll feel like the product got smarter. ActionOption Cards are where that thinking gets concrete, and they start by solving one very specific piece of friction.
The problem with plain-text option lists
Back to 2:47 a.m. The AI has already done the hard part: correlated signals across Datadog and GitHub, identified a bad deploy, and narrowed the options to three. That work matters. What happens next can undo it.
Most AI tooling hands the engineer a numbered list in a chat box and waits. That forces them to read, mentally select, and type a confirmation back, friction at exactly the wrong moment, and ambiguity that the agent then has to resolve. The pattern looks like this:
This forces the user to read the option, mentally select it, and type a follow-up message to confirm their intent. That is friction at exactly the wrong moment. It also introduces ambiguity, did the user mean option 1 exactly, or a variation of it?
The interface should be decisional, not just conversational. During an active incident, engineers operate under cognitive load. Every second spent re-reading, re-parsing, or re-typing is a second the incident continues.
What are A2UI (agent-to-user interface) ActionOption cards?
We are using the A2UI framework for dynamically rendering interactive UI elements inside the agent conversation thread, components that the agent generates on the fly, not static screens. An ActionOption Card is the primary way it is expressed: it’s what the agent renders instead of a numbered text list whenever a user action is required.
Each card represents a single, discrete course of action and is composed of:
- Title: A short, unambiguous label for the action, e.g. “Option 1: Scale up payment-gateway”.
- Description: An explanation of what the action does and the trade-offs it involves, so engineers can make an informed decision at a glance.
- Tag badge (optional): A colour-coded label: Recommended (green), Immediate (amber), Quick (blue), or Best (green). Only rendered when it meaningfully differentiates an option.
- Action button: A clickable button with a short action verb and an optional icon. One click is all that’s required to proceed.
A simple example: the agent proposes three options. Instead of typing “1”, “2”, or “3”, you click a button. This pattern scales into more complex scenarios: selections, sliders, rich tables.
.png)
Technical architecture: How cards are generated and rendered
Three things make it work: the LLM, a thin tool layer, and the frontend.
Step 1: Tool call
We built a dedicated tool that the agent can call whenever it decides structured options make more sense than a plain text reply. The LLM passes a list of option objects, one per card:
Step 2: Rendering
For each option, a unique identifier is generated. An A2UI surface update command is then published to the backend message bus. The frontend subscribes to these events and renders the cards in real time within the conversation thread as they arrive, no page reload, no manual polling.
Step 3: User interaction and intent injection
When the engineer clicks an action button, an event carrying the option's unique identifier is sent back to the agent. The agent maps this to a pre-configured confirmation sentence, for example, "Yes, scale up the payment-gateway replicas", and injects it into the chat thread as if the user had typed it themselves. This seamlessly resumes the LLM loop with the user's confirmed, unambiguous intent.
Step 4: Post-selection state
Once the engineer clicks, the card updates its own state: the action button is replaced with a green checkmark labelled "Selected". This visual confirmation makes it clear the action has been acknowledged and prevents accidental double submissions.
.png)
.png)
Why this pattern matters
This is ilert’s answer to a question every AI SRE vendor is navigating: how much should the agent do autonomously, and when does it hand back to a human? Our answer is that the handoff moment needs to be as frictionless as the investigation that precedes it. ActionOption Cards are built for that moment. Here’s what that means in practice:
- Visual scannability. Cards are spatially separated, visually distinct, and carry structured metadata. An engineer can evaluate three options at a glance rather than reading a paragraph of text.
- Explicit risk and effort signalling. Rather than leaving the risk assessment to intuition, the agent surfaces risk and effort data directly alongside each option, information drawn from runbooks, historical incident data, or its own analysis.
- Unambiguous intent. A clicked button maps to an exact, machine-readable action. There is no natural language ambiguity between “scale it up” and “increase the replicas”. The identifier-to-sentence mapping ensures the LLM receives exactly the intent the engineer confirmed.
- Resumable agent loop. Because the injected confirmation sentence re-enters the chat thread like any other user message, the LLM loop resumes without special-case handling. The agent continues its workflow as if the engineer had typed the response naturally.
.png)
The click is the governance
A lot of AI SRE products talk about human-in-the-loop as a safety concept. ActionOption Cards make it a UX reality. The engineer doesn’t approve an action by typing “yes” into a chat box, they click a button that surfaces the risk, the effort, and the trade-off at a glance. The approval is informed and it’s fast.
That’s the difference between an AI agent bolted on top of a product and one that’s built into it. The agent earns autonomy gradually, and at every step, the human approval moment is designed to be as clear and fast as the AI investigation that preceded it.
Back to 2:47 a.m. The AI investigated. Three options are on screen. One click.

.avif)

