BLOG

ilert AI Voice Agent: Deep dive

August 15, 2025

Share this article:

Table of Contents:

‍

The ilert AI Voice Agent is designed to transform how on-call engineers handle urgent calls. Instead of waking engineers at 3 a.m. with minimal context, the AI Voice Agent collects essential details first and routes calls intelligently based on relevant, up-to-date information.

‍

The agent works hand in hand with ilert’s Call Flow Builder – a visual tool that lets users design custom call flows by connecting configurable nodes. Each node represents a step in the call handling process, and the AI Voice Agent is one such node.

‍

This means you can drop the AI into exactly the right place in your call handling logic, making the process seamless and highly customizable.

‍

In this article, we’ll explore the problem it solves, its construction, how it delivers natural and context-aware conversations, and how we ensure it remains secure and reliable in production.

‍

Beta Notice: The ilert AI Voice Agent is currently available in Beta. Users with the Call Flow Builder add-on can request early access by contacting support@ilert.com.

The problem we’re solving, and why it matters for on-call engineers

On-call engineers often receive urgent calls with minimal context, forcing them to ask repetitive questions before they can take action. This wastes valuable time in high-pressure situations.

‍

The ilert AI Voice Agent addresses this by:

‍

Saving time: The AI collects key details before an engineer is called, allowing them to start troubleshooting immediately instead of asking basic qualifying questions. It also reduces unnecessary escalations by checking for open incidents and informing callers if the issue is already being handled.
Visual call flow integration: Add AI Voice Agent nodes directly into your call flow with an easy-to-use interface, so it becomes part of your existing logic without manual workarounds.
Customizable information gathering: Define exactly what data is collected, such as caller name, contact number, email, incident description, affected services, or custom fields.

Architecture: How the ilert AI Voice Agent works

Under the hood, the AI Voice Agent is designed for modular, configurable interactions with low latency.

‍

Key components:

WebSockets – Provide a low-latency channel for conversational AI with OpenAI.
Twilio integration – Streams live audio to and from callers.
Visual flow builder – Configure AI Voice Agent nodes directly in the Call Flow Builder.

Modular configuration:

Intents – Pre-built or custom, define how calls are routed based on the caller's purpose.
Gathers – Structured data collection (e.g., contact details, incident descriptions).
Enrichment – Optionally pull data from configured sources such as ilert Status Pages, service states, open incidents, or active maintenance windows.
Audio messages – Fully customizable greetings and prompts.
Fallback handling – A “catch-all” branch for unmatched conversations.

During the development of the AI Voice Agent, the team faced several complex technical challenges.

‍

One of the first hurdles was tracking who was speaking at any given time. Both Twilio and OpenAI send speaker events, and the system needed to reliably determine whether the bot or the user was speaking in real time. This was essential to avoid interruptions or missed messages during a conversation.

‍

Another major challenge was ensuring a natural conversation flow. Creating smooth, human-like interactions required extensive prompt engineering and fine-tuning. The pacing, tone, and responsiveness of the AI had to be carefully controlled to make the experience feel intuitive and engaging for users.

‍

Finally, synchronizing multi-stream connections proved to be a critical task. The system had to maintain accurate state information between Twilio streams, OpenAI responses, and ilert’s backend. This synchronization was vital for preserving context consistency throughout the conversation.

Making conversations natural, accurate, and context-aware

The Voice Agent goes beyond traditional voice menus by combining intent recognition with optional context enrichment.

‍

With configurable context enrichment, the agent receives intents, gathers potential follow-up nodes, and captures the caller’s number during call initialization. If enrichment is enabled, it can also access additional data, such as open incidents, current service states, and active maintenance windows. This allows the agent to provide more relevant and timely responses.

‍

Through intent-based routing, the system matches the caller’s intent to the appropriate branch of the call flow, enabling faster and more accurate resolution of requests.

Security, compliance, and observability in production

Reliability and compliance are built in from the start. Here are three major principles:

Stateless design: No persistent storage of caller data between requests.
System prompts with operational rules: The AI follows strict, pre-defined guidelines to ensure security and consistent responses.
Detailed call logging: Logs all call events for troubleshooting and performance review.

Lessons learned

During development and early Beta testing, we learned a great deal about delivering smooth, reliable AI-powered conversations. Allowing the AI to be interrupted by the user turned out to be a key feature – many callers prefer to skip the rest of a question or add details they forgot earlier.

‍

However, this made it even more important to track who is speaking at any given time. By monitoring speaker activity, we can detect long periods of silence and prevent calls from running indefinitely when no one is talking.

‍

Coordinating multiple live connections (Twilio, OpenAI, ilert backend) still required careful orchestration to ensure the call state stayed synchronised at all times. Prompt engineering proved essential in making conversations sound natural while ensuring the AI followed operational rules and safety guidelines.

What’s next?

The Beta release has already sparked new ideas for improvements. We plan to extend logging capabilities and provide full recordings of conversations for review and compliance purposes.

‍

To improve flexibility, the AI Voice Agent will gain adjustable speaking speed and verbosity settings, allowing teams to fine-tune the interaction style. We are also exploring ways to detect when callers are frustrated and offer them an immediate option to speak with a human operator.

‍

On the transcription side, we aim to enhance the ilert user experience by moving from Twilio’s built-in transcription to AI-powered voice transcription. This will provide more accurate and context-aware briefings for on-call engineers before a call is connected.

Conclusion

The ilert AI Voice Agent bridges the gap between urgent incident calls and the actionable details engineers need to respond quickly. By integrating directly with ilert’s incident management platform, it delivers natural, context-aware, and secure conversations while giving teams the flexibility to adapt the interaction to their workflows.

‍

With upcoming features such as multilingual support, transcripts, and deeper integrations, the Voice Agent will further reduce on-call friction and accelerate incident response.

‍

ilert AI Voice Agent: Deep dive

The problem we’re solving, and why it matters for on-call engineers

Architecture: How the ilert AI Voice Agent works

Making conversations natural, accurate, and context-aware

Security, compliance, and observability in production

Lessons learned

What’s next?

Conclusion

Other blog posts you might like:

Building a metrics backend (time series db) with PostgreSQL and Rust

How to manage ilert call flows via Terraform

Reducing noise: Configuring Alert Processing with Terraform

Ready to elevate your incident management?

The solution for operation teams.