AI-First Incident Management. With Privacy in Mind.

Designed to augment teams with intelligent agents while keeping humans in control.
Bechtle Logo
GoInspire Logo
Lufthansa Systems Logo
NTT Data Logo
Bertelsmann Logo
REWE Digital Logo
ilert AI

AI-first. All-in-one incident management.

Intelligent agents for every stage of the incident lifecycle.

Discover all AI features

On-call schedule assistant

Share your scheduling needs in a simple, chat-like interface. Add team members, rotation rules, and timeframes — and get a ready-to-use on-call calendar everyone can access.

Let AI take the call

Introducing the ilert AI Voice Agent—your first responder for calls, gathering key details and informing your on-call engineers.

Status updates in no time

ilert AI analyzes your system and incidents, offering quick updates and managing communications for efficient issue resolution.

ilert Responder – your real-time incident advisor

ilert Responder is an intelligent agent that analyzes incidents in real time. It connects to your observability stack, investigates alerts across systems, and surfaces actionable insights, without taking control away from your team.

Features

  • Analyze logs, metrics, and recent changes autonomously
  • Identify root causes and similar past incidents
  • Suggest responders, rollback paths, or related service
  • Ask questions in natural language and get direct, evidence-backed answers
Integrations

Get started immediately using our integrations

ilert seamlessly connects with your tools using our pre-built integrations or via email. ilert integrates with monitoring, ticketing, chat, and collaboration tools.

Transform your Incident Response today – start free trial

Start for free
Customers

See how industry leaders achieve 99.9% uptime with ilert

Organizations worldwide trust ilert to streamline incident management, enhance reliability, and minimize downtime. Read what our customers have to say about their experience with our platform.

Stay up to date

Expert insights from our blog

Engineering

Cut alert noise with AI-powered grouping for MSPs

How MSPs can streamline IT incident management with ilert AI

Tim Nguyen Van
Jul 31, 2025 • 5 min read

Managed Service Providers (MSPs) and IT service providers face growing complexity in monitoring client systems – especially when multiple tools are in play. When every minor issue triggers an alert, operations teams quickly drown in noise.

This article shows how ilert’s intelligent alert grouping cuts through that noise by automatically correlating related alerts from the same alert source – reducing alert volume, ticketing overhead, and response time.

We'll walk through realistic examples using N-able N-central monitoring and Freshservice ticketing, simulate alert scenarios, and explain how to configure and fine-tune ilert AI grouping for better IT incident management. The tools are chosen as examples, and ilert provides seamless connections with many other monitoring and ITSM tools out of the box.

The problem: Alert overload in MSP environments

MSPs’ tools like N-able N-central are essential for proactive monitoring of client systems. But with detailed metrics and aggressive thresholds, they often generate a high volume of alerts – especially during recurring issues or cascading failures.

Scenario 1: System resource issues from N-central

A monitored Ubuntu server from test_customer (UBUNTU-SRV-01) begins showing signs of resource exhaustion. Over a 10-minute span, N-central triggers the following alerts:

  • CPU usage exceeds 90%
  • Available memory drops below 500MB
  • Multiple failed login attempts
  • Disk space below threshold on root partition (/)

Meanwhile, a separate server from test_customer2 (UBUNTU-SRV-02) triggers:

  • Multiple failed login attempts
  • Disk space below threshold on root partition (/)

Each of these events creates separate alerts. Without intelligent alert grouping, ilert would receive six distinct alerts – all treated independently despite clear contextual overlap. This leads to:

  • Alert noise that distracts from the core issue
  • Increased manual effort to correlate related events
  • Longer response times for the support team

In RMM-heavy environments, these inefficiencies add up. What’s needed is a smarter, context-aware way to consolidate related alerts into a single, actionable view.

Scenario 2:  End-user issue escalation

Several users from customer_alpha report problems logging into a shared client portal:

  • “Can’t log into the client portal – getting a timeout.”
  • “Login takes forever, then I get a 502 error.”
  • “Some users can’t access the dashboard at all.”

Each of these creates an alert in ilert via the Freshservice alert source. With alert grouping disabled, they would generate four separate alerts.

The solution: Intelligent alert grouping with ilert AI

To help MSPs manage alert noise and accelerate response, ilert AI introduces intelligent alert grouping – a feature designed to automatically correlate similar alerts from the same alert source into a single, actionable unit.

​​Let’s revisit the previous example: six alerts triggered by N-able N-central related to CPU, memory, disk space, and login failures. With alert grouping enabled in ilert, these alerts can be automatically bundled together based on shared context, such as:

  • Same target customer (e.g. test_customer)
  • Same target host (e.g., UBUNTU-SRV-01)
  • Short time window (e.g., all within 5 minutes)
  • Similar keywords or tags (e.g., “memory”, “performance”, “server”)

How does it work?

ilert AI uses vector search to group alerts from the same alert source based on their semantic similarity. Each alert is transformed into a vector embedding, and alerts with similar vectors – meaning similar content – are grouped together automatically.

You can control grouping behavior with two key settings:

  • Grouping window – defines the time span in which similar alerts are eligible to be grouped.
  • Similarity threshold – sets how closely alerts must match in vector space to be grouped.

More details can be found in the documentation article related to grouping alerts with the help of ilert AI.

Scenario 1: N-able N-central – intelligent alert grouping in action

Let’s continue with the earlier example. UBUNTU-SRV-01, monitored via N-central, triggers six alerts over 5 minutes. With ilert AI grouping enabled, these alerts are automatically consolidated into two grouped alerts:

Scenario 2: Intelligent grouping of Freshservice support tickets

With ilert AI enabled on the Freshservice alert source, semantically similar alerts triggered by support tickets are grouped into a single alert:

Conclusion

For MSPs using tools for remote monitoring or ticketing, ilert's intelligent alert grouping transforms noisy alert streams into focused, high-context alerts. By reducing duplication and speeding up triage, your teams can stay efficient, responsive, and focused on what matters.

Product

New features: Event flows, revamped alert view, sleek reports, and much more

New alert view with AI, smarter event routing, improved reports, and more – explore and test the latest ilert features today.

Daria Yankevich
Jul 22, 2025 • 5 min read

As you know, we've introduced a major update in recent months – ilert Responder – the AI Agent that helps you run root cause analysis during incidents and provides recommendations toward faster resolution. That's not all, and there are way more powerful features to share with you. Feel free to reach out to us via chat or at support@ilert.com if you have questions or if you want to propose a feature or improvement. 

New Alert view: Built for real-time collaboration and AI assistance

To better support real-time collaboration and prepare for the next round of AI features, we introduced a revamped alert view. There are various collapsible sections displayed, allowing you to open only those that are important to you at the moment. The platform automatically opens the ones that are likely important to you by default. Apart from the ‘Alert details,’ ‘Deployment events,’ and ‘Incident communications,’ which are long familiar to you, you will notice the ‘Actions’ section with the list of recommendations from the ilert Responder and ‘Logs and data’ relevant to the received alert.

On the right side, you will see that the timeline now shares space with the chat which capabilities are also significantly enhanced. You can use threads to keep communication clean, tag colleagues, and leave emojis. And, most importantly, you can communicate with ilert AI in the same environment by simply mentioning it via @. Moreover, ilert chat mirrors the communication happening in the war room in Microsoft Teams. This new view brings alerts, context, and collaboration into one place, helping teams make faster and informed decisions in the heat of an incident.

Event Flows: Smarter routing for incoming events

With Event Flows, ilert introduces a powerful and flexible way to process incoming events before they are converted into alerts. The feature allows you to build dynamic, rule-based workflows that determine how events are handled, routed, or filtered – all through a simple visual interface.

This makes Event Flows perfect for organizations that deal with a large volume of alerts or operate across multiple teams. Instead of manually managing routing rules across alert sources, you can centralize your logic in one reusable flow. Whether you want to send database-related events to your DB ops team, ignore low-severity alerts outside of business hours, or escalate critical alerts directly to on-call responders, Event Flows give you the tools to do just that.

At the core of every Event Flow is the Incoming Event block. You can connect it to one or multiple integrations or custom event sources using ilert's Event API. Once connected, you gain full control over how these events should behave. For example, you can add conditional branches that inspect event content, such as custom fields, labels, or summaries, and direct them down different paths depending on the logic you define.

You can also integrate Support hours checks into your workflows, ensuring that notifications respect team availability. If no conditions match, a default "else" path ensures that the event still continues downstream without being lost.

Built with teams in mind, Event Flows can be assigned to one or more teams in ilert, making them easily reusable and manageable across larger organizations. 

If you have suggestions for other nodes, don't hesitate to contact our support team or submit your idea in the ilert Roadmap.

Smarter insights with Reports 2.0

Check out the refreshed experience for all Reports, including Notifications and On-call reports. With a sleek design and enhanced filtering options, you can now quickly break down notifications and on-call activities by user, team, or custom time periods – helping you detect patterns and gain clarity.

The updated On-call reports show detailed logs of shifts, including time spent on each alert. Here, you also have more filtering options to fine-tune reports to various needs and audiences. This update enables better compensation tracking and fairness across teams. With Reports 2.0, ilert gives you deeper visibility into alert fatigue, delivery success, and overall incident response performance.

Overlay public holidays directly in your on-call schedules

Creating one-time schedule overrides just got easier. With the new holiday calendar overlay, ilert now displays relevant national holidays directly within the on-call schedule detail view. This removes the need to check external calendars and reduces setup errors. Simply spot holiday conflicts at a glance and create overrides with fewer clicks, improving coverage and reducing time spent managing schedules. You will probably also notice an overall elevated view of on-call schedules, as we overhauled its design.

‘Undo’ and ‘Regenerate’ options in AI-assisted incident communication

Managing incidents with AI just got more flexible. The latest ilert update enhances the AI-assisted incident comms workflow by giving users more control over the generated content. Now, when you press ‘Generate,’ ilert creates the incident summary and message based on your input and automatically displays a preview. Once generation completes, the Generate button transforms into a menu with two new actions:

  • Undo: Reverts back to your previously entered summary and message.
  • Regenerate: Creates a new version of the incident text based on your latest changes.

This allows for fast iteration without losing your original input, saving time and reducing errors in high-pressure moments. Additionally, the notification preview box at the bottom of the screen now clearly shows which status pages the incident will be posted on and how many subscribers will be notified. This ensures full visibility before you click ‘Create new incident’.

A few more improvements

Bulk-link alerts to incidents from the alert list. Managing multiple alerts just became more efficient. The alert list page now supports bulk actions, allowing you to select multiple alerts and link them to a single incident in one go. This speeds up incident management, especially during larger outages or correlated alert storms, reducing manual work and ensuring better alert-to-incident traceability.

ilert now supports labels. Labels are key-value pairs that add structured context to alerts and events. Labels make it easier to filter, route, and analyze incidents based on relevant information. They’re fully integrated with ICL and ITL, allowing dynamic routing, filtering, and automation based on runtime context. While we started with the event API and alerts, we are looking forward to bringing new filter options to all entities across the board. 

Even better heartbeats. To prevent misconfigurations, ilert now prompts you if you try to save a heartbeat without selecting an alert source, ensuring you don’t accidentally create silent monitors. Additionally, you can customize the message for heartbeat pings. You’ll also now see your current heartbeat monitor usage directly in the ‘Usage & limits’ section (top right corner of the screen, under a cog icon), giving you better visibility and control.

Haven't yet tried ilert Heartbeat 2.0? Test it out together with the fully revamped Email alert source.

Alert actions are displayed in the ilert Event Explorer. The Event Explorer is a real-time view into alert activity, showing detailed logs for every event sent to ilert. With the latest update, alert actions are now fully visible within the Event Explorer.

Markdown support for maintenance windows. You can now use Markdown in maintenance window descriptions. Whether editing in the management UI or displaying on status pages, your formatting – like bullet points, links, or code snippets – is now fully supported, helping you communicate planned downtime more clearly and professionally.

Auto-accept alerts for connected calls in Call flows. ilert’s Create Alert node in call routing now supports auto-accepting alerts on successful call connections. When the “Accept alert on answer” option is enabled, the first responder who picks up the call automatically accepts the alert, speeding up ownership assignment and eliminating manual steps. This feature improves clarity and reduces lag during voice-based incident acknowledgement. It also allows copying a legacy call routing behaviour when migrating to call flows.

Integrations

Connectwise. Automatically turn ConnectWise service tickets into ilert alerts. Keep your operations team in sync with real-time updates and streamline incident workflows between ITSM and on-call responders.

Alibaba CloudMonitor. Forward alerts from Alibaba Cloud CloudMonitor directly into ilert. Ensure critical metrics and events from your cloud infrastructure trigger the right on-call actions without delay. 

Teamcity. Receive build and deployment failure alerts from JetBrains TeamCity in ilert. Stay on top of CI/CD issues and route incidents to the right developers instantly.

LibreNMS. Send network monitoring alerts from LibreNMS to ilert. Enhance your incident response by bringing SNMP and performance data into your centralized alerting and on-call system.

Engineering

How we built agentic incident response

Discover how ilert turned the vision of agentic incident response into a production reality, cutting toil, MTTR, and alert fatigue for modern on-call teams.

Tim Gühnemann
Jul 02, 2025 • 5 min read

AI already transforms how we detect, respond to, and resolve outages. Traditional workflows often force responders to switch between dashboards, shift through logs, and coordinate across fragmented channels under stress. This reactive, manual approach leads to slower resolution, higher operational costs, and burnout, especially as IT systems grow more complex.

At ilert, we are not just discussing the future of incident management – we are actively building it. We have brought agentic incident response into production, enabling operational excellence while reducing manual toil and cognitive load for on-call teams. Here is how we made this vision a reality.

Read our blog post on agentic incident response and the introduction of ilert Responder.

Building the foundation: Hive and the ilert AI voice agent

Our journey into agentic incident response began with architectural decisions prioritising flexibility, scalability, and intelligent action across all stages of the incident lifecycle.

Hive: Our LLM orchestration layer

Hive is our proprietary proxy and orchestration layer for large language models (LLMs). It powers intelligent incident summaries, contextual recommendations, and advanced workflows across ilert, enabling us to manage multiple model providers, optimise workload routing, and ensure a secure, consistent, and high-performance AI backbone for all use cases.

Hive allows us to seamlessly integrate new LLMs as they emerge, control cost efficiency by routing tasks to the best-fit model, and maintain data privacy while delivering highly contextual intelligence in real time.

AI voice agent for seamless responder interaction

Communication is critical during incidents, especially when responders need to act without being tethered to keyboards. Our AI voice agent enables responders to gather updates or report incidents verbally, integrating into existing call flows as a natural part of the process. It transforms voice interactions into structured, actionable alerts while synthesising updates from diverse data sources, bridging human intuition with automated data-driven action.

What is MCP (Model Context Protocol)?

The Model Context Protocol (MCP) is a dynamic, real-time protocol built by Anthropic that connects your data to the ilert Responder, providing the rich, structured context our agents need to act intelligently during incidents.

Why did we build MCP?

Traditional integrations often leave systems disconnected, requiring manual correlation across telemetry, logs, and infrastructure state during incidents. MCP was designed to eliminate these silos by automatically aggregating, structuring, and transmitting incident-relevant context in real time.

How does MCP work?

MCP gathers data from your monitoring systems, log aggregators, deployment pipelines, and infrastructure environments, processes it within a secure, EU-compliant, multi-tenant architecture, and delivers only the necessary data to our agentic responders. By doing so, MCP:

  • Ensures your agent has real-time, granular awareness of incidents;
  • Maintains strict data security, isolation, and compliance;
  • Reduces manual correlation and cognitive load during critical moments;
  • Enables low-latency, context-rich interactions with the ilert Responder.

Think of MCP as the neural network that links your observability stack, code repositories, and infrastructure directly to our AI systems, ensuring that decisions and suggestions are always contextually accurate, actionable, and relevant.

The ilert Responder pipeline: From alert to agent-proposed actions

We designed an end-to-end pipeline that transforms monitoring signals into intelligent, actionable workflows to accelerate incident resolution.

Event Flow → Alert

ilert Event Flow ingests monitoring signals and applies your rules and thresholds to trigger alerts when specific conditions are met. This ensures the right teams are notified the moment an incident requires attention, without unnecessary noise.

MCP (Model Context Protocol) comes into play

Immediately upon alert generation, MCP retrieves and structures relevant telemetry data, logs, recent deployment changes, and infrastructure status, delivering it securely to the ilert Responder. This ensures the Responder has comprehensive situational awareness, eliminating the manual task of gathering context during incidents. This is possible through context-aware integrations with

  • Observability tools: To pull telemetry and time-series data from Prometheus and Grafana;
  • Code repositories: To access commit history and deployment metadata from GitHub;
  • Infrastructure environments: To gain real-time status and configurations from Kubernetes.

ilert Responder proposes actions

The ilert Responder ingests and correlates data in real time, becoming an intelligent participant in incident response rather than a passive notification system. Leveraging its deep, contextual understanding, the ilert Responder formulates actionable recommendations such as:

  • Root-cause suggestions,
  • Step-by-step remediation instructions,
  • Escalation paths and dependency insights.

These are presented within the ilert chat interface, allowing responders to review, approve, or modify actions for safe execution during live incidents. The interactive chat UI evolves into a command centre, enabling responders to:

  • Request deeper insights dynamically,
  • Perform direct actions like scaling Kubernetes pods,
  • Drill down into suggested root causes and metrics seamlessly.

Operational improvements

Agentic incident response at ilert is delivering tangible results for engineering and operations teams:

  • Real-time log correlation and root cause inference to pinpoint likely causes within moments;
  • Diagnostic summaries providing human-readable, actionable overviews of incidents;
  • Interactive natural language Q&A with the agent for fast data retrieval and contextual clarity;
  • Actionable remediation proposals with direct, safe execution workflows;
  • Automated post-mortems and timelines to reduce manual documentation effort post-incident.

By reducing manual toil and accelerating clarity, teams are spending less time managing incidents and more time focusing on delivering reliable services.

Key learnings and best practices

Building and operating agentic systems for mission-critical incident management at ilert has taught us:

  • Trust through transparency: Autonomous data collection, correlation, and safe, pre-approved actions happen without manual steps, ensuring speed and reducing cognitive load for responders. For actions with higher risk or business impact, teams can choose to add approval steps if desired. Full transparency into what the agent is doing and why builds trust, enabling responders to understand and oversee agentic actions without slowing down resolution.
  • Guarding against hallucinations: Rich, structured, and verified context via MCP ensures the agent works with coherent, reliable information, significantly reducing the risk of inaccurate suggestions or actions.
  • Performance tuning for low latency: Incident response is time-critical. Through speculative tool calls and optimised data pathways, we ensure that insights and actions are generated in near real-time, reducing MTTR when every second counts.
    Continuous learning: Feedback loops integrated into workflows help our agent refine its recommendations and actions over time, improving accuracy and effectiveness with every incident.
  • Safe autonomous execution: By defining safe, controlled scopes for automated remediation, the agent can execute corrective actions independently where appropriate, accelerating resolution while retaining operational safety and rollback capabilities.

Conclusion: Agentic incident response is already here

At ilert, we believe that the era of manual, reactive incident management is ending, and the benefits of agentic automation are too significant to delay. We are proud to bring these advanced capabilities into production, reducing toil, cutting MTTR, and empowering teams to focus on what matters most: reliability and innovation.

While ilert Responder already automates data gathering, analysis, and remediation suggestions, this release is just the first milestone. Our next goal is to let ilert Responder resolve well-understood, low-risk incidents – like flaky health checks or transient latency spikes – entirely on its own. Human responders stay in control, but much of the routine toil will fade away.

Want to see it in action? Explore the ilert Responder, join our beta program, or contact us for a personalised demo to bring agentic incident response into your on-call workflow.

Explore all
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Our Cookie Policy
We use cookies to improve your experience, analyze site traffic and for marketing. Learn more in our Privacy Policy.
Open Preferences
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.