AI-First Incident Management. With Privacy in Mind.

Designed to augment teams with intelligent agents while keeping humans in control.
Bechtle Logo
GoInspire Logo
Lufthansa Systems Logo
NTT Data Logo
IKEA Logo
Bertelsmann Logo
REWE Digital Logo
ilert AI

AI-first. All-in-one incident management.

Intelligent agents for every stage of the incident lifecycle.

Discover all AI features

On-call schedule assistant

Share your scheduling needs in a simple, chat-like interface. Add team members, rotation rules, and timeframes — and get a ready-to-use on-call calendar everyone can access.

Let AI take the call

Introducing the ilert AI Voice Agent—your first responder for calls, gathering key details and informing your on-call engineers.

Status updates in no time

ilert AI analyzes your system and incidents, offering quick updates and managing communications for efficient issue resolution.

ilert Responder – your real-time incident advisor

ilert Responder is an intelligent agent that analyzes incidents in real time. It connects to your observability stack, investigates alerts across systems, and surfaces actionable insights, without taking control away from your team.

Features

  • Analyze logs, metrics, and recent changes autonomously
  • Identify root causes and similar past incidents
  • Suggest responders, rollback paths, or related service
  • Ask questions in natural language and get direct, evidence-backed answers
Integrations

Get started immediately using our integrations

ilert seamlessly connects with your tools using our pre-built integrations or via email. ilert integrates with monitoring, ticketing, chat, and collaboration tools.

Transform your Incident Response today – start free trial
Start for free
Customers

See how industry leaders achieve 99.9% uptime with ilert

Organizations worldwide trust ilert to streamline incident management, enhance reliability, and minimize downtime. Read what our customers have to say about their experience with our platform.

Stay up to date

Expert insights from our blog

Insights

On-call compensation for IT engineers in 2025

2025 guide to on-call pay: TVöD benchmark, global stipends, pay models, standby laws, and well-being best practices.

Daniel Weiß
Jun 30, 2025 • 5 min read

Imagine it’s 2 AM and a critical system flatlines without warning. A bleary-eyed on-call engineer scrambles to restore service, shielding customers from a major outage that could torpedo your next Service Level Objective (SLO) review. Yet when daylight returns, debates over fair on-call compensation start all over again: What’s “just” pay for sleepless nights, unpredictable pings, and rapid-fire incident responses?

What counts as on-call?

On-call is a special working hour arrangement under employment law. It comes into effect when the employee is obliged to be contactable at least by phone, so they can start work in an emergency. On-call duty is generally counted as time specifically meant for work purposes. 

In practice, this means that employees are normally not allowed to work while on call. However, there may be exceptions. For example, on-call employees may also work from home if they can be reached through their work device.

What's the difference between on-call and stand-by service?

There’s a time-and-location gap between the two models:

  • On-call – employees stay reachable (phone, pager, or on-call management app) and can log in from anywhere when an alert fires.
  • Stand-by – staff must be physically present on site and ready to act immediately. German labour law labels this Bereitschaftsdienst as working time and treats it accordingly.

In IT operations, remote on-call service is usually preferred because most incidents (code rollbacks, config tweaks) can be resolved over VPN. Stand-by still matters for latency-critical environments, for example, trading platforms or industrial control systems, where a technician must monitor hardware and intervene within seconds to meet strict service-level agreements.

Are on-call hours the same as work hours?

Whether on-call duty counts as working hours isn’t as clear-cut as it looks. Under most labour-law frameworks – including Occupational Safety and Health guidance and the U.S. FLSA Fact Sheet #22passive on-call time is treated as rest time as long as no alert comes in. The moment you’re paged and start troubleshooting, those minutes flip to active working time. In borderline cases, courts (e.g., Germany’s BAG, Oct 2023 ruling 6 AZR 210/22) decide which periods qualify, so definitions often vary by jurisdiction and company policy.

There’s also no universal rule on pay. Many employers treat on-call duty as billable work and compensate it accordingly; others classify passive standby as unpaid availability. If your firm uses the latter model, remember you won’t be reimbursed for simply being reachable.

Bottom line: on-call time isn’t always the same as working time – it hinges on the organisation’s compensation policy. Some U.S. big-tech companies (Airbnb, Apple, Netflix) don’t pay for passive standby, while many European tech firms do.

On-call duty times

On-call scheduling is usually confined to specific nights or weekends agreed in advance and written into the employment contract. Because fewer staff are on site during those hours, reliable night- and weekend coverage is essential.

In Germany, the ICT trade group Bitkom recommends capping on-call assignments at 56 days per calendar year and guaranteeing at least 8 consecutive hours of rest per shift – Bitkom’s guideline on Rufbereitschaft im IT-Betrieb. On-call duty is generally classified as non-working time, so the usual 11-hour rest break required by §5 (1) of the Arbeitszeitgesetz does not apply until the engineer has actively worked on an incident.

Need an easy way to keep those limits visible? ilert’s on-call scheduling shows every planned rotation and actual shifts at a glance, so teams stay compliant without spreadsheets.

How is payment settled for on-call service in IT companies?

In IT companies, on-call hours are usually considered working time and are paid as such. As mentioned above, be sure to clarify this with your employer in advance to check what is stated in your contract.

For large corporations like Airbnb or Apple, which do not pay for on-call time, the argument is that their employees are already among the top earners. This means that their employees still earn much more than they would at most companies that pay on-call time in addition to their salary.

In Germany, there is no specific law regarding how on-call hours should be paid. This is, therefore, left up to the employer’s discretion. Most commonly, however, on-call duty is generally paid working time, i.e., the employee receives payment for the time he or she is on-call. This can be structured in different ways. 

In practice, on-call time is often compensated either on top of the standard hourly wage or with time off. In many companies, on-call time is also counted as working time and is paid for accordingly. However, this is only possible if the employee is working rather than being only available by phone. As already mentioned, this would be the case while working from home.

In most tech organisations, hours spent on-call count as paid working time, yet the formula changes from company to company. Before you join a rota, double-check your contract or the internal on-call compensation policy.

In practice, you’ll see two common models:

Hourly uplift 

A percentage on top of the standard rate for every scheduled standby hour.

Time-off swap 

Eight hours on-call earn four hours of paid leave.


Remember, only the minutes you actively work are universally classed as working time; simply being reachable may stay unpaid unless your company’s policy says otherwise.

How are on-call services paid in IT companies?

Pay still varies by company size, sector, and risk profile. The federal collective agreement for public employees (TVöD) specifies the following allowances in § 8 Abs. 3:

 Stand-by shifts of 12 hours or longer

Weekdays (Mon–Fri): paid at 2 X the hourly rate for the entire day.

Weekends and public holidays: paid at 4 X the hourly rate for the entire day.

 Shorter stand-by windows (under 12 h)

Earn an additional 12.5 % of your hourly rate for each hour on call.

For work in a large corporation or a successful start-up, you can expect to earn about €1,000 per week. At Zalando, the on-call compensation is roughly €1,050; at the start-up HelloFresh, €1,000; and at Amazon Germany, about €800. Several companies in the financial sector offer comparable rates, although exact amounts vary. Here are the stats provided by Pragmatic Engineer blog:

  • SumUp (Germany): €1,050 per week
  • N26 (Germany): €880 per week
  • Klarna (Europe): €500 per week
  • Mastercard (UK): £470 per week
  • PayPal (Germany): $350 per week
  • Wise (UK): £300 per week

Recent engineer forums and community posts add further reference points:

  • Google – Tier-1 SRE rota (five-minute response): paid for 40 minutes of every on-call hour outside office hours (66% of the base hourly rate). Tier-2 (30-minute response): 20 minutes per hour (33 %).
  • AWS (EU Tier-0 services) – 25% of base pay for each out-of-hours on-call hour, plus a half-day of paid time off for every Saturday or night-time page.

Beyond payment: safeguarding on-call well-being

Pay isn’t the only lever that matters. On-call duty disrupts normal sleep patterns and life outside work, so protecting responders’ well-being is critical. Your team will cope far better if you follow these five practices:

  1. Set crystal-clear expectations for response windows and escalation paths.
  2. Rotate shifts fairly with primary + secondary roles,use an automated on-call schedule so the rota is transparent.
  3. Watch the workload: track pages per engineer and cap consecutive overnights with on-call reports.
  4. Leverage tooling- alert deduplication and smart escalations in ilert’s on-call management cut noise and shorten time-to-sleep.
  5. Provide regular training and support- run quarterly fire-drills or gamedays so responders stay confident under fire.

Quick summary

On-call duty in IT means being reachable outside normal hours to respond to incidents, usually remotely. It differs from standby service, which requires physical presence and is always counted as working time. Legally, on-call time isn’t always paid, only active incident response typically counts as working time. Compensation varies: some companies offer hourly uplifts or time-off swaps, while others, like Apple or Airbnb, don’t pay for passive standby. In Germany, Bitkom recommends no more than 56 on-call days per year with 8-hour rest shifts. Weekly stipends range from €800 to €1,050 at firms like Zalando, HelloFresh, and SumUp. To protect engineers, best practices include fair rotations, clear escalation paths, tooling to reduce alert noise, and regular training

Announcements

ilert introduces Agentic Incident Response: Entering the AI-first era

Meet ilert Responder: your AI incident co-pilot that investigates alerts, accelerates incident resolution, and empowers your SRE team without taking control away.

Birol Yildiz
Jun 16, 2025 • 5 min read

Imagine incidents resolved through insights, not manual investigations.

Picture an incident management future where you're never alone during critical alerts. Imagine your best engineer always available, tirelessly investigating issues, analyzing logs, correlating metrics, checking recent code changes, and delivering actionable insights, instantly. Today, ilert is stepping boldly into this future with our first intelligent agent: ilert Responder.

Why AI-first?

Incident management is evolving rapidly. Systems grow complex, alert volumes surge, and pressure on teams intensifies. SREs often find themselves overwhelmed by noise, urgently navigating logs, metrics, and dashboards to uncover root causes.

At ilert, we've been pioneering AI in incident management for over three years, launching intelligent alert grouping, automated post-mortem creation, and more. ilert Responder is not a beginning, but a leap forward, building on years of experience, foundational work, and customer feedback.

We're laser-focused on helping companies significantly reduce Mean Time to Resolution (MTTR). Every decision, every feature we develop revolves around one question: How does this contribute to lowering MTTR? With GenAI and agentic systems, we see transformative potential to contribute to this goal. We’re betting on a future where you're only paged about an incident if AI can't autonomously resolve it first. Imagine no more waking up at 3 a.m. just to restart a service or roll back a deployment.

That’s why we’re committed to becoming an AI-first platform, embedding artificial intelligence at the heart of everything we do. This isn't just adding AI as a feature; it's fundamentally reimagining incident response for the better.

Meet ilert Responder: Your 24/7 incident co-pilot

ilert Responder is your trusted teammate and is built directly into the ilert platform. It:

  • Connects directly with your observability stack, your cloud infrastructure, and code repository.
  • Analyzes incidents in real-time using various data sources, pinpointing root causes.
  • Provides clear, prioritized recommendations for remediation.

Interact seamlessly via a chat-based interface, ask questions, share context, and receive guidance precisely when you need it. Every insight from ilert Responder is clear, actionable, backed up with supporting data, even under pressure.

Under the hood, we’re using the MCP (Model-Context Protocol) to connect the ilert Responder agent with your tools and infrastructure. MCP is to AI what HTTP is to the web – a standardized protocol for connecting LLM-based agents to the systems where real data lives. It solves two key challenges: the limited, outdated knowledge and lack of context of LLMs, and the complexity of maintaining custom integration layers between AI apps and external data sources.

With MCP, ilert Responder can securely and contextually interact with tools like Grafana, Prometheus, GitHub and Kubernetes – fetching logs, metrics, deployment data, code changes, and more in real time. We've built a scalable, multi-tenant architecture around MCP that allows us to easily add new data sources (MCP servers), continuously expanding Responder’s investigative capabilities with every integration.

See the ilert Responder in action

Introducing Agentic Incident Management

ilert Responder marks the start of what we call Agentic Incident Management. Here, intelligent agents:

  • Reason and investigate like seasoned engineers.
  • Learn from each interaction, growing continuously smarter.
  • Work alongside humans transparently, always with clear oversight.

By default, the new ilert Responder operates in read-only mode and provides you with recommendations for faster resolution. It doesn't replace your on-call team but augments it.

Join the ilert Agentic Incident Response beta program

We're inviting innovative teams to join our Beta program, granting early access to ilert Responder. Beta testers will:

  • Directly shape future capabilities.
  • Enjoy early benefits and competitive advantages.
  • Lead their industries into the future of AI-powered incident management.

Interested? Email us at support@ilert.com and become a pioneer in the AI-first incident response revolution.

AI-first incident management – for everyone

AI features have been an integrated part of ilert for a few years now. With this next step, AI features are no longer reserved for premium plans or add-ons; they're foundational. 

We have already introduced some significant changes in our pricing. While in the Beta phase, ilert AI Responder is available at no additional cost. But that's not all. You will notice that we’re discontinuing our AIOps add-on and making AI features such as intelligent alert grouping available in the Scale plan. Even Free ilert customers now have access to ilert AI features. 

Soon, all ilert customers will have flexible AI credits to unlock advanced capabilities – from ilert Responder to ilert postmortem creator. Details on our transparent, credit-based pricing model will follow shortly. Stay tuned to our blog and newsletter.

Privacy-first, always

Privacy and security of your data remain the highest priority for us. We champion AI-first to accelerate incident resolution, embedding Privacy First with data sovereignty, end-to-end encryption, region-specific AI-processing, and GDPR compliance. From day one, we’re building intelligent systems that are as protective of confidentiality as they are innovative. ilert AI Responder is built on this basis. 

For ilert AI, we use foundational models hosted by AWS, Microsoft Azure, or OpenAI, depending on your location. For EU customers, all AI processing happens within Europe using AWS or Microsoft infrastructure – no data leaves the EU, and no personal or sensitive information is sent to OpenAI’s global endpoints. Customers outside the EU may use OpenAI or AWS, always under strict access controls and encryption at rest and in transit.

Moreover, we only use alert incident-related data and don’t share personal or user-level performance data with external AI models. We also don't use your data to train models and have opted out of data training with all of our LLM providers. 

Learn more about our security and privacy commitment in the Q&A section.

We're entering a new era in incident response, one where AI doesn't replace SREs, it elevates them. ilert Responder is just the beginning. The future is collaborative, intelligent, and human-centered. Let's build it together.

Engineering

Under the hood: Request coverage feature

Discover the process of developing one of the most frequently used features in ilert's mobile app.

Marko Simon
May 23, 2025 • 5 min read

The ilert mobile app is primarily used by responders to receive notifications about critical alerts, react to them on the go, and check their current on-call status. It has various capabilities, including critical notifications via push, quick actions for alerts, and critical alert settings. The app enables responders to view their current on-call shifts and escalation policies, take on-call shifts from somebody else, and create coverage requests to ask for on-call shift handover from a colleague. The latter is a new feature of ilert that has proven to be very useful for a communication tool between users, and this post is taking a deeper dive into the development of the feature and the challenges we faced developing it.

Why were coverage requests introduced?

Since we introduced on-call schedules, users have been able to create overrides—special shifts that take priority over regular ones. An override lets you assign another user to take over on-call duty, either for a full shift or just part of it. Overrides don’t have to follow existing shifts—they can be created for any time period, even outside of configured shifts.

Later on, the "Take on-call" feature was introduced, which is the opposite of overriding my shifts. Both methods create overrides, but neither method ensures that the other user gets notified of any action taken on their on-call shifts. Furthermore, creating overrides for other users was giving them responsibility they eventually weren't aware of, and this could be critical.

The solution for this problem was to introduce a flow of asking another user to take over specific on-call duties, resulting in a short communication stream of requesting coverage.

Designing the coverage request REST API

The general flow of a coverage request should be:

1. User A creates a coverage request, asking User B to (partially) take over one or multiple shifts

2. User B gets notified, either accepts or declines the coverage request

3. User A gets notified of the action that User B decided to choose

The logic behind ilert request coverage feature

We needed to design the API around a coverage request entity, which had to have at least the following fields:

- sender

- receiver

- shifts

Additionally, we added a message field to give users an option to communicate additional details for their request. For the user interface, we also provided the current state and the createdAt date string, which are read-only properties. When the user declines the coverage request, some communication back may be useful too, handled by giving the user the ability to add a declineComment. Lastly, to show multiple coverage requests in a list view and apply meaningful filters, we used the state field in combination with an `expired` state calculated in the frontend. A coverage request is considered expired when the last shift it covers has ended.

Beyond the classic Create and Read operations on the coverage request entity, we needed specific endpoints to perform actions: accept, decline, and cancel. Update and Delete operations are not part of the flow right now and won't be implemented.

From mockup to polished UI

ilert Request coverage feature: mockups and final view

There are no significant differences between the mockup and the final version of the coverage request creation view. The styles have been adjusted, and an additional timezone information box has been included. The final versions of the list view and the detail view look like this:

Communication is key

A general goal of this feature is to motivate users to see and respond to coverage requests as early as possible, as on-call shifts are always bound to time and can sometimes be on short notice. Another goal is to let all relevant communication stay in the ilert mobile app, eliminating the need to switch between tools. To achieve this, several means of communication are introduced.

Push notifications

Whenever an action related to a coverage request is taken, a push notification is sent to the relevant person.

  • Coverage request created: receiver gets notified
  • Coverage request accepted/declined: sender gets notified
  • Coverage request cancelled: receiver gets notified

But what if the receiver doesn't have a mobile app?

Email

ilert checks if any of the relevant users don't have at least one registered push notification token (unique ID from a user on a device, used by ilert to route push notifications). If that is the case, ilert sends out an email to the user’s primary email, containing information about the coverage request.

In-app badge

Sometimes push notifications get dismissed by accident, without recognising the content (and possibly swiping away a time-critical coverage request). To provide more presence in the app, a small red circle (badge) is added at the top left of the menu icon in each list view. It indicates whether there is one or more pending coverage requests for review. Additionally, the main menu item shows a count of all pending requests at any time.

Provide filters, but keep the UI clean

Giving the user the ability to filter coverage requests in the list view is necessary. An obvious one is a filter for Received and Sent requests. Another important but tricky filter is for relevant requests only. That means any expired and not pending requests are filtered out by default. But as we already have the Received/Sent toggle, another toggle for Current/All would've cluttered the UI too much.

One idea was to introduce a filter toolbar (similar to the one implemented on the alert list), but the idea was discarded as it would've been the only filter at the time of release (which would've looked odd). Another idea was to choose the default: only show requests in state Pending, and let the user access all via a button click. Ultimately, we settled on this solution for its simplicity and ease of use.

Every day usage reveals papercuts

After the release of the feature, the ilert team started using the feature internally as well, and quickly recognized one flaw of the feature. When acting on a coverage request (accept, decline, or cancel), the coverage request would instantly disappear from the list without giving a clear confirmation of the coverage request's change of state. 

Two improvements were put in place:

  • Stay on the detail view after an action happens to see the updated state of the request
  • Keep relevant coverage requests in the list view for 24 more hours after performing an action

The latter wasn't the case before, because the list was initially built upon the state field, meaning it would instantly disappear from the list upon acceptance. A click on past requests was needed to view the just-accepted request. Therefore, an additional query parameter was defined and included in the API, enabling the frontend to specify a past creation date. The response also included all coverage requests—no matter their status—from the given creation date up to now. Now users can see all pending coverage requests, plus recently accepted/declined/cancelled ones (in the last 24 hours).

Haven't  installed the ilert app yet? Give it a try! Download the app for Android or iOS.

Explore all
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Our Cookie Policy
We use cookies to improve your experience, analyze site traffic and for marketing. Learn more in our Privacy Policy.
Open Preferences
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.