Read More

One platform for alerting, on-call management and status pages.

Manage on-call, respond to incidents and communicate them via status pages using a single application.

Trusted by leading companies

Highlights

The features you need to operate always-on-services

Every feature in ilert is built to help you to respond to incidents faster and increase uptime.

Explore our features

Harness the power of generative AI

Enhance incident communication and streamline post-mortem creation with ilert Al. ilert AI helps your business to respond faster to incidents.

Read more
Integrations

Get started immediately using our integrations

ilert seamlessly connects with your tools using out pre-built integrations or via email. ilert integrates with monitoring, ticketing, chat, and collaboration tools.

Ready to elevate your incident management?
Start for free
Customers

What customers are saying about us

We have transformed our incident management process with ilert. Our platform is intuitive, reliable, and has greatly improved our team's response time.

ilert is a low maintenance solution, it simply delivers [...] as a result, the mental load has gone.

Tim Dauer
VP Tech

We even recommend ilert to our own customers.

Maximilian Krieg
Leader Of Managed Network & Security

We are using ilert to fix our problems sooner than our customers are realizing them. ilert gives our engineering and operations teams the confidence that we will react in time.

Dr. Robert Zores
Chief Technology Officer

ilert has proven to be a reliable and stable solution. Support for the very minor issues that occured within seven years has been outstanding and more than 7,000 incidents have been handled via ilert.

Stefan Hierlmeier
Service Delivery Manager

The overall experience is actually absolutely great and I'm very happy that we decided to use this product and your services.

Timo Manuel Junge
Head Of Microsoft Systems & Services

The easy integration of alert sources and the reliability of the alerts convinced us. The app offers our employee an easy way to respond to incidents.

Stephan Mund
ASP Manager
Stay up to date

New from our blog

Insights

6 Steps to Create Actionable Postmortems

Best practices for creating effective postmortems, ensuring that your incident analysis won't be forgotten as soon as the danger has passed

Daria Yankevich
Jun 17, 2024 • 5 min read

In DevOps and IT operations, conducting a thorough postmortem after an incident is crucial for continuous improvement. This article explores best practices for creating effective postmortems, ensuring that your incident analysis won't be forgotten as soon as the danger has passed but will be comprehensive and actionable.

What is a Postmortem?

A postmortem in DevOps is a structured process conducted after an incident or failure to analyze what happened, identify the root cause, and implement corrective actions to prevent future occurrences. It involves a detailed examination of the timeline, impact assessment, and lessons learned, fostering a culture of continuous improvement and transparency without assigning blame. The postmortem document is the final output of this process, encapsulating all the gathered information, analyses, and planned actions to be shared with relevant stakeholders.

Benefits of Conducting Postmortems

By fostering a culture focused on learning and improvement through postmortems, organizations can strengthen their infrastructure and incident response processes, making them better prepared for future incidents. The benefits of conduction postmortem include:

  • Improved recovery times.
  • Enhanced team learning and knowledge sharing.
  • Prevention of future incidents.
  • Building a culture of continuous improvement.
ilert feature showcase: create postmortem from an incident
ilert interface: create postmortem right from the incident

Postmortem Key Steps

As it's recommended in ilert's Incident Management Guide, once a major incident is resolved, the incident response lead quickly designates one of the responders to manage the postmortem process. 

Step 1: Assigner a Postmortem Owner

While creating the postmortem is a collaborative task, assigning a specific owner is essential for ensuring it is completed effectively. The postmortem owner is entrusted with several responsibilities, including:

  • Scheduling the postmortem meeting
  • Investigating the incident (drawing in the necessary expertise from other teams as required)
  • Updating the postmortem document
  • Creating follow-up action items to prevent a similar occurrence in the future.

Step 2: Schedule a Meeting

It's crucial to invite people with relevant experience and expertise, so we highly recommend checking that you have the following specialists: 

  • The Incident Response Lead
  • Owners of the services involved in the incident
  • Key engineers/responders who were involved in resolving the incident
  • Engineering and Product Managers for the impacted systems

Step 3: Build a Timeline

ilert incident timeline

Document the sequence of events objectively, without interpreting or judging the causes of the incident. The timeline should begin before the incident starts and continue until it is resolved, noting significant changes in status or impact and key actions taken by responders.

Examine the incident log in Slack or Microsoft Teams for critical decisions and actions. Also, include information that the team lacked during the incident but would have been helpful in hindsight. This information can be found in the monitoring data, logs, and deployments of the affected services.

Step 4: Documenting the Impact

Capture the incident impact from various angles. Note the duration of the observable impact, the total number of affected customers, how many reported the issue, and the severity of the functional disruption. Measure the impact using a business metric relevant to your product, such as the increase in API errors, performance slowdowns, or delays in notification delivery. If applicable, compile a list of all affected customers and share it with your support team for follow-up actions. Including any customer feedback or complaints received during the incident would also be helpful and provide context on user experience.

Step 5: Root Cause Analysis

After thoroughly understanding the incident's timeline and impact, proceed to the Root Cause Analysis (RCA) to explore the contributing factors, recognizing that complex systems often fail due to a combination of interacting elements rather than a single cause. Begin by reviewing the monitoring data of affected services, looking for irregularities such as sudden spikes or flatlining around the time of the incident. Include relevant queries, commands, graphs, or links from monitoring tools to illustrate the data collection process. If monitoring for this service is lacking, list the development of such monitoring as an action item in your postmortem. Next, identify the underlying causes by examining why the system's design allowed the incident, investigating past design decisions, and determining if they were part of a larger trend or a specific issue. Evaluate the processes, considering if collaboration, communication, and work reviews contributed to the incident, and use this stage to improve the incident response process. Summarize your findings in the postmortem, ensuring thorough documentation for a productive discussion during the postmortem meeting while remaining open to additional insights that may emerge.

ilert feature showcase: create postmortem with the help of AI
Generating postmortem using ilert AI

Step 6: Prepare Action Items

Now, it's crucial to determine steps to prevent similar issues in the future. While it may not always be feasible to completely eliminate the possibility of such incidents, focus on improving detection and mitigation measures for future events. This involves enhancing monitoring and alerting systems and developing strategies to reduce the severity or duration of incidents.

Create tickets for all proposed actions in your task management tool, ensuring each ticket includes sufficient context and a proposed direction. This will help the product owner prioritize the task and enable the assignee to carry it out efficiently. Each action item should be specific and actionable.

If any proposed actions require further discussion, add them to the postmortem meeting agenda. These could be proposals needing team validation or clarification. Discussing these items in the meeting will help determine the best course of action.

Insights

AI-Assisted Incident Management Communication

Learn how using AI Assisted Incident Management Communication can automate incident updates, ensuring that stakeholders receive clear, concise, and timely information about the incident. All while freeing up your engineers time to focus on a faster incident resolution.

Sirine Karray
Jun 11, 2024 • 5 min read

AI across the Incident Management Process

AI has revolutionized various aspects of incident response, from preparation to resolution. Across the incident response lifecycle, AI is being leveraged to streamline processes, reduce noise, and improve overall efficiency. One critical area where AI is making a significant impact is in incident communication. Effective and efficient communication is crucial during incidents, as it ensures that stakeholders are informed and aligned with the incident status and resolution efforts. In this blog, we will explore how AI-assisted incident communication is transforming the way incidents are managed and communicated.

Leveraging AI for Incident Communication

Incident communication is a critical component of incident response. It involves keeping stakeholders informed about the incident status, resolution efforts, and any necessary actions. Traditionally, this process has been manual, with engineers and incident responders spending significant time crafting updates and communicating with stakeholders. However, AI-assisted incident communication is changing this landscape. By leveraging Large Language Models (LLMs), AI can automate updates, ensuring that stakeholders receive clear, concise, and timely information about the incident.

AI-assisted incident communication involves using LLMs to generate incident reports, updates, and messages. These models are trained on vast amounts of text data, enabling them to understand the context and nuances of incident communication. When an incident occurs, AI can quickly generate a detailed incident report, including the incident status, summary, description, and affected services. This report is then used to inform stakeholders, ensuring that they are aware of the incident and its impact.

Benefits of AI-Assisted Incident Communication

AI-assisted incident communication offers several benefits, including:

  • Consistency and Clarity: AI ensures that all communications are consistent in style and tone, reducing confusion and maintaining professionalism.
  • Efficiency: By automating updates, engineers are freed up to focus on resolving the incident, speeding up the overall response time.
  • Objectivity: AI minimizes the potential for bias or oversight, offering an objective account of events.
  • Depth of Insight: AI can uncover insights that might be overlooked in manual analysis, providing a deeper understanding of underlying issues.

AI-Assisted Incident Communication with ilertAI

We have integrated AI-assisted incident communication with ilertAI, enabling seamless automation of incident updates. The example below demonstrates how a prompt can be transformed into a comprehensive incident report. This process includes generating a summary and message, setting the incident status, and selecting the impacted services from the provided prompt and the available services in the service catalog.

AI-assisted incident communication is transforming the way incidents are managed and communicated. By leveraging LLMs, AI can automate updates, ensuring that stakeholders receive clear, concise, and timely information about the incident. This approach not only enhances efficiency but also provides consistency, objectivity, and depth of insight. With solutions like ilert, implementing AI across your incident management process will be a breeze.

Engineering

How to Deploy Qdrant Database to Kubernetes Using Terraform: A Step-by-Outer Guide with Examples

There is no Terraform deployment guide for Qdrant on the internet, only the Helm variant, so we decided to publish this article.

Roman Frey
Jun 04, 2024 • 5 min read

When it comes to managing large-scale vector search operations, Qdrant is rapidly becoming a go-to choice. It is an open-source vector database that excels in storing, managing, and performing similarity search on vectors. For those leveraging Kubernetes for orchestration, integrating Qdrant via Terraform can streamline your deployment process, boosting your infrastructure’s scalability and reproducibility.

ilert AI evolves rapidly, and we have introduced a significant list of AI-supported features. Intelligent alert grouping is one of the newest, and we have used Qdrant as a backend for it. Our quick research has shown that there is no Terraform deployment guide for Qdrant on the internet, only the Helm variant, so we decided to publish this article. In this blog post, we will take you through the process of deploying Qdrant on a Kubernetes cluster using Terraform, complete with step-by-step examples to ensure you can follow along, even if you’re relatively new to these technologies.

A word about Qdrant: A High-Performance Vector Database

At its core, Qdrant is designed to store vectors—essentially lists of floating-point numbers that represent the features of items in a high-dimensional space. These vectors might represent anything from user preferences in a recommendation system to feature descriptors in image recognition systems.


Qdrant distinguishes itself with several robust features:


  • Persistence and High Availability: Unlike some vector databases designed only for in-memory use, Qdrant supports data persistence. It ensures high availability and durability by storing data on disk without sacrificing the query performance.
  • Efficient Similarity Search: Using state-of-the-art indexing techniques like HNSW (Hierarchical Navigable Small World graphs), Qdrant provides quick nearest-neighbor searches in high-dimensional spaces, which are crucial for real-time applications.
  • Scalable Architecture: Qdrant is designed with a focus on scalability. It supports horizontal scaling, which is a perfect match for deployment on Kubernetes clusters.
  • Flexible Data Management: Besides vectors, Qdrant allows storing additional payload that can be used for filtering and providing more context during searches.
Qdrant Features
Qdrant Features

Prerequisites

Before beginning the deployment process, ensure you have:

  • A Kubernetes cluster set up and accessible
  • Terraform installed on your machine
  • kubectl installed and configured to communicate with your Kubernetes cluster
  • Basic understanding of Kubernetes and Terraform concepts

Step 1: Setting Up Your Terraform Configuration

First, you’ll need to set up your Terraform configuration to deploy Qdrant. Create a directory where you will keep all your Terraform configurations.

mkdir qdrant-deployment

cd qdrant-deployment 

Create a providers.tf file to define the Kubernetes providers.

terraform {

  required_providers {

    kubernetes = {

      source = "hashicorp/kubernetes"

    }

  }

}

provider "kubernetes" {

  config_path = "~/.kube/config"

}

Step 2: Defining the Qdrant Deployment

Create a qdrant-deployment.tf file in the same directory. This file will define the deployment resource for Qdrant. Update the deployment spec according to your specific configuration needs.

module "qdrant" {

  source  = "iLert/qdrant/kubernetes"

  version = "1.0.0"  # Check for the latest version on the Terraform Registry

  # You can customize your deployment by specifying module variables here

  namespace        = "qdrant"

  replica_count    = 3

  qdrant_version   = "latest"  # Use a specific version if necessary

}

Note: Modify the namespace, replica_count, and qdrant_version according to your deployment requirements.

Step 3: Deploy Qdrant Using Terraform

Initialize Terraform to download and set up the Qdrant module.

terraform init 

Apply the configuration. Terraform will compute the changes to be made and present a plan.

terraform apply 

Confirm the deployment by typing yes when prompted. Terraform will proceed to deploy Qdrant to your Kubernetes cluster using the configurations specified through the module.

Step 4: Verify the Deployment

Once Terraform successfully applies your configurations, ensure the Qdrant pods and services are up and running.

kubectl get pods -n qdrant 

kubectl get services -n qdrant 

You should see the Qdrant pods running and a service set up to expose Qdrant to other applications or services.

Step 5: Interacting with Qdrant

At this point, Qdrant is deployed and running in your Kubernetes environment. You can begin interacting with it via REST API or any of the client libraries available for Qdrant to perform vector searches or manage vectors and payloads.

Conclusion

The integration of Qdrant into Kubernetes represents a powerful solution for businesses and developers looking to leverage advanced vector search capabilities. The automated, infrastructure-as-code approach not only simplifies the deployment process but also enhances the robustness and scalability of your applications. As AI and machine learning continue to evolve, efficiently handling and searching large datasets becomes increasingly critical. Qdrant, with its sophisticated vector storage and precise similarity search algorithms, provides an excellent foundation.For further customization and advanced configurations, refer to the documentation on the Terraform Registry for the ilert Qdrant module, the official Qdrant documentation to explore the full capabilities of your new Qdrant deployment, and the Terraform provider documentation for Kubernetes.

Explore all
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Our Cookie Policy
We use cookies to improve your experience, analyze site traffic and for marketing. Learn more in our Privacy Policy.
Open Preferences
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.