Building a full end-to-end, cloud-native alerting pipeline

A wide-format flat-style illustration on a blue gradient background showing a cloud-native alerting pipeline: at top center, the bold white title ‘Building a Full Cloud-Native Alerting Pipeline.’ On the left, a white cloud icon containing the Prometheus flame logo feeds into light-blue pipes. In the middle, a larger white cloud labeled ‘Alertmanager’ holds two orange alert symbols with white exclamation icons. From there, pipes branch right into a white cloud labeled ‘webhook.site’ with a matching orange alert triangle, and down to a generic notification endpoint at the bottom.

Last Updated: July 14, 2025

Why Cloud-Native Alerting Pipelines Are Vital Today

In modern cloud operations, systems are dynamic, distributed, and constantly scaling. Whether you’re deploying microservices across Kubernetes clusters, serverless functions in AWS, or containers in GCP, things break fast; and often without warning. That’s where a cloud-native alerting pipeline shines:

  • Catch issues early: Alerting empowers teams to detect anomalies (like rising error rates or resource exhaustion; before downtime impacts customers).
  • Reduce noise, increase signal: With over-alerting often causing “alert fatigue,” integrating tools such as Prometheus and Alertmanager helps you apply silencing, grouping, and routing logic effectively.
  • Enable rapid response: Pushing alerts through channels like Slack, PagerDuty, or via webhooks ensures your on-call team gets notified with context fast.
  • Scale confidently: As your infrastructure scales out or shifts, having alert pipelines that are declaratively managed (e.g. via IaC or GitOps) prevents configuration drift.

These foundational capabilities convert metrics and logs into actionable warnings, minimizing downtime and giving reliability teams a fighting chance.

In this deep dive you will build a full end-to-end, cloud-native alerting pipeline on WSL Ubuntu. We start by installing and configuring Prometheus and Alertmanager, then route alerts to webhook receivers. You will learn how to:

  • Set up a unified workspace for monitoring tools
  • Install and configure Alertmanager with routing and inhibition rules
  • Install Prometheus and define custom alerting rules
  • Launch both services and observe alerts flowing through the pipeline
  • Trigger and resolve test alerts to verify your configuration

Before You Begin

Assumptions

  1. You have a compatible Linux environment (WSL Ubuntu or similar).
    • Note: WSL gives you a full Linux shell that works just like a native Ubuntu machine, so you’ll install and run Prometheus, Alertmanager, exporters, and Ansible playbooks there. If you prefer an EC2 Ubuntu free-tier box, the steps are nearly identical (just omit the WSL install)
  2. You have wget, tar, nano, and basic shell tools installed.
    • Install basic tools if missing:
      sudo apt update && sudo apt install -y wget tar nano
    • If you already have these tools, you’re good to go.
  3. Ports 9090 and 9093 are available on localhost.

Prerequisites

  • A free account at webhook.site to capture HTTP posts
  • Familiarity with basic Linux file operations and shell commands

Getting Your Webhook URL

Initially I wanted to use OpsGenie as the alerting service in this deep dive; However, since recently new OpsGenie accounts now require Atlassian, I opted to use webhook.site for hands-on testing. In production, I would swap to OpsGenie or PagerDuty URLs with identical Alertmanager config.

  • Visit https://webhook.site in your browser.
  • Copy the unique URL it generates (it looks like https://webhook.site/<YOUR_ID>).
  • That endpoint will collect any POST requests you send, so you can inspect payloads.

1. Prepare Your Workspace

Create a single folder to hold all monitoring binaries, configs, and logs:

This keeps your setup organized and makes cleanup easier.

Recommended Directory structure:

2. Install Alertmanager v0.27.0

  1. Download the release and unpack it:
    cd ~/monitoring wget https://github.com/prometheus/alertmanager/releases/download/v0.27.0/alertmanager-0.27.0.linux-amd64.tar.gz tar xzf alertmanager-0.27.0.linux-amd64.tar.gz mv alertmanager-0.27.0.linux-amd64 alertmanager
  2. Verify the binaries are in place:
    ls alertmanager/bin
    # expect: alertmanager amtool

Why v0.27.0

It is the latest stable non-rc release with the v2 API support we need. (At the time this deep dive was written)

3. Configure Alertmanager

  1. Create the config folder and open the YAML:
    mkdir -p ~/monitoring/alertmanager/config
    nano ~/monitoring/alertmanager/config/alertmanager.yml
  2. Paste the following, replacing <YOUR_ID> with the ID you copied from webhook.site:

Key points

  • route: default to webhook-critical, with child routes for warnings and infra alerts
  • group_by: batch alerts by alert name
  • inhibit_rules: suppress warnings when a critical alert with the same name is firing
Screenshot of alertmanager.yml.

4. Launch and (Re)load Alertmanager

A. Run in Background

B. Run in Foreground (Quick Testing)

C. Reload Configuration (Apply Changes)

If Alertmanager is already running, reload its config without downtime:

Or fully restart:

Why both?

  • Foreground is great for a quick check and real-time logs.
  • Background frees your shell and persists after logout.
  • SIGHUP tells Alertmanager to re-read its config, avoiding a full restart.

Verify it is running or reloaded:

Open the UI at http://localhost:9093/.

Screenshot of the Alertmanager UI.

5. Install Prometheus v2.47.0

  1. Download and unpack:
    cd ~/monitoring wget https://github.com/prometheus/prometheus/releases/download/v2.47.0/prometheus-2.47.0.linux-amd64.tar.gz
    tar xzf prometheus-2.47.0.linux-amd64.tar.gz
    mv prometheus-2.47.0.linux-amd64 prometheus
  2. Inspect the directory layout:
    ls prometheus
    # expect: prometheus promtool consoles console_libraries

6. Configure Prometheus

  1. Create and edit the main config:
    nano ~/monitoring/prometheus/prometheus.yml
  2. Paste:
  1. Create a rule file for a test alert:
    mkdir -p ~/monitoring/prometheus/rules
    nano ~/monitoring/prometheus/rules/custom_rules.yml
  2. Paste:

Why this rule?

It always evaluates true so you can immediately see a warning+infra alert in action.

Custom Rules YAML config
Screenshot of custom_rules.yml

7. Launch Prometheus

Start Prometheus and log output:

Verify it is running:

Open the UI at http://localhost:9090/.

8. Trigger and Observe Alerts

  1. In Prometheus UI (/alerts), you should see TestWarningAlert firing immediately.
  2. In Alertmanager UI (/alerts), confirm it routes under webhook-warning and webhook-infra.
  3. In your webhook.site inbox, refresh to see two POST entries:
    • One with ?severity=warning
    • One with ?team=infra
Alert Firing - Screenshot of webhook.site inbox.
Alert Firing – Screenshot of webhook.site inbox.

9. (Optional) Resolve the Alert

To test resolution, clear the rule and reload Prometheus:

Within seconds, Alertmanager will send status: resolved payloads to each webhook.

Alert Resolved - Screenshot of webhook.site inbox.
Alert Resolved – Screenshot of webhook.site inbox.

10. Additional Notes

  • Prometheus UI only shows alerts it evaluates itself. Alerts posted directly to Alertmanager via api/v2/alerts appear only in Alertmanager UI.
  • You can use Pushgateway to simulate a push flow that goes through Prometheus.
  • OpsGenie migration: since new OpsGenie accounts require Atlassian, we used webhook.site for hands-on testing. In production, I would swap to OpsGenie or PagerDuty URLs with identical Alertmanager config.

Troubleshooting & Common Pitfalls

Even with a solid alerting pipeline, issues can crop up. Here’s how to catch and resolve the most common ones.

  1. Alerts Not Appearing in Alertmanager
  • Prometheus misconfigured alerting block: Ensure prometheus.yml has:

Without this, alerts won’t reach Alertmanager.

  • Network issues: Confirm connectivity with:

This uncovers port, DNS, or TLS problems.

  1. Rules Not Loading or Firing
  • Syntax errors in rules: Validate your .yml files with:
  • PromQL not evaluating: Use Prometheus UI to manually test query logic.
  1. Alertmanager Isn’t Routing/Notifying
  • No matching route/receiver: Inspect your YAML; routing and receiver labels must align. If no match, it falls back to receiver.
  • Bad templates or encoding errors: Non-ASCII labels or empty template expansions can silently drop alerts. Check logs for warnings like Message has been modified because the content was empty.
  1. Flapping & Duplicate Notifications
  • Mismatch in evaluation intervals: If alerts trigger too quickly after resolve, increase the for: duration (e.g., 3–4× your scrape interval) to prevent flapping.
  • Overlapping routing with continue: true: This can trigger the same alert across multiple receivers; review routing logic to avoid duplicates.
  1. Silent Failures or “Dead” Pipelines
  • No health-check alert like DeadMansSwitch: Include a synthetic alert and monitor it. If it stops, you’ll know the pipeline is broken.
  • Receiver permissions or message truncation: Verify that Alertmanager has write access to notification channels (e.g., SNS policies, email authentication). Look for “invalid key/value” errors in logs.
  1. Performance & Scaling Concerns
  • High metric cardinality: Too many labels lead to resource strain; trim unnecessary dimensions and aggregate metrics.
  • Slow PromQL queries: Optimize your rules to avoid queries that fetch large time ranges or use expensive functions.

Troubleshooting Checklist

SymptomQuick Checks
Alerts never firepromtool check rules, query in Prom UI
Prom Alertmgr failsnc, curl, verify alerting block
No notificationExamine Alertmanager logs (journalctl, debug output)
Messages truncatedAdjust templates and encoding
Flapping alertsAdd for: clause, review repeat_interval, suppress overlaps
Pipeline silent deathUse DeadMansSwitch, verify write permissions

Summary

Most issues stem from misconfiguration, network hiccups, or noisy/faulty rule setups. Start with logs and connectivity tests, validate syntax with promtool, and use synthetic health-check alerts to catch pipeline failures early. Compound this with regular pipeline reviews and you can sleep through the night; alert-free or not.

Next Steps

  • Replace webhook.site with a real alerting service like OpsGenie or PagerDuty by updating webhook_configs URLs
  • Add real scrape targets and meaningful alert rules for your infrastructure
  • Secure Alertmanager and Prometheus endpoints behind authentication or a VPN
  • Integrate silences and notification templates for richer alert context

Feel free to drop a comment or question below to share feedback or run into any issues. Happy alerting!

Download the Sample Files

You can download all 3 YAML files here as a ZIP archive.

Want more tutorials like this?

Subscribe and get actionable DevOps & Linux automation tips straight to your inbox.

smartphone, hand, inbox, empty, mailbox, digital, mobile phone, screen, lcd, inbox, inbox, inbox, inbox, inbox, lcd

Leave a Comment

Your email address will not be published. Required fields are marked *