Table of contents
- Understanding the Flow Diagram
- Components and Their Roles
- Prerequisites
- Setup Steps
- Node Exporter Setup (VM-1)
- Prometheus Setup (VM-2)
- Alertmanager Setup (VM-2)
- Setting Up Blackbox Exporter
- Step 1: Download Blackbox Exporter
- Step 2: Extract Blackbox Exporter
- Step 3: Move Files to /usr/local/bin
- Step 4: Create a Dedicated User to Run the Exporter
- Step 5: Set Up Configuration Directory
- Step 6: Create a Systemd Service File
- Step 7: Reload Systemd and Start the Service
- Step 8: Verify the Installation
- Comprehensive Setup for Monitoring Applications with Prometheus and Alertmanager
- Step 1: Clone the Project Repository
- Step 2: Install Java and Maven
- Step 3: Package the Application into a JAR File
- Step 4: Configure Prometheus Alert Rules
- Step 5: Update Prometheus Configuration
- Step 6: Add Node Exporter and Blackbox Exporter Configurations
- Step 7: Configure Email Notifications with Alertmanager
- Step 8: Test the Configuration
In this technical guide, we'll set up a robust monitoring solution for EC2 instances running Ubuntu OS. This project incorporates Prometheus, Alertmanager, Node Exporter, and Blackbox Exporter for effective monitoring and alerting.
Understanding the Flow Diagram
Components and Their Roles
VM-1:
Node Exporter:
- Installed on VM-1 to expose system metrics such as CPU usage, memory usage, and disk space in a format that Prometheus can scrape.
Nginx:
- A web server running on VM-1 that serves as an HTTP endpoint for testing availability and performance using Blackbox Exporter.
VM-2:
Prometheus:
Metrics Collection:
Continuously scrapes metrics from Node Exporter on VM-1 and system services like Nginx.
Retrieves external probes and HTTP status checks from Blackbox Exporter.
Alert Rules Evaluation:
- Processes the collected metrics based on pre-defined rules (e.g., detecting high memory usage or service unavailability).
Data Flow:
- Metrics from Node Exporter and Blackbox Exporter are combined for real-time analysis and visualization.
Blackbox Exporter:
Used for probing external targets like HTTP endpoints (e.g., Nginx) to check availability and performance.
Sends probe success or failure metrics back to Prometheus.
Alertmanager:
Alert Routing:
Receives alerts from Prometheus when certain thresholds or conditions are breached (e.g., service downtime, high CPU usage).
Routes these alerts to configured notification channels (e.g., email).
Inhibition Rules:
- Suppresses less critical alerts when higher-priority alerts are active, ensuring relevant notifications are sent.
Email Notifications:
Alertmanager integrates with an email system (e.g., Gmail).
Sends email alerts to notify administrators about critical issues like service downtime or resource exhaustion.
Prerequisites
Install Required Tools: Ensure that
wget
andtar
are installed on both VMs.Permissions: Ensure you have permissions to download, extract, and run the binaries.
Networking: Configure the firewall to allow the following ports:
Prometheus:
9090
Alertmanager:
9093
Blackbox Exporter:
9115
Node Exporter:
9100
Setup Steps
1. Create 2 EC2 Instances
Use
t2.medium
instances with 20GB storage and Ubuntu OS.
VM-1:
- Services: Node Exporter, Nginx.
VM-2:
- Services: Prometheus, Alertmanager, Blackbox Exporter.
Node Exporter Setup (VM-1)
Update the Packages:
sudo apt update -y
Download Node Exporter:
wget https://github.com/prometheus/node_exporter/releases/download/v1.8.1/node_exporter-1.8.1.linux-amd64.tar.gz
Extract Node Exporter:
tar xvfz node_exporter-1.8.1.linux-amd64.tar.gz
Set Up Node Exporter as a Systemd Service:
- Create a Service File:
sudo vim /etc/systemd/system/node_exporter.service
- Configuration:
[Unit]
Description=Node Exporter
After=network.target
[Service]
User=ubuntu
ExecStart=/home/ubuntu/node_exporter-1.8.1.linux-amd64/node_exporter
[Install]
WantedBy=multi-user.target
- Reload and Enable the Service:
sudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporter
Check Service Status:
sudo systemctl status node_exporter
Verify Node Exporter: Access http://<public-ip>:9100
Prometheus Setup (VM-2)
Update the Packages:
sudo apt update -y
Download Prometheus:
wget https://github.com/prometheus/prometheus/releases/download/v2.52.0/prometheus-2.52.0.linux-amd64.tar.gz
Extract Prometheus:
tar xvfz prometheus-2.52.0.linux-amd64.tar.gz
Move Files and Configure:
sudo mv prometheus-2.52.0.linux-amd64/prometheus /usr/local/bin/
sudo mv prometheus-2.52.0.linux-amd64/promtool /usr/local/bin/
Create Prometheus User and Directories:
sudo useradd --no-create-home --shell /bin/false prometheus
sudo mkdir /etc/prometheus
sudo mkdir /var/lib/prometheus
sudo chown prometheus:prometheus /etc/prometheus /var/lib/prometheus
Copy the Configuration File:
sudo cp prometheus-2.52.0.linux-amd64/prometheus.yml /etc/prometheus/prometheus.yml
sudo chown prometheus:prometheus /etc/prometheus/prometheus.yml
Set Up as a Systemd Service:
- Create a Service File:
sudo vim /etc/systemd/system/prometheus.service
- Configuration:
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
ExecStart=/usr/local/bin/prometheus --config.file=/etc/prometheus/prometheus.yml --storage.tsdb.path=/var/lib/prometheus/
[Install]
WantedBy=multi-user.target
-
Reload and Enable the Service:
sudo systemctl daemon-reload
sudo systemctl enable prometheus
sudo systemctl start prometheus
Access Prometheus Web Interface: http://<your-ec2-public-ip>:9090
Check Logs (Optional):
journalctl -u prometheus -f
Alertmanager Setup (VM-2)
Download and Install Alertmanager
Download Alertmanager:
wget https://github.com/prometheus/alertmanager/releases/download/v0.27.0/alertmanager-0.27.0.linux-amd64.tar.gz
Extract Alertmanager:
tar xvfz alertmanager-0.27.0.linux-amd64.tar.gz
Move Alertmanager Binaries:
sudo mv alertmanager-0.27.0.linux-amd64/alertmanager /usr/local/bin/
sudo mv alertmanager-0.27.0.linux-amd64/amtool /usr/local/bin/
Create Alertmanager User and Directories
Create a Dedicated User:
sudo useradd --no-create-home --shell /bin/false alertmanager
Create Directories for Data and Configuration:
sudo mkdir /etc/alertmanager
sudo mkdir /var/lib/alertmanager
sudo chown alertmanager:alertmanager /etc/alertmanager /var/lib/alertmanager
Configure Alertmanager
Move Configuration File:
sudo mv ~/alertmanager-0.27.0.linux-amd64/alertmanager.yml /etc/alertmanager/
sudo chown alertmanager:alertmanager /etc/alertmanager/alertmanager.yml
Create a Systemd Service for Alertmanager
Create the Service File:
sudo vim /etc/systemd/system/alertmanager.service
Add the Following Configuration:
[Unit]
Description=Alertmanager
Wants=network-online.target
After=network-online.target
[Service]
User=alertmanager
Group=alertmanager
ExecStart=/usr/local/bin/alertmanager --config.file=/etc/alertmanager/alertmanager.yml --storage.path=/var/lib/alertmanager/
[Install]
WantedBy=multi-user.target
Start and Enable Alertmanager
Reload Systemd Configuration:
sudo systemctl daemon-reload
Start the Service:
sudo systemctl start alertmanager
Enable the Service:
sudo systemctl enable alertmanager
Check Service Status:
sudo systemctl status alertmanager
Setting Up Blackbox Exporter
Step 1: Download Blackbox Exporter
wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.25.0/blackbox_exporter-0.25.0.linux-amd64.tar.gz
Step 2: Extract Blackbox Exporter
tar xvfz blackbox_exporter-0.25.0.linux-amd64.tar.gz
Step 3: Move Files to /usr/local/bin
sudo mv blackbox_exporter-0.25.0.linux-amd64/blackbox_exporter /usr/local/bin/
Step 4: Create a Dedicated User to Run the Exporter
sudo useradd --no-create-home --shell /bin/false blackbox_exporter
Step 5: Set Up Configuration Directory
Create the directory:
sudo mkdir /etc/blackbox_exporter
Move the configuration file (if required):
sudo mv blackbox_exporter-0.25.0.linux-amd64/blackbox.yml /etc/blackbox_exporter/
Set proper permissions:
sudo chown blackbox_exporter:blackbox_exporter /etc/blackbox_exporter -R
Step 6: Create a Systemd Service File
Open a new systemd service file:
sudo vim /etc/systemd/system/blackbox_exporter.service
Add the following content:
[Unit] Description=Blackbox Exporter Wants=network-online.target After=network-online.target [Service] User=blackbox_exporter Group=blackbox_exporter Type=simple ExecStart=/usr/local/bin/blackbox_exporter \ --config.file=/etc/blackbox_exporter/blackbox.yml [Install] WantedBy=multi-user.target
Step 7: Reload Systemd and Start the Service
Reload systemd to recognize the new service:
sudo systemctl daemon-reload
Start the Blackbox Exporter service:
sudo systemctl start blackbox_exporter
Enable the service to start on boot:
sudo systemctl enable blackbox_exporter
Step 8: Verify the Installation
Check the service status:
sudo systemctl status blackbox_exporter
Test that the exporter is working by accessing it on its default port (9115):
curl http://localhost:9115/metrics
Comprehensive Setup for Monitoring Applications with Prometheus and Alertmanager
Step 1: Clone the Project Repository
Download the repository of the project you want to monitor:
git clone https://github.com/imkiran13/Boardgame.git
Step 2: Install Java and Maven
Since the application is Java-based, you need to install Java and Maven:
sudo apt install openjdk-17-jre-headless -y
sudo apt install maven -y
Step 3: Package the Application into a JAR File
cd Boardgame/
mvn package
Navigate to the target
directory and run the JAR file:
cd target/
java -jar jar-file-name.jar
Access the application on port 8080
.
Step 4: Configure Prometheus Alert Rules
Go to the Prometheus configuration folder and create an alert rules file:
sudo vim /etc/prometheus/alert_rules.yml
Add the following content:
groups:
- name: alert_rules
rules:
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
severity: critical
annotations:
summary: Endpoint {{ $labels.instance }} down
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minute."
- alert: WebsiteDown
expr: probe_success == 0
for: 1m
labels:
severity: critical
annotations:
summary: Website down
description: The website at {{ $labels.instance }} is down.
- alert: HostOutOfMemory
expr: node_memory_MemAvailable / node_memory_MemTotal * 100 < 25
for: 5m
labels:
severity: warning
annotations:
summary: Host out of memory (instance {{ $labels.instance }})
description: |
Node memory is filling up (< 25% left)
VALUE = {{ $value }}
LABELS: {{ $labels }}
- alert: HostOutOfDiskSpace
expr: (node_filesystem_avail{mountpoint="/"} * 100) / node_filesystem_size{mountpoint="/"} < 50
for: 1s
labels:
severity: warning
annotations:
summary: Host out of disk space (instance {{ $labels.instance }})
description: |
Disk is almost full (< 50% left)
VALUE = {{ $value }}
LABELS: {{ $labels }}
- alert: HostHighCpuLoad
expr: sum by (instance) (irate(node_cpu{job="node_exporter_metrics", mode="idle"}[5m])) > 80
for: 5m
labels:
severity: warning
annotations:
summary: Host high CPU load (instance {{ $labels.instance }})
description: |
CPU load is > 80%
VALUE = {{ $value }}
LABELS: {{ $labels }}
- alert: ServiceUnavailable
expr: up{job="node_exporter"} == 0
for: 2m
labels:
severity: critical
annotations:
summary: Service Unavailable (instance {{ $labels.instance }})
description: |
The service {{ $labels.job }} is not available
VALUE = {{ $value }}
LABELS: {{ $labels }}
- alert: HighMemoryUsage
expr: (node_memory_Active / node_memory_MemTotal) * 100 > 90
for: 10m
labels:
severity: critical
annotations:
summary: High Memory Usage (instance {{ $labels.instance }})
description: |
Memory usage is > 90%
VALUE = {{ $value }}
LABELS: {{ $labels }}
- alert: FileSystemFull
expr: (node_filesystem_avail / node_filesystem_size) * 100 < 10
for: 5m
labels:
severity: critical
annotations:
summary: File System Almost Full (instance {{ $labels.instance }})
description: |
File system has < 10% free space
VALUE = {{ $value }}
LABELS: {{ $labels }}
Step 5: Update Prometheus Configuration
Edit the Prometheus configuration file to include the alert rules file:
sudo vim /etc/prometheus/prometheus.yml
Add the following:
rule_files:
- "alert_rules.yml"
Restart Prometheus:
sudo systemctl restart prometheus
Step 6: Add Node Exporter and Blackbox Exporter Configurations
Edit the Prometheus configuration file:
sudo vim /etc/prometheus/prometheus.yml
Add the following:
---
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
rule_files:
- alert_rules.yml
scrape_configs:
- job_name: prometheus
static_configs:
- targets:
- localhost:9090
- job_name: node_exporter
static_configs:
- targets:
- 13.127.62.132:9100 # Node Exporter IP & Port
- job_name: blackbox
metrics_path: /probe
params:
module:
- http_2xx
static_configs:
- targets:
- http://prometheus.io
- https://prometheus.io
- http://13.127.62.132:8080/ # Application Server IP & Port
relabel_configs:
- source_labels:
- __address__
target_label: __param_target
- source_labels:
- __param_target
target_label: instance
- target_label: __address__
replacement: 43.205.120.90:9115 # Blackbox Exporter IP & Port
Restart Blackbox Exporter:
sudo systemctl restart blackbox_exporter
Restart Prometheus:
sudo systemctl restart prometheus
Step 7: Configure Email Notifications with Alertmanager
Edit the Alertmanager configuration file:
sudo vim /etc/alertmanager/alertmanager.yml
Add the following:
route:
group_by:
- alertname
group_wait: 30s
group_interval: 5m
repeat_interval: 1h
receiver: email-notifications
receivers:
- name: email-notifications
email_configs:
- to: imkiran13@gmail.com
from: test@gmail.com
smarthost: smtp.gmail.com:587
auth_username: imkiran13@gmail.com
auth_identity: imkiran13@gmail.com
auth_password: dlqvbjsebrxslrts
send_resolved: true
inhibit_rules:
- source_match:
severity: critical
target_match:
severity: warning
equal:
- alertname
- dev
- instance
Generate an app password for your Gmail account:
Go to Manage My Google Account.
Search for App Passwords.
Generate and use the password in the configuration.
Restart Alertmanager:
sudo systemctl restart alertmanager
Restart Prometheus:
sudo systemctl restart prometheus
Step 8: Test the Configuration
Stop the application running on port
8080
.Verify the "Website Down" alert in Prometheus. You should see the alert transitioning to a firing state.
Check your email for notification alerts.
This comprehensive setup ensures proactive monitoring and timely alerting for your EC2 instances. Happy monitoring!