Monitoring Linux Hosts with Node Exporter, Prometheus & Grafana — Step-by-Step
A practical guide to install node_exporter on hosts, configure Prometheus as the metrics scraper, and visualize metrics in Grafana (including provisioning and example queries).
- Overview
- Prerequisites
- 1) Install node_exporter on hosts
- 2) Configure Prometheus to scrape node_exporter
- 3) Configure Grafana & add Prometheus datasource
- 4) Dashboards & example PromQL
- 5) Alerts (Prometheus Alertmanager or Grafana)
- Appendix: Docker Compose quickstart
- Best practices & troubleshooting
Overview
This post shows a minimal, production-ready approach to host-level observability using:
- node_exporter — exposes OS and hardware metrics (CPU, memory, disk, network) via /metrics on port
9100. - Prometheus — scrapes metrics from node_exporter and stores time-series data.
- Grafana — visualizes metrics using Prometheus as the datasource; dashboards show host health and trends.
Prerequisites
- Linux hosts (Ubuntu/CentOS) where you can install node_exporter (or run as container).
- A server for Prometheus and Grafana (can be one VM or container host).
- Network connectivity: Prometheus must be able to reach
HOST:9100. - Optional: firewall rules allowing port
9100from the Prometheus server only.
1) Install node_exporter on each host
Here’s a typical systemd-based installation on an Ubuntu host (replace version as needed):
# Download
wget https://github.com/prometheus/node_exporter/releases/download/v1.7.1/node_exporter-1.7.1.linux-amd64.tar.gz
tar xvf node_exporter-1.7.1.linux-amd64.tar.gz
sudo mv node_exporter-1.7.1.linux-amd64/node_exporter /usr/local/bin/
# Create systemd unit
sudo tee /etc/systemd/system/node_exporter.service >/dev/null <<'EOF'
[Unit]
Description=Prometheus Node Exporter
After=network.target
[Service]
User=nobody
ExecStart=/usr/local/bin/node_exporter --collector.textfile.directory=/var/lib/node_exporter/textfile_collector
Restart=always
[Install]
WantedBy=multi-user.target
EOF
# Start service
sudo mkdir -p /var/lib/node_exporter/textfile_collector
sudo systemctl daemon-reload
sudo systemctl enable --now node_exporter
# Verify
curl http://localhost:9100/metrics | head -n 20
The --collector.textfile.directory allows adding custom metrics as text files (useful for scripts). Use User=nobody or a dedicated user for security.
2) Configure Prometheus to scrape node_exporter
Prometheus configuration example (prometheus.yml). Add a job that scrapes all your nodes:
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node_exporter'
metrics_path: /metrics
scrape_interval: 15s
static_configs:
- targets:
- 'host1.example.com:9100'
- 'host2.example.com:9100'
# Optional relabeling to simplify instance label
relabel_configs:
- source_labels: [__address__]
regex: '([^:]+):.*'
target_label: instance
replacement: '$1'
After updating prometheus.yml, restart Prometheus and check the Targets page (http://PROM_SERVER:9090/targets) to ensure node exporters are UP.
Dynamic service discovery
If you run in cloud environments, replace static_configs with cloud SD (EC2, GCE, Kubernetes). For Kubernetes, use kubernetes_sd_configs and the Prometheus Operator for automated scraping.
3) Configure Grafana & add Prometheus datasource
Two ways to add the Prometheus datasource:
Option A — UI (Quick)
- Open Grafana web UI (e.g.,
http://GRAFANA_HOST:3000), log in (defaultadmin/admin). - Go to Configuration → Data Sources → Add data source
- Select Prometheus, set URL to
http://PROMETHEUS_HOST:9090, and click Save & Test.
Option B — Provisioning (recommended for automation)
Create a YAML file at /etc/grafana/provisioning/datasources/prometheus.yaml:
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://prometheus:9090
isDefault: true
editable: false
Restart Grafana — the datasource will be automatically created. This is ideal for repeatable infra-as-code deployments.
4) Dashboards & example PromQL queries
You can import community dashboards (search “Node Exporter Full” on grafana.com) or create custom panels. Below are useful PromQL queries for common panels:
CPU Usage (per instance)
100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
Memory Used (bytes)
node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes
Memory Usage %
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100
Disk Used % (root)
100 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"} * 100)
Load (1m)
node_load1{instance="$instance"}
Network Bytes In/Out
rate(node_network_receive_bytes_total{device!="lo"}[5m])
rate(node_network_transmit_bytes_total{device!="lo"}[5m])
When creating dashboards, group panels by purpose (CPU, Memory, Disk, Network, Processes) and include host selector variables for easy filtering.
5) Alerts (Prometheus Alertmanager or Grafana)
Two common approaches:
- Prometheus rules + Alertmanager: Define Prometheus alerting rules and route alerts via Alertmanager to email/Slack/PagerDuty.
- Grafana alerting: Grafana’s unified alerting can evaluate Prometheus queries and send notifications. This centralizes alert management in Grafana.
Example Prometheus alert rule (high CPU)
groups:
- name: node_alerts
rules:
- alert: HighCPUUsage
expr: (100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)) > 85
for: 2m
labels:
severity: critical
annotations:
summary: "High CPU on {{ $labels.instance }}"
description: "CPU usage is > 85% for more than 2 minutes."
Place rules in a file (e.g., node_rules.yml) and reference it in prometheus.yml under rule_files. Configure Alertmanager endpoints in prometheus.yml under alerting.
Appendix: Docker Compose quickstart (Prometheus + Grafana + Node Exporter)
Simple docker-compose.yml for local testing:
version: '3.7'
services:
prometheus:
image: prom/prometheus:latest
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
- ./rules/:/etc/prometheus/rules/:ro
ports:
- "9090:9090"
grafana:
image: grafana/grafana:latest
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
volumes:
- ./provisioning/:/etc/grafana/provisioning/:ro
ports:
- "3000:3000"
node-exporter-host1:
image: prom/node-exporter:latest
ports:
- "9100:9100"
command:
- '--collector.textfile.directory=/var/lib/node_exporter/textfile_collector'
volumes:
- ./textfile_collector/:/var/lib/node_exporter/textfile_collector:ro
Note: For production, run node_exporter on each host (not in the same host as Prometheus unless for lab/testing).
Best practices & troubleshooting
- Scrape intervals: 15s is common for host metrics; lower intervals increase load and storage.
- Security: Limit access to node_exporter (firewall or private network). Use mTLS or a proxy if exposing metrics across untrusted networks.
- Labeling: Use meaningful labels (environment, role, datacenter) so queries can aggregate effectively.
- Retention & downsampling: Plan Prometheus retention and use remote_write (Thanos, Cortex, VictoriaMetrics) for long-term storage.
- Textfile collector: Use scripts to export custom metrics into the textfile directory for application-specific metrics.
- Troubleshooting: If metrics don’t appear, check Prometheus targets page, verify node_exporter is reachable (
curl host:9100/metrics), and inspect firewall/security groups.