Skip to content

Learn. Share. Innovate.

  • Home
    • Overview
    • Latest Articles
    • Featured Tutorials
  • AI
    • Generative AI Basics
    • Prompt Engineering
    • AI-Driven DevOps
    • Agentic AI & Workflows
    • AI Automation Use Cases
  • Cloud Solutions
    • AWS
    • Azure
    • Google Cloud
  • DevOps Tools
    • CI/CD
    • Infrastructure as Code
      • Terraform
      • Ansible
    • Monitoring & Logging
    • Scripting & Automation
  • Containers & Orchestration
    • Docker
    • Kubernetes
    • ArgoCD
  • Home
  • Step-by-Step Guide: Monitoring Linux Hosts with Node Exporter, Prometheus & Grafana

Step-by-Step Guide: Monitoring Linux Hosts with Node Exporter, Prometheus & Grafana

Posted on September 7, 2024September 7, 2025 By vikash sinha No Comments on Step-by-Step Guide: Monitoring Linux Hosts with Node Exporter, Prometheus & Grafana
DevOps Tools, Monitoring & Logging

Monitoring Linux Hosts with Node Exporter, Prometheus & Grafana — Step-by-Step

A practical guide to install node_exporter on hosts, configure Prometheus as the metrics scraper, and visualize metrics in Grafana (including provisioning and example queries).

Table of contents

  • Overview
  • Prerequisites
  • 1) Install node_exporter on hosts
  • 2) Configure Prometheus to scrape node_exporter
  • 3) Configure Grafana & add Prometheus datasource
  • 4) Dashboards & example PromQL
  • 5) Alerts (Prometheus Alertmanager or Grafana)
  • Appendix: Docker Compose quickstart
  • Best practices & troubleshooting

Overview

This post shows a minimal, production-ready approach to host-level observability using:

  • node_exporter — exposes OS and hardware metrics (CPU, memory, disk, network) via /metrics on port 9100.
  • Prometheus — scrapes metrics from node_exporter and stores time-series data.
  • Grafana — visualizes metrics using Prometheus as the datasource; dashboards show host health and trends.

Prerequisites

  • Linux hosts (Ubuntu/CentOS) where you can install node_exporter (or run as container).
  • A server for Prometheus and Grafana (can be one VM or container host).
  • Network connectivity: Prometheus must be able to reach HOST:9100.
  • Optional: firewall rules allowing port 9100 from the Prometheus server only.
Tip: For security, restrict node_exporter access to only Prometheus (via firewall or by binding to a private interface).

1) Install node_exporter on each host

Here’s a typical systemd-based installation on an Ubuntu host (replace version as needed):

# Download
wget https://github.com/prometheus/node_exporter/releases/download/v1.7.1/node_exporter-1.7.1.linux-amd64.tar.gz
tar xvf node_exporter-1.7.1.linux-amd64.tar.gz
sudo mv node_exporter-1.7.1.linux-amd64/node_exporter /usr/local/bin/
# Create systemd unit
sudo tee /etc/systemd/system/node_exporter.service >/dev/null <<'EOF'
[Unit]
Description=Prometheus Node Exporter
After=network.target

[Service]
User=nobody
ExecStart=/usr/local/bin/node_exporter --collector.textfile.directory=/var/lib/node_exporter/textfile_collector
Restart=always

[Install]
WantedBy=multi-user.target
EOF
# Start service
sudo mkdir -p /var/lib/node_exporter/textfile_collector
sudo systemctl daemon-reload
sudo systemctl enable --now node_exporter
# Verify
curl http://localhost:9100/metrics | head -n 20

The --collector.textfile.directory allows adding custom metrics as text files (useful for scripts). Use User=nobody or a dedicated user for security.

If you run node_exporter in container mode (Docker), expose port 9100 and mount volumes for textfile collector if needed.

2) Configure Prometheus to scrape node_exporter

Prometheus configuration example (prometheus.yml). Add a job that scrapes all your nodes:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node_exporter'
    metrics_path: /metrics
    scrape_interval: 15s
    static_configs:
      - targets:
        - 'host1.example.com:9100'
        - 'host2.example.com:9100'
    # Optional relabeling to simplify instance label
    relabel_configs:
      - source_labels: [__address__]
        regex: '([^:]+):.*'
        target_label: instance
        replacement: '$1'

After updating prometheus.yml, restart Prometheus and check the Targets page (http://PROM_SERVER:9090/targets) to ensure node exporters are UP.

Dynamic service discovery

If you run in cloud environments, replace static_configs with cloud SD (EC2, GCE, Kubernetes). For Kubernetes, use kubernetes_sd_configs and the Prometheus Operator for automated scraping.

3) Configure Grafana & add Prometheus datasource

Two ways to add the Prometheus datasource:

Option A — UI (Quick)

  1. Open Grafana web UI (e.g., http://GRAFANA_HOST:3000), log in (default admin/admin).
  2. Go to Configuration → Data Sources → Add data source
  3. Select Prometheus, set URL to http://PROMETHEUS_HOST:9090, and click Save & Test.

Option B — Provisioning (recommended for automation)

Create a YAML file at /etc/grafana/provisioning/datasources/prometheus.yaml:

apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true
    editable: false

Restart Grafana — the datasource will be automatically created. This is ideal for repeatable infra-as-code deployments.

4) Dashboards & example PromQL queries

You can import community dashboards (search “Node Exporter Full” on grafana.com) or create custom panels. Below are useful PromQL queries for common panels:

CPU Usage (per instance)

100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

Memory Used (bytes)

node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes

Memory Usage %

(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100

Disk Used % (root)

100 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"} * 100)

Load (1m)

node_load1{instance="$instance"}

Network Bytes In/Out

rate(node_network_receive_bytes_total{device!="lo"}[5m])
rate(node_network_transmit_bytes_total{device!="lo"}[5m])

When creating dashboards, group panels by purpose (CPU, Memory, Disk, Network, Processes) and include host selector variables for easy filtering.

5) Alerts (Prometheus Alertmanager or Grafana)

Two common approaches:

  1. Prometheus rules + Alertmanager: Define Prometheus alerting rules and route alerts via Alertmanager to email/Slack/PagerDuty.
  2. Grafana alerting: Grafana’s unified alerting can evaluate Prometheus queries and send notifications. This centralizes alert management in Grafana.

Example Prometheus alert rule (high CPU)

groups:
- name: node_alerts
  rules:
  - alert: HighCPUUsage
    expr: (100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)) > 85
    for: 2m
    labels:
      severity: critical
    annotations:
      summary: "High CPU on {{ $labels.instance }}"
      description: "CPU usage is > 85% for more than 2 minutes."

Place rules in a file (e.g., node_rules.yml) and reference it in prometheus.yml under rule_files. Configure Alertmanager endpoints in prometheus.yml under alerting.

Appendix: Docker Compose quickstart (Prometheus + Grafana + Node Exporter)

Simple docker-compose.yml for local testing:

version: '3.7'
services:
  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - ./rules/:/etc/prometheus/rules/:ro
    ports:
      - "9090:9090"

  grafana:
    image: grafana/grafana:latest
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    volumes:
      - ./provisioning/:/etc/grafana/provisioning/:ro
    ports:
      - "3000:3000"

  node-exporter-host1:
    image: prom/node-exporter:latest
    ports:
      - "9100:9100"
    command:
      - '--collector.textfile.directory=/var/lib/node_exporter/textfile_collector'
    volumes:
      - ./textfile_collector/:/var/lib/node_exporter/textfile_collector:ro

Note: For production, run node_exporter on each host (not in the same host as Prometheus unless for lab/testing).

Best practices & troubleshooting

  • Scrape intervals: 15s is common for host metrics; lower intervals increase load and storage.
  • Security: Limit access to node_exporter (firewall or private network). Use mTLS or a proxy if exposing metrics across untrusted networks.
  • Labeling: Use meaningful labels (environment, role, datacenter) so queries can aggregate effectively.
  • Retention & downsampling: Plan Prometheus retention and use remote_write (Thanos, Cortex, VictoriaMetrics) for long-term storage.
  • Textfile collector: Use scripts to export custom metrics into the textfile directory for application-specific metrics.
  • Troubleshooting: If metrics don’t appear, check Prometheus targets page, verify node_exporter is reachable (curl host:9100/metrics), and inspect firewall/security groups.

Next steps: provision dashboards & alerts as code (Grafana provisioning + Prometheus rules in Git), consider running Prometheus behind a load balancer or adopting a managed Prometheus service for scale.

Post navigation

Next Post: Hello world! ❯

You may also like

DevOps Tools
Building a Java Application Monitoring Dashboard: Metrics, Plugins & Best Practices
September 7, 2025
AI-Driven DevOps
Mastering GitHub Copilot Agent Mode: Build Complete Projects in VS Code & Visual Studio
August 16, 2025
DevOps Tools
Placeholder for DevOps Tools
August 9, 2025
CI/CD
Why Argo CD is the Go-To Tool for Kubernetes Deployments in 2025
August 14, 2025

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • Accessing Databases in Private Subnets: Bastion Hosts vs VPNs — A Practical Guide
  • AWS SOC 2 Readiness Checker: Validate Compliance Across AWS & MongoDB Atlas
  • Building a Java Application Monitoring Dashboard: Metrics, Plugins & Best Practices
  • AWS Transfer Family Setup Guide: S3-Backed SFTP/FTPS/FTP with WinSCP Access
  • Git Change Visualizer: Author-Specific Branch Diff Reports Across Multiple Repos

Recent Comments

  1. ishika on Top AI & DevOps VS Code Extensions for Faster, Smarter Development

Archives

  • October 2025
  • September 2025
  • August 2025
  • July 2025
  • September 2024

Categories

  • Agentic AI & Workflows
  • AI-Driven DevOps
  • Automation Examples
  • AWS
  • AWS Best Practices
  • Azure
  • Azure Best Practices
  • Case Studies & Projects
  • Cheat Sheets
  • CI/CD
  • Cloud Computing
  • Cloud Security
  • Code Snippets
  • Compliance Standards
  • Compute & Networking
  • Container Security
  • Containers & Orchestration
  • DevOps Security
  • DevOps Tools
  • Docker Basics & Advanced
  • Generative AI Basics
  • GitOps
  • Helm Charts & Operators
  • Infrastructure as Code
  • Kubernetes Concepts & Deployments
  • Kubernetes Security Best Practices
  • Migration Stories
  • Monitoring & Logging
  • Performance Optimization
  • PowerShell
  • Prompt Engineering
  • Python for DevOps
  • Real-World Implementations
  • Recommended Books & Courses
  • Scripting & Automation
  • Security & Compliance
  • Security & IAM
  • Security & Identity
  • Shell Scripting
  • Storage & Databases
  • Tips & Resources
  • Tools & Utilities
  • Troubleshooting Guides
  • Uncategorized
  • Home
  • AI
  • Cloud Solutions
  • DevOps Tools
  • Containers & Orchestration
  • Home
  • AI
  • Cloud Solutions
  • DevOps Tools
  • Containers & Orchestration

Copyright © 2025 feenixdv

Theme: Oceanly News by ScriptsTown