No description

Find a file

MatCatLeroux dc5528eed2 All checks were successful Merge into main / Build Cache Push (push) Successful in 20s Details remove context from workflow Signed-off-by: MatCatLeroux <mathieu.chaleroux@gmail.com>		2025-11-04 19:42:52 +01:00
.forgejo/workflows	remove context from workflow	2025-11-04 19:42:52 +01:00
vendor	update Dockerfile and add workflow	2025-11-03 22:26:39 +01:00
.gitignore	Initial commit	2025-11-03 20:58:57 +00:00
Dockerfile	update Dockerfile and add workflow	2025-11-03 22:26:39 +01:00
go.mod	update Dockerfile and add workflow	2025-11-03 22:26:39 +01:00
go.sum	update Dockerfile and add workflow	2025-11-03 22:26:39 +01:00
LICENSE	update Dockerfile and add workflow	2025-11-03 22:26:39 +01:00
main.go	update Dockerfile and add workflow	2025-11-03 22:26:39 +01:00
README.md	update Dockerfile and add workflow	2025-11-03 22:26:39 +01:00

README.md

Kubelet Volume Stats Exporter

A Kubernetes DaemonSet application that addresses the regression in Kubernetes 1.34 where kubelet_volume_stats_* metrics are no longer exposed by the kubelet. This exporter retrieves volume statistics from the kubelet's /stats/summary API endpoint and exposes them in Prometheus format for backward compatibility.

Problem Statement

Starting with Kubernetes 1.34, CSI volume statistics disappeared from both the kubelet /stats/summary and /metrics endpoints (see kubernetes/kubernetes#133961). This breaks monitoring and alerting for persistent volume usage across clusters.

Solution

This application:

Runs as a DaemonSet on every node in your cluster
Queries the local kubelet's /stats/summary API endpoint
Extracts volume statistics for all pods on the node
Exposes metrics in Prometheus format with the same metric names as the original kubelet_volume_stats_* metrics
Provides backward compatibility for existing monitoring dashboards and alerts

Metrics Exposed

The exporter provides the following metrics with labels namespace, persistentvolumeclaim, and pod:

kubelet_volume_stats_capacity_bytes - Capacity in bytes of the volume
kubelet_volume_stats_available_bytes - Number of available bytes in the volume
kubelet_volume_stats_used_bytes - Number of used bytes in the volume
kubelet_volume_stats_inodes - Maximum number of inodes in the volume
kubelet_volume_stats_inodes_free - Number of free inodes in the volume
kubelet_volume_stats_inodes_used - Number of used inodes in the volume

Additional operational metrics:

kubelet_volume_stats_scrape_errors_total - Total number of errors while scraping kubelet stats
kubelet_volume_stats_last_scrape_timestamp_seconds - Timestamp of the last successful scrape

Prerequisites

Kubernetes cluster version 1.34+ (or any version where volume stats are missing)
kubectl configured to access your cluster
Docker or compatible container runtime for building the image
(Optional) Prometheus Operator for ServiceMonitor support

Quick Start

Deploy with Helm

# Add the Helm repository
helm repo add vbeaucha https://vbeaucha.github.io/helm-charts
helm repo update

# Install the chart
helm upgrade --install kubelet-volume-stats vbeaucha/kubelet-volume-stats-exporter \
  -n kubelet-volume-stats \
  --create-namespace

# Verify deployment
kubectl get daemonset -n kubelet-volume-stats
kubectl get pods -n kubelet-volume-stats

# Test metrics endpoint
kubectl port-forward -n kubelet-volume-stats daemonset/kubelet-volume-stats-exporter 8080:8080
curl http://localhost:8080/metrics | grep kubelet_volume_stats

Configuration

The exporter supports the following command-line flags:

Flag	Default	Description
`--kubelet-endpoint`	`https://127.0.0.1:10250`	Kubelet endpoint URL
`--metrics-port`	`8080`	Port to expose Prometheus metrics
`--scrape-interval`	`30s`	Interval to scrape kubelet stats
`--token-path`	`/var/run/secrets/kubernetes.io/serviceaccount/token`	Path to service account token
`--insecure-skip-tls-verify`	`false`	Skip TLS certificate verification

You can modify these in the DaemonSet manifest under the args section.

Prometheus Integration

Standard Prometheus

The Service includes annotations for automatic discovery:

annotations:
  prometheus.io/scrape: "true"
  prometheus.io/port: "8080"
  prometheus.io/path: "/metrics"

Complete Scrape Configuration

For standard Prometheus (non-Operator), add this scrape configuration to handle label conflicts:

scrape_configs:
  - job_name: 'kubelet-volume-stats-exporter'
    kubernetes_sd_configs:
      - role: endpoints
        namespaces:
          names:
            - kubelet-volume-stats

    relabel_configs:
      # Keep only endpoints for the kubelet-volume-stats-exporter service
      - source_labels: [__meta_kubernetes_service_name]
        action: keep
        regex: kubelet-volume-stats-exporter

      # Use pod name as instance label
      - source_labels: [__meta_kubernetes_pod_name]
        target_label: instance

      # Add node name label
      - source_labels: [__meta_kubernetes_pod_node_name]
        target_label: node

    # Fix label conflicts: Prometheus adds namespace/pod labels from Kubernetes metadata,
    # which conflict with the metric's own namespace/pod labels, causing them to be
    # renamed to exported_namespace/exported_pod. These rules rename them back.
    metric_relabel_configs:
      # Rename exported_namespace back to namespace
      - source_labels: [exported_namespace]
        target_label: namespace
        action: replace

      # Drop the exported_namespace label
      - regex: exported_namespace
        action: labeldrop

      # Rename exported_pod back to pod
      - source_labels: [exported_pod]
        target_label: pod
        action: replace

      # Drop the exported_pod label
      - regex: exported_pod
        action: labeldrop

Why these relabeling rules are needed: Prometheus automatically adds namespace and pod labels from Kubernetes service discovery metadata (the exporter's namespace/pod). These conflict with the metric's own namespace and pod labels (the PVC's namespace/pod), causing Prometheus to rename the metric labels to exported_namespace and exported_pod. The metric_relabel_configs above fix this by renaming them back.

Prometheus Operator

Enable ServiceMonitor for automatic scraping:

helm upgrade --install kubelet-volume-stats vbeaucha/kubelet-volume-stats-exporter \
  -n kubelet-volume-stats \
  --set serviceMonitor.enabled=true

The ServiceMonitor includes the necessary metricRelabelings to automatically handle the label conflict issue described above.

Label Conflict Issue: `exported_namespace` and `exported_pod`

Symptom: In Grafana or Prometheus, you see labels named exported_namespace and exported_pod instead of namespace and pod.

Root Cause: This is a common Prometheus label conflict issue:

The exporter exports metrics with labels: namespace="default" (the PVC's namespace) and pod="my-app-pod" (the pod using the PVC)
Prometheus adds metadata labels from Kubernetes service discovery: namespace="kubelet-volume-stats" (the exporter's namespace) and pod="exporter-pod" (the exporter pod)
Conflict detected: Two labels with the same name but different values
Prometheus renames: To avoid the conflict, Prometheus renames the metric's labels to exported_namespace and exported_pod
Result: Your queries and dashboards see the wrong label names

Solution 1: Use ServiceMonitor (Recommended for Prometheus Operator)

Enable ServiceMonitor which includes automatic label fixing:

helm upgrade --install kubelet-volume-stats vbeaucha/kubelet-volume-stats-exporter \
  -n kubelet-volume-stats \
  --set serviceMonitor.enabled=true

The ServiceMonitor includes metricRelabelings that automatically rename exported_namespace → namespace and exported_pod → pod.

Solution 2: Manual Prometheus Configuration (Standard Prometheus)

Add metric_relabel_configs to your Prometheus scrape configuration (see "Complete Scrape Configuration" section above).

Verification:

# Query Prometheus to check label names
curl -s 'http://prometheus:9090/api/v1/query?query=kubelet_volume_stats_capacity_bytes' | \
  jq '.data.result[0].metric | keys'

# Should include "namespace" and "pod", NOT "exported_namespace" or "exported_pod"

Example Prometheus Queries

# Volume usage percentage
100 * (kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes)

# Volumes with less than 10% free space
kubelet_volume_stats_available_bytes / kubelet_volume_stats_capacity_bytes < 0.1

# Total volume capacity by namespace
sum by (namespace) (kubelet_volume_stats_capacity_bytes)

# Volumes by pod
sum by (namespace, pod, persistentvolumeclaim) (kubelet_volume_stats_capacity_bytes)

Troubleshooting

Labels show as `exported_namespace` and `exported_pod`

See the "Label Conflict Issue" section under Prometheus Integration above.

High memory usage

Adjust resource limits in the DaemonSet:

resources:
  limits:
    memory: 256Mi  # Increase if needed

TLS certificate errors

If you encounter TLS certificate verification errors, you can enable insecure mode (not recommended for production):

args:
  - --insecure-skip-tls-verify=true

Security Considerations

The exporter follows Kubernetes security best practices:

Runs as non-root user (UID 1000)
Uses read-only root filesystem
Drops all Linux capabilities
Implements seccomp profile
Uses service account tokens for authentication
Minimal RBAC permissions (only access to node stats)

Development

Local Development

# Install dependencies
go mod download

# Run locally (requires kubeconfig)
go run main.go --kubelet-endpoint=https://your-node:10250 --insecure-skip-tls-verify=true

# Run tests
go test ./...

# Build binary
go build -o kubelet-volume-stats-exporter .

Building for Multiple Architectures

# Build for AMD64
GOOS=linux GOARCH=amd64 go build -o kubelet-volume-stats-exporter-amd64 .

# Build for ARM64
GOOS=linux GOARCH=arm64 go build -o kubelet-volume-stats-exporter-arm64 .

Architecture

┌─────────────────────────────────────────────────────────────┐
│                         Kubernetes Node                      │
│                                                              │
│  ┌──────────────┐                    ┌──────────────────┐  │
│  │   Kubelet    │◄───────────────────│  Volume Stats    │  │
│  │              │  /stats/summary    │    Exporter      │  │
│  │  Port 10250  │                    │   (DaemonSet)    │  │
│  └──────────────┘                    └────────┬─────────┘  │
│                                                │             │
│                                                │ :8080       │
└────────────────────────────────────────────────┼────────────┘
                                                 │
                                                 │
                                    ┌────────────▼─────────────┐
                                    │      Prometheus          │
                                    │   (scrapes metrics)      │
                                    └──────────────────────────┘

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

License

This project is licensed under the MIT License. See LICENSE file for details.

References

Kubernetes Issue #133961 - CSI volume statistics missing
Kubelet Stats Summary API
Kubernetes Metrics Reference