Add a counter for total number of node update/sync requests. In practice, this counts the number of gRPC requests received if the gRPC API is in use. If the NodeFeature API is enabled, this counts the requests initiated by the NFD API controller, i.e. updates triggered by changes in NodeFeature or NodeFeatureRule objects plus updates initiated by the controller resync period.
2.5 KiB
title | layout | sort |
---|---|---|
Metrics | default | 7 |
Metrics
Metrics are configured to be exposed using prometheus operator API's by default. If you want to expose metrics using the prometheus operator API's you need to install the prometheus operator in your cluster. By default NFD Master and Worker expose metrics on port 8081.
The exposed metrics are
Metric | Type | Description |
---|---|---|
nfd_master_build_info |
Gauge | Version from which nfd-master was built |
nfd_worker_build_info |
Gauge | Version from which nfd-worker was built |
nfd_node_update_requests_total |
Counter | Number of node update requests processed by the master |
nfd_node_updates_total |
Counter | Number of nodes updated |
nfd_node_update_failures_total |
Counter | Number of nodes update failures |
nfd_node_labels_rejected_total |
Counter | Number of nodes labels rejected by nfd-master |
nfd_node_extendedresources_rejected_total |
Counter | Number of nodes extended resources rejected by nfd-master |
nfd_node_taints_rejected_total |
Counter | Number of nodes taints rejected by nfd-master |
nfd_nodefeaturerule_processing_duration_seconds |
Histogram | Time taken to process NodeFeatureRule objects |
nfd_nodefeaturerule_processing_errors_total |
Counter | Number or errors encountered while processing NodeFeatureRule objects |
nfd_feature_discovery_duration_seconds |
Histogram | Time taken to discover features on a node |
Via Kustomize
To deploy NFD with metrics enabled using kustomize, you can use the Metrics Overlay.
Via Helm
By default metrics are enabled when deploying NFD via Helm. To enable Prometheus to scrape metrics from NFD, you need to pass the following values to Helm:
--set prometheus.enable=true
For more info on Helm deployment, see Helm.
We recommend setting
--set prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues=false
when deploying prometheus-operator via Helm to enable the prometheus-operator
to scrape metrics from any PodMonitor.
or setting labels on the PodMonitor via the helm parameter prometheus.labels
to control which Prometheus instances will scrape this PodMonitor.