mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2024-12-14 11:57:51 +00:00

Markus Lehtonen b90f2c318e metrics: add nfd_node_update_failures_total counter

Add a new counter for tracking node update failures from nfd-master.
This tracks both normal feature updates and the --prune sub-command.
This is a simple counter without any additional labels - nfd-master logs
can be used for further diagnostics.

2023-08-07 09:37:27 +03:00

1.9 KiB

Raw Blame History

title	layout	sort
Metrics	default	7

Metrics

Metrics are configured to be exposed using prometheus operator API's by default. If you want to expose metrics using the prometheus operator API's you need to install the prometheus operator in your cluster. By default NFD Master and Worker expose metrics on port 8081.

The exposed metrics are

Metric	Type	Description
`nfd_master_build_info`	Gauge	Version from which nfd-master was built
`nfd_worker_build_info`	Gauge	Version from which nfd-worker was built
`nfd_node_updates_total`	Counter	Number of nodes updated
`nfd_node_update_failures_total`	Counter	Number of nodes update failures
`nfd_nodefeaturerule_processing_duration_seconds`	Histogram	Time taken to process NodeFeatureRule objects
`nfd_feature_discovery_duration_seconds`	Histogram	Time taken to discover features on a node

Via Kustomize

To deploy NFD with metrics enabled using kustomize, you can use the Metrics Overlay.

Via Helm

By default metrics are enabled when deploying NFD via Helm. To enable Prometheus to scrape metrics from NFD, you need to pass the following values to Helm:

--set prometheus.enable=true

For more info on Helm deployment, see Helm.

We recommend setting --set prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues=false when deploying prometheus-operator via Helm to enable the prometheus-operator to scrape metrics from any PodMonitor.

or setting labels on the PodMonitor via the helm parameter prometheus.labels to control which Prometheus instances will scrape this PodMonitor.

1.9 KiB Raw Blame History

Metrics

Via Kustomize

Via Helm

1.9 KiB

Raw Blame History