1
0
Fork 0
mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2024-12-14 11:57:51 +00:00
Commit graph

18 commits

Author SHA1 Message Date
googs1025
e631a52374 chore: add metrics system prefix 2024-11-28 09:57:40 +08:00
Kubernetes Prow Robot
b997ade5b3
Merge pull request #1942 from marquiz/devel/drop-grpc
nfd-master: drop stale unreachable deprecation notices
2024-11-04 11:16:31 +01:00
Markus Lehtonen
6471a1f185 docs: second fix to the prometheus kustomize overlay name 2023-12-21 18:40:14 +02:00
Markus Lehtonen
2f3cfbf209 docs: don't use "we" 2023-12-01 15:47:18 +02:00
Carlos Eduardo Arango Gutierrez
150c394374
Make mdlint v0.13 happy
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-10-25 21:21:11 +02:00
Markus Lehtonen
b75de2a283 examples: add example grafana dashboard
Example visualization for all metrics except
nfd_node_update_requests_total which counts the deprecated (and
disabled-by-default) gRPC requests.
2023-10-10 17:47:51 +00:00
Markus Lehtonen
98c3b0750d nfd-gc: add metrics
Implements three metrics for nfd-gc:

- nfd_gc_build_info: version information of nfd-gc.
- nfd_gc_objects_deleted_total: total number of NodeFeature and
  NodeResourceTopology objects deleted by nfd-gc.
- nfd_gc_object_delete_failures_total: number of errors encountered when
  deleting NodeFeature and NodeResourceTopology objects.
2023-10-09 13:39:28 +00:00
Markus Lehtonen
f0a3581ca3 docs: document nfd_topology_updater_build_info metric 2023-10-09 13:06:36 +00:00
Markus Lehtonen
24574724e2 docs: clarify nfd_node_update_requests_total metric 2023-10-09 12:45:03 +00:00
Kubernetes Prow Robot
6d95e59cd0
Merge pull request #1290 from marquiz/devel/metrics-new
metrics: additional metrics for nfd-master
2023-08-28 02:07:42 -07:00
Markus Lehtonen
5ad2294c14 metrics: add nfd_node_update_requests_total counter
Add a counter for total number of node update/sync requests. In
practice, this counts the number of gRPC requests received if the gRPC
API is in use. If the NodeFeature API is enabled, this counts the
requests initiated by the NFD API controller, i.e. updates triggered by
changes in NodeFeature or NodeFeatureRule objects plus updates initiated
by the controller resync period.
2023-08-07 09:37:29 +03:00
Markus Lehtonen
4b24cc1afa metrics: counters for rejected labels, extended resources and taints
Add counters for labels, extended resources and taints rejected/filtered
out by nfd-master.
2023-08-07 09:37:29 +03:00
Markus Lehtonen
a8a29e6df2 metrics: add nfd_nodefeaturerule_processing_errors_total counter
Add a counter for errors encountered when processing NodeFeatureRules.
Another simple counter without any additional prometheus labels -
nfd-master logs can provide further details.
2023-08-07 09:37:29 +03:00
Markus Lehtonen
b90f2c318e metrics: add nfd_node_update_failures_total counter
Add a new counter for tracking node update failures from nfd-master.
This tracks both normal feature updates and the --prune sub-command.
This is a simple counter without any additional labels - nfd-master logs
can be used for further diagnostics.
2023-08-07 09:37:27 +03:00
Markus Lehtonen
06b333db1e nfd-topology-updater: add metrics support
For now, add only one metric, a counter for the errors occurring while
scanning pod resources on the node.
2023-08-04 16:48:37 +03:00
Markus Lehtonen
a1406767a9 docs: align metrics documentation with latest changes on naming
Also change table formatting and fix one incorrect description.
2023-08-01 15:53:06 +03:00
Pat Riehecky
0523257d1a Add optional labels to the podmonitor
Signed-off-by: Pat Riehecky <riehecky@fnal.gov>
2023-07-21 10:03:50 -05:00
Carlos Eduardo Arango Gutierrez
e3aedd33e2
Enable metrics via prometheus operator
Expose metrics via prometheus.monitoring.coreos.com/v1

The exposed metrics are

| Metric        | Type | Meaning |
| --------------- | ---------------- | ---------------- |
|  `nfd_master_build_info`           | Gauge | Version from which nfd-master was built. |
|  `nfd_worker_build_info`           | Gauge | Version from which nfd-worker was built. |
|  `nfd_updated_nodes`           |  Counter | Time taken to label a node |
|  `nfd_crd_processing_time`          |  Gauge | Time taken to process a NodeFeatureRule CRD |
| `nfd_feature_discovery_duration_seconds` |  HistogramVec | Time taken to discover features on a node |

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
2023-07-21 10:59:52 +02:00