1
0
Fork 0
mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2025-03-06 16:57:10 +00:00
Commit graph

133 commits

Author SHA1 Message Date
Kubernetes Prow Robot
797c66ffaa
Merge pull request #1947 from marquiz/devel/health-topology-updater
nfd-topology-updater: serve metrics and healthz on the same port
2025-02-17 02:02:22 -08:00
Markus Lehtonen
0b0aed2318 nfd-master: serve metrics and healthz endpoint on the same port
Changes the gRPC health endpoint to plain http. At the same time starts
serving both the metrics and healthz endpoints on a single port.
Replaces the -metrics and -grpc-health command line flags with a single
-port flag.

Changes the Helm and kustomize deployments correspondingly.
2025-02-04 10:30:58 +02:00
Markus Lehtonen
0b9a8cf120 nfd-topology-updater: serve metrics and healthz on the same port
Changes the gRPC health endpoint to plain http. At the same time starts
serving both the metrics and healthz endpoints on a single port.
Replaces the -metrics and -grpc-health command line flags with a single
-port flag.

Changes the Helm and kustomize deployments correspondingly.
2025-02-04 10:30:26 +02:00
Markus Lehtonen
4959a13a07 nfd-worker: replace --metrics with --port
Use a single port for serving http. In addition to metrics we will have
the healthz endpoint.
2025-01-31 07:52:12 +02:00
Markus Lehtonen
25914ec06e nfd-worker: drop the gRPC health port
To be replaced with plain http.
2025-01-31 07:50:00 +02:00
Markus Lehtonen
98cd96312e Drop setup of grpc logging 2025-01-07 16:13:54 +02:00
Marcin Franczyk
0b7661bf17
Add experimental note and fix subcmds flags naming
Signed-off-by: Marcin Franczyk <marcin0franczyk@gmail.com>
2024-12-18 15:39:18 +01:00
Marcin Franczyk
efc299ecf6 Introduce nfd client tool with a subset of image compatibility commands
Signed-off-by: Marcin Franczyk <marcin0franczyk@gmail.com>
2024-12-18 10:49:02 +01:00
Kubernetes Prow Robot
3e87c97ac2
Merge pull request #1976 from marquiz/devel/grpc-api-cleanup
Cleanup for NodeFeature API being GA
2024-12-13 15:14:26 +01:00
Markus Lehtonen
fc103a6028 Cleanup for NodeFeature API being GA
Drop references to the gRPC API and don't suggest that NodeFeatureAPI
could be disabled.

Also update the developer guide for instructions running nfd components
outside the cluster.
2024-12-13 15:40:46 +02:00
Kubernetes Prow Robot
caaac59eba
Merge pull request #1860 from ozhuraki/no-owner-refs
nfd-worker: Add an option to disable setting the owner references
2024-12-13 13:12:26 +01:00
Markus Lehtonen
047d0314aa Fix version parsing
The fix in a416af51a4 was not enough by
itself and that needs to be applied comprehensively.
2024-12-12 22:14:50 +02:00
Oleg Zhurakivskyy
20ef877ab1 nfd-worker: Add an option to disable setting the owner references
In some cases it's desirable to control automatic garbage collection
of NodeFeature object.

Add an option to disable setting the owner references to Pod
for NodeFeature object.

Closes: 1817

Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2024-11-28 16:50:10 +02:00
Markus Lehtonen
45f49d574a nfd-master: drop resourceLabels
Drop the resourceLabels config file option and the corresponding
-resource-labels command line flag. They were deprecated in NFD v0.13 so
it's time to let them go. NodeFeatureRule(s) should be used to manage
ERs, instead.
2024-11-07 15:16:52 +02:00
Markus Lehtonen
65c08ebd5d nfd-master: drop stale unreachable deprecation notices 2024-11-04 11:24:57 +02:00
Carlos Eduardo Arango Gutierrez
0bd82cf82a
Drop NFD gRPC API
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-10-29 15:15:18 +01:00
Tobias Giese
53ddf081da
Add parameter to configure health endpoint port
Signed-off-by: Tobias Giese <tgiese@nvidia.com>
2024-09-24 15:15:50 +02:00
Markus Lehtonen
a269bf4d25 Drop the -enable-nodefeature-api flag
Was marked to be removed in v0.17.
2024-07-10 15:20:07 +03:00
Carlos Eduardo Arango Gutierrez
e33e68ad5b
Add optionable arguments to NewWorker
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-07-09 15:08:26 +02:00
Markus Lehtonen
dfbd63b728 topology-updater: properly handle IPv6 from NODE_ADDRESS
Fix the usage of IPv6 addresses for default kubelet configz endpoint.

The default host:port we use for kubelet configz endpoint is
${NODE_ADDRESS}:10250. Previously we errored out if NODE_ADDRESS was an
IPv6 address because we used an incorrect notation (without brackets).
The (IPv6) needs to be enclosed in brackets if specifying the port.
2024-06-04 14:19:57 +03:00
Markus Lehtonen
560bd11d85 Re-add -enable-nodefeature-api cmdline flag
Bring back the -enable-nodefeature-api command line flag and the
corresponding enableNodeFeatureApi helm config value that were
removed without deprecation when the NodeFeatureAPI feature gate was
introduced. The thinking behind this change is to not break existing
users (without warning) unless totally unavoidable. Now the
-enable-nodefeature-api flag is marked as deprecated and slated for
removal in NFD v0.17.

The NodeFeatureAPI feature gate and the -enable-nodefeature-api flag
work together so that the NodeFeature API is disabled (gRPC is enabled,
instead) if either of them is set to false.

This patch selectively reverts parts of
06c4733bc5.
2024-05-16 10:53:49 +03:00
Markus Lehtonen
fcb8d3cda4 nfd-master: implement opts for modifying NfdMaster instance
This provides a more controlled way for setting up the NfdMaster
instance for testing.
2024-04-05 20:21:19 +03:00
Oleg Zhurakivskyy
f2e9557a2d nfd-topology-updater: Add liveness probe
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2024-04-03 13:15:54 +03:00
Kubernetes Prow Robot
0ad5e50f24
Merge pull request #1609 from ozhuraki/worker-health
nfd-worker: Add liveness probe
2024-03-19 06:57:23 -07:00
Oleg Zhurakivskyy
8b63d17af7 nfd-worker: Add liveness probe
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2024-03-19 15:34:53 +02:00
Markus Lehtonen
6f891ce1d2 Remove references to -enable-nodefeature-api flag
Fix documentation, code and e2e-tests.
2024-03-18 16:06:25 +02:00
Carlos Eduardo Arango Gutierrez
06c4733bc5
Add FeatureGate framework to handle new features
Code inspired on https://github.com/kubernetes/component-base/tree/master/featuregate

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-03-15 19:11:32 +01:00
Markus Lehtonen
638e7744f1 nfd-master: mark the -crd-controller flag as deprecated
Plan the removal of the -crd-controller flag along with the gRPC API.
This flag does not make much sense after that as all communication with
nfd-worker is based on CRDs - with the CRD controller disabled
nfd-master is virtually a functionless stub.
2024-03-13 15:10:35 +02:00
leemingeer
b6d8ce7a5a nfd-topology-updater add pods fingerprint by default 2024-01-26 17:55:34 +08:00
Markus Lehtonen
d7ec0bf674 topology-updater: document the -no-publish flag correctly 2024-01-22 14:21:02 +02:00
Markus Lehtonen
a053efda64 nfd-master: run a separate gRPC health server
This patch separates the gRPC health server from the deprecated gRPC
server (disabled by default, replaced by the NodeFeature CRD API) used
for node labeling requests. The new health server runs on hardcoded TCP
port number 8082.

The main motivation for this change is to make the Kubernetes' built-in
gRPC liveness probes to function if TLS is enabled (as they don't
support TLS).

The health server itself is a naive implementation (as it was before),
basically only checking that nfd-master has started and hasn't crashed.
The patch adds a TODO note to improve the functionality.
2024-01-04 13:58:26 +02:00
Carlos Eduardo Arango Gutierrez
57b6035b71
Add kubectl-nfd
kubectl-nfd is a kubectl plugin for debbuging NodeFeatureRules

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-12-21 16:00:19 +01:00
Markus Lehtonen
98c3b0750d nfd-gc: add metrics
Implements three metrics for nfd-gc:

- nfd_gc_build_info: version information of nfd-gc.
- nfd_gc_objects_deleted_total: total number of NodeFeature and
  NodeResourceTopology objects deleted by nfd-gc.
- nfd_gc_object_delete_failures_total: number of errors encountered when
  deleting NodeFeature and NodeResourceTopology objects.
2023-10-09 13:39:28 +00:00
AhmedGrati
7ab6314bdc chore: introduce a commong klog handling for cmd/nfd-*
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-09-07 22:38:15 +01:00
Kubernetes Prow Robot
c0c1b89a92
Merge pull request #1334 from ArangoGutierrez/grpc_gone_v2
Deprecate gRPC API
2023-09-07 00:38:59 -07:00
Carlos Eduardo Arango Gutierrez
9966d2ae12
Deprecate gRPC API
Now that the NodeFeature API has been set enabled by default, the gRPC
mode will be deprecated and with it all flags and features around it.

For nfd-master, flags
-port, -key-file, -ca-file, -cert-file, -verify-node-name, -enable-nodefeature-api
are now marked as deprecated.

For nfd-worker flags
-enable-nodefeature-api, -ca-file, -cert-file, -key-file, -server, -server-name-override
are now marked as deprecated.

Deprecated flags, as well as gRPC related code will be removed in future
releases.

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
2023-09-07 06:48:15 +02:00
AhmedGrati
b0be40aa09 feat: add logging parameters in configuration file for nfd master
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-09-06 15:27:27 +01:00
Carlos Eduardo Arango Gutierrez
04e954a7c3
Enable NodeFeature API by default
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
2023-09-05 20:21:31 +02:00
Kubernetes Prow Robot
a658c54de3
Merge pull request #1297 from marquiz/devel/topology-updater-version
topology-updater: make -version always runnable
2023-08-28 04:05:43 -07:00
Markus Lehtonen
01c08d67b6 Rename nfd-topology-gc to nfd-gc
This is preparation for making it a generic garbage collector for all
nfd-managed api objects.
2023-08-21 21:46:11 +03:00
Markus Lehtonen
5ba8d14b86 topology-updater: make -version always runnable
Make it possible to run -version in an environment whithout the
NODE_ADDRESS environment variable set.
2023-08-07 11:56:58 +03:00
Markus Lehtonen
06b333db1e nfd-topology-updater: add metrics support
For now, add only one metric, a counter for the errors occurring while
scanning pod resources on the node.
2023-08-04 16:48:37 +03:00
Kubernetes Prow Robot
e0f10a81de
Merge pull request #1256 from PiotrProkop/fix-topo-updater-policy-and-scope-advertisment
Fix Topology Manager policy and scope not being updated after NRT creation
2023-07-28 00:25:54 -07:00
Carlos Eduardo Arango Gutierrez
e3aedd33e2
Enable metrics via prometheus operator
Expose metrics via prometheus.monitoring.coreos.com/v1

The exposed metrics are

| Metric        | Type | Meaning |
| --------------- | ---------------- | ---------------- |
|  `nfd_master_build_info`           | Gauge | Version from which nfd-master was built. |
|  `nfd_worker_build_info`           | Gauge | Version from which nfd-worker was built. |
|  `nfd_updated_nodes`           |  Counter | Time taken to label a node |
|  `nfd_crd_processing_time`          |  Gauge | Time taken to process a NodeFeatureRule CRD |
| `nfd_feature_discovery_duration_seconds` |  HistogramVec | Time taken to discover features on a node |

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
2023-07-21 10:59:52 +02:00
pprokop
6d98b6150b Fix Topology Manager policy and scope not being updated properly
NFD is only detecting policy and scope of Topology Manager when NRT object doesn't exist.
This means that topologyManagerScope and topologyManagerPolicy attributes won't be updated
even if kubelet config was changed to use other TopologyManager policy and scope.

Signed-off-by: pprokop <pprokop@nvidia.com>
2023-07-20 16:31:12 +02:00
Carlos Eduardo Arango Gutierrez
c02c3d83ed
Fix a typo on nfd-master cmd
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-06-06 20:05:07 +02:00
AhmedGrati
b3cfe17392 feat: parallelize nodes update
This PR aims to optimize the process of updating nodes with
corresponding features. In fact, previously, we were updating nodes
sequentially even though they are independent from each other.
Therefore, we integrated new components: LabelersNodePool which is
responsible for spininng a goroutine whenever there's a request for
updating nodes, and a Workqueue which is responsible for holding nodes names
that should be updated.

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-06-02 11:41:50 +01:00
Markus Lehtonen
6e3b181ab4 topology-updater: migrate to structured logging 2023-05-31 14:43:08 +03:00
Markus Lehtonen
7be08f9e7f nfd-worker: migrate to structured logging 2023-05-31 14:43:08 +03:00
Markus Lehtonen
8113d651c2 nfd-master: migrate to structured logging 2023-05-31 14:43:05 +03:00