1
0
Fork 0
mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2024-12-14 11:57:51 +00:00
Commit graph

125 commits

Author SHA1 Message Date
Kubernetes Prow Robot
3e87c97ac2
Merge pull request #1976 from marquiz/devel/grpc-api-cleanup
Cleanup for NodeFeature API being GA
2024-12-13 15:14:26 +01:00
Markus Lehtonen
fc103a6028 Cleanup for NodeFeature API being GA
Drop references to the gRPC API and don't suggest that NodeFeatureAPI
could be disabled.

Also update the developer guide for instructions running nfd components
outside the cluster.
2024-12-13 15:40:46 +02:00
Kubernetes Prow Robot
caaac59eba
Merge pull request #1860 from ozhuraki/no-owner-refs
nfd-worker: Add an option to disable setting the owner references
2024-12-13 13:12:26 +01:00
Markus Lehtonen
047d0314aa Fix version parsing
The fix in a416af51a4 was not enough by
itself and that needs to be applied comprehensively.
2024-12-12 22:14:50 +02:00
Oleg Zhurakivskyy
20ef877ab1 nfd-worker: Add an option to disable setting the owner references
In some cases it's desirable to control automatic garbage collection
of NodeFeature object.

Add an option to disable setting the owner references to Pod
for NodeFeature object.

Closes: 1817

Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2024-11-28 16:50:10 +02:00
Markus Lehtonen
45f49d574a nfd-master: drop resourceLabels
Drop the resourceLabels config file option and the corresponding
-resource-labels command line flag. They were deprecated in NFD v0.13 so
it's time to let them go. NodeFeatureRule(s) should be used to manage
ERs, instead.
2024-11-07 15:16:52 +02:00
Markus Lehtonen
65c08ebd5d nfd-master: drop stale unreachable deprecation notices 2024-11-04 11:24:57 +02:00
Carlos Eduardo Arango Gutierrez
0bd82cf82a
Drop NFD gRPC API
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-10-29 15:15:18 +01:00
Tobias Giese
53ddf081da
Add parameter to configure health endpoint port
Signed-off-by: Tobias Giese <tgiese@nvidia.com>
2024-09-24 15:15:50 +02:00
Markus Lehtonen
a269bf4d25 Drop the -enable-nodefeature-api flag
Was marked to be removed in v0.17.
2024-07-10 15:20:07 +03:00
Carlos Eduardo Arango Gutierrez
e33e68ad5b
Add optionable arguments to NewWorker
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-07-09 15:08:26 +02:00
Markus Lehtonen
dfbd63b728 topology-updater: properly handle IPv6 from NODE_ADDRESS
Fix the usage of IPv6 addresses for default kubelet configz endpoint.

The default host:port we use for kubelet configz endpoint is
${NODE_ADDRESS}:10250. Previously we errored out if NODE_ADDRESS was an
IPv6 address because we used an incorrect notation (without brackets).
The (IPv6) needs to be enclosed in brackets if specifying the port.
2024-06-04 14:19:57 +03:00
Markus Lehtonen
560bd11d85 Re-add -enable-nodefeature-api cmdline flag
Bring back the -enable-nodefeature-api command line flag and the
corresponding enableNodeFeatureApi helm config value that were
removed without deprecation when the NodeFeatureAPI feature gate was
introduced. The thinking behind this change is to not break existing
users (without warning) unless totally unavoidable. Now the
-enable-nodefeature-api flag is marked as deprecated and slated for
removal in NFD v0.17.

The NodeFeatureAPI feature gate and the -enable-nodefeature-api flag
work together so that the NodeFeature API is disabled (gRPC is enabled,
instead) if either of them is set to false.

This patch selectively reverts parts of
06c4733bc5.
2024-05-16 10:53:49 +03:00
Markus Lehtonen
fcb8d3cda4 nfd-master: implement opts for modifying NfdMaster instance
This provides a more controlled way for setting up the NfdMaster
instance for testing.
2024-04-05 20:21:19 +03:00
Oleg Zhurakivskyy
f2e9557a2d nfd-topology-updater: Add liveness probe
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2024-04-03 13:15:54 +03:00
Kubernetes Prow Robot
0ad5e50f24
Merge pull request #1609 from ozhuraki/worker-health
nfd-worker: Add liveness probe
2024-03-19 06:57:23 -07:00
Oleg Zhurakivskyy
8b63d17af7 nfd-worker: Add liveness probe
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2024-03-19 15:34:53 +02:00
Markus Lehtonen
6f891ce1d2 Remove references to -enable-nodefeature-api flag
Fix documentation, code and e2e-tests.
2024-03-18 16:06:25 +02:00
Carlos Eduardo Arango Gutierrez
06c4733bc5
Add FeatureGate framework to handle new features
Code inspired on https://github.com/kubernetes/component-base/tree/master/featuregate

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-03-15 19:11:32 +01:00
Markus Lehtonen
638e7744f1 nfd-master: mark the -crd-controller flag as deprecated
Plan the removal of the -crd-controller flag along with the gRPC API.
This flag does not make much sense after that as all communication with
nfd-worker is based on CRDs - with the CRD controller disabled
nfd-master is virtually a functionless stub.
2024-03-13 15:10:35 +02:00
leemingeer
b6d8ce7a5a nfd-topology-updater add pods fingerprint by default 2024-01-26 17:55:34 +08:00
Markus Lehtonen
d7ec0bf674 topology-updater: document the -no-publish flag correctly 2024-01-22 14:21:02 +02:00
Markus Lehtonen
a053efda64 nfd-master: run a separate gRPC health server
This patch separates the gRPC health server from the deprecated gRPC
server (disabled by default, replaced by the NodeFeature CRD API) used
for node labeling requests. The new health server runs on hardcoded TCP
port number 8082.

The main motivation for this change is to make the Kubernetes' built-in
gRPC liveness probes to function if TLS is enabled (as they don't
support TLS).

The health server itself is a naive implementation (as it was before),
basically only checking that nfd-master has started and hasn't crashed.
The patch adds a TODO note to improve the functionality.
2024-01-04 13:58:26 +02:00
Carlos Eduardo Arango Gutierrez
57b6035b71
Add kubectl-nfd
kubectl-nfd is a kubectl plugin for debbuging NodeFeatureRules

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-12-21 16:00:19 +01:00
Markus Lehtonen
98c3b0750d nfd-gc: add metrics
Implements three metrics for nfd-gc:

- nfd_gc_build_info: version information of nfd-gc.
- nfd_gc_objects_deleted_total: total number of NodeFeature and
  NodeResourceTopology objects deleted by nfd-gc.
- nfd_gc_object_delete_failures_total: number of errors encountered when
  deleting NodeFeature and NodeResourceTopology objects.
2023-10-09 13:39:28 +00:00
AhmedGrati
7ab6314bdc chore: introduce a commong klog handling for cmd/nfd-*
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-09-07 22:38:15 +01:00
Kubernetes Prow Robot
c0c1b89a92
Merge pull request #1334 from ArangoGutierrez/grpc_gone_v2
Deprecate gRPC API
2023-09-07 00:38:59 -07:00
Carlos Eduardo Arango Gutierrez
9966d2ae12
Deprecate gRPC API
Now that the NodeFeature API has been set enabled by default, the gRPC
mode will be deprecated and with it all flags and features around it.

For nfd-master, flags
-port, -key-file, -ca-file, -cert-file, -verify-node-name, -enable-nodefeature-api
are now marked as deprecated.

For nfd-worker flags
-enable-nodefeature-api, -ca-file, -cert-file, -key-file, -server, -server-name-override
are now marked as deprecated.

Deprecated flags, as well as gRPC related code will be removed in future
releases.

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
2023-09-07 06:48:15 +02:00
AhmedGrati
b0be40aa09 feat: add logging parameters in configuration file for nfd master
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-09-06 15:27:27 +01:00
Carlos Eduardo Arango Gutierrez
04e954a7c3
Enable NodeFeature API by default
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
2023-09-05 20:21:31 +02:00
Kubernetes Prow Robot
a658c54de3
Merge pull request #1297 from marquiz/devel/topology-updater-version
topology-updater: make -version always runnable
2023-08-28 04:05:43 -07:00
Markus Lehtonen
01c08d67b6 Rename nfd-topology-gc to nfd-gc
This is preparation for making it a generic garbage collector for all
nfd-managed api objects.
2023-08-21 21:46:11 +03:00
Markus Lehtonen
5ba8d14b86 topology-updater: make -version always runnable
Make it possible to run -version in an environment whithout the
NODE_ADDRESS environment variable set.
2023-08-07 11:56:58 +03:00
Markus Lehtonen
06b333db1e nfd-topology-updater: add metrics support
For now, add only one metric, a counter for the errors occurring while
scanning pod resources on the node.
2023-08-04 16:48:37 +03:00
Kubernetes Prow Robot
e0f10a81de
Merge pull request #1256 from PiotrProkop/fix-topo-updater-policy-and-scope-advertisment
Fix Topology Manager policy and scope not being updated after NRT creation
2023-07-28 00:25:54 -07:00
Carlos Eduardo Arango Gutierrez
e3aedd33e2
Enable metrics via prometheus operator
Expose metrics via prometheus.monitoring.coreos.com/v1

The exposed metrics are

| Metric        | Type | Meaning |
| --------------- | ---------------- | ---------------- |
|  `nfd_master_build_info`           | Gauge | Version from which nfd-master was built. |
|  `nfd_worker_build_info`           | Gauge | Version from which nfd-worker was built. |
|  `nfd_updated_nodes`           |  Counter | Time taken to label a node |
|  `nfd_crd_processing_time`          |  Gauge | Time taken to process a NodeFeatureRule CRD |
| `nfd_feature_discovery_duration_seconds` |  HistogramVec | Time taken to discover features on a node |

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
2023-07-21 10:59:52 +02:00
pprokop
6d98b6150b Fix Topology Manager policy and scope not being updated properly
NFD is only detecting policy and scope of Topology Manager when NRT object doesn't exist.
This means that topologyManagerScope and topologyManagerPolicy attributes won't be updated
even if kubelet config was changed to use other TopologyManager policy and scope.

Signed-off-by: pprokop <pprokop@nvidia.com>
2023-07-20 16:31:12 +02:00
Carlos Eduardo Arango Gutierrez
c02c3d83ed
Fix a typo on nfd-master cmd
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-06-06 20:05:07 +02:00
AhmedGrati
b3cfe17392 feat: parallelize nodes update
This PR aims to optimize the process of updating nodes with
corresponding features. In fact, previously, we were updating nodes
sequentially even though they are independent from each other.
Therefore, we integrated new components: LabelersNodePool which is
responsible for spininng a goroutine whenever there's a request for
updating nodes, and a Workqueue which is responsible for holding nodes names
that should be updated.

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-06-02 11:41:50 +01:00
Markus Lehtonen
6e3b181ab4 topology-updater: migrate to structured logging 2023-05-31 14:43:08 +03:00
Markus Lehtonen
7be08f9e7f nfd-worker: migrate to structured logging 2023-05-31 14:43:08 +03:00
Markus Lehtonen
8113d651c2 nfd-master: migrate to structured logging 2023-05-31 14:43:05 +03:00
Kubernetes Prow Robot
70d5ef477f
Merge pull request #1219 from PiotrProkop/leader-elect
Add leader election for nfd-master
2023-05-22 00:36:21 -07:00
PiotrProkop
272fd4784f Add new flag enable-leader-election for nfd-master.
It allows NFD-master to be run in active-passive way when running
multiple instances of NFD-master to prevent multiple components
from updating same custom resources.

Signed-off-by: PiotrProkop <pprokop@nvidia.com>
2023-05-15 13:30:07 +02:00
Markus Lehtonen
1200fd05c5 topology-updater: use node IP in the default configz URI
Use a separate NODE_ADDRESS environment variable in the default value of
-kubelet-config-uri (instead of NODE_NAME that was previously used).
Also change the kustomize and Helm deployments to set this variable to
node IP address. This should make the default deployment more robust,
making it work in scenarios where node name does not resolve to the node
ip, e.g. nodename != hostname.
2023-05-05 13:29:51 +03:00
AhmedGrati
87c2d7e184 nfd-master: fix resync period config option
This PR fixes the resync-period configuration option of the nfd-master.
In fact, previously, changes were not reflected in the nfd-master at
runtime. e2e tests are also implemented to make sure that the fix is
already working as expected.

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-05-02 13:17:01 +02:00
AhmedGrati
7917434d38 feat: add master resync period configurability
This PR adds a config option for setting the NFD API controller resync period.
The resync period is only activated when the NodeFeature API has been
enabled (with -enable-nodefeature-api).

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-04-24 11:52:38 +02:00
Markus Lehtonen
8511980bf4 nfd-master: deprecate the -resource-labels flag
Mark the -resource-labels flag (and the corresponding resourceLabels
config option) as deprecated. We now support managing extended resources
via NodeFeatureRule objects. This kludge deserves to go, eventually.
2023-04-13 11:30:58 +03:00
Kubernetes Prow Robot
193c552b33
Merge pull request #1084 from AhmedGrati/feat-add-master-config-file
feat: add master config file
2023-04-04 10:41:40 -07:00
AhmedGrati
3fff409f6d Add master config file
Similar to the nfd-worker, in this PR we want to support the
dynamic run-time configurability through a config file for the nfd-master.

We'll use a json or yaml configuration file along with the fsnotify in
order to watch for changes in the config file. As a result, we're
allowing dynamic control of logging params, allowed namespaces,
extended resources, label whitelisting, and denied namespaces.

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-04-03 09:52:09 +01:00