Bump the required Kubernetes version to v1.24. In practice this is the
minimum Kubernetes version as our deployment (both kustomize and Helm)
depend on the gRPC container probes feature of Kubernetes.
Bring back the -enable-nodefeature-api command line flag and the
corresponding enableNodeFeatureApi helm config value that were
removed without deprecation when the NodeFeatureAPI feature gate was
introduced. The thinking behind this change is to not break existing
users (without warning) unless totally unavoidable. Now the
-enable-nodefeature-api flag is marked as deprecated and slated for
removal in NFD v0.17.
The NodeFeatureAPI feature gate and the -enable-nodefeature-api flag
work together so that the NodeFeature API is disabled (gRPC is enabled,
instead) if either of them is set to false.
This patch selectively reverts parts of
06c4733bc5.
Problem: memory requests and limits has been set for `master` process in
PR #1631. It does not follow best practices for setting those values,
but the intention was provide default values for a wide variety of
clusters, including small ones.
Solution: provide solid documentation about the problems that might
happen in production environments when
`resource.memory.requests << resource.memory.limits`. Add a link to
relevant external sources, which includes the advise from Tim Hockin:
> Always set memory limit == request
Signed-off-by: cmontemuino <1761056+cmontemuino@users.noreply.github.com>
Drop the deprecated and broken sample overlay. This was an example for
enabling TLS with cert-manager. However, the overlay has been broken
(and useless) since NodeFeature API was enabled by default - and gRPC
disabled - in v0.14.
This patch adds a post-delete hook to the Helm chart that runs
"nfd-master --prune" in the cluster. This cleans up the node of labels,
annotations, taints and extended resources that were created by NFD.
The "combined" overlay, deploying nfd-master and nfd-worker in the same
pod (with a daemonset) doesn't make sense anymore as we have enabled
NodeFeature API. There is no direct communication between nfd-master and
nfd-worker anymore, Moreover, the combined deployment can be seen as
broken as there is one NodeFeature controller (i.e. nfd-master) on each
node, causing them to race against each other, all processing all
NodeFeature objects.
Implements three metrics for nfd-gc:
- nfd_gc_build_info: version information of nfd-gc.
- nfd_gc_objects_deleted_total: total number of NodeFeature and
NodeResourceTopology objects deleted by nfd-gc.
- nfd_gc_object_delete_failures_total: number of errors encountered when
deleting NodeFeature and NodeResourceTopology objects.
Switch to fully statically linked binaries and use scratch as a base
image.
Switching to the virtually empty scratch base image means that the
default/minimal NFD image only supports running hooks that are truly
statically linked (e.g. normal go binaries that are "almost" statically
linked stop working). The documentation has been already stating this
(i.e. that only statically-linked binaries are supported) - i.e. we have
had no promise of supporting other than that. Also, hooks are now
deprecated and even disabled by default so the possibility of real user
impact should be small.
Now that the NodeFeature API has been set enabled by default, the gRPC
mode will be deprecated and with it all flags and features around it.
For nfd-master, flags
-port, -key-file, -ca-file, -cert-file, -verify-node-name, -enable-nodefeature-api
are now marked as deprecated.
For nfd-worker flags
-enable-nodefeature-api, -ca-file, -cert-file, -key-file, -server, -server-name-override
are now marked as deprecated.
Deprecated flags, as well as gRPC related code will be removed in future
releases.
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>