Simplify the code and reduce possible error scenarios by dropping
fsnotify-based reconfiguration from nfd-master and nfd-worker. Also
eliminates repeated re-configuration in scenarios where kubelet
continuosly touches the (every minute) mounted file (configmap) on the
filesystem.
Also modifies the Helm and kustomize deployments so that nfd-master,
nfd-worker and nfd-topology-updater pods are restarted on configmap
updates. In kustomize, the slght downside of this is the name of the
config map(s) depends on the content, so every time a user customizes
the config data, the old unused configmap will be left and must be
garbage-collected manually.
The upstream repo (and the release downloads)
github.com/rundocs/jekyll-rtd-theme has been deleted. This broke our
docs generation as the remote theme configuration depended on
downloading the release artefact.
This patch changes the docs building to use a Ruby gem instead of the
remote theme setting. To complicate matters, the gem has an seemingly
incorrect (too strict) version dependency. To mitigate this, we now
install bundler-override plugin to ignore this particular dependency.
The netlify conf is a hack, but I wasn't able to figure out a way how to
install the bundler-override plugin without doing all ruby
initialization in the build command.
The link to feature gates documentation is pointing to the
feature-gates.md in master-commandline-reference.html and
worker-commandline-reference.html, it should be updated to
linking html file.
Signed-off-by: joehuang <joehuang.sweden@gmail.com>
The link to feature gates documentation is pointing to the
upward folder in master-commandline-reference.md, it should
be updated to linking file in the same folder.
Signed-off-by: joehuang <joehuang.sweden@gmail.com>
The feature gate is locked to true. That is, it is not possible to revert
back to the gPRC-based communication which makes the gRPC API ready for
removal.
Bump the required Kubernetes version to v1.24. In practice this is the
minimum Kubernetes version as our deployment (both kustomize and Helm)
depend on the gRPC container probes feature of Kubernetes.
Disable AVX10 as unnecessary as AVX10_LEVEL is better suited for
checking AVX10 compatibility. There is not yet any hardware with the
feature so disabling it shouldn't cause problems for users.
Add new cpuid label "feature.node.kubernetes.io/cpu-cpuid.AVX10_VERSION"
that advertises the supported version of AVX10 vector ISA.
Correspondingly, the patch adds AVX10_VERSION to the "cpu.cpuid" feature
for NodeFeatureRules to consume.
This makes cpu.cpuid on amd64 architecture a "multi-type" feature in
that it contains "flags" and potentially also "attributes" (the only
cpuid attribute so far is the AVX10_VERSION).
This patch changes the handling of NodeFeatureRules so that one feature
name (say "cpu.cpuid") can hold different types of features (flags,
attributes and/or instances). Requiring features to choose one single
type has not been a limitation of the API itself (and there has been no
validation on this) but an implementation decision.
The new evalutation logic of match expressions is such that "flags" and
"attributes" are basically evaluated as an union - they are both maps
but "flags" just don't have any value associated with the key. However,
"instances" are handled separately as that is basically an array of
maps and needs to be evaluated in a different way (loop over the array
of instances and evaluate expressions against the attributes of each).
Because of this difference care must be taken if mixing "instance"
features with "flag" and/or "attribute" features.
Note that the API types or their validation is not changed - just the
implementation of how the NodeFeatureRules are evaluated.
The NodeFeatureGroup is an NFD-specific custom resource that is designed for
grouping nodes based on their features. NFD-Master watches for NodeFeatureGroup
objects in the cluster and updates the status of the NodeFeatureGroup object
with the list of nodes that match the feature group rules. The NodeFeatureGroup
rules follow the same syntax as the NodeFeatureRule rules.
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Bring back the -enable-nodefeature-api command line flag and the
corresponding enableNodeFeatureApi helm config value that were
removed without deprecation when the NodeFeatureAPI feature gate was
introduced. The thinking behind this change is to not break existing
users (without warning) unless totally unavoidable. Now the
-enable-nodefeature-api flag is marked as deprecated and slated for
removal in NFD v0.17.
The NodeFeatureAPI feature gate and the -enable-nodefeature-api flag
work together so that the NodeFeature API is disabled (gRPC is enabled,
instead) if either of them is set to false.
This patch selectively reverts parts of
06c4733bc5.
Now that we have support for feature gates deprecate the autoDefaultNs
config option of nfd-master and replace it with a new alpha feature gate
DisableAutoPrefix (defaults to false). Using a feature gate to handle
and communicate these kind of changes, where the default behavior is
intended to be changed in a future release, feels much more natural than
using random flags/options.
The combined logic of the feature gate and the config option is a
logical OR over disabling auto-prefixing. That is, auto-prefixing is
disabled if either the feature gate or the config options is used set to
disable it:
| DisableAutoPrefix (feature gate)
| false | true
-------------------- | --------------------------------
autoDefaultNs true | ON | OFF
(config opt) false | OFF | OFF
Problem: memory requests and limits has been set for `master` process in
PR #1631. It does not follow best practices for setting those values,
but the intention was provide default values for a wide variety of
clusters, including small ones.
Solution: provide solid documentation about the problems that might
happen in production environments when
`resource.memory.requests << resource.memory.limits`. Add a link to
relevant external sources, which includes the advise from Tim Hockin:
> Always set memory limit == request
Signed-off-by: cmontemuino <1761056+cmontemuino@users.noreply.github.com>
Plan the removal of the -crd-controller flag along with the gRPC API.
This flag does not make much sense after that as all communication with
nfd-worker is based on CRDs - with the CRD controller disabled
nfd-master is virtually a functionless stub.