Simplify the code and reduce possible error scenarios by dropping
fsnotify-based reconfiguration from nfd-master and nfd-worker. Also
eliminates repeated re-configuration in scenarios where kubelet
continuosly touches the (every minute) mounted file (configmap) on the
filesystem.
Also modifies the Helm and kustomize deployments so that nfd-master,
nfd-worker and nfd-topology-updater pods are restarted on configmap
updates. In kustomize, the slght downside of this is the name of the
config map(s) depends on the content, so every time a user customizes
the config data, the old unused configmap will be left and must be
garbage-collected manually.
Stop blocking on event channels when the api controller is stopped.
Ensures that the nfd API informer factory is properly shut down and all
resources released when stop() is called. This eliminates a memory leak
on re-configure events when leader election is enabled.
List NodeFeature and NodeResourceTopology objects in pages of 200 items.
This reduces memory consumption and eliminates timeouts (on the
apiserver side) in big clusters of thousands of nodes.
Fix cache syncing problems on big clusters with thousands of NodeFeature
objects.
On the initial list (sync) the client-go cache reflector sets the
ResourceVersion to "0" (instead of leaving it empty). This causes
problems in the api server with (apiserver) logs like:
E writers.go:122] apiserver was unable to write a JSON response: http:
Handler timeout
E status.go:71] apiserver received an error that is not an
metav1.Status: &errors.errorString{s:"http: Handler timeout"}:
http: Handler timeout
On the nfd-master side we see corresponding log snippets like:
W reflector.go:547] failed to list *v1alpha1.NodeFeature: stream error
when reading response body, may be caused by closed
connection. Please retry. Original error: stream
error: stream ID 1521; INTERNAL_ERROR; received from
peer
I trace.go:236] "Reflector ListAndWatch" name:*** (***) (total time:
61126ms): ---"Objects listed" error:stream error when
reading response body, may be caused by closed
connection. Please retry. Original error: stream
error: stream ID 1521; INTERNAL_ERROR; received from
peer 61126ms (***)
Decreasing the page size (opts.Limits) does not have any effect on the
timeouts. However, setting ResourceVersion to an empty value seems to
get the paging on its tracks, eliminating the timeouts.
TODO: investigate in Kubernetes upstream the root cause of the timeouts
with ResourceVersion="0".
The feature gate is locked to true. That is, it is not possible to revert
back to the gPRC-based communication which makes the gRPC API ready for
removal.
Run nfd-worker with NodeFeature API enabled (against a fake apiserver)
instead of using the deprecated gRPC (against a nfd-master instance).
Expand the test to verify the features and labels that are advertised as
a NodeFeature object.
We faced the problem when master deleted some of labels on start. Sometimes he doesn't gets NodeFeatures when they are present in cluster because of empty cache in informer
This patch changes the handling of NodeFeatureRules so that one feature
name (say "cpu.cpuid") can hold different types of features (flags,
attributes and/or instances). Requiring features to choose one single
type has not been a limitation of the API itself (and there has been no
validation on this) but an implementation decision.
The new evalutation logic of match expressions is such that "flags" and
"attributes" are basically evaluated as an union - they are both maps
but "flags" just don't have any value associated with the key. However,
"instances" are handled separately as that is basically an array of
maps and needs to be evaluated in a different way (loop over the array
of instances and evaluate expressions against the attributes of each).
Because of this difference care must be taken if mixing "instance"
features with "flag" and/or "attribute" features.
Note that the API types or their validation is not changed - just the
implementation of how the NodeFeatureRules are evaluated.
The NodeFeatureGroup is an NFD-specific custom resource that is designed for
grouping nodes based on their features. NFD-Master watches for NodeFeatureGroup
objects in the cluster and updates the status of the NodeFeatureGroup object
with the list of nodes that match the feature group rules. The NodeFeatureGroup
rules follow the same syntax as the NodeFeatureRule rules.
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Bring back the -enable-nodefeature-api command line flag and the
corresponding enableNodeFeatureApi helm config value that were
removed without deprecation when the NodeFeatureAPI feature gate was
introduced. The thinking behind this change is to not break existing
users (without warning) unless totally unavoidable. Now the
-enable-nodefeature-api flag is marked as deprecated and slated for
removal in NFD v0.17.
The NodeFeatureAPI feature gate and the -enable-nodefeature-api flag
work together so that the NodeFeature API is disabled (gRPC is enabled,
instead) if either of them is set to false.
This patch selectively reverts parts of
06c4733bc5.
Now that we have support for feature gates deprecate the autoDefaultNs
config option of nfd-master and replace it with a new alpha feature gate
DisableAutoPrefix (defaults to false). Using a feature gate to handle
and communicate these kind of changes, where the default behavior is
intended to be changed in a future release, feels much more natural than
using random flags/options.
The combined logic of the feature gate and the config option is a
logical OR over disabling auto-prefixing. That is, auto-prefixing is
disabled if either the feature gate or the config options is used set to
disable it:
| DisableAutoPrefix (feature gate)
| false | true
-------------------- | --------------------------------
autoDefaultNs true | ON | OFF
(config opt) false | OFF | OFF
Return false (i.e. "did not match") but no error when evaluating a match
expression against a "flag" type feature (which don't have any
associated value, just the name) if a MatchOp that never matches is
used.
This is preparation for supporting multi-type features, i.e. one
feature, like "cpu.cpuid", having e.g. "flag" and "attribute" type
features.
Sharing the same client between updater threads virtually serializes
access, in practice making the effective parallelism close to 1.
With this patch, in my bench cluster of 300 nodes, the time taken by
updating all nodes drops from ~2 minutes to ~12 seconds (with the
default parallelism of 10 node updater threads). This demonstrates the
10-fold increased parallelism from ~1 to 10.
There might be other solutions that could be explored, e.g. caching
nodes with an indexer/lister but otoh nfd doesn't necessarily need/want
to watch every little change in each node. We only need to get the node
when something in our own CRDs change (we don't react to any changes in
the node object itself). Using multiple clients was the most obvious
choice to solve the problem for now.
Don't require that the annotation value must conform to the (strict)
requirements of label values. In the Kubernetes API annotation values do
not have other restrictions than that the total size (keys and values)
of _all_ annotations combined of an object must not exceed 256kB.
This patch sets a maximum size limit of 1kB for the value of a single
feature annotation created by NFD. This limit is rather arbitrary but
should be enough for the NFD usage scenarios (until proven wrong).
Prevents potential race between node-updater pool and the api-controller
when re-configuring nfd-master. Reconfiguration causes a new
api-controller instance to be created so nfd api lister might change in
the midst of processing a node update (if the pool was running). No
actual issues related to this have been identified but races (like this)
should still be avoided.
Don't try to be too smart when kubeconfig is needed. In practice, the
nfd-master really doesn't work anymore (with the NodeFeature API
enabled) without a kubeconfig set. This patch fixes crashes happening
when NoPublish is enabled, e.g. in listing all nodes in the nfd api
handler and in getting single node objects in the node updater pool.
This patch changes the kubeconfig parsing to happen at the creation of
the nfd-master instance. We don't need to do that at reconfigure time as
none of the dynamic config options affect it. Unit tests are adjusted,
accordingly.