This patch changes the handling of NodeFeatureRules so that one feature
name (say "cpu.cpuid") can hold different types of features (flags,
attributes and/or instances). Requiring features to choose one single
type has not been a limitation of the API itself (and there has been no
validation on this) but an implementation decision.
The new evalutation logic of match expressions is such that "flags" and
"attributes" are basically evaluated as an union - they are both maps
but "flags" just don't have any value associated with the key. However,
"instances" are handled separately as that is basically an array of
maps and needs to be evaluated in a different way (loop over the array
of instances and evaluate expressions against the attributes of each).
Because of this difference care must be taken if mixing "instance"
features with "flag" and/or "attribute" features.
Note that the API types or their validation is not changed - just the
implementation of how the NodeFeatureRules are evaluated.
The NodeFeatureGroup is an NFD-specific custom resource that is designed for
grouping nodes based on their features. NFD-Master watches for NodeFeatureGroup
objects in the cluster and updates the status of the NodeFeatureGroup object
with the list of nodes that match the feature group rules. The NodeFeatureGroup
rules follow the same syntax as the NodeFeatureRule rules.
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Return false (i.e. "did not match") but no error when evaluating a match
expression against a "flag" type feature (which don't have any
associated value, just the name) if a MatchOp that never matches is
used.
This is preparation for supporting multi-type features, i.e. one
feature, like "cpu.cpuid", having e.g. "flag" and "attribute" type
features.
Don't require that the annotation value must conform to the (strict)
requirements of label values. In the Kubernetes API annotation values do
not have other restrictions than that the total size (keys and values)
of _all_ annotations combined of an object must not exceed 256kB.
This patch sets a maximum size limit of 1kB for the value of a single
feature annotation created by NFD. This limit is rather arbitrary but
should be enough for the NFD usage scenarios (until proven wrong).
Rewrite the generate.sh into update_codegen inspired in
k8s.io/code-generator documentation.
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Extend the format of feature matcher terms (the elements of the
arrayspecified under under matchFeatures field) with new matchName
field. The value of this field is an expression that is evaluated
against the names of feature elements instead of their values (values
are matched with the matchExpressions field, instead).
The matchName field is useful e.g. in template rules for creating
per-feature-element labels based on feature names (instead of values)
and in non-template rules for checking if (at least) one of certain
feature element names are present.
If both matchExpressions and matchName for certain feature matcher term
is specified, they both must match in order to get an overall match.
Also, in this case the list of matched features (used in templating) is
the union of the results from matchExpressions and matchName.
An example of creating an "avx512" label if any AVX512* CPUID feature is
present:
- name: "avx wildcard rule"
labels:
avx512: "true"
matchFeatures:
- feature: cpu.cpuid
matchName: {op: InRegexp, value: ["^AVX512"]}
An example of a template rule creating a dynamic set of labels based on
the existence of certain kconfig options.
- name: "kconfig template rule"
labelsTemplate: |
{{ range .kernel.config }}kconfig-{{ .Name }}={{ .Value }}
{{ end }}
matchFeatures:
- feature: kernel.config
matchName: {op: In, value: ["SWAP", "X86", "ARM"]}
NOTE: this patch changes the corner case of nil/null match expressions
with instance features (i.e. "matchExpressions: null"). Previously, we
returned all instances for templating but now a nil match expression is
not evaluated and no instances for templating are returned.
Drop the private fields – that were supposed to be used for caching parsed
templates – from the Rule type. Keep the API typedefs cleaner and
simpler. Moreover, the caching was not even used in practice,
effectively complicating code without any benefit: the way the types
are used in nfd-master creates a local copy of Rule type storing the
cached template in the copy, wasting it from any future users.
There are also other possible caveats in caching like we tried to do it.
For example the objects returned by the api lister are supposed to be
treated as read-only - in particular if we would be to modify them there
should at least be proper locking in place as nfd-master potentially
processes the same rule (the same Go object) in parallel for multiple
nodes. If any optimization like this will be pursued it should be done
properly, probably with private type(s) at the consumer's end, not
contaminating the API types.
Drop the creation helper functions as one step in an effort to tidy up
the api package. These functions were not much used outside unit tests
anyway, the static rules of the nfd-worker custom feature source being
the only exception (and if those happened to be invalid we'd catch that
e.g. in the e2e-tests).
Drop the private field for caching parsed regexp from the
MatchExpression type. This tidies up the API type definition and not so
tied with particular implementation details. The change also elimiates
potential concurrency problems as no locking is in place in the API
types.
If caching will be desired in the future, it's better to do it properly
in a separate package, not directly in the API types.
Fix flakyness of unit tests by adding back the sorting of matched
feature elements that was unadvisedly removed in
63c22551df. This might help debugging some
corner cases in real-life scenarios (when using templating), too.
Fix NodeFeatureRule templating in cases where multiple matchFeatures
terms are targeting the same feature. Previously, only matched feature
elements from the last matcher terms were used as the input to the
template. However, the input should contain all matched elements from
all matcher terms.
For example, consider the example rule snippet below:
...
labelsTemplate: |
{{ range .pci.device }}vendor.io/pci-device.{{ .class }}-{{ .device }}=exists
{{ end }}
matchFeatures:
- feature: pci.device
matchExpressions:
class: {op: InRegexp, value: ["^03"]}
vendor: {op: In, value: ["1234"]}
- feature: pci.device
matchExpressions:
class: {op: InRegexp, value: ["^12"]}
This rule matches if both a pci device of class 03 from vendor 1234
exists and a pci device of class 12 (from any vendor) exists.
Previously, the template would only generate labels from the devices in
class 12 (as that's the last term). With this patch the template creates
device labels from devices in both classes 03 and 12.
We now have metrics for getting detailed information about the NFD
instances running. There should be no need to pollute the node object
with NFD version annotations.
One problem with the annotations also that they were incomplete in the
sense that they only covered nfd-master and nfd-worker but not
nfd-topology-updater or nfd-gc.
Also, there was a problem with stale annotations, giving misleading
information. E.g. there was no way to remove old/stale master.version
annotations if nfd-master was scheduled on another node where it was
previously running.
Drop the KlogDump helper in favor of klog.InfoS. However, that patch
introduces a new DelayedDumper() helper to avoid processing
(marshalling) of object unless really evaluated by the logging function.
Fix a a bug where nfd-master with NodeFeature API enabled would crash
when NodeFeatureRule objects were processed in the case where no
NodeFeature objects existed. This was caused by trying to insert values
into a non-initialized NodeFeatureSpec in the code.
This patch adds two safety measures to prevent that from happening in
the future. First, add a constructor function for the NodeFeatureSpec
type, and second, check for uninitialized object in the function
inserting new functions.
TODO: add unit tests for the API helper functions.
Add support for management of Extended Resources via the
NodeFeatureRule CRD API.
There are usage scenarios where users want to advertise features
as extended resources instead of labels (or annotations).
This patch enables the discovery of extended resources, via annotation
and patch of node.status.capacity and node.status.allocatable. By using
the NodeFeatureRule API.
Co-authored-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
Co-authored-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Disallow taints having a key with "kubernetes.io/" or "*.kubernetes.io/"
prefix. This is a precaution to protect the user from messing up with
the "official" well-known taints from Kubernetes itself. The only
exception is that the "nfd.node.kubernetes.io/" prefix is allowed.
However, there is one allowed NFD-specific namespace (and its
sub-namespaces) i.e. "feature.node.kubernetes.io" under the
kubernetes.io domain that can be used for NFD-managed taints.
Also disallow unprefixed taint keys. We don't add a default prefix to
unprefixed taints (like we do for labels) from NodeFeatureRules. This is
to prevent unpleasant surprises to users that need to manage matching
tolerations for their workloads.