Remove a lot of boilerplate code by defining reusable functions.
Also, test the Run() method instead of the functions callees of Run() as
it is the top level functionality that was tested in practice (we don't
have separate unit tests for the callee functions).
Add a counter for total number of node update/sync requests. In
practice, this counts the number of gRPC requests received if the gRPC
API is in use. If the NodeFeature API is enabled, this counts the
requests initiated by the NFD API controller, i.e. updates triggered by
changes in NodeFeature or NodeFeatureRule objects plus updates initiated
by the controller resync period.
Add a counter for errors encountered when processing NodeFeatureRules.
Another simple counter without any additional prometheus labels -
nfd-master logs can provide further details.
Add a new counter for tracking node update failures from nfd-master.
This tracks both normal feature updates and the --prune sub-command.
This is a simple counter without any additional labels - nfd-master logs
can be used for further diagnostics.
Rename the "NodeName" prometheus label to "node", aligning with
common prometheus/kubernetes conventions. Also reconfigure the
prometheus histogram buckets (now 10ms to 1s) to better match the
expected sample range.
Rename the metric, better describe what we're measuring and better
comply with prometheus naming conventions. Also change it to represent
actual updates of the node object on the Kubernetes apiserver.
Change the metric from a simple gauge (that basically was a single value
for the whole cluster) into a HistogramVec, aligning with the feature
discovery duration metric in nfd-worker. This improved metric now has
prometheus labels for the NFR name and node name, i.e. it is tracking
per-NFR metric for each node being processed. Also, change the naming to
better comply with prometheus suggested conventions.
Expose metrics via prometheus.monitoring.coreos.com/v1
The exposed metrics are
| Metric | Type | Meaning |
| --------------- | ---------------- | ---------------- |
| `nfd_master_build_info` | Gauge | Version from which nfd-master was built. |
| `nfd_worker_build_info` | Gauge | Version from which nfd-worker was built. |
| `nfd_updated_nodes` | Counter | Time taken to label a node |
| `nfd_crd_processing_time` | Gauge | Time taken to process a NodeFeatureRule CRD |
| `nfd_feature_discovery_duration_seconds` | HistogramVec | Time taken to discover features on a node |
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
NFD is only detecting policy and scope of Topology Manager when NRT object doesn't exist.
This means that topologyManagerScope and topologyManagerPolicy attributes won't be updated
even if kubelet config was changed to use other TopologyManager policy and scope.
Signed-off-by: pprokop <pprokop@nvidia.com>
This PR aims to optimize the process of updating nodes with
corresponding features. In fact, previously, we were updating nodes
sequentially even though they are independent from each other.
Therefore, we integrated new components: LabelersNodePool which is
responsible for spininng a goroutine whenever there's a request for
updating nodes, and a Workqueue which is responsible for holding nodes names
that should be updated.
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
This PR aims to support the dynamic values for labels in the
NodeFeatureRule CRD, it would offer more flexible labeling for users.
To achieve this, we check whether label value starts with "@", and if
it's the case, we will get the value of the feature value, and update
the value of the label with the feature value.
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
Drop the KlogDump helper in favor of klog.InfoS. However, that patch
introduces a new DelayedDumper() helper to avoid processing
(marshalling) of object unless really evaluated by the logging function.
It allows NFD-master to be run in active-passive way when running
multiple instances of NFD-master to prevent multiple components
from updating same custom resources.
Signed-off-by: PiotrProkop <pprokop@nvidia.com>
This PR fixes the resync-period configuration option of the nfd-master.
In fact, previously, changes were not reflected in the nfd-master at
runtime. e2e tests are also implemented to make sure that the fix is
already working as expected.
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
Split out resolving of node name (of the node to be updated) into a
separate function. Makes it possible to add unit tests. Also. do
unconditional type casting in the handler functions – that shouldn't
fail unless there is a really serious internal inconsistency in the
codebase so it should be ok to panic.
- kubelet_internal_checkpoint file is in /var/lib/kubelet/device-plugins not /var/lib/kubelet
fsWatcher doesn't watch dirs recursively
- e.Name returned from fsWatcher events is a full path not a basename
Signed-off-by: pprokop <pprokop@nvidia.com>
This PR adds a config option for setting the NFD API controller resync period.
The resync period is only activated when the NodeFeature API has been
enabled (with -enable-nodefeature-api).
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
Reject malformed extended resource dynamic capacity assignment
capacity should be in the form of domain.feature.element,
add logic at func filterExtendedResources to check if true or ignore
ExtendedResource, logging as an error.
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Fix a a bug where nfd-master with NodeFeature API enabled would crash
when NodeFeatureRule objects were processed in the case where no
NodeFeature objects existed. This was caused by trying to insert values
into a non-initialized NodeFeatureSpec in the code.
This patch adds two safety measures to prevent that from happening in
the future. First, add a constructor function for the NodeFeatureSpec
type, and second, check for uninitialized object in the function
inserting new functions.
TODO: add unit tests for the API helper functions.
Make the nfd.node.kubernetes.io/feature-labels and
nfd.node.kubernetes.io/extended-resources annotations behave similary to
the taints annotation: only create the annotations if some labels or
extended resources are created.
Update mocked implementation of
k8s.io/kubelet/pkg/apis/podresources/v1.PodResourcesListerClient. The
mocked implementation is moved to a separate "mocks" subpackage as it's
for an external interface.
This patch also adds code for auto-generation for the mocked interface.
Change the NFD API handler to re-try on node update failures. Will work
around transient failures, making sure that failed nodes (i.e. nodes
that we failed to update) don't need to wait for the 1 hour resync
period before being tried again.
This PR adds the combination of dynamic and builtin kernel modules into
one feature called `kernel.enabledmodule`. It's a superset of the
`kernel.loadedmodule` feature.
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
Increase the NFD API controller resync period from 5 minutes to 1 hour.
The resync causes nfd-master to replay all NodeFeature and
NodeFeatureRule objects, being effectively a "big hammer reset all"
button. This should only be needed as an "insurance" to fix labels et al
in case they have been manually tampered (outside NFD) and against
certain bugs in nfd itself. NFD is not supposed to manage anything
fast-changing so 1 hour should be enough.
This change only affects behavior when the NodeFeature API has been
enabled (with -enable-nodefeature-api).
Add support for management of Extended Resources via the
NodeFeatureRule CRD API.
There are usage scenarios where users want to advertise features
as extended resources instead of labels (or annotations).
This patch enables the discovery of extended resources, via annotation
and patch of node.status.capacity and node.status.allocatable. By using
the NodeFeatureRule API.
Co-authored-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
Co-authored-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Update node status before node metadata. This fixes a problem where we
lose track of NFD-managed extended resources in case patching node
status fails. Previously we removed all labels and annotations
(including the one listing our ERs) and only after that updated node
status. If node status update failed we had lost the annotation but
extended resources were still there, leaving them orphaned.
Disallow taints having a key with "kubernetes.io/" or "*.kubernetes.io/"
prefix. This is a precaution to protect the user from messing up with
the "official" well-known taints from Kubernetes itself. The only
exception is that the "nfd.node.kubernetes.io/" prefix is allowed.
However, there is one allowed NFD-specific namespace (and its
sub-namespaces) i.e. "feature.node.kubernetes.io" under the
kubernetes.io domain that can be used for NFD-managed taints.
Also disallow unprefixed taint keys. We don't add a default prefix to
unprefixed taints (like we do for labels) from NodeFeatureRules. This is
to prevent unpleasant surprises to users that need to manage matching
tolerations for their workloads.
Similar to the nfd-worker, in this PR we want to support the
dynamic run-time configurability through a config file for the nfd-master.
We'll use a json or yaml configuration file along with the fsnotify in
order to watch for changes in the config file. As a result, we're
allowing dynamic control of logging params, allowed namespaces,
extended resources, label whitelisting, and denied namespaces.
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
Access to the kubelet state directory may raise concerns in some setups, added an option to disable it.
The feature is enabled by default.
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
When a message received via the channel,
the main loop updates the `NodeResourceTopology` objects.
The notifier will send a message via the channel if:
1. It reached the sleep timeout.
2. It detected a change in Kubelet state files
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
On different Kubernetes flavors like OpenShift for exmaple,
the Kubelet state directory path is different. make it configurable
for maximum flexability.
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
Enabling reactive update for nfd-topology-updater
by detecting changes in Kubelet state/checkpoint files,
and signaling to the main loop to update the NodeResourceTopology
objects.
This has high value when scaling is an issue.
Having multiple pods deployed in between single update instance
might reflect incorrect resource accounting in the NRT CRs.
Example:
Time Interval = 5s
t0 - New update sent to NRT CRs
t1 - Schedule guaranteed podA
t2 - Schedule guaranteed podB
time elapsed between t0-t2 < 5 seconds,
IOW the update on t0 is the recent update.
In t2 the resource accounting reflected by NRT
is not aligned with the actual accounting because
NRT CRs doesn't reflect the change happened in t1.
With this reactive update feature we expect an update to be trigger
between t1 and t2 so the NRT objects will reflect more accurate
picture.
There still might be a scenario when the updates
aren't fast enough, but this is an additional
future planned optimization.
The notifier has two event types:
1. Time based - keeping the old behavior, trigger
an update per interval.
2. FS event - trigger an update when Kubelet state/checkpoint files modified.
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
NodeResourceTopology(aka NRT) custom resource is used to enable NUMA aware Scheduling in Kubernetes.
As of now node-feature-discovery daemons are used to advertise those
resources but there is no service responsible for removing obsolete
objects(without corresponding Kubernetes node).
This patch adds new daemon called nfd-topology-gc which removes old
NRTs.
Signed-off-by: PiotrProkop <pprokop@nvidia.com>
Don't require features to be specified. The creator possibly only wants
to create labels or only some types of features. No need to specify
empty structs for the unused fields.
Correctly handle the case where no NodeFeature objects exist for certain
node (and NodeFeature API has been enabled with
-enable-nodefeature-api). In this case all the labels should be removed.
We want to always update all nodes at startup. Without this patch we
don't get any update event from the controller if no NodeFeature or
NodeFeatureRule objects exist in the cluster. Thus all nodes would stay
untouched whereas we really want to remove all labels from all nodes in
this case.
Implement a naive ratelimiter for node update events originating from
the nfd API. We might get a ton of events in short interval. The
simplest example is startup when we get a separate Add event for every
NodeFeature and NodeFeatureRule object. Without rate limiting we
run "update all nodes" separately for each NodeFeatureRule object, plus,
we would run "update node X" separately for each NodeFeature object
targeting node X. This is a huge amount of wasted work because in
principle just running "update all nodes" once should be enough.
Implement handling of multiple NodeFeature objects by merging all
objects (targeting a certain node) into one before processing the data.
This patch implements MergeInto() methods for all required data types.
With support for multiple NodeFeature objects per node, The "nfd api
workflow" can be easily demonstrated and tested from the command line.
Creating the folloiwing object (assuming node-n exists in the cluster):
apiVersion: nfd.k8s-sigs.io/v1alpha1
kind: NodeFeature
metadata:
labels:
nfd.node.kubernetes.io/node-name: node-n
name: my-features-for-node-n
spec:
# Features for NodeFeatureRule matching
features:
flags:
vendor.domain-a:
elements:
feature-x: {}
attributes:
vendor.domain-b:
elements:
feature-y: "foo"
feature-z: "123"
instances:
vendor.domain-c:
elements:
- attributes:
name: "elem-1"
vendor: "acme"
- attributes:
name: "elem-2"
vendor: "acme"
# Labels to be created
labels:
vendor-feature.enabled: "true"
vendor-setting.value: "100"
will create two feature labes:
feature.node.kubernetes.io/vendor-feature.enabled: "true"
feature.node.kubernetes.io/vendor-setting.value: "100"
In addition it will advertise hidden/raw features that can be used for
custom rules in NodeFeatureRule objects. Now, creating a NodeFeatureRule
object:
apiVersion: nfd.k8s-sigs.io/v1alpha1
kind: NodeFeatureRule
metadata:
name: my-rule
spec:
rules:
- name: "my feature rule"
labels:
"my-feature": "true"
matchFeatures:
- feature: vendor.domain-a
matchExpressions:
feature-x: {op: Exists}
- feature: vendor.domain-c
matchExpressions:
vendor: {op: In, value: ["acme"]}
will match the features in the NodeFeature object above and cause one
more label to be created:
feature.node.kubernetes.io/my-feature: "true"
Deprecate the '-featurerules-controller' command line flag as the name
does not describe the functionality anymore: in practice it controls the
CRD controller handling both NodeFeature and NodeFeatureRule objects.
The patch introduces a duplicate, more generally named, flag
'-crd-controller'. A warning is printed in the log if
'-featurerules-controller' flag is encountered.
Add initial support for handling NodeFeature objects. With this patch
nfd-master watches NodeFeature objects in all namespaces and reacts to
changes in any of these. The node which a certain NodeFeature object
affects is determined by the "nfd.node.kubernetes.io/node-name"
annotation of the object. When a NodeFeature object targeting certain
node is changed, nfd-master needs to process all other objects targeting
the same node, too, because there may be dependencies between them.
Add a new command line flag for selecting between gRPC and NodeFeature
CRD API as the source of feature requests. Enabling NodeFeature API
disables the gRPC interface.
-enable-nodefeature-api enable NodeFeature CRD API for incoming
feature requests, will disable the gRPC
interface (defaults to false)
It is not possible to serve gRPC and watch NodeFeature objects at the
same time. This is deliberate to avoid labeling races e.g. by nfd-worker
sending gRPC requests but NodeFeature objects in the cluster
"overriding" those changes (labels from the gRPC requests will get
overridden when NodeFeature objects are processed).
Support the new NodeFeatures object of the NFD CRD api. Add two new
command line options to nfd-worker:
-kubeconfig specifies the kubeconfig to use for
connecting k8s api (defaults to empty which
implies in-cluster config)
-enable-nodefeature-api enable the NodeFeature CRD API for
communicating node features to nfd-master,
will also automatically disable gRPC
(defgault to false)
No config file option for selecting the API is available as there should
be no need for dynamically selecting between gRPC and CRD. The
nfd-master configuration must be changed in tandem and it is safer (and
avoid awkward configuration races) to configure the whole NFD deployment
at once.
Default behavior of nfd-worker is not changed i.e. NodeFeatures object
creation is not enabled by default (but must be enabled with the command
line flag).
The patch also updates the kustomize and Helm deployment, adding RBAC
rules for nfd-worker and updating the example worker configuration.
Add a new NodeFeature CRD to the nfd Kubernetes API to communicate node
features over K8s api objects instead of gRPC. The new resource is
namespaced which will help the management of multiple NodeFeature
objects per node. This aims at enabling 3rd party detectors for custom
features.
In addition to communicating raw features the NodeFeature object also
has a field for directly requesting labels that should be applied on the
node object.
Rename the crd deployment file to nfd-api-crds.yaml so that it matches
the new content of the file. Also, rename the Helm subdir for CRDs to
match the expected chart directory structure.
Drop the gRPC communication to nfd-master and connect to the Kubernetes
API server directly when updating NodeResourceTopology objects.
Topology-updater already has connection to the API server for listing
Pods so this is not that dramatic change. It also simplifies the code
a lot as there is no need for the NFD gRPC client and no need for
managing TLS certs/keys.
This change aligns nfd-topology-updater with the future direction of
nfd-worker where the gRPC API is being dropped and replaced by a
CRD-based API.
This patch also update deployment files and documentation to reflect
this change.
Implement detection of kubernetes namespace by reading file
/var/run/secrets/kubernetes.io/serviceaccount/namespace
Aa a fallback (if the file is not accessible) we take namespace from
KUBERNETES_NAMESPACE environment variable. This is useful for e.g.
testing and development where you might run nfd-worker directly from the
command line on a host system.
This commits extends NFD master code to support adding node taints
from NodeFeatureRule CR. We also introduce a new annotation for
taints which helps to identify if the taint set on node is owned
by NFD or not. When user deletes the taint entry from
NodeFeatureRule CR, NFD will remove the taint from the node. But
to avoid accidental deletion of taints not owned by the NFD, it
needs to know the owner. Keeping track of NFD set taints in the
annotation can be used during the filtering of the owner. Also
enable-taints flag is added to allow users opt in/out for node
tainting feature. The flag takes precedence over taints defined
in NodeFeatureRule CR. In other words, if enbale-taints is set to
false(disabled) and user still defines taints on the CR, NFD will
ignore those taints and skip them from setting on the node.
Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>
Extend NodeFeatureRule Spec with taints field to allow users to
specify the list of the taints they want to be set on the node if
rule matches.
Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>
Drop the following flags that were deprecated already in v0.8.0:
-sleep-interval (replaced by core.sleepInterval config file option)
-label-whitelist (replaced by core.labelWhiteList config file option)
-sources (replaced by -label-sources flag)
The exclude-list allows to filter specific resource accounting
from NRT's objects per node basis.
The CRs created by the topology-updater are used by the scheduler-plugin
as a source of truth for making scheduling decisions.
As such, this feature allows to hide specific information
from the scheduler, which in turn
will affect the scheduling decision.
A common use case is when user would like to perform scheduling
decisions which are based on a specific resource.
In that case, we can exclude all the other resources
which we don't want the scheduler to exemine.
The exclude-list is provided to the topology-updater via a ConfigMap.
Resource type's names specified in the list should match the names
as shown here: https://pkg.go.dev/k8s.io/api/core/v1#ResourceName
This is a resurrection of an old work started here:
https://github.com/kubernetes-sigs/node-feature-discovery/pull/545
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
Fix handling of templates that got broken in
b907d07d7e when "flattening" the internal
data structure of features. That happened because the golang
text/template format uses dots to reference fields of a struct /
elements of a map (i.e. 'foo.bar' means that 'bar' must be a sub-element
of foo). Thus, using dots in our feature names (e.g. 'cpu.cpuid') means
that that hierarchy must be reflected in the data structure that is fed
to the templating engine. Thus, for templates we're now stuck stuck with
two level hierarchy. It doesn't really matter for now as all our
features follow that naming patter. We might be able to overcome this
limitation e.g. by using reflect but that's left as a future exercise.
Scanning podresources can temporarily fail; the previous code was
mistakenly not rearming the loop condition when this occurred,
effectively stopping the monitoring.
Rather, we should always pool and bail out on unrecoverable
error or when asked to stop.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Flatten the data structure that stores features, dropping the "domain"
level from the data model. That extra level of hierarchy brought little
benefit but just caused some extra complexity, instead. The new
structure nicely matches what we have in the NodeFeatureRule object (the
matchFeatures field of uses the same flat structure with the "feature"
field having a value <domain>.<feature>, e.g. "kernel.version").
This is pre-work for introducing a new "node feature" CRD that contains
the raw feature data. It makes the life of both users and developers
easier when both CRDs, plus our internal code, handle feature data in a
similar flat structure.
Move the previously-protobuf-only internal "feature api" over to the
public "nfd api" package. This is in preparation for introducing a new
CRD API for communicating features.
This patch carries no functional changes. Just moving code around.
Error strings should not be capitalized (ST1005) & remove the
redundancy from array, slice or map composite literals.
Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>
Make the NoPublish config flag a more direct control point for
whether to publishing features. This patch is pre-work for adding
support for other clients (upcoming new CRD API) in nfd-worker.
Refactor the code so that the initialization and running of the gRPC
server is done in a separate function. The goal is to make the code more
maintainable in terms of disabling (and eventually removing) the gRPC
functionality in the future.
Refactor the code, moving the hostpath helper functionality to new
"pkg/utils/hostpath" package. This breaks odd-ish dependency
"pkg/utils" -> "source".
This patch adds a kubebuilder marker to add a short name nfr for
NodeFeatureRule CRD.
Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>
Remove the cleanup code that removes ancient NFD labels with the
node.alpha.kubernetes-incubator.io/ prefix. This label namespace was
deprecated/dropped already in v0.4.0 so it should be safe to drop this
code.
Replace deprecated grpc.WithInsecure() with
grpc.WithTransportCredentials and insecure.NewCredentials(). Makes
golangci-lint pass muster.
enter the commit message for your changes. Lines starting
test that NodeFeatureRule templates work with empty MatchFeatures, but
with MatchAny.
this test would fail, higligting an issue which is fixed in next commit.
see #864.
Signed-off-by: Viktor Oreshkin <imselfish@stek29.rocks>
Revert the hack that was a workaround for issues with k8s deepcopy-gen.
New deepcopy-gen is able to generate code correctly without issues so
this is not needed anymore.
Also, removing this hack solves issues with object validation when
creating NodeFeatureRules programmatically with nfd go-client. This is
needed later with NodeFeatureRules e2e-tests.
Logically reverts f3cc109f99.
This patch changes a rare corner case of custom label rules with an
empty set of matchexpressions. The patch removes a special case where an
empty match expression set matched everything and returned all feature
elements for templates to consume. With this patch the match expression
set logically evaluates all expressions in the set and returns all
matches - if there are no expressions there are no matches and no
matched features are returned. However, the overall match result
(determining if "non-template" labels will be created) in this special
case will be "true" as before as none of the zero match expressions
failed.
The former behavior was somewhat illogical and counterintuitive: having
1 to N expressions matched and returned 1 to N features (at most), but,
having 0 expressions always matched everything and returned all
features. This was some leftover proof-of-concept functionality (for
some possible future extensions) that should have been removed before
merging.
It's possible for device plugins to advertise non-existent
numa node ids that cause topology updater to crash.
Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
* fix linter issues for few files
* fix linter issue of exported const Name should have comment or be unexported
* fix name lint issue and resolve lints
* add changes to comments
Do not prefix label names from the new matchFeatures/matchAny custom
rules with "custom-". We want to have the same result (set of labels)
from a rule independent of whether it has been specified in worker
config or in a NodeFeatureRule CRs. Legacy matchOn rules (not available
in NodeFeatureRule CRs) are intact, i.e. still prefixed, in order to
retain backwards compatibility.
Add a configuration option for controlling the enabled "raw" feature
sources. This is useful e.g. in testing and development, plus it also
allows fully shutting down discovery of features that are not needed in
a deployment. Supplements core.labelSources which controls the
enablement of label sources.
Provide backwards compatibility via a deprecated 'core.sources' config
file option. This will override 'core.labelSources'. A warning is
printed in the log if this option is detected.
The goal is to make the name more descriptive. Also keeping in mind a
possible future addition a 'featureSources' option (or similar) for
controlling the feature discovery.
Use the single-dash (i.e. '-option' instead of '--option') format
consistently accross log messages and documentation. This is the format
that was mostly used, already, and shown by command line help of the
binaries, for example.
Support templating of var names in a similar manner as labels. Add
support for a new 'varsTemplate' field to the feature rule spec which is
treated similarly to the 'labelsTemplate' field. The value of the field
is processed through the golang "text/template" template engine and the
expanded value must contain variables in <key>=<value> format, separated
by newlines i.e.:
- name: <rule-name>
varsTemplate: |
<label-1>=<value-1>
<label-2>=<value-2>
...
Similar rules as for 'labelsTemplate' apply, i.e.
1. In case of matchAny is specified, the template is executed separately
against each individual matchFeatures matcher.
2. 'vars' field has priority over 'varsTemplate'
Support backreferencing of output values from previous rules. Enables
complex rule setups where custom features are further combined together
to form even more sophisticated higher level labels. The labels created
by preceding rules are available as a special 'rule.matched' feature
(for matchFeatures to use).
If referencing rules accross multiple configs/CRDs care must be taken
with the ordering. Processing order of rules in nfd-worker:
1. Static rules
2. Files from /etc/kubernetes/node-feature-discovery/custom.d/
in alphabetical order. Subdirectories are processed by reading their
files in alphabetical order.
3. Custom rules from main nfd-worker.conf
In nfd-master, NodeFeatureRule objects are processed in alphabetical
order (based on their metadata.name).
This patch also adds new 'vars' fields to the rule spec. Like 'labels',
it is a map of key-value pairs but no labels are generated from these.
The values specified in 'vars' are only added for backreferencing into
the 'rules.matched' feature. This may by desired in schemes where the
output of certain rules is only used as intermediate variables for other
rules and no labels out of these are wanted.
An example setup:
- name: "kernel feature"
labels:
kernel-feature:
matchFeatures:
- feature: kernel.version
matchExpressions:
major: {op: Gt, value: ["4"]}
- name: "intermediate var feature"
vars:
nolabel-feature: "true"
matchFeatures:
- feature: cpu.cpuid
matchExpressions:
AVX512F: {op: Exists}
- feature: pci.device
matchExpressions:
vendor: {op: In, value: ["8086"]}
device: {op: In, value: ["1234", "1235"]}
- name: top-level-feature
matchFeatures:
- feature: rule.matched
matchExpressions:
kernel-feature: "true"
nolabel-feature: "true"