The NodeResourceTopology API has been made cluster
scoped as in the current context a CR corresponds to
a Node and since Node is a cluster scoped resource it
makes sense to make NRT cluster scoped as well.
Ref: https://github.com/k8stopologyawareschedwg/noderesourcetopology-api/pull/18
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
Implement a new 'matchAny' directive in the new rule format, building on
top of the previously implemented 'matchFeatures' matcher. MatchAny
applies a logical OR over multiple matchFeatures directives. That is, it
allows specifying multiple alternative matchers (at least one of which
must match) in a single label rule.
The configuration format for the new matchers is
matchAny:
- matchFeatures:
- feature: <domain>.<feature>
matchExpressions:
<attribute>:
op: <operator>
value:
- <list-of-values>
- matchFeatures:
...
A configuration example. In order to require a cpu feature, kernel
module and one of two specific PCI devices (taking use of the shortform
notation):
- name: multi-device-test
labels:
multi-device-feature: "true"
matchFeatures:
- feature: kernel.loadedmodule
matchExpressions: [driver-module]
- feature: cpu.cpuid
matchExpressions: [AVX512F]
matchAny:
- matchFeatures:
- feature; pci.device
matchExpressions:
vendor: "8086"
device: "1234"
- matchFeatures:
- feature: pci.device
matchExpressions:
vendor: "8086"
device: "abcd"
Implement generic feature matchers that cover all feature sources (that
implement the FeatureSource interface). The implementation relies on the
unified data model provided by the FeatureSource interface as well as
the generic expression-based rule processing framework that was added to
the source/custom/expression package.
With this patch any new features added will be automatically available
for custom rules, without any additional work. Rule hierarchy follows
the source/feature hierarchy by design.
This patch introduces a new format for custom rule specifications,
dropping the 'value' field and introducing new 'labels' field which
makes it possible to specify multiple labels per rule. Also, in the new
format the 'name' field is just for reference and no matching label is
created. The new generic rules are available in this new rule format
under a 'matchFeatures. MatchFeatures implements a logical AND over
an array of per-feature matchers - i.e. a match for all of the matchers
is required. The goal of the new rule format is to make it better follow
K8s API design guidelines and make it extensible for future enhancements
(e.g. addition of templating, taints, annotations, extended resources
etc).
The old rule format (with cpuID, kConfig, loadedKMod, nodename, pciID,
usbID rules) is still supported. The rule format (new vs. old) is
determined at config parsing time based on the existence of the
'matchOn' field.
The new rule format and the configuration format for the new
matchFeatures field is
- name: <rule-name>
labels:
<key>: <value>
...
matchFeatures:
- feature: <domain>.<feature>
matchExpressions:
<attribute>:
op: <operator>
value:
- <list-of-values>
- feature: <domain>.<feature>
...
Currently, "cpu", "kernel", "pci", "system", "usb" and "local" sources
are covered by the matshers/feature selectors. Thus, the following
features are available for matching with this patch:
- cpu.cpuid:
<cpuid-flag>: <exists/does-not-exist>
- cpu.cstate:
enabled: <bool>
- cpu.pstate:
status: <string>
turbo: <bool>
scaling_governor: <string>
- cpu.rdt:
<rdt-feature>: <exists/does-not-exist>
- cpu.sst:
bf.enabled: <bool>
- cpu.topology:
hardware_multithreading: <bool>
- kernel.config:
<flag-name>: <string>
- kernel.loadedmodule:
<module-name>: <exists/does-not-exist>
- kernel.selinux:
enabled: <bool>
- kernel.version:
major: <int>
minor: <int>
revision: <int>
full: <string>
- system.osrelease:
<key-name>: <string>
VERSION_ID.major: <int>
VERSION_ID.minor: <int>
- system.name:
nodename: <string>
- pci.device:
<device-instance>:
class: <string>
vendor: <string>
device: <string>
subsystem_vendor: <string>
susbystem_device: <string>
sriov_totalvfs: <int>
- usb.device:
<device-instance>:
class: <string>
vendor: <string>
device: <string>
serial: <string>
- local.label:
<label-name>: <string>
The configuration also supports some "shortforms" for convenience:
matchExpressions: [<attr-1>, <attr-2>=<val-2>]
---
matchExpressions:
<attr-3>:
<attr-4>: <val-4>
is equal to:
matchExpressions:
<attr-1>: {op: Exists}
<attr-2>: {op: In, value: [<val-2>]}
---
matchExpressions:
<attr-3>: {op: Exists}
<attr-4>: {op: In, value: [<val-4>]}
In other words:
- feature: kernel.config
matchExpressions: ["X86", "INIT_ENV_ARG_LIMIT=32"]
- feature: pci.device
matchExpressions:
vendor: "8086"
is the same as:
- feature: kernel.config
matchExpressions:
X86: {op: Exists}
INIT_ENV_ARG_LIMIT: {op: In, values: ["32"]}
- feature: pci.device
matchExpressions:
vendor: {op: In, value: ["8086"]
Some configuration examples below. In order to match a CPUID feature the
following snippet can be used:
- name: cpu-test-1
labels:
cpu-custom-feature: "true"
matchFeatures:
- feature: cpu.cpuid
matchExpressions:
AESNI: {op: Exists}
AVX: {op: Exists}
In order to match against a loaded kernel module and OS version:
- name: kernel-test-1
labels:
kernel-custom-feature: "true"
matchFeatures:
- feature: kernel.loadedmodule
matchExpressions:
e1000: {op: Exists}
- feature: system.osrelease
matchExpressions:
NAME: {op: InRegexp, values: ["^openSUSE"]}
VERSION_ID.major: {op: Gt, values: ["14"]}
In order to require a kernel module and both of two specific PCI devices:
- name: multi-device-test
labels:
multi-device-feature: "true"
matchFeatures:
- feature: kernel.loadedmodule
matchExpressions:
driver-module: {op: Exists}
- pci.device:
vendor: "8086"
device: "1234"
- pci.device:
vendor: "8086"
device: "abcd"
* Simplify NFD worker service configuration in Helm
Signed-off-by: Elias Koromilas <elias.koromilas@gmail.com>
* Update docs/get-started/deployment-and-usage.md
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
Drop --sleep-interval from the template. We really don't want to do that
as. First, it's the default value so no use repeating that in the
template. And more importantly, the commandline flag will override
anything that will be provided in the worker config file, making it
impossible for users to specify the sleep interval (other than by
editing the template directly).
The base should really have the very bare minimum. Remove all redundant
(at default-value) args and move the others to the specific
topologyupdater kustomize component. This also makes these settings
re-usable in user-specific overlays (that are not based on
topologyupdater-daemonset).
Align "topologyupdater" overlay with "topologyupdater-job". Both should
deploy topologyupdater as a standalone application. Previously the
topologyupdater overlay did not deploy nfd-master at all (but deployed
nfd-worker instead) causing the pods to end up in crashloopbackoff as
there was no master to communicate with.
- create an overlay for deployment of all components
- create an overlay for just topologyupdater deployment (to be deployed in
conjunction with the default overlay)
- create a separate overlay for deployment of master and topologyupdater-job
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
This commit makes the mount of /usr/src optional in the Helm chart, and
removes it from the kustomization. Reason is that some systems do not
have a /usr/src (such as Talos) *and* have a R/O filesystem. Since
/usr/src is optional per FHS 3.0, NFD should not assume its presence.
Signed-off-by: Jorik Jonker <jorik@kippendief.biz>
Replicates nfd-daemonset-combined.yaml.template.
In addition to the overlay we need to add a separate set of patches
under components/common in order to handle the double-container pod.
Implement functionality virtually replicating deployment templates for
nfd-master and nfd-worker daemonset (nfd-master.yaml.template and
nfd-worker-daemonset.yaml.template) by adding a kustomize overlay named
"default".
We split the resources into multiple bases (rbac, master and
worker-daemonset) so that relevant parts are re-usable in
other deployment scenarios added later (e.g. "one-shot job", and
"combined daemonset").
This patch adds one component (components/common) doing the required
kustomization for the example deployment.
The naming was changed in when with cpuid v2
(github.com/klauspost/cpuid/v2) and we didn't catch this in NFD. No
issue reports of the inadvertent naming change so let's just adapt to
the updated naming in NFD configuration. The SSE4* labels are disabled
by default so they're not widely used, if at all.
Mount /usr/lib and /usr/src as /host-usr/lib and /host-usr/src inside the pod
to allow NFD to search for the kernel configuration file inside /usr.
This solves the problem of the kernel config file not being present in /boot
on s390x RHCOS.
Signed-off-by: Jan Schintag <jan.schintag@de.ibm.com>
There are cases when the only available metadata for discovering
features is the node's name. The "nodename" rule extends the custom
source and matches when the node's name matches one of the given
nodename regexp patterns.
It is also possible now to set an optional "value" on custom rules,
which overrides the default "true" label value in case the rule matches.
In order to allow more dynamic configurations without having to modify
the complete worker configuration, custom rules are additionally read
from a "custom.d" directory now. Typically that directory will be filled
by mounting one or more ConfigMaps.
Signed-off-by: Marc Sluiter <msluiter@redhat.com>
This commit adds Helm chart for node-feature-discovery
Signed-off-by: Adrian Chiris <adrianc@nvidia.com>
Signed-off-by: Ivan Kolodiazhnyi <ikolodiazhny@nvidia.com>