Let's refactor part of the getCgroupMiscCapacity() out to its own
retrieveCgroupMiscCapacityValue(), for the legibility sake.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We've been only considering cgroupsv2 when trying to read misc.capacity.
However, there are still a bunch of systems out there relying on
cgroupsv1.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We have deprecated hooks in v0.12.0 but kept it enabled by default.
Starting from v0.14 we are starting to disable it by default and
plan to fully remove it in the near future.
Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>
NFD already has the capability to discover whether baremetal / host
machines support Intel TDX. Now, the next step is to add support for
discovering whether a node is TDX protected (as in, a virtual machine
started using Intel TDX).
In order to do so, we've decided to go for a new `cpu-security.tdx`
property, called `protected` (`cpu-security.tdx.protected`).
Signed-off-by: Hairong Chen <hairong.chen@intel.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Drop the KlogDump helper in favor of klog.InfoS. However, that patch
introduces a new DelayedDumper() helper to avoid processing
(marshalling) of object unless really evaluated by the logging function.
This patch add SEV ASIDs and the related (but distinct) SEV Encrypted State
(SEV-ES) IDs as two quantities to be exposed via extended resources.
In a kernel built with CONFIG_CGROUP_MISC on a suitably equipped AMD CPU, the
root control group will have a misc.capacity file that shows the number of
available IDs in each category.
The added extended resources are:
- sev.asids
- sev.encrypted_state_ids
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
This PR adds the combination of dynamic and builtin kernel modules into
one feature called `kernel.enabledmodule`. It's a superset of the
`kernel.loadedmodule` feature.
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
The total amount of keys that can be used on a specific TDX system is
exposed via the cgroups misc.capacity. See:
```
$ cat /sys/fs/cgroup/misc.capacity
tdx 31
```
The first step to properly manage the amount of keys present in a node
is exposing it via the NFD, and that's exactly what this commit does.
An example of how it ends up being exposed via the NFD:
```
$ kubectl get node 984fee00befb.jf.intel.com -o jsonpath='{.metadata.labels}' | jq | grep tdx.total_keys
"feature.node.kubernetes.io/cpu-security.tdx.total_keys": "31",
```
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Scan the mounted host sysfs instead of hard coded /sys mount point.
Currently, sysfs devices subdir is not namespaced in Linux (containers
have the same view as the host) so this wasn't an issue in practice.
However, this change should make the code more future proof and align
usb with other sysfs detection in nfd.
Flatten the data structure that stores features, dropping the "domain"
level from the data model. That extra level of hierarchy brought little
benefit but just caused some extra complexity, instead. The new
structure nicely matches what we have in the NodeFeatureRule object (the
matchFeatures field of uses the same flat structure with the "feature"
field having a value <domain>.<feature>, e.g. "kernel.version").
This is pre-work for introducing a new "node feature" CRD that contains
the raw feature data. It makes the life of both users and developers
easier when both CRDs, plus our internal code, handle feature data in a
similar flat structure.
Move the previously-protobuf-only internal "feature api" over to the
public "nfd api" package. This is in preparation for introducing a new
CRD API for communicating features.
This patch carries no functional changes. Just moving code around.
Refactor the code, moving the hostpath helper functionality to new
"pkg/utils/hostpath" package. This breaks odd-ish dependency
"pkg/utils" -> "source".
Set `cpu-security.tdx.enable` to `true` when TDX is avialable and has
been enabled. otherwise it'll be set to `false`.
`/sys/module/kvm_intel/parameters/tdx` presence and content is used to
detect whether a CPU is Intel TDX capable.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Revert the hack that was a workaround for issues with k8s deepcopy-gen.
New deepcopy-gen is able to generate code correctly without issues so
this is not needed anymore.
Also, removing this hack solves issues with object validation when
creating NodeFeatureRules programmatically with nfd go-client. This is
needed later with NodeFeatureRules e2e-tests.
Logically reverts f3cc109f99.
Move existing security/trusted-execution related features (i.e. SGX and
SE) under the same "security" feature, deprecating the old features. The
motivation for the change is to keep the source code and user interface
more organized as we experience a constant inflow of similar security
related features. This change will affect the user interface so it is
less painful to do it early on.
New feature labels will be:
feature.node.kubernetes.io/cpu-security.se.enabled
feature.node.kubernetes.io/cpu-security.sgx.enabled
and correspondingly new "cpu.security" feature with "se.enabled" and
"sgx.enabled" elements will be available for custom rules, for example:
- name: "sample sgx rule"
labels:
sgx.sample.feature: "true"
matchFeatures:
- feature: cpu.security
matchExpressions:
"sgx.enabled": {op: IsTrue}
At the same time deprecate old labels "cpu-sgx.enabled" and
"cpu-se.enabled" feature labels and the corresponding features for
custom rules. These will be removed in the future causing an effective
change in NFDs user interface.
Ignore the operational state of network interface when creating the
network SR-IOV labels. Previously NFD only considered interfaces which
were "up".
Pre v0.9 we used to check the "administrative state" of interfaces
(managed by the sysadmin with e.g. with ip link set dev <dev> down/up).
In v0.10 we changed to checking the "operational state" of interfaces,
reflecting whether the it is actually able to transfer data. Both these
checks have caused confusion among users and it is more understandable
and more aligned with other HW discovery functions in NFD to just drop
the state check. Also, the documentation is aligned with this behavior.
Set `cpu.se-enabled` to `true` when IBM Secure Execution for Linux
(IBM Z & LinuxONE) is available and has been enabled.
Uses `/sys/firmware/uv/prot_virt_host`, which is available in kernels
>=5.12 + backports. For simplicity, skip more complicated facility &
kernel cmdline lookups.
Discover "iommu/intel-iommu/version" sysfs attribute for pci devices.
This information is available for custom label rules.
An example custom rule:
- name: "iommu version rule"
labels:
iommu.version_1: "true"
matchFeatures:
- feature: pci.device
matchExpressions:
"iommu/intel-iommu/version": {op: In, value: ["1:0"]}
* fix linter issues for few files
* fix linter issue of exported const Name should have comment or be unexported
* fix name lint issue and resolve lints
* add changes to comments
Add "iommu_group/type" to the list of PCI device attributes that are
discovered. The value is the raw value from sysfs (i.e DMA, DMA-FQ or
identity).
No built-in (automatic) labels are generated based on this, but, the
attribute is available for custom label rules to use. Examples of custom
rules:
- name: "iommu enabled rule"
labels:
iommu.enabled: "true"
matchFeatures:
- feature: pci.device
matchExpressions:
"iommu_group/type": {op: NotIn, value: ["unknown"]}
- name: "iommu passthrough rule"
labels:
iommu.passthrough: "true"
matchFeatures:
- feature: pci.device
matchExpressions:
"iommu_group/type": {op: In, value: ["identity"]}
Implicitly injecting the filename of the hook/featurefile into the name
of the label is confusing, counter-intuitive and unnecessarily complex
to understand. It's much clearer to advertise features and labels as
presented in the feature file / output of the hook.
NOTE: this breaks backwards compatibility with usage scenarios that rely
on prefixing the label with the filename.
Do not prefix label names from the new matchFeatures/matchAny custom
rules with "custom-". We want to have the same result (set of labels)
from a rule independent of whether it has been specified in worker
config or in a NodeFeatureRule CRs. Legacy matchOn rules (not available
in NodeFeatureRule CRs) are intact, i.e. still prefixed, in order to
retain backwards compatibility.
Stop converting "=y" and "=m" to "true" for the raw feature values used
in "kernel.config" custom rule processing.
In practice, this means that to check if a kernel config flag has been
set to "y" or "m", one needs to explicitly check for both of the values:
matchFeatures:
- feature: kernel.config
matchExpressions:
FOO: {op: In, value: ["y", "m"]}
instead of (how it used to be):
matchFeatures:
- feature: kernel.config
matchExpressions:
FOO: {op: IsTrue}
The legacy kconfig custom rule is unchanged as are the
kernel-config.<flag> feature labels.
Do not do length checking here. We do not need/want to limit the values
here because they could still be used in custom rules. Moreover, we do
more proper validation of label all label values in nfd-worker, anyway.
Support backreferencing of output values from previous rules. Enables
complex rule setups where custom features are further combined together
to form even more sophisticated higher level labels. The labels created
by preceding rules are available as a special 'rule.matched' feature
(for matchFeatures to use).
If referencing rules accross multiple configs/CRDs care must be taken
with the ordering. Processing order of rules in nfd-worker:
1. Static rules
2. Files from /etc/kubernetes/node-feature-discovery/custom.d/
in alphabetical order. Subdirectories are processed by reading their
files in alphabetical order.
3. Custom rules from main nfd-worker.conf
In nfd-master, NodeFeatureRule objects are processed in alphabetical
order (based on their metadata.name).
This patch also adds new 'vars' fields to the rule spec. Like 'labels',
it is a map of key-value pairs but no labels are generated from these.
The values specified in 'vars' are only added for backreferencing into
the 'rules.matched' feature. This may by desired in schemes where the
output of certain rules is only used as intermediate variables for other
rules and no labels out of these are wanted.
An example setup:
- name: "kernel feature"
labels:
kernel-feature:
matchFeatures:
- feature: kernel.version
matchExpressions:
major: {op: Gt, value: ["4"]}
- name: "intermediate var feature"
vars:
nolabel-feature: "true"
matchFeatures:
- feature: cpu.cpuid
matchExpressions:
AVX512F: {op: Exists}
- feature: pci.device
matchExpressions:
vendor: {op: In, value: ["8086"]}
device: {op: In, value: ["1234", "1235"]}
- name: top-level-feature
matchFeatures:
- feature: rule.matched
matchExpressions:
kernel-feature: "true"
nolabel-feature: "true"
Commit 0945019161 changed the behavior so
that NFD started to advertise also "false" status of selinux.enabled
label. This patch reverts this behavior (i.e. we only have
selinux.enabled=true). The rationale behind is avoiding any excess
labels - selinux.enabled=false label would be pointless noise in most
deployments.
Separate feature discovery and creation of feature labels.
Generalize the discovery of nvdimm devices so that they can be matched
in custom label rules in a similar fashion as pci and usb devices.
Available attributes for matching nvdimm devices are limited to:
- devtype
- mode
For numa we now detect the number of numa nodes which can be matched
agains in custom label rules.
Labels created by the memory feature source are unchanged. The new
features being detected are available in custom rules only.
Example custom rule:
- name: "my memory rule"
labels:
my-memory-feature: "true"
matchFeatures:
- feature: memory.numa
matchExpressions:
"node_count": {op: Gt, value: ["3"]}
- feature: memory.nv
matchExpressions:
"devtype" {op: In, value: ["nd_dax"]}
Also, add minimalist unit test.
Separate feature discovery and creation of feature labels. Generalize
the feature discovery so that network devices can be matched in custom
label rules in a similar fashion as pci and usb devices. Available
attributes for matching are:
- operstate
- speed
- sriov_numvfs
- sriov_totalvfs
Labels created by the network feature source are unchanged. The new
features being detected are available in custom rules only.
Example custom rule:
- name: "my network rule"
labels:
my-network-feature: "true"
matchFeatures:
- feature: network.device
matchExpressions:
"operstate": { op: In, value: ["up"] }
"sriov_numvfs": { op: Gt, value: ["9"] }
Also, add minimalist unit test.
Separate feature discovery and creation of feature labels. Generalize
the feature discovery so that block devices can be matched in custom
label rules in a similar fashion as pci and usb devices. This extends
the discovery to other block queue attributes than 'rotational': now we
also detect 'dax', 'nr_zones' and 'zoned'.
Labels created by the storage feature source are unchanged. The new
features being detected are available in custom rules only.
Example custom rules:
- name: "my block rule 1"
labels:
my-block-feature-1: "true"
matchFeatures:
- feature: storage.block
"rotational": {op: In, value: ["0"]}
- name: "my block rule 2"
labels:
my-block-feature-2: "true"
matchFeatures:
- feature: storage.block
"zoned": {op: In, value: [“host-aware”, “host-managed”]}
Also, add minimalist unit test.
Move the rule processing of matchFeatures and matchAny from
source/custom package over to pkg/apis/nfd, aiming for better integrity
and re-usability of the code. Does not change the CRD API as such, just
adds more supportive functions.
Create a new package pkg/apis/nfd/v1alpha1 and migrate the custom rule
expressions over there. This is the first step in creating a new CRD API
for custom rules.
Print out the result of applying an expression. Also, truncate the
output to max 10 elements (of items matched against) unless '-v 4'
verbosity level is in use.
Implement a new 'matchAny' directive in the new rule format, building on
top of the previously implemented 'matchFeatures' matcher. MatchAny
applies a logical OR over multiple matchFeatures directives. That is, it
allows specifying multiple alternative matchers (at least one of which
must match) in a single label rule.
The configuration format for the new matchers is
matchAny:
- matchFeatures:
- feature: <domain>.<feature>
matchExpressions:
<attribute>:
op: <operator>
value:
- <list-of-values>
- matchFeatures:
...
A configuration example. In order to require a cpu feature, kernel
module and one of two specific PCI devices (taking use of the shortform
notation):
- name: multi-device-test
labels:
multi-device-feature: "true"
matchFeatures:
- feature: kernel.loadedmodule
matchExpressions: [driver-module]
- feature: cpu.cpuid
matchExpressions: [AVX512F]
matchAny:
- matchFeatures:
- feature; pci.device
matchExpressions:
vendor: "8086"
device: "1234"
- matchFeatures:
- feature: pci.device
matchExpressions:
vendor: "8086"
device: "abcd"
Implement generic feature matchers that cover all feature sources (that
implement the FeatureSource interface). The implementation relies on the
unified data model provided by the FeatureSource interface as well as
the generic expression-based rule processing framework that was added to
the source/custom/expression package.
With this patch any new features added will be automatically available
for custom rules, without any additional work. Rule hierarchy follows
the source/feature hierarchy by design.
This patch introduces a new format for custom rule specifications,
dropping the 'value' field and introducing new 'labels' field which
makes it possible to specify multiple labels per rule. Also, in the new
format the 'name' field is just for reference and no matching label is
created. The new generic rules are available in this new rule format
under a 'matchFeatures. MatchFeatures implements a logical AND over
an array of per-feature matchers - i.e. a match for all of the matchers
is required. The goal of the new rule format is to make it better follow
K8s API design guidelines and make it extensible for future enhancements
(e.g. addition of templating, taints, annotations, extended resources
etc).
The old rule format (with cpuID, kConfig, loadedKMod, nodename, pciID,
usbID rules) is still supported. The rule format (new vs. old) is
determined at config parsing time based on the existence of the
'matchOn' field.
The new rule format and the configuration format for the new
matchFeatures field is
- name: <rule-name>
labels:
<key>: <value>
...
matchFeatures:
- feature: <domain>.<feature>
matchExpressions:
<attribute>:
op: <operator>
value:
- <list-of-values>
- feature: <domain>.<feature>
...
Currently, "cpu", "kernel", "pci", "system", "usb" and "local" sources
are covered by the matshers/feature selectors. Thus, the following
features are available for matching with this patch:
- cpu.cpuid:
<cpuid-flag>: <exists/does-not-exist>
- cpu.cstate:
enabled: <bool>
- cpu.pstate:
status: <string>
turbo: <bool>
scaling_governor: <string>
- cpu.rdt:
<rdt-feature>: <exists/does-not-exist>
- cpu.sst:
bf.enabled: <bool>
- cpu.topology:
hardware_multithreading: <bool>
- kernel.config:
<flag-name>: <string>
- kernel.loadedmodule:
<module-name>: <exists/does-not-exist>
- kernel.selinux:
enabled: <bool>
- kernel.version:
major: <int>
minor: <int>
revision: <int>
full: <string>
- system.osrelease:
<key-name>: <string>
VERSION_ID.major: <int>
VERSION_ID.minor: <int>
- system.name:
nodename: <string>
- pci.device:
<device-instance>:
class: <string>
vendor: <string>
device: <string>
subsystem_vendor: <string>
susbystem_device: <string>
sriov_totalvfs: <int>
- usb.device:
<device-instance>:
class: <string>
vendor: <string>
device: <string>
serial: <string>
- local.label:
<label-name>: <string>
The configuration also supports some "shortforms" for convenience:
matchExpressions: [<attr-1>, <attr-2>=<val-2>]
---
matchExpressions:
<attr-3>:
<attr-4>: <val-4>
is equal to:
matchExpressions:
<attr-1>: {op: Exists}
<attr-2>: {op: In, value: [<val-2>]}
---
matchExpressions:
<attr-3>: {op: Exists}
<attr-4>: {op: In, value: [<val-4>]}
In other words:
- feature: kernel.config
matchExpressions: ["X86", "INIT_ENV_ARG_LIMIT=32"]
- feature: pci.device
matchExpressions:
vendor: "8086"
is the same as:
- feature: kernel.config
matchExpressions:
X86: {op: Exists}
INIT_ENV_ARG_LIMIT: {op: In, values: ["32"]}
- feature: pci.device
matchExpressions:
vendor: {op: In, value: ["8086"]
Some configuration examples below. In order to match a CPUID feature the
following snippet can be used:
- name: cpu-test-1
labels:
cpu-custom-feature: "true"
matchFeatures:
- feature: cpu.cpuid
matchExpressions:
AESNI: {op: Exists}
AVX: {op: Exists}
In order to match against a loaded kernel module and OS version:
- name: kernel-test-1
labels:
kernel-custom-feature: "true"
matchFeatures:
- feature: kernel.loadedmodule
matchExpressions:
e1000: {op: Exists}
- feature: system.osrelease
matchExpressions:
NAME: {op: InRegexp, values: ["^openSUSE"]}
VERSION_ID.major: {op: Gt, values: ["14"]}
In order to require a kernel module and both of two specific PCI devices:
- name: multi-device-test
labels:
multi-device-feature: "true"
matchFeatures:
- feature: kernel.loadedmodule
matchExpressions:
driver-module: {op: Exists}
- pci.device:
vendor: "8086"
device: "1234"
- pci.device:
vendor: "8086"
device: "abcd"
Implement a framework for more flexible rule configuration and matching,
mimicking the MatchExpressions pattern from K8s nodeselector.
The basic building block is MatchExpression which contains an operator
and a list of values. The operator specifies that "function" that is
applied when evaluating a given input agains the list of values.
Available operators are:
- MatchIn
- MatchNotIn
- MatchInRegexp
- MatchExists
- MatchDoesNotExist
- MatchGt
- MatchLt
- MatchIsTrue
- MatchIsFalse
Another building block of the framework is MatchExpressionSet which is a
map of string-MatchExpression pairs. It is a helper for specifying
multiple expressions that can be matched against a set of set of
features.
This patch converts all existing custom rules to utilize the new
expression-based framework.