Clarify that we account, and we can account, only
resources exclusively allocated to Guaranteed QoS pods.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Rename files and parameters. Drop the container security context
parameters from the Helm chart. There should be no reason to run the
nfd-gc with other than the minimal privileges.
Also updates the documentation.
Add a counter for total number of node update/sync requests. In
practice, this counts the number of gRPC requests received if the gRPC
API is in use. If the NodeFeature API is enabled, this counts the
requests initiated by the NFD API controller, i.e. updates triggered by
changes in NodeFeature or NodeFeatureRule objects plus updates initiated
by the controller resync period.
Add a counter for errors encountered when processing NodeFeatureRules.
Another simple counter without any additional prometheus labels -
nfd-master logs can provide further details.
Add a new counter for tracking node update failures from nfd-master.
This tracks both normal feature updates and the --prune sub-command.
This is a simple counter without any additional labels - nfd-master logs
can be used for further diagnostics.
Expose metrics via prometheus.monitoring.coreos.com/v1
The exposed metrics are
| Metric | Type | Meaning |
| --------------- | ---------------- | ---------------- |
| `nfd_master_build_info` | Gauge | Version from which nfd-master was built. |
| `nfd_worker_build_info` | Gauge | Version from which nfd-worker was built. |
| `nfd_updated_nodes` | Counter | Time taken to label a node |
| `nfd_crd_processing_time` | Gauge | Time taken to process a NodeFeatureRule CRD |
| `nfd_feature_discovery_duration_seconds` | HistogramVec | Time taken to discover features on a node |
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
We have deprecated hooks in v0.12.0 but kept it enabled by default.
Starting from v0.14 we are starting to disable it by default and
plan to fully remove it in the near future.
Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>
NFD already has the capability to discover whether baremetal / host
machines support Intel TDX. Now, the next step is to add support for
discovering whether a node is TDX protected (as in, a virtual machine
started using Intel TDX).
In order to do so, we've decided to go for a new `cpu-security.tdx`
property, called `protected` (`cpu-security.tdx.protected`).
Signed-off-by: Hairong Chen <hairong.chen@intel.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR aims to optimize the process of updating nodes with
corresponding features. In fact, previously, we were updating nodes
sequentially even though they are independent from each other.
Therefore, we integrated new components: LabelersNodePool which is
responsible for spininng a goroutine whenever there's a request for
updating nodes, and a Workqueue which is responsible for holding nodes names
that should be updated.
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
This PR aims to support the dynamic values for labels in the
NodeFeatureRule CRD, it would offer more flexible labeling for users.
To achieve this, we check whether label value starts with "@", and if
it's the case, we will get the value of the feature value, and update
the value of the label with the feature value.
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
It allows NFD-master to be run in active-passive way when running
multiple instances of NFD-master to prevent multiple components
from updating same custom resources.
Signed-off-by: PiotrProkop <pprokop@nvidia.com>
Use a separate NODE_ADDRESS environment variable in the default value of
-kubelet-config-uri (instead of NODE_NAME that was previously used).
Also change the kustomize and Helm deployments to set this variable to
node IP address. This should make the default deployment more robust,
making it work in scenarios where node name does not resolve to the node
ip, e.g. nodename != hostname.
Change the configuration so that, by default, we use a dedicated
serviceaccount for topology-updater (similar to topology-gc, nfd-master
and nfd-worker).
Fix the templates so that the serviceaccount and clusterrolebinding are
only created when topology-updater is enabled (clusterrole was already
handled this way).
This patch also correctly documents the default value of rbac.create
parameter of topology-updater and topology-gc.
Make it possible to disable kubelet state tracking with
--set topologyUpdater.kubeletStateFiles="" as the documentation
suggests.
Also, fix the documentation regarding the default value of
topologyUpdater.kubeletStateFiles parameter.
Commit bfbc47f55e added a lot of those and
this patch tries to cover all that we missed there. Having .md suffixes
in references to internal files makes it convenient to browse the
document locally, just as text files as the references work correctly.
This PR adds a config option for setting the NFD API controller resync period.
The resync period is only activated when the NodeFeature API has been
enabled (with -enable-nodefeature-api).
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
This patch add SEV ASIDs and the related (but distinct) SEV Encrypted State
(SEV-ES) IDs as two quantities to be exposed via extended resources.
In a kernel built with CONFIG_CGROUP_MISC on a suitably equipped AMD CPU, the
root control group will have a misc.capacity file that shows the number of
available IDs in each category.
The added extended resources are:
- sev.asids
- sev.encrypted_state_ids
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
This PR adds the combination of dynamic and builtin kernel modules into
one feature called `kernel.enabledmodule`. It's a superset of the
`kernel.loadedmodule` feature.
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
Mark the -resource-labels flag (and the corresponding resourceLabels
config option) as deprecated. We now support managing extended resources
via NodeFeatureRule objects. This kludge deserves to go, eventually.
Add support for management of Extended Resources via the
NodeFeatureRule CRD API.
There are usage scenarios where users want to advertise features
as extended resources instead of labels (or annotations).
This patch enables the discovery of extended resources, via annotation
and patch of node.status.capacity and node.status.allocatable. By using
the NodeFeatureRule API.
Co-authored-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
Co-authored-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Disallow taints having a key with "kubernetes.io/" or "*.kubernetes.io/"
prefix. This is a precaution to protect the user from messing up with
the "official" well-known taints from Kubernetes itself. The only
exception is that the "nfd.node.kubernetes.io/" prefix is allowed.
However, there is one allowed NFD-specific namespace (and its
sub-namespaces) i.e. "feature.node.kubernetes.io" under the
kubernetes.io domain that can be used for NFD-managed taints.
Also disallow unprefixed taint keys. We don't add a default prefix to
unprefixed taints (like we do for labels) from NodeFeatureRules. This is
to prevent unpleasant surprises to users that need to manage matching
tolerations for their workloads.
Document built-in RDT labels to be deprecated and removed in a future
release. The plan is that the default built-in RDT labels would not be
created anymore, but the RDT features would still be available for
NodeFeatureRules to consume.
The RDT labels are not very useful (they don't e.g indicate if the
features are really enabled in kernel or if the resctrlfs is mounted).
Similar to the nfd-worker, in this PR we want to support the
dynamic run-time configurability through a config file for the nfd-master.
We'll use a json or yaml configuration file along with the fsnotify in
order to watch for changes in the config file. As a result, we're
allowing dynamic control of logging params, allowed namespaces,
extended resources, label whitelisting, and denied namespaces.
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
The total amount of keys that can be used on a specific TDX system is
exposed via the cgroups misc.capacity. See:
```
$ cat /sys/fs/cgroup/misc.capacity
tdx 31
```
The first step to properly manage the amount of keys present in a node
is exposing it via the NFD, and that's exactly what this commit does.
An example of how it ends up being exposed via the NFD:
```
$ kubectl get node 984fee00befb.jf.intel.com -o jsonpath='{.metadata.labels}' | jq | grep tdx.total_keys
"feature.node.kubernetes.io/cpu-security.tdx.total_keys": "31",
```
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Access to the kubelet state directory may raise concerns in some setups, added an option to disable it.
The feature is enabled by default.
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
Bump cpuid version to v2.2.4 in the go.mod so that WRMSRNS (
Non-Serializing Write to Model Specific Register) and MSRLIST
(Read/Write List of Model Specific Registers) instructions are
detectable.
Signed-off-by: Muyassarov, Feruzjon <feruzjon.muyassarov@intel.com>
Make distroless/base as the base image for the default image,
effectively making the minimal image as the default. Add a new "full"
image variant that corresponds the previous default image. The
"*-minimal" container image tag is provided for backwards compatibility.
The practical user impact of this change is that hook support is limited
to statically linked ELF binaries. Bash or Perl scripts are not
supported by the default image, anymore, but the new "full" image
variant can be used for backwards compatibility.
Bump cpuid to v2.2.3 which adds support for detecting Intel Sierra
Forest instructions like AVXIFMA, AVXNECONVERT, AVXVNNIINT8 and
CMPCCXADD.
Signed-off-by: Muyassarov, Feruzjon <feruzjon.muyassarov@intel.com>
Document the usage of the NodeFeature CRD API. Also re-organize the
documentation a bit, moving the description of NodeFeatureRule
controller from customization guide to nfd-master usage page.
Deprecate the '-featurerules-controller' command line flag as the name
does not describe the functionality anymore: in practice it controls the
CRD controller handling both NodeFeature and NodeFeatureRule objects.
The patch introduces a duplicate, more generally named, flag
'-crd-controller'. A warning is printed in the log if
'-featurerules-controller' flag is encountered.
Add initial support for handling NodeFeature objects. With this patch
nfd-master watches NodeFeature objects in all namespaces and reacts to
changes in any of these. The node which a certain NodeFeature object
affects is determined by the "nfd.node.kubernetes.io/node-name"
annotation of the object. When a NodeFeature object targeting certain
node is changed, nfd-master needs to process all other objects targeting
the same node, too, because there may be dependencies between them.
Add a new command line flag for selecting between gRPC and NodeFeature
CRD API as the source of feature requests. Enabling NodeFeature API
disables the gRPC interface.
-enable-nodefeature-api enable NodeFeature CRD API for incoming
feature requests, will disable the gRPC
interface (defaults to false)
It is not possible to serve gRPC and watch NodeFeature objects at the
same time. This is deliberate to avoid labeling races e.g. by nfd-worker
sending gRPC requests but NodeFeature objects in the cluster
"overriding" those changes (labels from the gRPC requests will get
overridden when NodeFeature objects are processed).
Support the new NodeFeatures object of the NFD CRD api. Add two new
command line options to nfd-worker:
-kubeconfig specifies the kubeconfig to use for
connecting k8s api (defaults to empty which
implies in-cluster config)
-enable-nodefeature-api enable the NodeFeature CRD API for
communicating node features to nfd-master,
will also automatically disable gRPC
(defgault to false)
No config file option for selecting the API is available as there should
be no need for dynamically selecting between gRPC and CRD. The
nfd-master configuration must be changed in tandem and it is safer (and
avoid awkward configuration races) to configure the whole NFD deployment
at once.
Default behavior of nfd-worker is not changed i.e. NodeFeatures object
creation is not enabled by default (but must be enabled with the command
line flag).
The patch also updates the kustomize and Helm deployment, adding RBAC
rules for nfd-worker and updating the example worker configuration.
Drop the gRPC communication to nfd-master and connect to the Kubernetes
API server directly when updating NodeResourceTopology objects.
Topology-updater already has connection to the API server for listing
Pods so this is not that dramatic change. It also simplifies the code
a lot as there is no need for the NFD gRPC client and no need for
managing TLS certs/keys.
This change aligns nfd-topology-updater with the future direction of
nfd-worker where the gRPC API is being dropped and replaced by a
CRD-based API.
This patch also update deployment files and documentation to reflect
this change.
Add a reference to the label rule format in the NodeFeatureRule section.
Also make it explicit in the beginning of Hooks section that hooks are
deprecated.
Add a separate page for describing the custom resources used by NFD.
Simplify the Introduction page by moving the details of
NodeResourceTopology from there. Similarly, drop long
NodeResourceTopology example from the quick-start page, making the page
shorter and simpler.
Drop the following flags that were deprecated already in v0.8.0:
-sleep-interval (replaced by core.sleepInterval config file option)
-label-whitelist (replaced by core.labelWhiteList config file option)
-sources (replaced by -label-sources flag)
Introduce two main sections "Deployment" and "Usage" and move "Developer
guide" to the top level, too. In particular, split the huge
deployment-and-usage file into multiple parts under the new main
sections. Move customization guide from "Advanced" to "Usage".
This patch also renames "Advanced" to "Reference" as only that is left
there is reference documentation.
Set `cpu-security.tdx.enable` to `true` when TDX is avialable and has
been enabled. otherwise it'll be set to `false`.
`/sys/module/kvm_intel/parameters/tdx` presence and content is used to
detect whether a CPU is Intel TDX capable.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
In some cases (CI) it is useful to run NFD e2e tests using
ephemeral clusters. To save time and bandwidth, it is also useful
to prime the ephemeral cluster with the images under test.
In these circumstances there is no risk of running a stale image,
and having a `Always` PullPolicy hardcoded actually makes
the whole exercise null.
So we add a new option, disabled by default, to make the e2e
manifest use the `IfNotPresent` pull policy, to effectively
cover this use case.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Move existing security/trusted-execution related features (i.e. SGX and
SE) under the same "security" feature, deprecating the old features. The
motivation for the change is to keep the source code and user interface
more organized as we experience a constant inflow of similar security
related features. This change will affect the user interface so it is
less painful to do it early on.
New feature labels will be:
feature.node.kubernetes.io/cpu-security.se.enabled
feature.node.kubernetes.io/cpu-security.sgx.enabled
and correspondingly new "cpu.security" feature with "se.enabled" and
"sgx.enabled" elements will be available for custom rules, for example:
- name: "sample sgx rule"
labels:
sgx.sample.feature: "true"
matchFeatures:
- feature: cpu.security
matchExpressions:
"sgx.enabled": {op: IsTrue}
At the same time deprecate old labels "cpu-sgx.enabled" and
"cpu-se.enabled" feature labels and the corresponding features for
custom rules. These will be removed in the future causing an effective
change in NFDs user interface.
Update the partial list of x86 cpuid features that are presented in the
NFD documentation. In particular, the following instructions were left
out of the list: AVXSLOW, CETIBT, CETSS, CLDEMOTE, HLE, MPX, RTM,
RTM_ALWAYS_ABORT, SERIALIZE, SHA, TSXLDTRK.
Let the documentation follow the latest release name. Even if it's just
referential here it would look odd in the future if we refer to some
ancient version.
Set `cpu.se-enabled` to `true` when IBM Secure Execution for Linux
(IBM Z & LinuxONE) is available and has been enabled.
Uses `/sys/firmware/uv/prot_virt_host`, which is available in kernels
>=5.12 + backports. For simplicity, skip more complicated facility &
kernel cmdline lookups.
This patch changes a rare corner case of custom label rules with an
empty set of matchexpressions. The patch removes a special case where an
empty match expression set matched everything and returned all feature
elements for templates to consume. With this patch the match expression
set logically evaluates all expressions in the set and returns all
matches - if there are no expressions there are no matches and no
matched features are returned. However, the overall match result
(determining if "non-template" labels will be created) in this special
case will be "true" as before as none of the zero match expressions
failed.
The former behavior was somewhat illogical and counterintuitive: having
1 to N expressions matched and returned 1 to N features (at most), but,
having 0 expressions always matched everything and returned all
features. This was some leftover proof-of-concept functionality (for
some possible future extensions) that should have been removed before
merging.
Change the default K8S_NAMESPACE to node-feature-discovery from
kube-system. The default was changed in the Makefile in commit
5d4484a1d9, but the docs were not updated
to correspond with that.
Discover "iommu/intel-iommu/version" sysfs attribute for pci devices.
This information is available for custom label rules.
An example custom rule:
- name: "iommu version rule"
labels:
iommu.version_1: "true"
matchFeatures:
- feature: pci.device
matchExpressions:
"iommu/intel-iommu/version": {op: In, value: ["1:0"]}
Add "iommu_group/type" to the list of PCI device attributes that are
discovered. The value is the raw value from sysfs (i.e DMA, DMA-FQ or
identity).
No built-in (automatic) labels are generated based on this, but, the
attribute is available for custom label rules to use. Examples of custom
rules:
- name: "iommu enabled rule"
labels:
iommu.enabled: "true"
matchFeatures:
- feature: pci.device
matchExpressions:
"iommu_group/type": {op: NotIn, value: ["unknown"]}
- name: "iommu passthrough rule"
labels:
iommu.passthrough: "true"
matchFeatures:
- feature: pci.device
matchExpressions:
"iommu_group/type": {op: In, value: ["identity"]}
Add cross-referencing links to the helm deployment and configuration
sections. Use correct names for the tls related helm options
(tls.enabled and tls.certManager).
Add a separate customization guide. Move documentation of the custom and
local sources there. Also, cover the new NodeFeatureRules custom
resource and the new expression-based label rule format.
This patch also simplifies the "Feature labels" page, describing
built-in labels. Reformat the tables describing feature labels.
Change the helm chart so that the NodeFeatureRule controller will be
disabled for other than the default deployment (i.e. all deployments
where master.instance is non-empty), unless explicitly set to true. With
this we try to ensure that there is only on controller instance for the
CR, avoiding contention and conflicts.
Move top-level serviceAccount and rbac fields under master, making the
Helm chart more coherent.
Also, drop unused rbac.serviceAccountName and
rbac.serviceAccountAnnotations from values.yaml.
Implicitly injecting the filename of the hook/featurefile into the name
of the label is confusing, counter-intuitive and unnecessarily complex
to understand. It's much clearer to advertise features and labels as
presented in the feature file / output of the hook.
NOTE: this breaks backwards compatibility with usage scenarios that rely
on prefixing the label with the filename.
Add a configuration option for controlling the enabled "raw" feature
sources. This is useful e.g. in testing and development, plus it also
allows fully shutting down discovery of features that are not needed in
a deployment. Supplements core.labelSources which controls the
enablement of label sources.
Use the single-dash (i.e. '-option' instead of '--option') format
consistently accross log messages and documentation. This is the format
that was mostly used, already, and shown by command line help of the
binaries, for example.
Add a new command line flag for disabling/enabling the controller for
NodeFeatureRule objects. In practice, disabling the controller disables
all labels generated from rules in NodeFeatureRule objects.
Implement a simple controller stub that operates on NodeFeatureRule
objects. The controller does not yet have any functionality other than
logging changes in the (NodeFeatureRule) objecs it is watching.
Also update the documentation on the -no-publish flag to match the new
functionality.
Add a cluster-scoped Custom Resource Definition for specifying labeling
rules. Nodes (node features, node objects) are cluster-level objects and
thus the natural and encouraged setup is to only have one NFD deployment
per cluster - the set of underlying features of the node stays the same
independent of how many parallel NFD deployments you have. Our extension
points (hooks, feature files and now CRs) can be be used by multiple
actors (depending on us) simultaneously. Having the CRD cluster-scoped
hopefully drives deployments in this direction. It also should make
deployment of vendor-specific labeling rules easy as there is no need to
worry about the namespace.
This patch virtually replicates the source.custom.FeatureSpec in a CRD
API (located in the pkg/apis/nfd/v1alpha1 package) with the notable
exception that "MatchOn" legacy rules are not supported. Legacy rules
are left out in order to keep the CRD simple and clean.
The duplicate functionality in source/custom will be dropped by upcoming
patches.
This patch utilizes controller-gen (from sigs.k8s.io/controller-tools)
for generating the CRD and deepcopy methods. Code can be (re-)generated
with "make generate". Install controller-gen with:
go install sigs.k8s.io/controller-tools/cmd/controller-gen@v0.7.0
Update kustomize and helm deployments to deploy the CRD.