Add new autoDefaultNs (default is "true") config option to nfd-master.
Setting the config option to false stops NFD from automatically adding
the "feature.node.kubernetes.io/" prefix to labels, annotations and
extended resources. Taints are not affected as for them no prefix is
automatically added. The user-visible part of enabling the option change
is that NodeFeatureRules, local feature files, hooks and configuration
of the "custom" may need to be altereda (if the auto-prefixing is
relied on).
For now, the config option defaults to "true", meaning no change in
default behavior. However, the intent is to change the default to
"false" in a future release, deprecating the option and eventually
removing it (forcing it to "false").
The goal of stopping doing "auto-prefixing" is to simplify the operation
(of nfd and users). Make the naming more straightforward and easier to
understand and debug (kind of WYSIWYG), eliminating peculiar corner
cases:
1. Make validation simpler and unambiguous
2. Remove "overloading" of names, i.e. the mapping two values to the
same actual name. E.g. previously something like
labels:
feature.node.kubernetes.io/foo: bar
foo: baz
Could actually result in node label:
feature.node.kubernetes.io/foo: baz
3. Make the processing/usagee of the "rule.matched" and "local.labels"
feature in NodeFeatureRules unambiguous and more understadable. E.g.
previously you could have node label
"feature.node.kubernetes.io/local-foo: bar" but in the NodeFeatureRule
you'd need to use the unprefixed name "local-foo" or the fully
prefixed name, depending on what was specified in the feature file (or
hook) on the node(s).
NOTE: setting autoDefaultNs to false is a breaking change for users who
rely on automatic prefixing with the default feature.node.kubernetes.io/
namespace. NodeFeatureRules, feature files, hooks and custom rules
(configuration of the "custom" source of nfd-worker) will need to be
altered. Unprefixed labels, annoations and extended resources will be
denied by nfd-master.
Implements three metrics for nfd-gc:
- nfd_gc_build_info: version information of nfd-gc.
- nfd_gc_objects_deleted_total: total number of NodeFeature and
NodeResourceTopology objects deleted by nfd-gc.
- nfd_gc_object_delete_failures_total: number of errors encountered when
deleting NodeFeature and NodeResourceTopology objects.
* Helm - service to be only deployed when needed
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
* Update deployment/helm/node-feature-discovery/templates/service.yaml
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
---------
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
Kubernetes 1.23 has introduced native health probes for gRPC which
can replace grpc_health_probe utility. This commit removes baking
in grpc_health_probe binary into the image and updates related
health checks to use k8s native gRPC.
Signed-off-by: Muyassarov, Feruzjon <feruzjon.muyassarov@intel.com>
Rename files and parameters. Drop the container security context
parameters from the Helm chart. There should be no reason to run the
nfd-gc with other than the minimal privileges.
Also updates the documentation.
Rename the old "topology-gc" to just "gc". Simplify the setup a bit by
including the RBAC rules in the "gc" base.
Note: we don't enable nfd-gc in the default overlay, yet, as the
NodeFeature API isn't enabled (gc is not needed).
Expose metrics via prometheus.monitoring.coreos.com/v1
The exposed metrics are
| Metric | Type | Meaning |
| --------------- | ---------------- | ---------------- |
| `nfd_master_build_info` | Gauge | Version from which nfd-master was built. |
| `nfd_worker_build_info` | Gauge | Version from which nfd-worker was built. |
| `nfd_updated_nodes` | Counter | Time taken to label a node |
| `nfd_crd_processing_time` | Gauge | Time taken to process a NodeFeatureRule CRD |
| `nfd_feature_discovery_duration_seconds` | HistogramVec | Time taken to discover features on a node |
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
We have deprecated hooks in v0.12.0 but kept it enabled by default.
Starting from v0.14 we are starting to disable it by default and
plan to fully remove it in the near future.
Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>
NFD already has the capability to discover whether baremetal / host
machines support Intel TDX. Now, the next step is to add support for
discovering whether a node is TDX protected (as in, a virtual machine
started using Intel TDX).
In order to do so, we've decided to go for a new `cpu-security.tdx`
property, called `protected` (`cpu-security.tdx.protected`).
Signed-off-by: Hairong Chen <hairong.chen@intel.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR aims to optimize the process of updating nodes with
corresponding features. In fact, previously, we were updating nodes
sequentially even though they are independent from each other.
Therefore, we integrated new components: LabelersNodePool which is
responsible for spininng a goroutine whenever there's a request for
updating nodes, and a Workqueue which is responsible for holding nodes names
that should be updated.
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
It allows NFD-master to be run in active-passive way when running
multiple instances of NFD-master to prevent multiple components
from updating same custom resources.
Signed-off-by: PiotrProkop <pprokop@nvidia.com>
Use a separate NODE_ADDRESS environment variable in the default value of
-kubelet-config-uri (instead of NODE_NAME that was previously used).
Also change the kustomize and Helm deployments to set this variable to
node IP address. This should make the default deployment more robust,
making it work in scenarios where node name does not resolve to the node
ip, e.g. nodename != hostname.
Change the configuration so that, by default, we use a dedicated
serviceaccount for topology-updater (similar to topology-gc, nfd-master
and nfd-worker).
Fix the templates so that the serviceaccount and clusterrolebinding are
only created when topology-updater is enabled (clusterrole was already
handled this way).
This patch also correctly documents the default value of rbac.create
parameter of topology-updater and topology-gc.
Make it possible to disable kubelet state tracking with
--set topologyUpdater.kubeletStateFiles="" as the documentation
suggests.
Also, fix the documentation regarding the default value of
topologyUpdater.kubeletStateFiles parameter.
Mount kubelet podresources socket on an independent path, not under
with the kubelet state directory. Otherwise container creation may fail
on mount creation if topologyUpdater.kubeletPodResourcesSockPath and/or
topologyUpdater.kubeletConfigPath Helm parameters are specified in a
certain way.
This PR adds a config option for setting the NFD API controller resync period.
The resync period is only activated when the NodeFeature API has been
enabled (with -enable-nodefeature-api).
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
Update controller-gen tool from sigs.k8s.io/controller-tools to the
latest release.
Also, bump goimports from golang.org/x/tools to the latest version.
This PR adds the combination of dynamic and builtin kernel modules into
one feature called `kernel.enabledmodule`. It's a superset of the
`kernel.loadedmodule` feature.
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
Add support for management of Extended Resources via the
NodeFeatureRule CRD API.
There are usage scenarios where users want to advertise features
as extended resources instead of labels (or annotations).
This patch enables the discovery of extended resources, via annotation
and patch of node.status.capacity and node.status.allocatable. By using
the NodeFeatureRule API.
Co-authored-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
Co-authored-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>