1
0
Fork 0
mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2025-03-05 16:27:05 +00:00
Commit graph

169 commits

Author SHA1 Message Date
Markus Lehtonen
1d012a28cd Option to stop implicitly adding default prefix to names
Add new autoDefaultNs (default is "true") config option to nfd-master.
Setting the config option to false stops NFD from automatically adding
the "feature.node.kubernetes.io/" prefix to labels, annotations and
extended resources. Taints are not affected as for them no prefix is
automatically added. The user-visible part of enabling the option change
is that NodeFeatureRules, local feature files, hooks and configuration
of the "custom" may need to be altereda (if the auto-prefixing is
relied on).

For now, the config option defaults to "true", meaning no change in
default behavior. However, the intent is to change the default to
"false" in a future release, deprecating the option and eventually
removing it (forcing it to "false").

The goal of stopping doing "auto-prefixing" is to simplify the operation
(of nfd and users). Make the naming more straightforward and easier to
understand and debug (kind of WYSIWYG), eliminating peculiar corner
cases:

1. Make validation simpler and unambiguous
2. Remove "overloading" of names, i.e. the mapping two values to the
   same actual name. E.g. previously something like

      labels:
        feature.node.kubernetes.io/foo: bar
        foo: baz

   Could actually result in node label:

     feature.node.kubernetes.io/foo: baz

3. Make the processing/usagee of the "rule.matched" and "local.labels"
   feature in NodeFeatureRules unambiguous and more understadable. E.g.
   previously you could have node label
   "feature.node.kubernetes.io/local-foo: bar" but in the NodeFeatureRule
   you'd need to use the unprefixed name "local-foo" or the fully
   prefixed name, depending on what was specified in the feature file (or
   hook) on the node(s).

NOTE: setting autoDefaultNs to false is a breaking change for users who
rely on automatic prefixing with the default feature.node.kubernetes.io/
namespace. NodeFeatureRules, feature files, hooks and custom rules
(configuration of the "custom" source of nfd-worker) will need to be
altered.  Unprefixed labels, annoations and extended resources will be
denied by nfd-master.
2023-11-24 12:48:20 +02:00
Carlos Eduardo Arango Gutierrez
c0063be4f4
Discover node features as annotations
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: bebc <mchf1990212@gmail.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
2023-10-25 19:58:58 +02:00
AhmedGrati
d27eb0ac6d feat: add parameters in helm to disable/enable nfd-master and nfd-worker
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-10-11 10:50:32 +01:00
Markus Lehtonen
98c3b0750d nfd-gc: add metrics
Implements three metrics for nfd-gc:

- nfd_gc_build_info: version information of nfd-gc.
- nfd_gc_objects_deleted_total: total number of NodeFeature and
  NodeResourceTopology objects deleted by nfd-gc.
- nfd_gc_object_delete_failures_total: number of errors encountered when
  deleting NodeFeature and NodeResourceTopology objects.
2023-10-09 13:39:28 +00:00
Shiva Krishna, Merla
6237b821f6 Fix serviceaccount handling for nfd-gc to be consistent with others
Signed-off-by: Shiva Krishna, Merla <smerla@nvidia.com>
2023-10-04 15:17:32 -07:00
Carlos Eduardo Arango Gutierrez
dd8d7f6725
Helm - service to be only deployed when needed (#1389)
* Helm - service to be only deployed when needed

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>

* Update deployment/helm/node-feature-discovery/templates/service.yaml

Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>

---------

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
2023-10-04 16:00:43 +02:00
Carlos Eduardo Arango Gutierrez
3543aa22ce
Helm - Move remaining gPRC related flags to conditional
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-10-03 07:32:52 +02:00
Muyassarov, Feruzjon
06036a62ce Replace gRPC health probe utility with k8s built-in health probe
Kubernetes 1.23 has introduced native health probes for gRPC which
can replace grpc_health_probe utility. This commit removes baking
in grpc_health_probe binary into the image and updates related
health checks to use k8s native gRPC.

Signed-off-by: Muyassarov, Feruzjon <feruzjon.muyassarov@intel.com>
2023-09-20 12:25:36 +03:00
Kubernetes Prow Robot
8cdedf92fd
Merge pull request #1365 from marquiz/devel/helm-fix-nf-api
deployment/helm: fix handling of enableNodeFeatureApi parameter
2023-09-19 05:33:08 -07:00
Markus Lehtonen
8b207cae1f deployment/helm: fix handling of enableNodeFeatureApi parameter 2023-09-19 14:18:03 +03:00
Markus Lehtonen
759143ea3c deployment/helm: fix namespace of nfd-worker role and rolebinding
Put nfd-worker role and rolebinding in the correct namespace if
namespaceOverride parameter is used.
2023-09-19 13:53:19 +03:00
Kubernetes Prow Robot
2e6a202218
Merge pull request #1331 from andrewjamesbrown/ajb/chart_annotations
Helm: conditionally add annotations if defined
2023-09-07 01:20:59 -07:00
AhmedGrati
a5624cc8ca chore: update config file in helm deployment
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-09-06 16:05:02 +01:00
Andrew Brown
b2f292bc7a
Add annotations to all deployments+daemonsets 2023-09-06 09:39:01 -04:00
Andrew Brown
1bb5b87d4a
Conditionally add annotations if defined 2023-09-05 19:37:56 -04:00
Carlos Eduardo Arango Gutierrez
04e954a7c3
Enable NodeFeature API by default
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
2023-09-05 20:21:31 +02:00
Markus Lehtonen
ceb672bde0 deployment/helm: support nfd-gc
Rename files and parameters. Drop the container security context
parameters from the Helm chart. There should be no reason to run the
nfd-gc with other than the minimal privileges.

Also updates the documentation.
2023-08-23 10:56:12 +03:00
Markus Lehtonen
6cf29bd8ef deployment/kustomize: support nfd-gc
Rename the old "topology-gc" to just "gc". Simplify the setup a bit by
including the RBAC rules in the "gc" base.

Note: we don't enable nfd-gc in the default overlay, yet, as the
NodeFeature API isn't enabled (gc is not needed).
2023-08-23 10:56:12 +03:00
Markus Lehtonen
01c08d67b6 Rename nfd-topology-gc to nfd-gc
This is preparation for making it a generic garbage collector for all
nfd-managed api objects.
2023-08-21 21:46:11 +03:00
Markus Lehtonen
06b333db1e nfd-topology-updater: add metrics support
For now, add only one metric, a counter for the errors occurring while
scanning pod resources on the node.
2023-08-04 16:48:37 +03:00
Markus Lehtonen
7e375ad1f0 generate: bump tools to their latest versions
Bump tools versions and re-auto-generate files.
2023-07-27 14:29:48 +03:00
Pat Riehecky
0523257d1a Add optional labels to the podmonitor
Signed-off-by: Pat Riehecky <riehecky@fnal.gov>
2023-07-21 10:03:50 -05:00
Carlos Eduardo Arango Gutierrez
e3aedd33e2
Enable metrics via prometheus operator
Expose metrics via prometheus.monitoring.coreos.com/v1

The exposed metrics are

| Metric        | Type | Meaning |
| --------------- | ---------------- | ---------------- |
|  `nfd_master_build_info`           | Gauge | Version from which nfd-master was built. |
|  `nfd_worker_build_info`           | Gauge | Version from which nfd-worker was built. |
|  `nfd_updated_nodes`           |  Counter | Time taken to label a node |
|  `nfd_crd_processing_time`          |  Gauge | Time taken to process a NodeFeatureRule CRD |
| `nfd_feature_discovery_duration_seconds` |  HistogramVec | Time taken to discover features on a node |

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
2023-07-21 10:59:52 +02:00
adrianc
904f3739a3
fix typo in helm chart
else statement of crd-controller should
also refer to crd-controller flag.

Signed-off-by: adrianc <adrianc@nvidia.com>
2023-07-02 18:01:31 +03:00
Kubernetes Prow Robot
407a610e0c
Merge pull request #1182 from fmuyassarov/disable-hooks-by-default
hooks: disable hooks by default from v0.14
2023-06-22 04:43:40 -07:00
Dipankar Das
ebac4a25e7
Removal of the bases field as it is deprecated by kustomize
Signed-off-by: Dipankar Das <dipankardas0115@gmail.com>
2023-06-09 12:49:24 +05:30
Muyassarov, Feruzjon
19527be924
hooks: disable hooks by default
We have deprecated hooks in v0.12.0 but kept it enabled by default.
Starting from v0.14 we are starting to disable it by default and
plan to fully remove it in the near future.

Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>
2023-06-07 13:04:23 +03:00
Markus Lehtonen
457fc8483b deployment/kustomize: use a named port for nfd gRPC service 2023-06-06 21:00:42 +03:00
Hairong Chen
e8a00ba7da cpu: Discover TDX guests based on cpuid information
NFD already has the capability to discover whether baremetal / host
machines support Intel TDX.  Now, the next step is to add support for
discovering whether a node is TDX protected (as in, a virtual machine
started using Intel TDX).

In order to do so, we've decided to go for a new `cpu-security.tdx`
property, called `protected` (`cpu-security.tdx.protected`).

Signed-off-by: Hairong Chen <hairong.chen@intel.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-06-05 11:06:28 +02:00
AhmedGrati
b3cfe17392 feat: parallelize nodes update
This PR aims to optimize the process of updating nodes with
corresponding features. In fact, previously, we were updating nodes
sequentially even though they are independent from each other.
Therefore, we integrated new components: LabelersNodePool which is
responsible for spininng a goroutine whenever there's a request for
updating nodes, and a Workqueue which is responsible for holding nodes names
that should be updated.

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-06-02 11:41:50 +01:00
Kubernetes Prow Robot
70d5ef477f
Merge pull request #1219 from PiotrProkop/leader-elect
Add leader election for nfd-master
2023-05-22 00:36:21 -07:00
PiotrProkop
272fd4784f Add new flag enable-leader-election for nfd-master.
It allows NFD-master to be run in active-passive way when running
multiple instances of NFD-master to prevent multiple components
from updating same custom resources.

Signed-off-by: PiotrProkop <pprokop@nvidia.com>
2023-05-15 13:30:07 +02:00
Markus Lehtonen
1200fd05c5 topology-updater: use node IP in the default configz URI
Use a separate NODE_ADDRESS environment variable in the default value of
-kubelet-config-uri (instead of NODE_NAME that was previously used).
Also change the kustomize and Helm deployments to set this variable to
node IP address. This should make the default deployment more robust,
making it work in scenarios where node name does not resolve to the node
ip, e.g. nodename != hostname.
2023-05-05 13:29:51 +03:00
Kubernetes Prow Robot
cd45baef8d
Merge pull request #1211 from marquiz/devel/helm
deployment/helm: improve handling of topologyUpdater.kubeletStateFiles
2023-05-05 00:17:13 -07:00
Kubernetes Prow Robot
68370f861c
Merge pull request #1213 from marquiz/devel/helm-3
deployment/helm: user dedicated serviceaccount for topology-updater
2023-05-05 00:09:20 -07:00
Markus Lehtonen
526aab87cf deployment/helm: user dedicated serviceaccount for topology-updater
Change the configuration so that, by default, we use a dedicated
serviceaccount for topology-updater (similar to topology-gc, nfd-master
and nfd-worker).

Fix the templates so that the serviceaccount and clusterrolebinding are
only created when topology-updater is enabled (clusterrole was already
handled this way).

This patch also correctly documents the default value of rbac.create
parameter of topology-updater and topology-gc.
2023-05-05 08:30:21 +03:00
Markus Lehtonen
9c2f268fd2 deployment/helm: improve handling of topologyUpdater.kubeletStateFiles
Make it possible to disable kubelet state tracking with
--set topologyUpdater.kubeletStateFiles="" as the documentation
suggests.

Also, fix the documentation regarding the default value of
topologyUpdater.kubeletStateFiles parameter.
2023-05-04 15:01:19 +03:00
Markus Lehtonen
5891df6917 deployment/helm: avoid overlapping mount paths on topology-updater
Mount kubelet podresources socket on an independent path, not under
with the kubelet state directory. Otherwise container creation may fail
on mount creation if topologyUpdater.kubeletPodResourcesSockPath and/or
topologyUpdater.kubeletConfigPath Helm parameters are specified in a
certain way.
2023-05-04 14:17:08 +03:00
Kubernetes Prow Robot
11db6bd37d
Merge pull request #1208 from marquiz/devel/kubelet-mounts
deployment/kustomize: drop pod-resources mount for topology-updater
2023-05-04 02:02:42 -07:00
Markus Lehtonen
efabbe04ae deployment/helm: fix default for kubeletStateDir parameter
This parameter is a path in the host system, not a mount path inside the
container.
2023-05-04 11:48:18 +03:00
Markus Lehtonen
c8a722b7c3 deployment/kustomize: drop pod-resources mount for topology-updater
This mount is redundant as it's already included in the kubelet state
files (/var/lib/kubelet) mount.
2023-05-04 11:06:55 +03:00
Markus Lehtonen
b016def8a3 helm: fix mount for nfd-master config
Volume/mount setup for the ConfigMap was erroneously inside conditionals
so it was not mounted unless TLS was enabled.
2023-05-02 10:06:21 +03:00
Kubernetes Prow Robot
2356223ffc
Merge pull request #1139 from AhmedGrati/feat-configure-master-resync
feat: add master resync period configurability
2023-04-24 03:49:02 -07:00
AhmedGrati
7917434d38 feat: add master resync period configurability
This PR adds a config option for setting the NFD API controller resync period.
The resync period is only activated when the NodeFeature API has been
enabled (with -enable-nodefeature-api).

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-04-24 11:52:38 +02:00
Markus Lehtonen
f4de7ed8ee deployment/kustomize: add master config to prune overlay
Otherwise pods error out with failed mount of nfd-master-conf ConfigMap.
2023-04-20 20:38:36 +03:00
Markus Lehtonen
a5ec646c48 generate: update controller-gen to v0.11.3
Update controller-gen tool from sigs.k8s.io/controller-tools to the
latest release.

Also, bump goimports from golang.org/x/tools to the latest version.
2023-04-19 12:48:12 +03:00
Kubernetes Prow Robot
8d71ed6755
Merge pull request #1086 from AhmedGrati/feat-support-builtin-kernel-mods
feat: support builtin kernel mods
2023-04-13 10:30:40 -07:00
AhmedGrati
109caa1f28 feat: support builtin kernel mods
This PR adds the combination of dynamic and builtin kernel modules into
one feature called `kernel.enabledmodule`. It's a superset of the
`kernel.loadedmodule` feature.

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-04-13 10:19:24 +01:00
Fabiano Fidêncio
250aea4741
Create extended resources with NodeFeatureRule
Add support for management of Extended Resources via the
NodeFeatureRule CRD API.

There are usage scenarios where users want to advertise features
as extended resources instead of labels (or annotations).

This patch enables the discovery of extended resources, via annotation
and patch of node.status.capacity and node.status.allocatable. By using
the NodeFeatureRule API.

Co-authored-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
Co-authored-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-04-07 16:14:56 +02:00
Kubernetes Prow Robot
193c552b33
Merge pull request #1084 from AhmedGrati/feat-add-master-config-file
feat: add master config file
2023-04-04 10:41:40 -07:00