1
0
Fork 0
mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2025-03-06 16:57:10 +00:00
Commit graph

242 commits

Author SHA1 Message Date
Markus Lehtonen
457fc8483b deployment/kustomize: use a named port for nfd gRPC service 2023-06-06 21:00:42 +03:00
Hairong Chen
e8a00ba7da cpu: Discover TDX guests based on cpuid information
NFD already has the capability to discover whether baremetal / host
machines support Intel TDX.  Now, the next step is to add support for
discovering whether a node is TDX protected (as in, a virtual machine
started using Intel TDX).

In order to do so, we've decided to go for a new `cpu-security.tdx`
property, called `protected` (`cpu-security.tdx.protected`).

Signed-off-by: Hairong Chen <hairong.chen@intel.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-06-05 11:06:28 +02:00
AhmedGrati
b3cfe17392 feat: parallelize nodes update
This PR aims to optimize the process of updating nodes with
corresponding features. In fact, previously, we were updating nodes
sequentially even though they are independent from each other.
Therefore, we integrated new components: LabelersNodePool which is
responsible for spininng a goroutine whenever there's a request for
updating nodes, and a Workqueue which is responsible for holding nodes names
that should be updated.

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-06-02 11:41:50 +01:00
Kubernetes Prow Robot
70d5ef477f
Merge pull request #1219 from PiotrProkop/leader-elect
Add leader election for nfd-master
2023-05-22 00:36:21 -07:00
PiotrProkop
272fd4784f Add new flag enable-leader-election for nfd-master.
It allows NFD-master to be run in active-passive way when running
multiple instances of NFD-master to prevent multiple components
from updating same custom resources.

Signed-off-by: PiotrProkop <pprokop@nvidia.com>
2023-05-15 13:30:07 +02:00
Markus Lehtonen
1200fd05c5 topology-updater: use node IP in the default configz URI
Use a separate NODE_ADDRESS environment variable in the default value of
-kubelet-config-uri (instead of NODE_NAME that was previously used).
Also change the kustomize and Helm deployments to set this variable to
node IP address. This should make the default deployment more robust,
making it work in scenarios where node name does not resolve to the node
ip, e.g. nodename != hostname.
2023-05-05 13:29:51 +03:00
Kubernetes Prow Robot
cd45baef8d
Merge pull request #1211 from marquiz/devel/helm
deployment/helm: improve handling of topologyUpdater.kubeletStateFiles
2023-05-05 00:17:13 -07:00
Kubernetes Prow Robot
68370f861c
Merge pull request #1213 from marquiz/devel/helm-3
deployment/helm: user dedicated serviceaccount for topology-updater
2023-05-05 00:09:20 -07:00
Markus Lehtonen
526aab87cf deployment/helm: user dedicated serviceaccount for topology-updater
Change the configuration so that, by default, we use a dedicated
serviceaccount for topology-updater (similar to topology-gc, nfd-master
and nfd-worker).

Fix the templates so that the serviceaccount and clusterrolebinding are
only created when topology-updater is enabled (clusterrole was already
handled this way).

This patch also correctly documents the default value of rbac.create
parameter of topology-updater and topology-gc.
2023-05-05 08:30:21 +03:00
Markus Lehtonen
9c2f268fd2 deployment/helm: improve handling of topologyUpdater.kubeletStateFiles
Make it possible to disable kubelet state tracking with
--set topologyUpdater.kubeletStateFiles="" as the documentation
suggests.

Also, fix the documentation regarding the default value of
topologyUpdater.kubeletStateFiles parameter.
2023-05-04 15:01:19 +03:00
Markus Lehtonen
5891df6917 deployment/helm: avoid overlapping mount paths on topology-updater
Mount kubelet podresources socket on an independent path, not under
with the kubelet state directory. Otherwise container creation may fail
on mount creation if topologyUpdater.kubeletPodResourcesSockPath and/or
topologyUpdater.kubeletConfigPath Helm parameters are specified in a
certain way.
2023-05-04 14:17:08 +03:00
Kubernetes Prow Robot
11db6bd37d
Merge pull request #1208 from marquiz/devel/kubelet-mounts
deployment/kustomize: drop pod-resources mount for topology-updater
2023-05-04 02:02:42 -07:00
Markus Lehtonen
efabbe04ae deployment/helm: fix default for kubeletStateDir parameter
This parameter is a path in the host system, not a mount path inside the
container.
2023-05-04 11:48:18 +03:00
Markus Lehtonen
c8a722b7c3 deployment/kustomize: drop pod-resources mount for topology-updater
This mount is redundant as it's already included in the kubelet state
files (/var/lib/kubelet) mount.
2023-05-04 11:06:55 +03:00
Markus Lehtonen
b016def8a3 helm: fix mount for nfd-master config
Volume/mount setup for the ConfigMap was erroneously inside conditionals
so it was not mounted unless TLS was enabled.
2023-05-02 10:06:21 +03:00
Kubernetes Prow Robot
2356223ffc
Merge pull request #1139 from AhmedGrati/feat-configure-master-resync
feat: add master resync period configurability
2023-04-24 03:49:02 -07:00
AhmedGrati
7917434d38 feat: add master resync period configurability
This PR adds a config option for setting the NFD API controller resync period.
The resync period is only activated when the NodeFeature API has been
enabled (with -enable-nodefeature-api).

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-04-24 11:52:38 +02:00
Markus Lehtonen
f4de7ed8ee deployment/kustomize: add master config to prune overlay
Otherwise pods error out with failed mount of nfd-master-conf ConfigMap.
2023-04-20 20:38:36 +03:00
Markus Lehtonen
a5ec646c48 generate: update controller-gen to v0.11.3
Update controller-gen tool from sigs.k8s.io/controller-tools to the
latest release.

Also, bump goimports from golang.org/x/tools to the latest version.
2023-04-19 12:48:12 +03:00
Kubernetes Prow Robot
8d71ed6755
Merge pull request #1086 from AhmedGrati/feat-support-builtin-kernel-mods
feat: support builtin kernel mods
2023-04-13 10:30:40 -07:00
AhmedGrati
109caa1f28 feat: support builtin kernel mods
This PR adds the combination of dynamic and builtin kernel modules into
one feature called `kernel.enabledmodule`. It's a superset of the
`kernel.loadedmodule` feature.

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-04-13 10:19:24 +01:00
Fabiano Fidêncio
250aea4741
Create extended resources with NodeFeatureRule
Add support for management of Extended Resources via the
NodeFeatureRule CRD API.

There are usage scenarios where users want to advertise features
as extended resources instead of labels (or annotations).

This patch enables the discovery of extended resources, via annotation
and patch of node.status.capacity and node.status.allocatable. By using
the NodeFeatureRule API.

Co-authored-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
Co-authored-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-04-07 16:14:56 +02:00
Kubernetes Prow Robot
193c552b33
Merge pull request #1084 from AhmedGrati/feat-add-master-config-file
feat: add master config file
2023-04-04 10:41:40 -07:00
AhmedGrati
3fff409f6d Add master config file
Similar to the nfd-worker, in this PR we want to support the
dynamic run-time configurability through a config file for the nfd-master.

We'll use a json or yaml configuration file along with the fsnotify in
order to watch for changes in the config file. As a result, we're
allowing dynamic control of logging params, allowed namespaces,
extended resources, label whitelisting, and denied namespaces.

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-04-03 09:52:09 +01:00
AhmedGrati
02b3b7c7e0 feat: add enableTaints to helm chart
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-03-21 10:49:24 +01:00
Kubernetes Prow Robot
13f92faa77
Merge pull request #1031 from k8stopologyawareschedwg/reactive_updates
topology-updater: reactive updates
2023-03-17 10:13:17 -07:00
Talor Itzhak
91daff3b59 deployment/helm: update helm charts
Adding kubelet state directory mount

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
2023-03-16 11:51:45 +02:00
Carlos Eduardo Arango Gutierrez
355807f98c
kustomize: trim prune overlay
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-03-15 20:36:45 +01:00
Talor Itzhak
8afd819132 deployment/topology-updater: add mount for kubelet state dir
This mount is needed for watching the state files

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
2023-03-12 12:43:13 +02:00
Kubernetes Prow Robot
37504109d6
Merge pull request #1080 from marquiz/devel/deploy-topology-updater
deployment: fixes for mounting kubelet config
2023-03-09 09:50:02 -08:00
Markus Lehtonen
ed8a87b131 helm: fix handling of topologyUpdater.kubeletConfigPath
By default we use the configz API endpoint so no mounts are needed.
2023-03-09 17:49:31 +02:00
Markus Lehtonen
33a1e3d114 kustomize: drop mount for kubelet config in topology-updater
We use the configz endpoint nowadays.
2023-03-09 17:48:56 +02:00
Markus Lehtonen
40644aab60 helm: create topology-updater RBAC rules by default
Create RBAC rules if topology-updater is enabled. Previously installing
with topologyUpdater.enable=true (without
topologyUpdater.rbac.create=true) resulted in a crashloogbackoff as RBAC
was missing.
2023-03-09 16:16:09 +02:00
Markus Lehtonen
40d7139257 helm: fix topology-updater rbac clusterrole
Access to nodes/proxy resource was accidentally given to nfd-master
(which really doesn't need it), not topology-updater.
2023-03-09 16:15:03 +02:00
Jose Luis Ojosnegros Manchón
b340d112a8 topology-updater:compute pod set fingerprint
Add an option to compute the fingerprint of the current pod set on each
node.

Report this new fingerprint using an attribute in NRT object.
2023-02-22 10:22:50 +01:00
Kubernetes Prow Robot
a92614c292
Merge pull request #1051 from AhmedGrati/feat-add-deny-label-ns-with-wildcard
feat: add deny-label-ns flag which supports wildcard
2023-02-15 03:42:25 -08:00
AhmedGrati
b499799364 feat: add deny-label-ns flag which supports wildcard
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-02-15 09:47:00 +01:00
Jose Luis Ojosnegros Manchón
d1d1eda0d2 nrt-api: Update to v0.1.0 to use v1alpha2 2023-02-09 12:03:18 +01:00
Kubernetes Prow Robot
94ab0ddd3d
Merge pull request #1045 from AhmedGrati/feat-disable-service-links-nfd-master
deployment: disable service links in NFD master pod
2023-02-06 08:55:01 -08:00
AhmedGrati
07d5ffe4b8 helm: make master port configurable
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-02-01 10:03:06 +01:00
AhmedGrati
743c877ad8 deployment: disable service links in NFD master pod
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-01-27 16:55:18 +01:00
Carlos Eduardo Arango Gutierrez
1c095f5e8e
docs: Fix link for Helm docs 2023-01-17 15:23:30 +01:00
PiotrProkop
59afae50ba Add NodeResourceTopology garbage collector
NodeResourceTopology(aka NRT) custom resource is used to enable NUMA aware Scheduling in Kubernetes.
As of now node-feature-discovery daemons are used to advertise those
resources but there is no service responsible for removing obsolete
objects(without corresponding Kubernetes node).

This patch adds new daemon called nfd-topology-gc which removes old
NRTs.

Signed-off-by: PiotrProkop <pprokop@nvidia.com>
2023-01-11 10:15:21 +01:00
Markus Lehtonen
dfda9bccad apis/nfd: update auto-generated code 2022-12-22 17:58:20 +02:00
Markus Lehtonen
59a2757115 Use single-dash format for nfd cmdline flags
Use the "single-dash" version of nfd command line flags in deployment
files and e2e-tests. No impact in functionality, just aligns with
documentation and other parts of the codebase.
2022-12-21 15:00:49 +02:00
Markus Lehtonen
9f0806593d nfd-master: rename -featurerules-controller flag to -crd-controller
Deprecate the '-featurerules-controller' command line flag as the name
does not describe the functionality anymore: in practice it controls the
CRD controller handling both NodeFeature and NodeFeatureRule objects.
The patch introduces a duplicate, more generally named, flag
'-crd-controller'. A warning is printed in the log if
'-featurerules-controller' flag is encountered.
2022-12-14 10:23:45 +02:00
Markus Lehtonen
6ddd87e465 nfd-master: support NodeFeature objects
Add initial support for handling NodeFeature objects. With this patch
nfd-master watches NodeFeature objects in all namespaces and reacts to
changes in any of these. The node which a certain NodeFeature object
affects is determined by the "nfd.node.kubernetes.io/node-name"
annotation of the object. When a NodeFeature object targeting certain
node is changed, nfd-master needs to process all other objects targeting
the same node, too, because there may be dependencies between them.

Add a new command line flag for selecting between gRPC and NodeFeature
CRD API as the source of feature requests. Enabling NodeFeature API
disables the gRPC interface.

 -enable-nodefeature-api   enable NodeFeature CRD API for incoming
                           feature requests, will disable the gRPC
                           interface (defaults to false)

It is not possible to serve gRPC and watch NodeFeature objects at the
same time. This is deliberate to avoid labeling races e.g. by nfd-worker
sending gRPC requests but NodeFeature objects in the cluster
"overriding" those changes (labels from the gRPC requests will get
overridden when NodeFeature objects are processed).
2022-12-14 07:31:28 +02:00
Markus Lehtonen
237494463b nfd-worker: support creating NodeFeatures object
Support the new NodeFeatures object of the NFD CRD api. Add two new
command line options to nfd-worker:

 -kubeconfig               specifies the kubeconfig to use for
                           connecting k8s api (defaults to empty which
                           implies in-cluster config)
 -enable-nodefeature-api   enable the NodeFeature CRD API for
                           communicating node features to nfd-master,
                           will also automatically disable gRPC
                           (defgault to false)

No config file option for selecting the API is available as there should
be no need for dynamically selecting between gRPC and CRD. The
nfd-master configuration must be changed in tandem and it is safer (and
avoid awkward configuration races) to configure the whole NFD deployment
at once.

Default behavior of nfd-worker is not changed i.e. NodeFeatures object
creation is not enabled by default (but must be enabled with the command
line flag).

The patch also updates the kustomize and Helm deployment, adding RBAC
rules for nfd-worker and updating the example worker configuration.
2022-12-14 07:31:28 +02:00
Markus Lehtonen
d1c91e129a apis/nfd: update auto-generated code 2022-12-14 07:31:28 +02:00
Markus Lehtonen
59ebff46c9 apis/nfd: add CRD for communicating node features
Add a new NodeFeature CRD to the nfd Kubernetes API to communicate node
features over K8s api objects instead of gRPC. The new resource is
namespaced which will help the management of multiple NodeFeature
objects per node. This aims at enabling 3rd party detectors for custom
features.

In addition to communicating raw features the NodeFeature object also
has a field for directly requesting labels that should be applied on the
node object.

Rename the crd deployment file to nfd-api-crds.yaml so that it matches
the new content of the file. Also, rename the Helm subdir for CRDs to
match the expected chart directory structure.
2022-12-14 07:31:28 +02:00