1
0
Fork 0
mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2024-12-15 17:50:49 +00:00
Commit graph

239 commits

Author SHA1 Message Date
Markus Lehtonen
8511980bf4 nfd-master: deprecate the -resource-labels flag
Mark the -resource-labels flag (and the corresponding resourceLabels
config option) as deprecated. We now support managing extended resources
via NodeFeatureRule objects. This kludge deserves to go, eventually.
2023-04-13 11:30:58 +03:00
Markus Lehtonen
dcbb3bc450 docs: add missing mentions of extended resources and taints
A small update to fix some missing mentions of extended resources and
taints as assets managed by NFD.
2023-04-11 20:38:21 +03:00
Kubernetes Prow Robot
ad07829d0a
Merge pull request #1099 from ArangoGutierrez/extended_resources_v2
Create extended resources with NodeFeatureRule
2023-04-07 08:09:15 -07:00
Fabiano Fidêncio
250aea4741
Create extended resources with NodeFeatureRule
Add support for management of Extended Resources via the
NodeFeatureRule CRD API.

There are usage scenarios where users want to advertise features
as extended resources instead of labels (or annotations).

This patch enables the discovery of extended resources, via annotation
and patch of node.status.capacity and node.status.allocatable. By using
the NodeFeatureRule API.

Co-authored-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
Co-authored-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-04-07 16:14:56 +02:00
Kubernetes Prow Robot
6740224a13
Merge pull request #1100 from PiotrProkop/expose-L3-num-closid
Advertise RDT L3 num_closid
2023-04-07 00:49:14 -07:00
Markus Lehtonen
cc6c20ff5f nfd-master: disallow unprefixed and kubernetes taints
Disallow taints having a key with "kubernetes.io/" or "*.kubernetes.io/"
prefix. This is a precaution to protect the user from messing up with
the "official" well-known taints from Kubernetes itself. The only
exception is that the "nfd.node.kubernetes.io/" prefix is allowed.

However, there is one allowed NFD-specific namespace (and its
sub-namespaces) i.e. "feature.node.kubernetes.io" under the
kubernetes.io domain that can be used for NFD-managed taints.

Also disallow unprefixed taint keys. We don't add a default prefix to
unprefixed taints (like we do for labels) from NodeFeatureRules. This is
to prevent unpleasant surprises to users that need to manage matching
tolerations for their workloads.
2023-04-06 16:12:37 +03:00
PiotrProkop
0e78eba40e Advertise RDT L3 num_closid
Signed-off-by: PiotrProkop <pprokop@nvidia.com>
2023-04-06 11:22:55 +02:00
Kubernetes Prow Robot
3c0c43b9be
Merge pull request #1114 from marquiz/devel/rdt-deprecate
source/cpu: deprecate cpu-rdt.* labels
2023-04-05 06:21:40 -07:00
Kubernetes Prow Robot
193c552b33
Merge pull request #1084 from AhmedGrati/feat-add-master-config-file
feat: add master config file
2023-04-04 10:41:40 -07:00
Markus Lehtonen
6cb5e99afa source/cpu: deprecate cpu-rdt.* labels
Document built-in RDT labels to be deprecated and removed in a future
release. The plan is that the default built-in RDT labels would not be
created anymore, but the RDT features would still be available for
NodeFeatureRules to consume.

The RDT labels are not very useful (they don't e.g indicate if the
features are really enabled in kernel or if the resctrlfs is mounted).
2023-04-04 11:54:57 +03:00
AhmedGrati
3fff409f6d Add master config file
Similar to the nfd-worker, in this PR we want to support the
dynamic run-time configurability through a config file for the nfd-master.

We'll use a json or yaml configuration file along with the fsnotify in
order to watch for changes in the config file. As a result, we're
allowing dynamic control of logging params, allowed namespaces,
extended resources, label whitelisting, and denied namespaces.

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-04-03 09:52:09 +01:00
Fabiano Fidêncio
10672e1bba cpu: Expose the total number of keys for TDX
The total amount of keys that can be used on a specific TDX system is
exposed via the cgroups misc.capacity. See:

```
$ cat /sys/fs/cgroup/misc.capacity
tdx 31
```

The first step to properly manage the amount of keys present in a node
is exposing it via the NFD, and that's exactly what this commit does.

An example of how it ends up being exposed via the NFD:

```
$ kubectl get node 984fee00befb.jf.intel.com -o jsonpath='{.metadata.labels}'  | jq | grep tdx.total_keys
  "feature.node.kubernetes.io/cpu-security.tdx.total_keys": "31",
```

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-03-31 09:12:26 +02:00
Carlos Eduardo Arango Gutierrez
7171cfd4eb
cpu: expose AMD SEV support
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
2023-03-30 15:19:43 +02:00
AhmedGrati
02b3b7c7e0 feat: add enableTaints to helm chart
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-03-21 10:49:24 +01:00
Talor Itzhak
5c6be580f4 reactive updates: add an option to disable the feature
Access to the kubelet state directory may raise concerns in some setups, added an option to disable it.
The feature is enabled by default.

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
2023-03-16 11:53:16 +02:00
Talor Itzhak
727de56191 documentaion: document the reactive updates feature
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
2023-03-16 11:53:12 +02:00
Talor Itzhak
8924213d14 topology-updater: make it possible to disable sleep-interval
Especially convenient for testing porpuses and
completely harmless

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
2023-03-12 12:43:17 +02:00
Sajiyah Salat
7082c31d6c
Update worker-configuration-reference.md 2023-03-08 21:33:44 +05:30
Sajiyah Salat
fb2d70a313
Update worker-configuration-reference.md 2023-03-08 21:28:45 +05:30
AhmedGrati
ff2dddd27d docs: fix usage cusomization guide typos
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-02-27 10:26:25 +01:00
Jose Luis Ojosnegros Manchón
b340d112a8 topology-updater:compute pod set fingerprint
Add an option to compute the fingerprint of the current pod set on each
node.

Report this new fingerprint using an attribute in NRT object.
2023-02-22 10:22:50 +01:00
Kubernetes Prow Robot
69440d7820
Merge pull request #1062 from yanggangtony/fix-doc
docs: describe nfd-topology-gc in introduction.md
2023-02-21 02:17:48 -08:00
Muyassarov, Feruzjon
0e2f2c4587 go.mod: bump cpuid to v2.2.4
Bump cpuid version to v2.2.4 in the go.mod so that WRMSRNS (
Non-Serializing Write to Model Specific Register) and MSRLIST
(Read/Write List of Model Specific Registers) instructions are
detectable.

Signed-off-by: Muyassarov, Feruzjon <feruzjon.muyassarov@intel.com>
2023-02-20 22:58:59 +02:00
yanggang
150d4f4db2
docs: describe nfd-topology-gc in introduction.md
Signed-off-by: yanggang <gang.yang@daocloud.io>
2023-02-18 06:12:35 +08:00
Guangwen Feng
8ad6c5b425 Fix some typos
Signed-off-by: Guangwen Feng <fenggw-fnst@fujitsu.com>
2023-02-16 22:08:00 +08:00
Kubernetes Prow Robot
a92614c292
Merge pull request #1051 from AhmedGrati/feat-add-deny-label-ns-with-wildcard
feat: add deny-label-ns flag which supports wildcard
2023-02-15 03:42:25 -08:00
AhmedGrati
b499799364 feat: add deny-label-ns flag which supports wildcard
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-02-15 09:47:00 +01:00
Kubernetes Prow Robot
e3b9184354
Merge pull request #1027 from marquiz/devel/image-full
images: base the default image on distroless/base
2023-02-10 08:07:30 -08:00
AhmedGrati
07d5ffe4b8 helm: make master port configurable
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-02-01 10:03:06 +01:00
Markus Lehtonen
cd62f6566f images: base the default image on distroless/base
Make distroless/base as the base image for the default image,
effectively making the minimal image as the default. Add a new "full"
image variant that corresponds the previous default image. The
"*-minimal" container image tag is provided for backwards compatibility.

The practical user impact of this change is that hook support is limited
to statically linked ELF binaries. Bash or Perl scripts are not
supported by the default image, anymore, but the new "full" image
variant can be used for backwards compatibility.
2023-01-31 11:30:38 +02:00
Chandan Abhyankar
d66096a491 cpu: support for detecting nx-gzip coprocessor feature
Nest accelerator gzip support for IBM Power systems.

Signed-off-by: Chandan Abhyankar <Chandan.Abhyankar@ibm.com>
2023-01-17 23:18:16 -08:00
Hiren Panchasara
bfbc47f55e docs: fix internal cross-page references by injecting .md 2023-01-16 20:53:36 -08:00
PiotrProkop
3143faf0ab Add documentation for topology garbage collector
Signed-off-by: PiotrProkop <pprokop@nvidia.com>
2023-01-11 10:15:38 +01:00
Kubernetes Prow Robot
0159ab04e7
Merge pull request #1021 from fmuyassarov/docs-taint
Docs: mention tainting in the intro section
2023-01-02 02:19:30 -08:00
Kubernetes Prow Robot
79cd4fc094
Merge pull request #1023 from fmuyassarov/sfr-support
Bump cpuid to v2.2.3
2023-01-02 01:27:31 -08:00
Muyassarov, Feruzjon
d9dc4b09d5 Bump cpuid to v2.2.3
Bump cpuid to v2.2.3 which adds support for detecting Intel Sierra
Forest instructions like AVXIFMA, AVXNECONVERT, AVXVNNIINT8 and
CMPCCXADD.
Signed-off-by: Muyassarov, Feruzjon <feruzjon.muyassarov@intel.com>
2022-12-30 11:42:05 +02:00
Muyassarov, Feruzjon
842153a907 Docs: mention tainting in the intro section
Signed-off-by: Muyassarov, Feruzjon <feruzjon.muyassarov@intel.com>
2022-12-28 14:00:04 +02:00
Markus Lehtonen
8c0e38d0c5 docs: fix typo in CRD name 2022-12-21 13:42:10 +02:00
Markus Lehtonen
b91922746a docs: mention NodeFeature as an extension point
In the CRD intro, mention that NodeFeature can be used as an integration
point for 3rd party extensions.
2022-12-21 13:26:31 +02:00
Markus Lehtonen
27c47bd088 docs: better document differences between deployment methods 2022-12-20 16:29:48 +02:00
Markus Lehtonen
3209c14bea docs: document NodeFeature API
Document the usage of the NodeFeature CRD API. Also re-organize the
documentation a bit, moving the description of NodeFeatureRule
controller from customization guide to nfd-master usage page.
2022-12-14 22:33:12 +02:00
Markus Lehtonen
9f0806593d nfd-master: rename -featurerules-controller flag to -crd-controller
Deprecate the '-featurerules-controller' command line flag as the name
does not describe the functionality anymore: in practice it controls the
CRD controller handling both NodeFeature and NodeFeatureRule objects.
The patch introduces a duplicate, more generally named, flag
'-crd-controller'. A warning is printed in the log if
'-featurerules-controller' flag is encountered.
2022-12-14 10:23:45 +02:00
Markus Lehtonen
5a717c418b docs: small reordering of master cmdline reference
Move documentation of -enable-taints near '-enable-nodefeature-api' and
'-no-publish' as they are related in that they control the enablement of
APIs.
2022-12-14 07:31:28 +02:00
Markus Lehtonen
6ddd87e465 nfd-master: support NodeFeature objects
Add initial support for handling NodeFeature objects. With this patch
nfd-master watches NodeFeature objects in all namespaces and reacts to
changes in any of these. The node which a certain NodeFeature object
affects is determined by the "nfd.node.kubernetes.io/node-name"
annotation of the object. When a NodeFeature object targeting certain
node is changed, nfd-master needs to process all other objects targeting
the same node, too, because there may be dependencies between them.

Add a new command line flag for selecting between gRPC and NodeFeature
CRD API as the source of feature requests. Enabling NodeFeature API
disables the gRPC interface.

 -enable-nodefeature-api   enable NodeFeature CRD API for incoming
                           feature requests, will disable the gRPC
                           interface (defaults to false)

It is not possible to serve gRPC and watch NodeFeature objects at the
same time. This is deliberate to avoid labeling races e.g. by nfd-worker
sending gRPC requests but NodeFeature objects in the cluster
"overriding" those changes (labels from the gRPC requests will get
overridden when NodeFeature objects are processed).
2022-12-14 07:31:28 +02:00
Markus Lehtonen
237494463b nfd-worker: support creating NodeFeatures object
Support the new NodeFeatures object of the NFD CRD api. Add two new
command line options to nfd-worker:

 -kubeconfig               specifies the kubeconfig to use for
                           connecting k8s api (defaults to empty which
                           implies in-cluster config)
 -enable-nodefeature-api   enable the NodeFeature CRD API for
                           communicating node features to nfd-master,
                           will also automatically disable gRPC
                           (defgault to false)

No config file option for selecting the API is available as there should
be no need for dynamically selecting between gRPC and CRD. The
nfd-master configuration must be changed in tandem and it is safer (and
avoid awkward configuration races) to configure the whole NFD deployment
at once.

Default behavior of nfd-worker is not changed i.e. NodeFeatures object
creation is not enabled by default (but must be enabled with the command
line flag).

The patch also updates the kustomize and Helm deployment, adding RBAC
rules for nfd-worker and updating the example worker configuration.
2022-12-14 07:31:28 +02:00
Kubernetes Prow Robot
776a8c335c
Merge pull request #980 from marquiz/devel/topology-updater
nfd-topology-updater: update NodeResourceTopology objects directly
2022-12-08 01:44:22 -08:00
Markus Lehtonen
f13ed2d91c nfd-topology-updater: update NodeResourceTopology objects directly
Drop the gRPC communication to nfd-master and connect to the Kubernetes
API server directly when updating NodeResourceTopology objects.
Topology-updater already has connection to the API server for listing
Pods so this is not that dramatic change. It also simplifies the code
a lot as there is no need for the NFD gRPC client and no need for
managing TLS certs/keys.

This change aligns nfd-topology-updater with the future direction of
nfd-worker where the gRPC API is being dropped and replaced by a
CRD-based API.

This patch also update deployment files and documentation to reflect
this change.
2022-12-08 11:03:22 +02:00
Markus Lehtonen
881ee13654 docs: remove non-existent nodeFeatureRule.createCRD parameter
This value was recently dropped.
2022-12-07 16:25:43 +02:00
Markus Lehtonen
0834ec5cbf go.mod: update to klauspost/cpuid to v2.2.2
Support detection of Intel TME (Total Memory Encryption) plus AMXFP16
and PREFETCHI.
2022-12-07 13:58:19 +02:00
Feruzjon Muyassarov
984a3de198 Document tainting feature
Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>
2022-12-02 17:29:10 +02:00