1
0
Fork 0
mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2024-12-15 17:50:49 +00:00
Commit graph

119 commits

Author SHA1 Message Date
Carlos Eduardo Arango Gutierrez
150c394374
Make mdlint v0.13 happy
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-10-25 21:21:11 +02:00
Carlos Eduardo Arango Gutierrez
c0063be4f4
Discover node features as annotations
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: bebc <mchf1990212@gmail.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
2023-10-25 19:58:58 +02:00
Markus Lehtonen
7d1df87305 source/custom: drop support for the legacy rule format 2023-10-05 16:15:37 +03:00
AhmedGrati
3130898d58 feat: support raw features
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-10-04 22:37:42 +01:00
AhmedGrati
6c895b496a feat: ignore hidden feature files
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-09-11 15:11:31 +01:00
Markus Lehtonen
c126764d7a cpu: drop the deprecated sgx and se labels
Drop the deprecated cpu-sgx.enabled and cpu-se.enabled labels and the
corresponding "raw" features. These have been replaced by
cpu-security.sgx.enabled and cpu-security.se.enabled.
2023-09-08 14:28:04 +03:00
Carlos Eduardo Arango Gutierrez
9966d2ae12
Deprecate gRPC API
Now that the NodeFeature API has been set enabled by default, the gRPC
mode will be deprecated and with it all flags and features around it.

For nfd-master, flags
-port, -key-file, -ca-file, -cert-file, -verify-node-name, -enable-nodefeature-api
are now marked as deprecated.

For nfd-worker flags
-enable-nodefeature-api, -ca-file, -cert-file, -key-file, -server, -server-name-override
are now marked as deprecated.

Deprecated flags, as well as gRPC related code will be removed in future
releases.

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
2023-09-07 06:48:15 +02:00
AhmedGrati
124dfbf6df docs: add notes in the customization guide about the feature file size limit
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-09-06 11:15:45 +01:00
Kubernetes Prow Robot
50dd128b23
Merge pull request #1329 from ArangoGutierrez/1187
Enable NodeFeature API by default
2023-09-05 11:56:51 -07:00
Carlos Eduardo Arango Gutierrez
04e954a7c3
Enable NodeFeature API by default
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
2023-09-05 20:21:31 +02:00
Kubernetes Prow Robot
0b218a1eca
Merge pull request #1285 from AhmedGrati/feat-add-expiry-date-feature-files
Feat: add expiry date for feature files
2023-09-05 02:19:50 -07:00
Markus Lehtonen
cbd2c2f3df docs: demote hooks in the customization guide
Hooks are deprecated so describe feature files first.
2023-09-04 16:06:51 +03:00
Francesco Romani
727875f240 docs: nfd-updater: clarify accounting
Clarify that we account, and we can account, only
resources exclusively allocated to Guaranteed QoS pods.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-09-04 08:51:14 +02:00
AhmedGrati
47aec15ea1 test: add unit tests for the expiration date function
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-09-01 20:04:24 +01:00
Markus Lehtonen
a15b5690b6 docs: update to cover nfd-gc 2023-08-23 10:56:12 +03:00
AhmedGrati
f0edc6532a docs: add the support of the exipration date in the input format of the feature files
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-08-05 20:39:09 +01:00
Markus Lehtonen
4aa7a8f8f8 source/local: support comments in input
Lines starting with '#' are treated as comments and ignored when parsing
feature files and hook output.
2023-08-04 16:46:22 +03:00
Markus Lehtonen
0a8b514d67 docs: unify formatting of NOTEs 2023-08-03 15:36:56 +03:00
Kubernetes Prow Robot
463a737b82
Merge pull request #1277 from marquiz/docs/k8s-compat
docs: describe supported Kubernetes versions
2023-07-25 08:54:06 -07:00
Markus Lehtonen
b1328b3166 docs: describe supported Kubernetes versions 2023-07-25 17:40:06 +03:00
Markus Lehtonen
312ef308d1 docs: remove useless TOCs
Drop table of contents from short pages where it is only cluttering the
page.
2023-07-21 16:35:12 +03:00
Kubernetes Prow Robot
407a610e0c
Merge pull request #1182 from fmuyassarov/disable-hooks-by-default
hooks: disable hooks by default from v0.14
2023-06-22 04:43:40 -07:00
Carlos Eduardo Arango Gutierrez
563cc862de
Docs: Fix typo on customization-guide
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-06-09 10:23:33 +02:00
Muyassarov, Feruzjon
19527be924
hooks: disable hooks by default
We have deprecated hooks in v0.12.0 but kept it enabled by default.
Starting from v0.14 we are starting to disable it by default and
plan to fully remove it in the near future.

Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>
2023-06-07 13:04:23 +03:00
Hairong Chen
e8a00ba7da cpu: Discover TDX guests based on cpuid information
NFD already has the capability to discover whether baremetal / host
machines support Intel TDX.  Now, the next step is to add support for
discovering whether a node is TDX protected (as in, a virtual machine
started using Intel TDX).

In order to do so, we've decided to go for a new `cpu-security.tdx`
property, called `protected` (`cpu-security.tdx.protected`).

Signed-off-by: Hairong Chen <hairong.chen@intel.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-06-05 11:06:28 +02:00
AhmedGrati
08b9c3486e feat: support dynamic values for labels in the NodeFeatureRule
This PR aims to support the dynamic values for labels in the
NodeFeatureRule CRD, it would offer more flexible labeling for users.
To achieve this, we check whether label value starts with "@", and if
it's the case, we will get the value of the feature value, and update
the value of the label with the feature value.

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-05-31 23:30:26 +01:00
PiotrProkop
272fd4784f Add new flag enable-leader-election for nfd-master.
It allows NFD-master to be run in active-passive way when running
multiple instances of NFD-master to prevent multiple components
from updating same custom resources.

Signed-off-by: PiotrProkop <pprokop@nvidia.com>
2023-05-15 13:30:07 +02:00
Markus Lehtonen
9685d292a2 docs: add missing .md suffix to internal references
Commit bfbc47f55e added a lot of those and
this patch tries to cover all that we missed there. Having .md suffixes
in references to internal files makes it convenient to browse the
document locally, just as text files as the references work correctly.
2023-04-25 15:28:07 +03:00
Carlos Eduardo Arango Gutierrez
05ef5d4e9d
cpu: expose the total number of AMD SEV ASID and ES
This patch add SEV ASIDs and the related (but distinct) SEV Encrypted State
(SEV-ES) IDs as two quantities to be exposed via extended resources.
In a kernel built with CONFIG_CGROUP_MISC on a suitably equipped AMD CPU, the
root control group will have a misc.capacity file that shows the number of
available IDs in each category.

The added extended resources are:
- sev.asids
- sev.encrypted_state_ids

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-04-17 19:34:39 +02:00
Mikko Ylinen
de1b69a8bf cpu: make SGX EPC resource available to NodeFeatureRules
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2023-04-14 15:31:54 +03:00
Markus Lehtonen
3320c74472 source/cpu: don't create cpu-security.tdx.total_keys label
Just have that as a feature for NodeFeatureRules to consume.
2023-04-14 13:33:13 +03:00
Kubernetes Prow Robot
8d71ed6755
Merge pull request #1086 from AhmedGrati/feat-support-builtin-kernel-mods
feat: support builtin kernel mods
2023-04-13 10:30:40 -07:00
AhmedGrati
109caa1f28 feat: support builtin kernel mods
This PR adds the combination of dynamic and builtin kernel modules into
one feature called `kernel.enabledmodule`. It's a superset of the
`kernel.loadedmodule` feature.

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-04-13 10:19:24 +01:00
Kubernetes Prow Robot
ad07829d0a
Merge pull request #1099 from ArangoGutierrez/extended_resources_v2
Create extended resources with NodeFeatureRule
2023-04-07 08:09:15 -07:00
Fabiano Fidêncio
250aea4741
Create extended resources with NodeFeatureRule
Add support for management of Extended Resources via the
NodeFeatureRule CRD API.

There are usage scenarios where users want to advertise features
as extended resources instead of labels (or annotations).

This patch enables the discovery of extended resources, via annotation
and patch of node.status.capacity and node.status.allocatable. By using
the NodeFeatureRule API.

Co-authored-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
Co-authored-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-04-07 16:14:56 +02:00
Kubernetes Prow Robot
6740224a13
Merge pull request #1100 from PiotrProkop/expose-L3-num-closid
Advertise RDT L3 num_closid
2023-04-07 00:49:14 -07:00
Markus Lehtonen
cc6c20ff5f nfd-master: disallow unprefixed and kubernetes taints
Disallow taints having a key with "kubernetes.io/" or "*.kubernetes.io/"
prefix. This is a precaution to protect the user from messing up with
the "official" well-known taints from Kubernetes itself. The only
exception is that the "nfd.node.kubernetes.io/" prefix is allowed.

However, there is one allowed NFD-specific namespace (and its
sub-namespaces) i.e. "feature.node.kubernetes.io" under the
kubernetes.io domain that can be used for NFD-managed taints.

Also disallow unprefixed taint keys. We don't add a default prefix to
unprefixed taints (like we do for labels) from NodeFeatureRules. This is
to prevent unpleasant surprises to users that need to manage matching
tolerations for their workloads.
2023-04-06 16:12:37 +03:00
PiotrProkop
0e78eba40e Advertise RDT L3 num_closid
Signed-off-by: PiotrProkop <pprokop@nvidia.com>
2023-04-06 11:22:55 +02:00
Kubernetes Prow Robot
3c0c43b9be
Merge pull request #1114 from marquiz/devel/rdt-deprecate
source/cpu: deprecate cpu-rdt.* labels
2023-04-05 06:21:40 -07:00
Kubernetes Prow Robot
193c552b33
Merge pull request #1084 from AhmedGrati/feat-add-master-config-file
feat: add master config file
2023-04-04 10:41:40 -07:00
Markus Lehtonen
6cb5e99afa source/cpu: deprecate cpu-rdt.* labels
Document built-in RDT labels to be deprecated and removed in a future
release. The plan is that the default built-in RDT labels would not be
created anymore, but the RDT features would still be available for
NodeFeatureRules to consume.

The RDT labels are not very useful (they don't e.g indicate if the
features are really enabled in kernel or if the resctrlfs is mounted).
2023-04-04 11:54:57 +03:00
AhmedGrati
3fff409f6d Add master config file
Similar to the nfd-worker, in this PR we want to support the
dynamic run-time configurability through a config file for the nfd-master.

We'll use a json or yaml configuration file along with the fsnotify in
order to watch for changes in the config file. As a result, we're
allowing dynamic control of logging params, allowed namespaces,
extended resources, label whitelisting, and denied namespaces.

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-04-03 09:52:09 +01:00
Fabiano Fidêncio
10672e1bba cpu: Expose the total number of keys for TDX
The total amount of keys that can be used on a specific TDX system is
exposed via the cgroups misc.capacity. See:

```
$ cat /sys/fs/cgroup/misc.capacity
tdx 31
```

The first step to properly manage the amount of keys present in a node
is exposing it via the NFD, and that's exactly what this commit does.

An example of how it ends up being exposed via the NFD:

```
$ kubectl get node 984fee00befb.jf.intel.com -o jsonpath='{.metadata.labels}'  | jq | grep tdx.total_keys
  "feature.node.kubernetes.io/cpu-security.tdx.total_keys": "31",
```

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-03-31 09:12:26 +02:00
Carlos Eduardo Arango Gutierrez
7171cfd4eb
cpu: expose AMD SEV support
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
2023-03-30 15:19:43 +02:00
Talor Itzhak
727de56191 documentaion: document the reactive updates feature
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
2023-03-16 11:53:12 +02:00
AhmedGrati
ff2dddd27d docs: fix usage cusomization guide typos
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-02-27 10:26:25 +01:00
Muyassarov, Feruzjon
0e2f2c4587 go.mod: bump cpuid to v2.2.4
Bump cpuid version to v2.2.4 in the go.mod so that WRMSRNS (
Non-Serializing Write to Model Specific Register) and MSRLIST
(Read/Write List of Model Specific Registers) instructions are
detectable.

Signed-off-by: Muyassarov, Feruzjon <feruzjon.muyassarov@intel.com>
2023-02-20 22:58:59 +02:00
Kubernetes Prow Robot
a92614c292
Merge pull request #1051 from AhmedGrati/feat-add-deny-label-ns-with-wildcard
feat: add deny-label-ns flag which supports wildcard
2023-02-15 03:42:25 -08:00
AhmedGrati
b499799364 feat: add deny-label-ns flag which supports wildcard
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-02-15 09:47:00 +01:00
Kubernetes Prow Robot
e3b9184354
Merge pull request #1027 from marquiz/devel/image-full
images: base the default image on distroless/base
2023-02-10 08:07:30 -08:00
Markus Lehtonen
cd62f6566f images: base the default image on distroless/base
Make distroless/base as the base image for the default image,
effectively making the minimal image as the default. Add a new "full"
image variant that corresponds the previous default image. The
"*-minimal" container image tag is provided for backwards compatibility.

The practical user impact of this change is that hook support is limited
to statically linked ELF binaries. Bash or Perl scripts are not
supported by the default image, anymore, but the new "full" image
variant can be used for backwards compatibility.
2023-01-31 11:30:38 +02:00
Chandan Abhyankar
d66096a491 cpu: support for detecting nx-gzip coprocessor feature
Nest accelerator gzip support for IBM Power systems.

Signed-off-by: Chandan Abhyankar <Chandan.Abhyankar@ibm.com>
2023-01-17 23:18:16 -08:00
Hiren Panchasara
bfbc47f55e docs: fix internal cross-page references by injecting .md 2023-01-16 20:53:36 -08:00
PiotrProkop
3143faf0ab Add documentation for topology garbage collector
Signed-off-by: PiotrProkop <pprokop@nvidia.com>
2023-01-11 10:15:38 +01:00
Kubernetes Prow Robot
79cd4fc094
Merge pull request #1023 from fmuyassarov/sfr-support
Bump cpuid to v2.2.3
2023-01-02 01:27:31 -08:00
Muyassarov, Feruzjon
d9dc4b09d5 Bump cpuid to v2.2.3
Bump cpuid to v2.2.3 which adds support for detecting Intel Sierra
Forest instructions like AVXIFMA, AVXNECONVERT, AVXVNNIINT8 and
CMPCCXADD.
Signed-off-by: Muyassarov, Feruzjon <feruzjon.muyassarov@intel.com>
2022-12-30 11:42:05 +02:00
Markus Lehtonen
8c0e38d0c5 docs: fix typo in CRD name 2022-12-21 13:42:10 +02:00
Markus Lehtonen
b91922746a docs: mention NodeFeature as an extension point
In the CRD intro, mention that NodeFeature can be used as an integration
point for 3rd party extensions.
2022-12-21 13:26:31 +02:00
Markus Lehtonen
3209c14bea docs: document NodeFeature API
Document the usage of the NodeFeature CRD API. Also re-organize the
documentation a bit, moving the description of NodeFeatureRule
controller from customization guide to nfd-master usage page.
2022-12-14 22:33:12 +02:00
Kubernetes Prow Robot
776a8c335c
Merge pull request #980 from marquiz/devel/topology-updater
nfd-topology-updater: update NodeResourceTopology objects directly
2022-12-08 01:44:22 -08:00
Markus Lehtonen
f13ed2d91c nfd-topology-updater: update NodeResourceTopology objects directly
Drop the gRPC communication to nfd-master and connect to the Kubernetes
API server directly when updating NodeResourceTopology objects.
Topology-updater already has connection to the API server for listing
Pods so this is not that dramatic change. It also simplifies the code
a lot as there is no need for the NFD gRPC client and no need for
managing TLS certs/keys.

This change aligns nfd-topology-updater with the future direction of
nfd-worker where the gRPC API is being dropped and replaced by a
CRD-based API.

This patch also update deployment files and documentation to reflect
this change.
2022-12-08 11:03:22 +02:00
Markus Lehtonen
0834ec5cbf go.mod: update to klauspost/cpuid to v2.2.2
Support detection of Intel TME (Total Memory Encryption) plus AMXFP16
and PREFETCHI.
2022-12-07 13:58:19 +02:00
Feruzjon Muyassarov
984a3de198 Document tainting feature
Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>
2022-12-02 17:29:10 +02:00
Markus Lehtonen
32b252147c docs: small update to customization guide
Add a reference to the label rule format in the NodeFeatureRule section.
Also make it explicit in the beginning of Hooks section that hooks are
deprecated.
2022-12-01 18:33:48 +02:00
Markus Lehtonen
8a45384037 docs: simplify quick-start page
Move topology-updater deployment notes to the topology-updater usage
page. Also, rework the plaintext and headings a bit.
2022-12-01 12:22:23 +02:00
Markus Lehtonen
cdc7558f6f docs: better document custom resources
Add a separate page for describing the custom resources used by NFD.
Simplify the Introduction page by moving the details of
NodeResourceTopology from there. Similarly, drop long
NodeResourceTopology example from the quick-start page, making the page
shorter and simpler.
2022-12-01 11:12:59 +02:00
Markus Lehtonen
eb8e29c80a nfd-worker: drop deprecated command line flags
Drop the following flags that were deprecated already in v0.8.0:

-sleep-interval  (replaced by core.sleepInterval config file option)
-label-whitelist (replaced by core.labelWhiteList config file option)
-sources         (replaced by -label-sources flag)
2022-11-23 22:33:51 +02:00
Talor Itzhak
d495376f06 docs: topology-updater: update docs for exclude-list feature
Update the docs with explanations and examples
about the exclude-list feature.

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
2022-11-21 21:31:51 +02:00
Markus Lehtonen
6171c745a4 docs: restructure docs
Introduce two main sections "Deployment" and "Usage" and move "Developer
guide" to the top level, too. In particular, split the huge
deployment-and-usage file into multiple parts under the new main
sections. Move customization guide from "Advanced" to "Usage".
This patch also renames "Advanced" to "Reference" as only that is left
there is reference documentation.
2022-11-03 10:26:56 +02:00