1
0
Fork 0
mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2024-12-14 11:57:51 +00:00
Commit graph

414 commits

Author SHA1 Message Date
Markus Lehtonen
7d1df87305 source/custom: drop support for the legacy rule format 2023-10-05 16:15:37 +03:00
Markus Lehtonen
1d8a83b045 nfd-master: stop creating NFD version annotations
We now have metrics for getting detailed information about the NFD
instances running. There should be no need to pollute the node object
with NFD version annotations.

One problem with the annotations also that they were incomplete in the
sense that they only covered nfd-master and nfd-worker but not
nfd-topology-updater or nfd-gc.

Also, there was a problem with stale annotations, giving misleading
information. E.g. there was no way to remove old/stale master.version
annotations if nfd-master was scheduled on another node where it was
previously running.
2023-10-05 14:53:29 +03:00
AhmedGrati
3130898d58 feat: support raw features
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-10-04 22:37:42 +01:00
Markus Lehtonen
6149000637 Build statically linked binaries
Switch to fully statically linked binaries and use scratch as a base
image.

Switching to the virtually empty scratch base image means that the
default/minimal NFD image only supports running hooks that are truly
statically linked (e.g.  normal go binaries that are "almost" statically
linked stop working).  The documentation has been already stating this
(i.e. that only statically-linked binaries are supported) - i.e. we have
had no promise of supporting other than that. Also, hooks are now
deprecated and even disabled by default so the possibility of real user
impact should be small.
2023-09-19 21:59:18 +03:00
AhmedGrati
6c895b496a feat: ignore hidden feature files
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-09-11 15:11:31 +01:00
Markus Lehtonen
c126764d7a cpu: drop the deprecated sgx and se labels
Drop the deprecated cpu-sgx.enabled and cpu-se.enabled labels and the
corresponding "raw" features. These have been replaced by
cpu-security.sgx.enabled and cpu-security.se.enabled.
2023-09-08 14:28:04 +03:00
Kubernetes Prow Robot
2e6a202218
Merge pull request #1331 from andrewjamesbrown/ajb/chart_annotations
Helm: conditionally add annotations if defined
2023-09-07 01:20:59 -07:00
Kubernetes Prow Robot
c0c1b89a92
Merge pull request #1334 from ArangoGutierrez/grpc_gone_v2
Deprecate gRPC API
2023-09-07 00:38:59 -07:00
Kubernetes Prow Robot
e097c3f8f6
Merge pull request #1338 from AhmedGrati/feat-add-logging-params-config-file
nfd-master: add config file options for logging
2023-09-06 23:58:57 -07:00
Carlos Eduardo Arango Gutierrez
9966d2ae12
Deprecate gRPC API
Now that the NodeFeature API has been set enabled by default, the gRPC
mode will be deprecated and with it all flags and features around it.

For nfd-master, flags
-port, -key-file, -ca-file, -cert-file, -verify-node-name, -enable-nodefeature-api
are now marked as deprecated.

For nfd-worker flags
-enable-nodefeature-api, -ca-file, -cert-file, -key-file, -server, -server-name-override
are now marked as deprecated.

Deprecated flags, as well as gRPC related code will be removed in future
releases.

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
2023-09-07 06:48:15 +02:00
AhmedGrati
a6b4a7d6a9 docs: add docs of logging configuration in nfd master
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-09-06 15:36:15 +01:00
Andrew Brown
a3d26a0404
Add new helm values to documentation 2023-09-06 09:55:25 -04:00
AhmedGrati
124dfbf6df docs: add notes in the customization guide about the feature file size limit
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-09-06 11:15:45 +01:00
Carlos Eduardo Arango Gutierrez
ade5833ee3
tls.md: Add note (#1332)
* tls.md: Add note

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>

* Update docs/deployment/tls.md

Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>

---------

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
2023-09-06 01:06:52 -07:00
Kubernetes Prow Robot
50dd128b23
Merge pull request #1329 from ArangoGutierrez/1187
Enable NodeFeature API by default
2023-09-05 11:56:51 -07:00
Carlos Eduardo Arango Gutierrez
04e954a7c3
Enable NodeFeature API by default
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
2023-09-05 20:21:31 +02:00
Kubernetes Prow Robot
0b218a1eca
Merge pull request #1285 from AhmedGrati/feat-add-expiry-date-feature-files
Feat: add expiry date for feature files
2023-09-05 02:19:50 -07:00
Markus Lehtonen
cbd2c2f3df docs: demote hooks in the customization guide
Hooks are deprecated so describe feature files first.
2023-09-04 16:06:51 +03:00
Kubernetes Prow Robot
9848ef9d43
Merge pull request #1321 from ffromani/nfd-topo-updater-fix-docs
docs: nfd-updater: clarify accounting
2023-09-04 00:33:49 -07:00
Francesco Romani
727875f240 docs: nfd-updater: clarify accounting
Clarify that we account, and we can account, only
resources exclusively allocated to Guaranteed QoS pods.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-09-04 08:51:14 +02:00
AhmedGrati
47aec15ea1 test: add unit tests for the expiration date function
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-09-01 20:04:24 +01:00
Markus Lehtonen
ae1a95f395 docs: update docs build dependencies
Add webrick as that is needed. Also update other deps to their latest
versions.
2023-08-30 19:31:35 +03:00
Kubernetes Prow Robot
e1f90a233b
Merge pull request #1305 from marquiz/devel/nf-gc
Garbage collection of NodeFeature objects
2023-08-28 02:59:42 -07:00
Kubernetes Prow Robot
6d95e59cd0
Merge pull request #1290 from marquiz/devel/metrics-new
metrics: additional metrics for nfd-master
2023-08-28 02:07:42 -07:00
Markus Lehtonen
a15b5690b6 docs: update to cover nfd-gc 2023-08-23 10:56:12 +03:00
Markus Lehtonen
ceb672bde0 deployment/helm: support nfd-gc
Rename files and parameters. Drop the container security context
parameters from the Helm chart. There should be no reason to run the
nfd-gc with other than the minimal privileges.

Also updates the documentation.
2023-08-23 10:56:12 +03:00
Kubernetes Prow Robot
536f9d17d0
Merge pull request #1295 from marquiz/devel/topology-updater-metrics
nfd-topology-updater: add metrics support
2023-08-20 23:25:24 -07:00
Markus Lehtonen
b64ba37377 docs: update github-pages gem to v228
Also update other dependencies.
2023-08-16 13:51:09 +03:00
Markus Lehtonen
5ad2294c14 metrics: add nfd_node_update_requests_total counter
Add a counter for total number of node update/sync requests. In
practice, this counts the number of gRPC requests received if the gRPC
API is in use. If the NodeFeature API is enabled, this counts the
requests initiated by the NFD API controller, i.e. updates triggered by
changes in NodeFeature or NodeFeatureRule objects plus updates initiated
by the controller resync period.
2023-08-07 09:37:29 +03:00
Markus Lehtonen
4b24cc1afa metrics: counters for rejected labels, extended resources and taints
Add counters for labels, extended resources and taints rejected/filtered
out by nfd-master.
2023-08-07 09:37:29 +03:00
Markus Lehtonen
a8a29e6df2 metrics: add nfd_nodefeaturerule_processing_errors_total counter
Add a counter for errors encountered when processing NodeFeatureRules.
Another simple counter without any additional prometheus labels -
nfd-master logs can provide further details.
2023-08-07 09:37:29 +03:00
Markus Lehtonen
b90f2c318e metrics: add nfd_node_update_failures_total counter
Add a new counter for tracking node update failures from nfd-master.
This tracks both normal feature updates and the --prune sub-command.
This is a simple counter without any additional labels - nfd-master logs
can be used for further diagnostics.
2023-08-07 09:37:27 +03:00
AhmedGrati
f0edc6532a docs: add the support of the exipration date in the input format of the feature files
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-08-05 20:39:09 +01:00
Kubernetes Prow Robot
9ed191808d
Merge pull request #1296 from marquiz/docs/metrics
docs: document -metrics flag in command line reference
2023-08-05 03:06:30 -07:00
Markus Lehtonen
4b7ee47e5f docs: document -metrics flag in command line reference
Document the -metrics command line flag in the command line reference of
nfd-master and nfd-worker.
2023-08-04 16:49:03 +03:00
Markus Lehtonen
06b333db1e nfd-topology-updater: add metrics support
For now, add only one metric, a counter for the errors occurring while
scanning pod resources on the node.
2023-08-04 16:48:37 +03:00
Markus Lehtonen
4aa7a8f8f8 source/local: support comments in input
Lines starting with '#' are treated as comments and ignored when parsing
feature files and hook output.
2023-08-04 16:46:22 +03:00
Markus Lehtonen
0a8b514d67 docs: unify formatting of NOTEs 2023-08-03 15:36:56 +03:00
Markus Lehtonen
a1406767a9 docs: align metrics documentation with latest changes on naming
Also change table formatting and fix one incorrect description.
2023-08-01 15:53:06 +03:00
Kubernetes Prow Robot
65b7216313
Merge pull request #1283 from marquiz/docs/deprecation-policy
docs: deprecation policy for Helm chart params
2023-07-25 10:46:06 -07:00
Kubernetes Prow Robot
463a737b82
Merge pull request #1277 from marquiz/docs/k8s-compat
docs: describe supported Kubernetes versions
2023-07-25 08:54:06 -07:00
Markus Lehtonen
b1328b3166 docs: describe supported Kubernetes versions 2023-07-25 17:40:06 +03:00
Markus Lehtonen
b72b537261 docs: deprecation policy for Helm chart params 2023-07-24 14:06:30 +03:00
Pat Riehecky
0523257d1a Add optional labels to the podmonitor
Signed-off-by: Pat Riehecky <riehecky@fnal.gov>
2023-07-21 10:03:50 -05:00
Kubernetes Prow Robot
c9f3550237
Merge pull request #1280 from marquiz/docs/tocs
docs: remove useless TOCs
2023-07-21 06:50:15 -07:00
Kubernetes Prow Robot
ebbea564a8
Merge pull request #1278 from marquiz/docs/fixes
docs: fix toc of topology-updater and topology-gc reference
2023-07-21 06:50:08 -07:00
Markus Lehtonen
312ef308d1 docs: remove useless TOCs
Drop table of contents from short pages where it is only cluttering the
page.
2023-07-21 16:35:12 +03:00
Markus Lehtonen
f825812229 docs: document version and deprecation policy 2023-07-21 16:28:38 +03:00
Markus Lehtonen
d4d6963473 docs: fix toc of topology-updater and topology-gc reference
Exclude the main title from to (with the empty line the "no_toc"
directive took no effect).
2023-07-21 15:41:59 +03:00
Carlos Eduardo Arango Gutierrez
e3aedd33e2
Enable metrics via prometheus operator
Expose metrics via prometheus.monitoring.coreos.com/v1

The exposed metrics are

| Metric        | Type | Meaning |
| --------------- | ---------------- | ---------------- |
|  `nfd_master_build_info`           | Gauge | Version from which nfd-master was built. |
|  `nfd_worker_build_info`           | Gauge | Version from which nfd-worker was built. |
|  `nfd_updated_nodes`           |  Counter | Time taken to label a node |
|  `nfd_crd_processing_time`          |  Gauge | Time taken to process a NodeFeatureRule CRD |
| `nfd_feature_discovery_duration_seconds` |  HistogramVec | Time taken to discover features on a node |

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
2023-07-21 10:59:52 +02:00
Kubernetes Prow Robot
407a610e0c
Merge pull request #1182 from fmuyassarov/disable-hooks-by-default
hooks: disable hooks by default from v0.14
2023-06-22 04:43:40 -07:00
Carlos Eduardo Arango Gutierrez
563cc862de
Docs: Fix typo on customization-guide
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-06-09 10:23:33 +02:00
Muyassarov, Feruzjon
19527be924
hooks: disable hooks by default
We have deprecated hooks in v0.12.0 but kept it enabled by default.
Starting from v0.14 we are starting to disable it by default and
plan to fully remove it in the near future.

Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>
2023-06-07 13:04:23 +03:00
Simon Jürgensmeyer
307a865465
Fix missing apostrophe for jq 2023-06-07 09:53:02 +02:00
Hairong Chen
e8a00ba7da cpu: Discover TDX guests based on cpuid information
NFD already has the capability to discover whether baremetal / host
machines support Intel TDX.  Now, the next step is to add support for
discovering whether a node is TDX protected (as in, a virtual machine
started using Intel TDX).

In order to do so, we've decided to go for a new `cpu-security.tdx`
property, called `protected` (`cpu-security.tdx.protected`).

Signed-off-by: Hairong Chen <hairong.chen@intel.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-06-05 11:06:28 +02:00
Kubernetes Prow Robot
306969a945
Merge pull request #1133 from AhmedGrati/feat-parallelize-nodes-update
feat: parallelize nodes update
2023-06-02 05:28:57 -07:00
AhmedGrati
b3cfe17392 feat: parallelize nodes update
This PR aims to optimize the process of updating nodes with
corresponding features. In fact, previously, we were updating nodes
sequentially even though they are independent from each other.
Therefore, we integrated new components: LabelersNodePool which is
responsible for spininng a goroutine whenever there's a request for
updating nodes, and a Workqueue which is responsible for holding nodes names
that should be updated.

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-06-02 11:41:50 +01:00
AhmedGrati
08b9c3486e feat: support dynamic values for labels in the NodeFeatureRule
This PR aims to support the dynamic values for labels in the
NodeFeatureRule CRD, it would offer more flexible labeling for users.
To achieve this, we check whether label value starts with "@", and if
it's the case, we will get the value of the feature value, and update
the value of the label with the feature value.

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-05-31 23:30:26 +01:00
Kubernetes Prow Robot
d28a02c5cd
Merge pull request #1222 from vaibhav2107/kustomize-type
Fixed typo in Header under deployment/kustomize.md
2023-05-22 00:42:21 -07:00
Kubernetes Prow Robot
70d5ef477f
Merge pull request #1219 from PiotrProkop/leader-elect
Add leader election for nfd-master
2023-05-22 00:36:21 -07:00
vaibhav2107
9f7854479f Fixed type in Header under deployment/kustomize.md 2023-05-18 14:59:54 +05:30
PiotrProkop
272fd4784f Add new flag enable-leader-election for nfd-master.
It allows NFD-master to be run in active-passive way when running
multiple instances of NFD-master to prevent multiple components
from updating same custom resources.

Signed-off-by: PiotrProkop <pprokop@nvidia.com>
2023-05-15 13:30:07 +02:00
Markus Lehtonen
1200fd05c5 topology-updater: use node IP in the default configz URI
Use a separate NODE_ADDRESS environment variable in the default value of
-kubelet-config-uri (instead of NODE_NAME that was previously used).
Also change the kustomize and Helm deployments to set this variable to
node IP address. This should make the default deployment more robust,
making it work in scenarios where node name does not resolve to the node
ip, e.g. nodename != hostname.
2023-05-05 13:29:51 +03:00
Kubernetes Prow Robot
cd45baef8d
Merge pull request #1211 from marquiz/devel/helm
deployment/helm: improve handling of topologyUpdater.kubeletStateFiles
2023-05-05 00:17:13 -07:00
Markus Lehtonen
526aab87cf deployment/helm: user dedicated serviceaccount for topology-updater
Change the configuration so that, by default, we use a dedicated
serviceaccount for topology-updater (similar to topology-gc, nfd-master
and nfd-worker).

Fix the templates so that the serviceaccount and clusterrolebinding are
only created when topology-updater is enabled (clusterrole was already
handled this way).

This patch also correctly documents the default value of rbac.create
parameter of topology-updater and topology-gc.
2023-05-05 08:30:21 +03:00
Markus Lehtonen
9c2f268fd2 deployment/helm: improve handling of topologyUpdater.kubeletStateFiles
Make it possible to disable kubelet state tracking with
--set topologyUpdater.kubeletStateFiles="" as the documentation
suggests.

Also, fix the documentation regarding the default value of
topologyUpdater.kubeletStateFiles parameter.
2023-05-04 15:01:19 +03:00
Markus Lehtonen
9685d292a2 docs: add missing .md suffix to internal references
Commit bfbc47f55e added a lot of those and
this patch tries to cover all that we missed there. Having .md suffixes
in references to internal files makes it convenient to browse the
document locally, just as text files as the references work correctly.
2023-04-25 15:28:07 +03:00
Kubernetes Prow Robot
2356223ffc
Merge pull request #1139 from AhmedGrati/feat-configure-master-resync
feat: add master resync period configurability
2023-04-24 03:49:02 -07:00
AhmedGrati
7917434d38 feat: add master resync period configurability
This PR adds a config option for setting the NFD API controller resync period.
The resync period is only activated when the NodeFeature API has been
enabled (with -enable-nodefeature-api).

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-04-24 11:52:38 +02:00
Carlos Eduardo Arango Gutierrez
05ef5d4e9d
cpu: expose the total number of AMD SEV ASID and ES
This patch add SEV ASIDs and the related (but distinct) SEV Encrypted State
(SEV-ES) IDs as two quantities to be exposed via extended resources.
In a kernel built with CONFIG_CGROUP_MISC on a suitably equipped AMD CPU, the
root control group will have a misc.capacity file that shows the number of
available IDs in each category.

The added extended resources are:
- sev.asids
- sev.encrypted_state_ids

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-04-17 19:34:39 +02:00
Mikko Ylinen
de1b69a8bf cpu: make SGX EPC resource available to NodeFeatureRules
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2023-04-14 15:31:54 +03:00
Markus Lehtonen
3320c74472 source/cpu: don't create cpu-security.tdx.total_keys label
Just have that as a feature for NodeFeatureRules to consume.
2023-04-14 13:33:13 +03:00
Kubernetes Prow Robot
84c348b69f
Merge pull request #1126 from marquiz/devel/er-deprecation
nfd-master: deprecate the -resource-labels flag
2023-04-13 10:52:39 -07:00
Kubernetes Prow Robot
8d71ed6755
Merge pull request #1086 from AhmedGrati/feat-support-builtin-kernel-mods
feat: support builtin kernel mods
2023-04-13 10:30:40 -07:00
AhmedGrati
109caa1f28 feat: support builtin kernel mods
This PR adds the combination of dynamic and builtin kernel modules into
one feature called `kernel.enabledmodule`. It's a superset of the
`kernel.loadedmodule` feature.

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-04-13 10:19:24 +01:00
Markus Lehtonen
8511980bf4 nfd-master: deprecate the -resource-labels flag
Mark the -resource-labels flag (and the corresponding resourceLabels
config option) as deprecated. We now support managing extended resources
via NodeFeatureRule objects. This kludge deserves to go, eventually.
2023-04-13 11:30:58 +03:00
Markus Lehtonen
dcbb3bc450 docs: add missing mentions of extended resources and taints
A small update to fix some missing mentions of extended resources and
taints as assets managed by NFD.
2023-04-11 20:38:21 +03:00
Kubernetes Prow Robot
ad07829d0a
Merge pull request #1099 from ArangoGutierrez/extended_resources_v2
Create extended resources with NodeFeatureRule
2023-04-07 08:09:15 -07:00
Fabiano Fidêncio
250aea4741
Create extended resources with NodeFeatureRule
Add support for management of Extended Resources via the
NodeFeatureRule CRD API.

There are usage scenarios where users want to advertise features
as extended resources instead of labels (or annotations).

This patch enables the discovery of extended resources, via annotation
and patch of node.status.capacity and node.status.allocatable. By using
the NodeFeatureRule API.

Co-authored-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
Co-authored-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-04-07 16:14:56 +02:00
Kubernetes Prow Robot
6740224a13
Merge pull request #1100 from PiotrProkop/expose-L3-num-closid
Advertise RDT L3 num_closid
2023-04-07 00:49:14 -07:00
Markus Lehtonen
cc6c20ff5f nfd-master: disallow unprefixed and kubernetes taints
Disallow taints having a key with "kubernetes.io/" or "*.kubernetes.io/"
prefix. This is a precaution to protect the user from messing up with
the "official" well-known taints from Kubernetes itself. The only
exception is that the "nfd.node.kubernetes.io/" prefix is allowed.

However, there is one allowed NFD-specific namespace (and its
sub-namespaces) i.e. "feature.node.kubernetes.io" under the
kubernetes.io domain that can be used for NFD-managed taints.

Also disallow unprefixed taint keys. We don't add a default prefix to
unprefixed taints (like we do for labels) from NodeFeatureRules. This is
to prevent unpleasant surprises to users that need to manage matching
tolerations for their workloads.
2023-04-06 16:12:37 +03:00
PiotrProkop
0e78eba40e Advertise RDT L3 num_closid
Signed-off-by: PiotrProkop <pprokop@nvidia.com>
2023-04-06 11:22:55 +02:00
Kubernetes Prow Robot
3c0c43b9be
Merge pull request #1114 from marquiz/devel/rdt-deprecate
source/cpu: deprecate cpu-rdt.* labels
2023-04-05 06:21:40 -07:00
Kubernetes Prow Robot
193c552b33
Merge pull request #1084 from AhmedGrati/feat-add-master-config-file
feat: add master config file
2023-04-04 10:41:40 -07:00
Markus Lehtonen
6cb5e99afa source/cpu: deprecate cpu-rdt.* labels
Document built-in RDT labels to be deprecated and removed in a future
release. The plan is that the default built-in RDT labels would not be
created anymore, but the RDT features would still be available for
NodeFeatureRules to consume.

The RDT labels are not very useful (they don't e.g indicate if the
features are really enabled in kernel or if the resctrlfs is mounted).
2023-04-04 11:54:57 +03:00
AhmedGrati
3fff409f6d Add master config file
Similar to the nfd-worker, in this PR we want to support the
dynamic run-time configurability through a config file for the nfd-master.

We'll use a json or yaml configuration file along with the fsnotify in
order to watch for changes in the config file. As a result, we're
allowing dynamic control of logging params, allowed namespaces,
extended resources, label whitelisting, and denied namespaces.

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-04-03 09:52:09 +01:00
Fabiano Fidêncio
10672e1bba cpu: Expose the total number of keys for TDX
The total amount of keys that can be used on a specific TDX system is
exposed via the cgroups misc.capacity. See:

```
$ cat /sys/fs/cgroup/misc.capacity
tdx 31
```

The first step to properly manage the amount of keys present in a node
is exposing it via the NFD, and that's exactly what this commit does.

An example of how it ends up being exposed via the NFD:

```
$ kubectl get node 984fee00befb.jf.intel.com -o jsonpath='{.metadata.labels}'  | jq | grep tdx.total_keys
  "feature.node.kubernetes.io/cpu-security.tdx.total_keys": "31",
```

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-03-31 09:12:26 +02:00
Carlos Eduardo Arango Gutierrez
7171cfd4eb
cpu: expose AMD SEV support
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
2023-03-30 15:19:43 +02:00
AhmedGrati
02b3b7c7e0 feat: add enableTaints to helm chart
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-03-21 10:49:24 +01:00
Talor Itzhak
5c6be580f4 reactive updates: add an option to disable the feature
Access to the kubelet state directory may raise concerns in some setups, added an option to disable it.
The feature is enabled by default.

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
2023-03-16 11:53:16 +02:00
Talor Itzhak
727de56191 documentaion: document the reactive updates feature
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
2023-03-16 11:53:12 +02:00
Talor Itzhak
8924213d14 topology-updater: make it possible to disable sleep-interval
Especially convenient for testing porpuses and
completely harmless

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
2023-03-12 12:43:17 +02:00
Sajiyah Salat
7082c31d6c
Update worker-configuration-reference.md 2023-03-08 21:33:44 +05:30
Sajiyah Salat
fb2d70a313
Update worker-configuration-reference.md 2023-03-08 21:28:45 +05:30
AhmedGrati
ff2dddd27d docs: fix usage cusomization guide typos
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-02-27 10:26:25 +01:00
Jose Luis Ojosnegros Manchón
b340d112a8 topology-updater:compute pod set fingerprint
Add an option to compute the fingerprint of the current pod set on each
node.

Report this new fingerprint using an attribute in NRT object.
2023-02-22 10:22:50 +01:00
Kubernetes Prow Robot
69440d7820
Merge pull request #1062 from yanggangtony/fix-doc
docs: describe nfd-topology-gc in introduction.md
2023-02-21 02:17:48 -08:00
Muyassarov, Feruzjon
0e2f2c4587 go.mod: bump cpuid to v2.2.4
Bump cpuid version to v2.2.4 in the go.mod so that WRMSRNS (
Non-Serializing Write to Model Specific Register) and MSRLIST
(Read/Write List of Model Specific Registers) instructions are
detectable.

Signed-off-by: Muyassarov, Feruzjon <feruzjon.muyassarov@intel.com>
2023-02-20 22:58:59 +02:00
yanggang
150d4f4db2
docs: describe nfd-topology-gc in introduction.md
Signed-off-by: yanggang <gang.yang@daocloud.io>
2023-02-18 06:12:35 +08:00
Guangwen Feng
8ad6c5b425 Fix some typos
Signed-off-by: Guangwen Feng <fenggw-fnst@fujitsu.com>
2023-02-16 22:08:00 +08:00
Kubernetes Prow Robot
a92614c292
Merge pull request #1051 from AhmedGrati/feat-add-deny-label-ns-with-wildcard
feat: add deny-label-ns flag which supports wildcard
2023-02-15 03:42:25 -08:00
AhmedGrati
b499799364 feat: add deny-label-ns flag which supports wildcard
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-02-15 09:47:00 +01:00
Kubernetes Prow Robot
e3b9184354
Merge pull request #1027 from marquiz/devel/image-full
images: base the default image on distroless/base
2023-02-10 08:07:30 -08:00
AhmedGrati
07d5ffe4b8 helm: make master port configurable
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-02-01 10:03:06 +01:00
Markus Lehtonen
cd62f6566f images: base the default image on distroless/base
Make distroless/base as the base image for the default image,
effectively making the minimal image as the default. Add a new "full"
image variant that corresponds the previous default image. The
"*-minimal" container image tag is provided for backwards compatibility.

The practical user impact of this change is that hook support is limited
to statically linked ELF binaries. Bash or Perl scripts are not
supported by the default image, anymore, but the new "full" image
variant can be used for backwards compatibility.
2023-01-31 11:30:38 +02:00
Chandan Abhyankar
d66096a491 cpu: support for detecting nx-gzip coprocessor feature
Nest accelerator gzip support for IBM Power systems.

Signed-off-by: Chandan Abhyankar <Chandan.Abhyankar@ibm.com>
2023-01-17 23:18:16 -08:00
Hiren Panchasara
bfbc47f55e docs: fix internal cross-page references by injecting .md 2023-01-16 20:53:36 -08:00
PiotrProkop
3143faf0ab Add documentation for topology garbage collector
Signed-off-by: PiotrProkop <pprokop@nvidia.com>
2023-01-11 10:15:38 +01:00
Kubernetes Prow Robot
0159ab04e7
Merge pull request #1021 from fmuyassarov/docs-taint
Docs: mention tainting in the intro section
2023-01-02 02:19:30 -08:00
Kubernetes Prow Robot
79cd4fc094
Merge pull request #1023 from fmuyassarov/sfr-support
Bump cpuid to v2.2.3
2023-01-02 01:27:31 -08:00
Muyassarov, Feruzjon
d9dc4b09d5 Bump cpuid to v2.2.3
Bump cpuid to v2.2.3 which adds support for detecting Intel Sierra
Forest instructions like AVXIFMA, AVXNECONVERT, AVXVNNIINT8 and
CMPCCXADD.
Signed-off-by: Muyassarov, Feruzjon <feruzjon.muyassarov@intel.com>
2022-12-30 11:42:05 +02:00
Muyassarov, Feruzjon
842153a907 Docs: mention tainting in the intro section
Signed-off-by: Muyassarov, Feruzjon <feruzjon.muyassarov@intel.com>
2022-12-28 14:00:04 +02:00
Markus Lehtonen
8c0e38d0c5 docs: fix typo in CRD name 2022-12-21 13:42:10 +02:00
Markus Lehtonen
b91922746a docs: mention NodeFeature as an extension point
In the CRD intro, mention that NodeFeature can be used as an integration
point for 3rd party extensions.
2022-12-21 13:26:31 +02:00
Markus Lehtonen
27c47bd088 docs: better document differences between deployment methods 2022-12-20 16:29:48 +02:00
Markus Lehtonen
3209c14bea docs: document NodeFeature API
Document the usage of the NodeFeature CRD API. Also re-organize the
documentation a bit, moving the description of NodeFeatureRule
controller from customization guide to nfd-master usage page.
2022-12-14 22:33:12 +02:00
Markus Lehtonen
9f0806593d nfd-master: rename -featurerules-controller flag to -crd-controller
Deprecate the '-featurerules-controller' command line flag as the name
does not describe the functionality anymore: in practice it controls the
CRD controller handling both NodeFeature and NodeFeatureRule objects.
The patch introduces a duplicate, more generally named, flag
'-crd-controller'. A warning is printed in the log if
'-featurerules-controller' flag is encountered.
2022-12-14 10:23:45 +02:00
Markus Lehtonen
5a717c418b docs: small reordering of master cmdline reference
Move documentation of -enable-taints near '-enable-nodefeature-api' and
'-no-publish' as they are related in that they control the enablement of
APIs.
2022-12-14 07:31:28 +02:00
Markus Lehtonen
6ddd87e465 nfd-master: support NodeFeature objects
Add initial support for handling NodeFeature objects. With this patch
nfd-master watches NodeFeature objects in all namespaces and reacts to
changes in any of these. The node which a certain NodeFeature object
affects is determined by the "nfd.node.kubernetes.io/node-name"
annotation of the object. When a NodeFeature object targeting certain
node is changed, nfd-master needs to process all other objects targeting
the same node, too, because there may be dependencies between them.

Add a new command line flag for selecting between gRPC and NodeFeature
CRD API as the source of feature requests. Enabling NodeFeature API
disables the gRPC interface.

 -enable-nodefeature-api   enable NodeFeature CRD API for incoming
                           feature requests, will disable the gRPC
                           interface (defaults to false)

It is not possible to serve gRPC and watch NodeFeature objects at the
same time. This is deliberate to avoid labeling races e.g. by nfd-worker
sending gRPC requests but NodeFeature objects in the cluster
"overriding" those changes (labels from the gRPC requests will get
overridden when NodeFeature objects are processed).
2022-12-14 07:31:28 +02:00
Markus Lehtonen
237494463b nfd-worker: support creating NodeFeatures object
Support the new NodeFeatures object of the NFD CRD api. Add two new
command line options to nfd-worker:

 -kubeconfig               specifies the kubeconfig to use for
                           connecting k8s api (defaults to empty which
                           implies in-cluster config)
 -enable-nodefeature-api   enable the NodeFeature CRD API for
                           communicating node features to nfd-master,
                           will also automatically disable gRPC
                           (defgault to false)

No config file option for selecting the API is available as there should
be no need for dynamically selecting between gRPC and CRD. The
nfd-master configuration must be changed in tandem and it is safer (and
avoid awkward configuration races) to configure the whole NFD deployment
at once.

Default behavior of nfd-worker is not changed i.e. NodeFeatures object
creation is not enabled by default (but must be enabled with the command
line flag).

The patch also updates the kustomize and Helm deployment, adding RBAC
rules for nfd-worker and updating the example worker configuration.
2022-12-14 07:31:28 +02:00
Kubernetes Prow Robot
776a8c335c
Merge pull request #980 from marquiz/devel/topology-updater
nfd-topology-updater: update NodeResourceTopology objects directly
2022-12-08 01:44:22 -08:00
Markus Lehtonen
f13ed2d91c nfd-topology-updater: update NodeResourceTopology objects directly
Drop the gRPC communication to nfd-master and connect to the Kubernetes
API server directly when updating NodeResourceTopology objects.
Topology-updater already has connection to the API server for listing
Pods so this is not that dramatic change. It also simplifies the code
a lot as there is no need for the NFD gRPC client and no need for
managing TLS certs/keys.

This change aligns nfd-topology-updater with the future direction of
nfd-worker where the gRPC API is being dropped and replaced by a
CRD-based API.

This patch also update deployment files and documentation to reflect
this change.
2022-12-08 11:03:22 +02:00
Markus Lehtonen
881ee13654 docs: remove non-existent nodeFeatureRule.createCRD parameter
This value was recently dropped.
2022-12-07 16:25:43 +02:00
Markus Lehtonen
0834ec5cbf go.mod: update to klauspost/cpuid to v2.2.2
Support detection of Intel TME (Total Memory Encryption) plus AMXFP16
and PREFETCHI.
2022-12-07 13:58:19 +02:00
Feruzjon Muyassarov
984a3de198 Document tainting feature
Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>
2022-12-02 17:29:10 +02:00
Kubernetes Prow Robot
f740f084e0
Merge pull request #976 from marquiz/docs/customization-guide
docs: small update to customization guide
2022-12-01 12:51:55 -08:00
Markus Lehtonen
72e523f277 scripts/mdlint: update mdlint to v0.12.0 2022-12-01 20:57:21 +02:00
Markus Lehtonen
32b252147c docs: small update to customization guide
Add a reference to the label rule format in the NodeFeatureRule section.
Also make it explicit in the beginning of Hooks section that hooks are
deprecated.
2022-12-01 18:33:48 +02:00
Markus Lehtonen
8a45384037 docs: simplify quick-start page
Move topology-updater deployment notes to the topology-updater usage
page. Also, rework the plaintext and headings a bit.
2022-12-01 12:22:23 +02:00
Markus Lehtonen
cdc7558f6f docs: better document custom resources
Add a separate page for describing the custom resources used by NFD.
Simplify the Introduction page by moving the details of
NodeResourceTopology from there. Similarly, drop long
NodeResourceTopology example from the quick-start page, making the page
shorter and simpler.
2022-12-01 11:12:59 +02:00
Kubernetes Prow Robot
efc833d1c7
Merge pull request #970 from marquiz/docs/worker-helm-sa-params
docs: document helm chart params related to worker serviceaccount
2022-11-28 08:36:08 -08:00
Markus Lehtonen
d0a4cf7564 docs: document helm chart params related to worker serviceaccount 2022-11-28 18:07:17 +02:00
Markus Lehtonen
c1fa8b2f28 docs: revise topology-updater helm chart rbac parameters 2022-11-28 17:49:19 +02:00
Markus Lehtonen
eb8e29c80a nfd-worker: drop deprecated command line flags
Drop the following flags that were deprecated already in v0.8.0:

-sleep-interval  (replaced by core.sleepInterval config file option)
-label-whitelist (replaced by core.labelWhiteList config file option)
-sources         (replaced by -label-sources flag)
2022-11-23 22:33:51 +02:00
Talor Itzhak
d495376f06 docs: topology-updater: update docs for exclude-list feature
Update the docs with explanations and examples
about the exclude-list feature.

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
2022-11-21 21:31:51 +02:00
Markus Lehtonen
6f49421c0e docs: update github-pages gem to v227 2022-11-16 21:08:13 +02:00
Garrybest
3ec1b94020 get kubelet config from configz
Signed-off-by: Garrybest <garrybest@foxmail.com>
2022-11-08 23:52:35 +08:00
Markus Lehtonen
6171c745a4 docs: restructure docs
Introduce two main sections "Deployment" and "Usage" and move "Developer
guide" to the top level, too. In particular, split the huge
deployment-and-usage file into multiple parts under the new main
sections. Move customization guide from "Advanced" to "Usage".
This patch also renames "Advanced" to "Reference" as only that is left
there is reference documentation.
2022-11-03 10:26:56 +02:00
Markus Lehtonen
3a279ce751 docs: update the name of the base image 2022-11-02 15:10:46 +02:00
Kubernetes Prow Robot
e5c8180558
Merge pull request #937 from pacoxu/master
Stop using the beta.kubernetes.io/os and arch labels
2022-10-27 05:36:32 -07:00
Paco Xu
4e12ed8aac Stop using the beta.kubernetes.io/os and arch labels 2022-10-27 11:03:14 +08:00
Fabiano Fidêncio
d5db1cf907 cpu: Discover Intel TDX
Set `cpu-security.tdx.enable` to `true` when TDX is avialable and has
been enabled. otherwise it'll be set to `false`.

`/sys/module/kvm_intel/parameters/tdx` presence and content is used to
detect whether a CPU is Intel TDX capable.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-10-03 09:56:24 +02:00
Kubernetes Prow Robot
8662d17530
Merge pull request #871 from fmuyassarov/disable-hook
Config option to disable hooks
2022-09-26 10:40:08 -07:00
Markus Lehtonen
db7dd93a64 docs: fix incorrect shell snippet for removing labels 2022-09-15 16:18:09 +03:00
Markus Lehtonen
f21315d85f Update kubernetes registry to registry.k8s.io
Update registry location for non-nfd images.
2022-09-12 11:23:04 +03:00
Markus Lehtonen
4f34451db8 Update NFD registry to registry.k8s.io
Kubernetes has moved to a new container image registry:
https://groups.google.com/a/kubernetes.io/g/dev/c/DYZYNQ_A6_c/m/FpHqeVR2BAAJ
2022-09-12 11:21:12 +03:00
Kubernetes Prow Robot
77af16fe9d
Merge pull request #880 from fmuyassarov/add-tiltfile/feruz
Add Tilt option for developing NFD
2022-09-06 12:06:23 -07:00
Kubernetes Prow Robot
81da164b7f
Merge pull request #833 from marquiz/devel/security-refactor
cpu: re-organize security features
2022-09-01 05:29:06 -07:00
Feruzjon Muyassarov
e7af8d068f Update documentation about hooks depreciation
Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>
2022-09-01 10:58:35 +03:00
Feruzjon Muyassarov
a675fd93fd Don't advertise BASE_IMAGE_FULL and BASE_IMAGE_MINIMAL
Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>
2022-08-30 17:37:01 +03:00