1
0
Fork 0
mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2024-12-14 11:57:51 +00:00
Commit graph

55 commits

Author SHA1 Message Date
Markus Lehtonen
fc103a6028 Cleanup for NodeFeature API being GA
Drop references to the gRPC API and don't suggest that NodeFeatureAPI
could be disabled.

Also update the developer guide for instructions running nfd components
outside the cluster.
2024-12-13 15:40:46 +02:00
Markus Lehtonen
45f49d574a nfd-master: drop resourceLabels
Drop the resourceLabels config file option and the corresponding
-resource-labels command line flag. They were deprecated in NFD v0.13 so
it's time to let them go. NodeFeatureRule(s) should be used to manage
ERs, instead.
2024-11-07 15:16:52 +02:00
Kubernetes Prow Robot
fd2893e2a5
Merge pull request #1592 from AhmedThresh/feat-configure-cr-restrictions
feat/nfd-master: configure CR restrictions
2024-10-24 12:20:54 +01:00
AhmedGrati
28b40c90b8 deploy: add CR restrictions to the helm config
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
Signed-off-by: AhmedThresh <ahmed.grati@insat.ucar.tn>
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
Signed-off-by: AhmedThresh <ahmed.grati@insat.ucar.tn>
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
Signed-off-by: AhmedThresh <ahmed.grati@insat.ucar.tn>
2024-09-16 16:02:42 +02:00
Kubernetes Prow Robot
0fd0934de2
Merge pull request #1847 from marquiz/devel/drop-fsnotify
Drop dynamic run-time reconfiguration
2024-09-03 09:15:16 +01:00
Markus Lehtonen
94bda00c75 test/e2e: drop the pod security admission hack
The Kubernetes e2e framework now supports setting the pod security
level.
2024-08-26 16:10:49 +03:00
Markus Lehtonen
02b6b7395c Drop dynamic run-time reconfiguration
Simplify the code and reduce possible error scenarios by dropping
fsnotify-based reconfiguration from nfd-master and nfd-worker. Also
eliminates repeated re-configuration in scenarios where kubelet
continuosly touches the (every minute) mounted file (configmap) on the
filesystem.

Also modifies the Helm and kustomize deployments so that nfd-master,
nfd-worker and nfd-topology-updater pods are restarted on configmap
updates. In kustomize, the slght downside of this is the name of the
config map(s) depends on the content, so every time a user customizes
the config data, the old unused configmap will be left and must be
garbage-collected manually.
2024-08-21 12:46:36 +03:00
AhmedGrati
7bad0d583c feat/nfd-master: support CR restrictions
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2024-08-10 22:39:10 +02:00
Markus Lehtonen
25e827a4c8 feature-gates: mark NodeFeatureAPI as GA
The feature gate is locked to true. That is, it is not possible to revert
back to the gPRC-based communication which makes the gRPC API ready for
removal.
2024-07-16 13:53:31 +03:00
Carlos Eduardo Arango Gutierrez
47c054e1db
Add NodeFeatureGroup CRD
The NodeFeatureGroup is an NFD-specific custom resource that is designed for
grouping nodes based on their features. NFD-Master watches for NodeFeatureGroup
objects in the cluster and updates the status of the NodeFeatureGroup object
with the list of nodes that match the feature group rules. The NodeFeatureGroup
rules follow the same syntax as the NodeFeatureRule rules.

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-05-23 16:34:08 +02:00
Carlos Eduardo Arango Gutierrez
3434557d7c
Move NFD api to a separate go mod
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-04-05 16:35:47 +02:00
Markus Lehtonen
6f891ce1d2 Remove references to -enable-nodefeature-api flag
Fix documentation, code and e2e-tests.
2024-03-18 16:06:25 +02:00
Markus Lehtonen
53003cbf69 pkg/utils: move JsonPatch from pkg/apihelper 2024-01-25 17:23:14 +02:00
Markus Lehtonen
fe412a54b9 apis/nfd: add matchName field in feature matcher terms
Extend the format of feature matcher terms (the elements of the
arrayspecified under under matchFeatures field) with new matchName
field. The value of this field is an expression that is evaluated
against the names of feature elements instead of their values (values
are matched with the matchExpressions field, instead).

The matchName field is useful e.g. in template rules for creating
per-feature-element labels based on feature names (instead of values)
and in non-template rules for checking if (at least) one of certain
feature element names are present.

If both matchExpressions and matchName for certain feature matcher term
is specified, they both must match in order to get an overall match.
Also, in this case the list of matched features (used in templating) is
the union of the results from matchExpressions and matchName.

An example of creating an "avx512" label if any AVX512* CPUID feature is
present:

  - name: "avx wildcard rule"
    labels:
        avx512: "true"
    matchFeatures:
      - feature: cpu.cpuid
        matchName: {op: InRegexp, value: ["^AVX512"]}

An example of a template rule creating a dynamic set of labels  based on
the existence of certain kconfig options.

  - name: "kconfig template rule"
    labelsTemplate: |
      {{ range .kernel.config }}kconfig-{{ .Name }}={{ .Value }}
      {{ end }}
    matchFeatures:
      - feature: kernel.config
        matchName: {op: In, value: ["SWAP", "X86", "ARM"]}

NOTE: this patch changes the corner case of nil/null match expressions
with instance features (i.e. "matchExpressions: null"). Previously, we
returned all instances for templating but now a nil match expression is
not evaluated and no instances for templating are returned.
2023-12-15 11:32:23 +02:00
Kubernetes Prow Robot
794630f7df
Merge pull request #1489 from ArangoGutierrez/ginkofocus
Makefile: add env var controls to make test targets configurable
2023-12-08 18:24:20 +01:00
Carlos Eduardo Arango Gutierrez
f9195ef6a4
Makefile: add env var controls to make test targets configurable
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-12-08 17:48:00 +01:00
Markus Lehtonen
c5174a835d test/e2e: fix broken test
Missed one check of NodeFeature API is enablement.
2023-12-08 18:35:37 +02:00
Markus Lehtonen
c02e05245b test/e2e: test NodeFeature owner reference
Add tests for verifying that the automatic garbage-colletion of
NodeFeature objects created by nfd-worker works as expected.
2023-12-08 16:04:03 +02:00
Markus Lehtonen
18fada0cfb test/e2e: increase timeout for waiting node status
In some occasions the node status (capacity) takes a lot of time to
update. Increase the timeout on extended resource tests. Revert the default
timeout back to 10s.
2023-11-16 13:26:09 +02:00
Markus Lehtonen
7015dae352 test/e2e: cleanup feature annotations
Delete NFD-managed feature annotations at test setup and teardown
2023-10-27 15:17:54 +03:00
Kubernetes Prow Robot
6b90401950
Merge pull request #1440 from marquiz/devel/e2e-fix
test/e2e: fix broken feature-annotations test
2023-10-27 11:16:02 +02:00
Markus Lehtonen
f732342a2a test/e2e: improved test logging 2023-10-27 10:21:11 +03:00
Markus Lehtonen
0fa330f2d4 test/e2e: fix log messages
Fix some typos and improve log messages a bit.
2023-10-26 23:01:08 +03:00
Markus Lehtonen
0d766a0fde test/e2e: fix broken feature-annotations test 2023-10-26 22:56:14 +03:00
Carlos Eduardo Arango Gutierrez
c0063be4f4
Discover node features as annotations
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: bebc <mchf1990212@gmail.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
2023-10-25 19:58:58 +02:00
Markus Lehtonen
ddec2e27cb test/e2e: drop the unused ignoreUnexpected arg from custom matcher 2023-10-20 18:04:18 +03:00
Markus Lehtonen
28132fb274 test/e2e: stricter validation of node annotations
Now that the hard-to-predict version annotations are gone we can do
strict validation of nfd-generated node annotations.
2023-10-20 18:04:18 +03:00
Markus Lehtonen
a9849f20ff nfd-master: fix retry of node updates
This patch addresses issues with slow node status (extended resources)
updates. Previously we did just a few retries in quick succession which
could result in the node update failing, just because node status was
updated slower than our retry window. The patch mitigates the issue by
increasing the number of tries to 15. In addition, it creates a
ratelimiter with a longer per-item (per-node) base delay.

The patch also fixes the e2e-tests to expose the issue.
2023-10-20 17:24:01 +03:00
Markus Lehtonen
d7a91b818e test/e2e: fix source/custom nodename test
We dropped the legacy rule format so we need to convert the e2e test
rules to the new format, accordingly.
2023-10-20 12:12:45 +03:00
Markus Lehtonen
1d8a83b045 nfd-master: stop creating NFD version annotations
We now have metrics for getting detailed information about the NFD
instances running. There should be no need to pollute the node object
with NFD version annotations.

One problem with the annotations also that they were incomplete in the
sense that they only covered nfd-master and nfd-worker but not
nfd-topology-updater or nfd-gc.

Also, there was a problem with stale annotations, giving misleading
information. E.g. there was no way to remove old/stale master.version
annotations if nfd-master was scheduled on another node where it was
previously running.
2023-10-05 14:53:29 +03:00
Markus Lehtonen
b09ce75c8e nfd-master: fix filtering of extended resources
Fix a bug in checking the allowed ".feature.node.kubernetes.io" ns
suffix for extended resources. Also update e2e-tests to cover this case.
2023-09-27 10:55:11 +03:00
Carlos Eduardo Arango Gutierrez
04e954a7c3
Enable NodeFeature API by default
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
2023-09-05 20:21:31 +02:00
guoguangwu
29118f67bb fix: Drop the e2elog instead
Signed-off-by: guoguangwu <guoguangwu@magic-shield.com>
2023-06-25 09:44:08 +08:00
guoguangwu
92482e45d8 node_feature_discovery_test.go rm pkg imported twice
Signed-off-by: guoguangwu <guoguangwu@magic-shield.com>
2023-06-21 16:55:25 +08:00
AhmedGrati
08b9c3486e feat: support dynamic values for labels in the NodeFeatureRule
This PR aims to support the dynamic values for labels in the
NodeFeatureRule CRD, it would offer more flexible labeling for users.
To achieve this, we check whether label value starts with "@", and if
it's the case, we will get the value of the feature value, and update
the value of the label with the feature value.

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-05-31 23:30:26 +01:00
Muyassarov, Feruzjon
cfb8530083 e2e: delete CRs only if found
Delete NodeFeatureRule and NodeFeature CRs only if found.
Signed-off-by: Muyassarov, Feruzjon <feruzjon.muyassarov@intel.com>
2023-05-08 13:46:29 +03:00
Markus Lehtonen
2d9db2ccec test/e2e: rework taints matching
Add new MatchTaints matcher replacing the old waitForNfdNodeTaints
helper function. Also, drop the now-unused simplePoll() helper function.
2023-05-03 08:44:03 +03:00
Markus Lehtonen
f93ab9d423 test/e2e: rework node capacity matching
Add new MatchCapacity matcher replacing the old waitForCapacity helper
function.
2023-05-03 08:44:03 +03:00
Markus Lehtonen
a85e396200 test/e2e: rework annotations matcher
Add new MatchAnnotations Gomega matcher and drop the old
waitForNfdNodeAnnotations helper function.
2023-05-03 08:44:03 +03:00
Markus Lehtonen
2330896620 test/e2e: refactor matching of node properties
Implement a new generic type nodeListPropertyMatcher, a generic Gomega
matcher for matching basically any property of a set of node objects. We
will be using it for verifying labels, annotations, extended resources
and taints for now. This moves the tests in a more Gomega'ish direction,
leveraging code re-use and providing way more informative error messages
in case of test failures.

The patch adds a new eventuallyNonControlPlaneNodes helper assertion for
asserting all (non-control-plane) nodes in the cluster, intended to
replace the ugly simplePoll() helper function.

This patch implements a matcher for node labels and converts tests to
use it instead of the old checkForNodeLabels helper function.
2023-05-03 08:44:03 +03:00
AhmedGrati
87c2d7e184 nfd-master: fix resync period config option
This PR fixes the resync-period configuration option of the nfd-master.
In fact, previously, changes were not reflected in the nfd-master at
runtime. e2e tests are also implemented to make sure that the fix is
already working as expected.

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-05-02 13:17:01 +02:00
Markus Lehtonen
87371e2df0 test/e2e: adapt tests to updates in k8s e2e-framework
Add context to functions that now require it. Also, replace the
deprecated wait.Poll* calls with wait.PollUntilContextTimeout.
2023-04-18 23:04:34 +03:00
Markus Lehtonen
ad8bd057b7 test/e2e: use proper context
Eliminate all context.TODO() from the e2e tests and use ginkgo context
instead. This ensures that calls involving context are properly
cancelled and return fast in case the tests get aborted.
2023-04-18 14:55:09 +03:00
Fabiano Fidêncio
250aea4741
Create extended resources with NodeFeatureRule
Add support for management of Extended Resources via the
NodeFeatureRule CRD API.

There are usage scenarios where users want to advertise features
as extended resources instead of labels (or annotations).

This patch enables the discovery of extended resources, via annotation
and patch of node.status.capacity and node.status.allocatable. By using
the NodeFeatureRule API.

Co-authored-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
Co-authored-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-04-07 16:14:56 +02:00
Markus Lehtonen
cc6c20ff5f nfd-master: disallow unprefixed and kubernetes taints
Disallow taints having a key with "kubernetes.io/" or "*.kubernetes.io/"
prefix. This is a precaution to protect the user from messing up with
the "official" well-known taints from Kubernetes itself. The only
exception is that the "nfd.node.kubernetes.io/" prefix is allowed.

However, there is one allowed NFD-specific namespace (and its
sub-namespaces) i.e. "feature.node.kubernetes.io" under the
kubernetes.io domain that can be used for NFD-managed taints.

Also disallow unprefixed taint keys. We don't add a default prefix to
unprefixed taints (like we do for labels) from NodeFeatureRules. This is
to prevent unpleasant surprises to users that need to manage matching
tolerations for their workloads.
2023-04-06 16:12:37 +03:00
Markus Lehtonen
2e85b8a914 test/e2e: refactor nfd pod configuration
Make the default master pod run with no special options. Move the
customizations of the master pod to the setup functions of the tests
that actually need it.

Also, cleanup the configuration of nfd-worker of some tests.
2023-04-05 21:51:25 +03:00
Markus Lehtonen
68c3bf317b test/e2e: fix node cleanup function
The node cleanup function was not removing all NFD-labels. It omitted
NFD-originated labels that used a non-default label namespace. This
patch fixes the issue by getting all NFD-managed labels from the special
annotation (nfd.node.kubernetes.io/feature-labels).

The patch also adds the ability to cleanup extended resources in a
similar way. This will be needed by future work.

Also changes the order of cleaning up CRs and the node. It is the right
order as cleaning up the CRs may still update the node.
2023-04-05 15:09:25 +03:00
AhmedGrati
3fff409f6d Add master config file
Similar to the nfd-worker, in this PR we want to support the
dynamic run-time configurability through a config file for the nfd-master.

We'll use a json or yaml configuration file along with the fsnotify in
order to watch for changes in the config file. As a result, we're
allowing dynamic control of logging params, allowed namespaces,
extended resources, label whitelisting, and denied namespaces.

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-04-03 09:52:09 +01:00
AhmedGrati
16abfd7b0e test: implement e2e test of the deny-label-ns flag
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-03-10 11:11:36 +01:00
Kubernetes Prow Robot
2b865759fd
Merge pull request #1073 from marquiz/devel/e2e-worker-wait
test/e2e: reduce worker wait-for-ready period to 2s
2023-03-07 04:18:18 -08:00