1
0
Fork 0
mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2024-12-14 11:57:51 +00:00
Commit graph

155 commits

Author SHA1 Message Date
Markus Lehtonen
53003cbf69 pkg/utils: move JsonPatch from pkg/apihelper 2024-01-25 17:23:14 +02:00
Markus Lehtonen
cd18fe8970 source/cpu: drop deprecated cpu-rdt labels
Drop RDT labels that were deprecated in NFD v0.13. The RDT features
remain available for NodeFeatureRules to serve custom labeling.
2023-12-22 17:29:00 +02:00
Markus Lehtonen
ea68017af6 test/e2e: replace k8s.io/utils/pointer package
Use "k8s.io/utils/ptr" instead of the deprecated "k8s.io/utils/pointer".
2023-12-20 15:12:11 +02:00
Markus Lehtonen
fe412a54b9 apis/nfd: add matchName field in feature matcher terms
Extend the format of feature matcher terms (the elements of the
arrayspecified under under matchFeatures field) with new matchName
field. The value of this field is an expression that is evaluated
against the names of feature elements instead of their values (values
are matched with the matchExpressions field, instead).

The matchName field is useful e.g. in template rules for creating
per-feature-element labels based on feature names (instead of values)
and in non-template rules for checking if (at least) one of certain
feature element names are present.

If both matchExpressions and matchName for certain feature matcher term
is specified, they both must match in order to get an overall match.
Also, in this case the list of matched features (used in templating) is
the union of the results from matchExpressions and matchName.

An example of creating an "avx512" label if any AVX512* CPUID feature is
present:

  - name: "avx wildcard rule"
    labels:
        avx512: "true"
    matchFeatures:
      - feature: cpu.cpuid
        matchName: {op: InRegexp, value: ["^AVX512"]}

An example of a template rule creating a dynamic set of labels  based on
the existence of certain kconfig options.

  - name: "kconfig template rule"
    labelsTemplate: |
      {{ range .kernel.config }}kconfig-{{ .Name }}={{ .Value }}
      {{ end }}
    matchFeatures:
      - feature: kernel.config
        matchName: {op: In, value: ["SWAP", "X86", "ARM"]}

NOTE: this patch changes the corner case of nil/null match expressions
with instance features (i.e. "matchExpressions: null"). Previously, we
returned all instances for templating but now a nil match expression is
not evaluated and no instances for templating are returned.
2023-12-15 11:32:23 +02:00
Markus Lehtonen
8e477cdfa4 Use non-exp maps package
The maps package became available as a standard non-experimental package
in Go 1.21.
2023-12-12 17:31:25 +02:00
Kubernetes Prow Robot
794630f7df
Merge pull request #1489 from ArangoGutierrez/ginkofocus
Makefile: add env var controls to make test targets configurable
2023-12-08 18:24:20 +01:00
Carlos Eduardo Arango Gutierrez
f9195ef6a4
Makefile: add env var controls to make test targets configurable
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-12-08 17:48:00 +01:00
Markus Lehtonen
c5174a835d test/e2e: fix broken test
Missed one check of NodeFeature API is enablement.
2023-12-08 18:35:37 +02:00
Markus Lehtonen
c02e05245b test/e2e: test NodeFeature owner reference
Add tests for verifying that the automatic garbage-colletion of
NodeFeature objects created by nfd-worker works as expected.
2023-12-08 16:04:03 +02:00
Markus Lehtonen
dc5af8be04 nfd-master: predictable handling of unprefixed names
Make the handling of unprefixed names (of labels, annotations and
extended resources) well-defined and predictable. Previously the
resulting output was not predictable in case the same name was coming in
both the unprefixed and prefixed form, say unprefixed "foo=bar" coming from
one source (be it nfd-worker or NodeFeature(Rule)) and
"feature.node.kubernetes.io/foo=baz" from a NodeFeature(Rule).
Previously the output value was randomly either "bar" or "baz".

This patch adds prefixes to all names early in the processing
"pipeline", preventing random name clashes later on.
2023-11-23 22:16:04 +02:00
Markus Lehtonen
18fada0cfb test/e2e: increase timeout for waiting node status
In some occasions the node status (capacity) takes a lot of time to
update. Increase the timeout on extended resource tests. Revert the default
timeout back to 10s.
2023-11-16 13:26:09 +02:00
Markus Lehtonen
7015dae352 test/e2e: cleanup feature annotations
Delete NFD-managed feature annotations at test setup and teardown
2023-10-27 15:17:54 +03:00
Kubernetes Prow Robot
6b90401950
Merge pull request #1440 from marquiz/devel/e2e-fix
test/e2e: fix broken feature-annotations test
2023-10-27 11:16:02 +02:00
Markus Lehtonen
f732342a2a test/e2e: improved test logging 2023-10-27 10:21:11 +03:00
Markus Lehtonen
0fa330f2d4 test/e2e: fix log messages
Fix some typos and improve log messages a bit.
2023-10-26 23:01:08 +03:00
Markus Lehtonen
0d766a0fde test/e2e: fix broken feature-annotations test 2023-10-26 22:56:14 +03:00
Carlos Eduardo Arango Gutierrez
c0063be4f4
Discover node features as annotations
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: bebc <mchf1990212@gmail.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
2023-10-25 19:58:58 +02:00
Markus Lehtonen
ddec2e27cb test/e2e: drop the unused ignoreUnexpected arg from custom matcher 2023-10-20 18:04:18 +03:00
Markus Lehtonen
28132fb274 test/e2e: stricter validation of node annotations
Now that the hard-to-predict version annotations are gone we can do
strict validation of nfd-generated node annotations.
2023-10-20 18:04:18 +03:00
Markus Lehtonen
a9849f20ff nfd-master: fix retry of node updates
This patch addresses issues with slow node status (extended resources)
updates. Previously we did just a few retries in quick succession which
could result in the node update failing, just because node status was
updated slower than our retry window. The patch mitigates the issue by
increasing the number of tries to 15. In addition, it creates a
ratelimiter with a longer per-item (per-node) base delay.

The patch also fixes the e2e-tests to expose the issue.
2023-10-20 17:24:01 +03:00
Kubernetes Prow Robot
b6231b60fc
Merge pull request #1418 from ArangoGutierrez/test-utils-deplo
Fix pkg name for test/utils/deployment
2023-10-20 13:44:32 +02:00
Markus Lehtonen
d7a91b818e test/e2e: fix source/custom nodename test
We dropped the legacy rule format so we need to convert the e2e test
rules to the new format, accordingly.
2023-10-20 12:12:45 +03:00
Carlos Eduardo Arango Gutierrez
251f0d8a7e
Fix pkg name for test/utils/deployment
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-10-16 16:11:20 +02:00
Markus Lehtonen
1d8a83b045 nfd-master: stop creating NFD version annotations
We now have metrics for getting detailed information about the NFD
instances running. There should be no need to pollute the node object
with NFD version annotations.

One problem with the annotations also that they were incomplete in the
sense that they only covered nfd-master and nfd-worker but not
nfd-topology-updater or nfd-gc.

Also, there was a problem with stale annotations, giving misleading
information. E.g. there was no way to remove old/stale master.version
annotations if nfd-master was scheduled on another node where it was
previously running.
2023-10-05 14:53:29 +03:00
Markus Lehtonen
b09ce75c8e nfd-master: fix filtering of extended resources
Fix a bug in checking the allowed ".feature.node.kubernetes.io" ns
suffix for extended resources. Also update e2e-tests to cover this case.
2023-09-27 10:55:11 +03:00
Carlos Eduardo Arango Gutierrez
30b8751515
nfd_gc_test.go: fix multiple import of same pkg
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-09-06 09:47:15 +02:00
Kubernetes Prow Robot
50dd128b23
Merge pull request #1329 from ArangoGutierrez/1187
Enable NodeFeature API by default
2023-09-05 11:56:51 -07:00
Carlos Eduardo Arango Gutierrez
04e954a7c3
Enable NodeFeature API by default
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
2023-09-05 20:21:31 +02:00
Markus Lehtonen
f8162a0106 e2e/test: make the nfd-gc test pass on one-node cluster
Also remove some leftover debug print.
2023-09-05 14:16:50 +03:00
Francesco Romani
000c919071 nfd-updater: events: enable timer-only flow
The nfd-topology-updater has state-directories notification mechanism
enabled by default.
In theory, we can have only timer-based updates, but if the option
is given to disable the state-directories event source, then all
the update mechanism is mistakenly disabled, including the
timer-based updates.

The two updaters mechanism should be decoupled.
So this PR changes this to make sure we can enable just and only
the timer-based updates.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-09-04 13:05:50 +02:00
Markus Lehtonen
f9fadd2102 test/e2e: add e2e test for nfd-gc 2023-08-22 21:24:26 +03:00
Markus Lehtonen
2e79a015f5 test/e2e: align with latest kubernetes code base 2023-08-16 12:43:52 +03:00
guoguangwu
29118f67bb fix: Drop the e2elog instead
Signed-off-by: guoguangwu <guoguangwu@magic-shield.com>
2023-06-25 09:44:08 +08:00
guoguangwu
92482e45d8 node_feature_discovery_test.go rm pkg imported twice
Signed-off-by: guoguangwu <guoguangwu@magic-shield.com>
2023-06-21 16:55:25 +08:00
AhmedGrati
08b9c3486e feat: support dynamic values for labels in the NodeFeatureRule
This PR aims to support the dynamic values for labels in the
NodeFeatureRule CRD, it would offer more flexible labeling for users.
To achieve this, we check whether label value starts with "@", and if
it's the case, we will get the value of the feature value, and update
the value of the label with the feature value.

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-05-31 23:30:26 +01:00
Muyassarov, Feruzjon
cfb8530083 e2e: delete CRs only if found
Delete NodeFeatureRule and NodeFeature CRs only if found.
Signed-off-by: Muyassarov, Feruzjon <feruzjon.muyassarov@intel.com>
2023-05-08 13:46:29 +03:00
Markus Lehtonen
2d9db2ccec test/e2e: rework taints matching
Add new MatchTaints matcher replacing the old waitForNfdNodeTaints
helper function. Also, drop the now-unused simplePoll() helper function.
2023-05-03 08:44:03 +03:00
Markus Lehtonen
f93ab9d423 test/e2e: rework node capacity matching
Add new MatchCapacity matcher replacing the old waitForCapacity helper
function.
2023-05-03 08:44:03 +03:00
Markus Lehtonen
a85e396200 test/e2e: rework annotations matcher
Add new MatchAnnotations Gomega matcher and drop the old
waitForNfdNodeAnnotations helper function.
2023-05-03 08:44:03 +03:00
Markus Lehtonen
2330896620 test/e2e: refactor matching of node properties
Implement a new generic type nodeListPropertyMatcher, a generic Gomega
matcher for matching basically any property of a set of node objects. We
will be using it for verifying labels, annotations, extended resources
and taints for now. This moves the tests in a more Gomega'ish direction,
leveraging code re-use and providing way more informative error messages
in case of test failures.

The patch adds a new eventuallyNonControlPlaneNodes helper assertion for
asserting all (non-control-plane) nodes in the cluster, intended to
replace the ugly simplePoll() helper function.

This patch implements a matcher for node labels and converts tests to
use it instead of the old checkForNodeLabels helper function.
2023-05-03 08:44:03 +03:00
AhmedGrati
87c2d7e184 nfd-master: fix resync period config option
This PR fixes the resync-period configuration option of the nfd-master.
In fact, previously, changes were not reflected in the nfd-master at
runtime. e2e tests are also implemented to make sure that the fix is
already working as expected.

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-05-02 13:17:01 +02:00
Markus Lehtonen
87371e2df0 test/e2e: adapt tests to updates in k8s e2e-framework
Add context to functions that now require it. Also, replace the
deprecated wait.Poll* calls with wait.PollUntilContextTimeout.
2023-04-18 23:04:34 +03:00
Markus Lehtonen
ad8bd057b7 test/e2e: use proper context
Eliminate all context.TODO() from the e2e tests and use ginkgo context
instead. This ensures that calls involving context are properly
cancelled and return fast in case the tests get aborted.
2023-04-18 14:55:09 +03:00
Fabiano Fidêncio
250aea4741
Create extended resources with NodeFeatureRule
Add support for management of Extended Resources via the
NodeFeatureRule CRD API.

There are usage scenarios where users want to advertise features
as extended resources instead of labels (or annotations).

This patch enables the discovery of extended resources, via annotation
and patch of node.status.capacity and node.status.allocatable. By using
the NodeFeatureRule API.

Co-authored-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
Co-authored-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-04-07 16:14:56 +02:00
Markus Lehtonen
cc6c20ff5f nfd-master: disallow unprefixed and kubernetes taints
Disallow taints having a key with "kubernetes.io/" or "*.kubernetes.io/"
prefix. This is a precaution to protect the user from messing up with
the "official" well-known taints from Kubernetes itself. The only
exception is that the "nfd.node.kubernetes.io/" prefix is allowed.

However, there is one allowed NFD-specific namespace (and its
sub-namespaces) i.e. "feature.node.kubernetes.io" under the
kubernetes.io domain that can be used for NFD-managed taints.

Also disallow unprefixed taint keys. We don't add a default prefix to
unprefixed taints (like we do for labels) from NodeFeatureRules. This is
to prevent unpleasant surprises to users that need to manage matching
tolerations for their workloads.
2023-04-06 16:12:37 +03:00
Markus Lehtonen
2e85b8a914 test/e2e: refactor nfd pod configuration
Make the default master pod run with no special options. Move the
customizations of the master pod to the setup functions of the tests
that actually need it.

Also, cleanup the configuration of nfd-worker of some tests.
2023-04-05 21:51:25 +03:00
Kubernetes Prow Robot
60f052f086
Merge pull request #1116 from marquiz/devel/e2e-crd-deletion
test/e2e: wait for CRD deletion to complete
2023-04-05 07:31:40 -07:00
Markus Lehtonen
5793207cf2 test/e2e: wait for CRD deletion to complete
Wait for the deletion of NFD CRDs to complete before trying to re-create
them. Prevents errors in case CRDs already exist on the cluster when
e2e-tests are launched.
2023-04-05 15:56:26 +03:00
Markus Lehtonen
68c3bf317b test/e2e: fix node cleanup function
The node cleanup function was not removing all NFD-labels. It omitted
NFD-originated labels that used a non-default label namespace. This
patch fixes the issue by getting all NFD-managed labels from the special
annotation (nfd.node.kubernetes.io/feature-labels).

The patch also adds the ability to cleanup extended resources in a
similar way. This will be needed by future work.

Also changes the order of cleaning up CRs and the node. It is the right
order as cleaning up the CRs may still update the node.
2023-04-05 15:09:25 +03:00
Kubernetes Prow Robot
193c552b33
Merge pull request #1084 from AhmedGrati/feat-add-master-config-file
feat: add master config file
2023-04-04 10:41:40 -07:00