1
0
Fork 0
mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2024-12-14 11:57:51 +00:00
Commit graph

167 commits

Author SHA1 Message Date
Markus Lehtonen
02b6b7395c Drop dynamic run-time reconfiguration
Simplify the code and reduce possible error scenarios by dropping
fsnotify-based reconfiguration from nfd-master and nfd-worker. Also
eliminates repeated re-configuration in scenarios where kubelet
continuosly touches the (every minute) mounted file (configmap) on the
filesystem.

Also modifies the Helm and kustomize deployments so that nfd-master,
nfd-worker and nfd-topology-updater pods are restarted on configmap
updates. In kustomize, the slght downside of this is the name of the
config map(s) depends on the content, so every time a user customizes
the config data, the old unused configmap will be left and must be
garbage-collected manually.
2024-08-21 12:46:36 +03:00
Markus Lehtonen
3e1c43dc6f test/e2e: simplify TestMain
Drop unneeded bits.
2024-08-12 14:17:27 +03:00
Markus Lehtonen
25e827a4c8 feature-gates: mark NodeFeatureAPI as GA
The feature gate is locked to true. That is, it is not possible to revert
back to the gPRC-based communication which makes the gRPC API ready for
removal.
2024-07-16 13:53:31 +03:00
Markus Lehtonen
5aeea28957 test/e2e: specify -sleep-interval in topology-updater exclude-memory test
Make the test finish considerably faster.
2024-07-16 12:47:08 +03:00
Markus Lehtonen
5a81f748bf test/e2e: set topology-updater sleep-interval in podfingerprint test
Run topology-updater with short sleep-interval to try to eliminate
flakiness in CI.
2024-07-16 10:22:46 +03:00
Carlos Eduardo Arango Gutierrez
e33e68ad5b
Add optionable arguments to NewWorker
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-07-09 15:08:26 +02:00
Carlos Eduardo Arango Gutierrez
47c054e1db
Add NodeFeatureGroup CRD
The NodeFeatureGroup is an NFD-specific custom resource that is designed for
grouping nodes based on their features. NFD-Master watches for NodeFeatureGroup
objects in the cluster and updates the status of the NodeFeatureGroup object
with the list of nodes that match the feature group rules. The NodeFeatureGroup
rules follow the same syntax as the NodeFeatureRule rules.

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-05-23 16:34:08 +02:00
Markus Lehtonen
d2652cebb7 test/e2e: stop importing kubernetes test/e2e
Don't import the kubernetes tests/e2e "root" package (we still use the
test/e2e/framework). We only used the simple e2e runner function from
there so copy that over to the nfd test/e2e package. This change removes
a lot of dependencies speeding up builds.
2024-04-26 09:34:34 +03:00
Carlos Eduardo Arango Gutierrez
3434557d7c
Move NFD api to a separate go mod
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-04-05 16:35:47 +02:00
Kubernetes Prow Robot
7df0f17f68
Merge pull request #1602 from ozhuraki/nrt-owner-ref
Add owner reference to NRT object
2024-03-19 01:12:59 -07:00
Markus Lehtonen
6f891ce1d2 Remove references to -enable-nodefeature-api flag
Fix documentation, code and e2e-tests.
2024-03-18 16:06:25 +02:00
Oleg Zhurakivskyy
c662265a47 topology-updater: Add owner reference to NRT object
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2024-03-15 16:36:27 +02:00
Markus Lehtonen
53003cbf69 pkg/utils: move JsonPatch from pkg/apihelper 2024-01-25 17:23:14 +02:00
Markus Lehtonen
cd18fe8970 source/cpu: drop deprecated cpu-rdt labels
Drop RDT labels that were deprecated in NFD v0.13. The RDT features
remain available for NodeFeatureRules to serve custom labeling.
2023-12-22 17:29:00 +02:00
Markus Lehtonen
ea68017af6 test/e2e: replace k8s.io/utils/pointer package
Use "k8s.io/utils/ptr" instead of the deprecated "k8s.io/utils/pointer".
2023-12-20 15:12:11 +02:00
Markus Lehtonen
fe412a54b9 apis/nfd: add matchName field in feature matcher terms
Extend the format of feature matcher terms (the elements of the
arrayspecified under under matchFeatures field) with new matchName
field. The value of this field is an expression that is evaluated
against the names of feature elements instead of their values (values
are matched with the matchExpressions field, instead).

The matchName field is useful e.g. in template rules for creating
per-feature-element labels based on feature names (instead of values)
and in non-template rules for checking if (at least) one of certain
feature element names are present.

If both matchExpressions and matchName for certain feature matcher term
is specified, they both must match in order to get an overall match.
Also, in this case the list of matched features (used in templating) is
the union of the results from matchExpressions and matchName.

An example of creating an "avx512" label if any AVX512* CPUID feature is
present:

  - name: "avx wildcard rule"
    labels:
        avx512: "true"
    matchFeatures:
      - feature: cpu.cpuid
        matchName: {op: InRegexp, value: ["^AVX512"]}

An example of a template rule creating a dynamic set of labels  based on
the existence of certain kconfig options.

  - name: "kconfig template rule"
    labelsTemplate: |
      {{ range .kernel.config }}kconfig-{{ .Name }}={{ .Value }}
      {{ end }}
    matchFeatures:
      - feature: kernel.config
        matchName: {op: In, value: ["SWAP", "X86", "ARM"]}

NOTE: this patch changes the corner case of nil/null match expressions
with instance features (i.e. "matchExpressions: null"). Previously, we
returned all instances for templating but now a nil match expression is
not evaluated and no instances for templating are returned.
2023-12-15 11:32:23 +02:00
Markus Lehtonen
8e477cdfa4 Use non-exp maps package
The maps package became available as a standard non-experimental package
in Go 1.21.
2023-12-12 17:31:25 +02:00
Kubernetes Prow Robot
794630f7df
Merge pull request #1489 from ArangoGutierrez/ginkofocus
Makefile: add env var controls to make test targets configurable
2023-12-08 18:24:20 +01:00
Carlos Eduardo Arango Gutierrez
f9195ef6a4
Makefile: add env var controls to make test targets configurable
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-12-08 17:48:00 +01:00
Markus Lehtonen
c5174a835d test/e2e: fix broken test
Missed one check of NodeFeature API is enablement.
2023-12-08 18:35:37 +02:00
Markus Lehtonen
c02e05245b test/e2e: test NodeFeature owner reference
Add tests for verifying that the automatic garbage-colletion of
NodeFeature objects created by nfd-worker works as expected.
2023-12-08 16:04:03 +02:00
Markus Lehtonen
dc5af8be04 nfd-master: predictable handling of unprefixed names
Make the handling of unprefixed names (of labels, annotations and
extended resources) well-defined and predictable. Previously the
resulting output was not predictable in case the same name was coming in
both the unprefixed and prefixed form, say unprefixed "foo=bar" coming from
one source (be it nfd-worker or NodeFeature(Rule)) and
"feature.node.kubernetes.io/foo=baz" from a NodeFeature(Rule).
Previously the output value was randomly either "bar" or "baz".

This patch adds prefixes to all names early in the processing
"pipeline", preventing random name clashes later on.
2023-11-23 22:16:04 +02:00
Markus Lehtonen
18fada0cfb test/e2e: increase timeout for waiting node status
In some occasions the node status (capacity) takes a lot of time to
update. Increase the timeout on extended resource tests. Revert the default
timeout back to 10s.
2023-11-16 13:26:09 +02:00
Markus Lehtonen
7015dae352 test/e2e: cleanup feature annotations
Delete NFD-managed feature annotations at test setup and teardown
2023-10-27 15:17:54 +03:00
Kubernetes Prow Robot
6b90401950
Merge pull request #1440 from marquiz/devel/e2e-fix
test/e2e: fix broken feature-annotations test
2023-10-27 11:16:02 +02:00
Markus Lehtonen
f732342a2a test/e2e: improved test logging 2023-10-27 10:21:11 +03:00
Markus Lehtonen
0fa330f2d4 test/e2e: fix log messages
Fix some typos and improve log messages a bit.
2023-10-26 23:01:08 +03:00
Markus Lehtonen
0d766a0fde test/e2e: fix broken feature-annotations test 2023-10-26 22:56:14 +03:00
Carlos Eduardo Arango Gutierrez
c0063be4f4
Discover node features as annotations
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: bebc <mchf1990212@gmail.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
2023-10-25 19:58:58 +02:00
Markus Lehtonen
ddec2e27cb test/e2e: drop the unused ignoreUnexpected arg from custom matcher 2023-10-20 18:04:18 +03:00
Markus Lehtonen
28132fb274 test/e2e: stricter validation of node annotations
Now that the hard-to-predict version annotations are gone we can do
strict validation of nfd-generated node annotations.
2023-10-20 18:04:18 +03:00
Markus Lehtonen
a9849f20ff nfd-master: fix retry of node updates
This patch addresses issues with slow node status (extended resources)
updates. Previously we did just a few retries in quick succession which
could result in the node update failing, just because node status was
updated slower than our retry window. The patch mitigates the issue by
increasing the number of tries to 15. In addition, it creates a
ratelimiter with a longer per-item (per-node) base delay.

The patch also fixes the e2e-tests to expose the issue.
2023-10-20 17:24:01 +03:00
Kubernetes Prow Robot
b6231b60fc
Merge pull request #1418 from ArangoGutierrez/test-utils-deplo
Fix pkg name for test/utils/deployment
2023-10-20 13:44:32 +02:00
Markus Lehtonen
d7a91b818e test/e2e: fix source/custom nodename test
We dropped the legacy rule format so we need to convert the e2e test
rules to the new format, accordingly.
2023-10-20 12:12:45 +03:00
Carlos Eduardo Arango Gutierrez
251f0d8a7e
Fix pkg name for test/utils/deployment
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-10-16 16:11:20 +02:00
Markus Lehtonen
1d8a83b045 nfd-master: stop creating NFD version annotations
We now have metrics for getting detailed information about the NFD
instances running. There should be no need to pollute the node object
with NFD version annotations.

One problem with the annotations also that they were incomplete in the
sense that they only covered nfd-master and nfd-worker but not
nfd-topology-updater or nfd-gc.

Also, there was a problem with stale annotations, giving misleading
information. E.g. there was no way to remove old/stale master.version
annotations if nfd-master was scheduled on another node where it was
previously running.
2023-10-05 14:53:29 +03:00
Markus Lehtonen
b09ce75c8e nfd-master: fix filtering of extended resources
Fix a bug in checking the allowed ".feature.node.kubernetes.io" ns
suffix for extended resources. Also update e2e-tests to cover this case.
2023-09-27 10:55:11 +03:00
Carlos Eduardo Arango Gutierrez
30b8751515
nfd_gc_test.go: fix multiple import of same pkg
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-09-06 09:47:15 +02:00
Kubernetes Prow Robot
50dd128b23
Merge pull request #1329 from ArangoGutierrez/1187
Enable NodeFeature API by default
2023-09-05 11:56:51 -07:00
Carlos Eduardo Arango Gutierrez
04e954a7c3
Enable NodeFeature API by default
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
2023-09-05 20:21:31 +02:00
Markus Lehtonen
f8162a0106 e2e/test: make the nfd-gc test pass on one-node cluster
Also remove some leftover debug print.
2023-09-05 14:16:50 +03:00
Francesco Romani
000c919071 nfd-updater: events: enable timer-only flow
The nfd-topology-updater has state-directories notification mechanism
enabled by default.
In theory, we can have only timer-based updates, but if the option
is given to disable the state-directories event source, then all
the update mechanism is mistakenly disabled, including the
timer-based updates.

The two updaters mechanism should be decoupled.
So this PR changes this to make sure we can enable just and only
the timer-based updates.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-09-04 13:05:50 +02:00
Markus Lehtonen
f9fadd2102 test/e2e: add e2e test for nfd-gc 2023-08-22 21:24:26 +03:00
Markus Lehtonen
2e79a015f5 test/e2e: align with latest kubernetes code base 2023-08-16 12:43:52 +03:00
guoguangwu
29118f67bb fix: Drop the e2elog instead
Signed-off-by: guoguangwu <guoguangwu@magic-shield.com>
2023-06-25 09:44:08 +08:00
guoguangwu
92482e45d8 node_feature_discovery_test.go rm pkg imported twice
Signed-off-by: guoguangwu <guoguangwu@magic-shield.com>
2023-06-21 16:55:25 +08:00
AhmedGrati
08b9c3486e feat: support dynamic values for labels in the NodeFeatureRule
This PR aims to support the dynamic values for labels in the
NodeFeatureRule CRD, it would offer more flexible labeling for users.
To achieve this, we check whether label value starts with "@", and if
it's the case, we will get the value of the feature value, and update
the value of the label with the feature value.

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-05-31 23:30:26 +01:00
Muyassarov, Feruzjon
cfb8530083 e2e: delete CRs only if found
Delete NodeFeatureRule and NodeFeature CRs only if found.
Signed-off-by: Muyassarov, Feruzjon <feruzjon.muyassarov@intel.com>
2023-05-08 13:46:29 +03:00
Markus Lehtonen
2d9db2ccec test/e2e: rework taints matching
Add new MatchTaints matcher replacing the old waitForNfdNodeTaints
helper function. Also, drop the now-unused simplePoll() helper function.
2023-05-03 08:44:03 +03:00
Markus Lehtonen
f93ab9d423 test/e2e: rework node capacity matching
Add new MatchCapacity matcher replacing the old waitForCapacity helper
function.
2023-05-03 08:44:03 +03:00