1
0
Fork 0
mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2024-12-14 11:57:51 +00:00
Commit graph

176 commits

Author SHA1 Message Date
Markus Lehtonen
fc103a6028 Cleanup for NodeFeature API being GA
Drop references to the gRPC API and don't suggest that NodeFeatureAPI
could be disabled.

Also update the developer guide for instructions running nfd components
outside the cluster.
2024-12-13 15:40:46 +02:00
Markus Lehtonen
45f49d574a nfd-master: drop resourceLabels
Drop the resourceLabels config file option and the corresponding
-resource-labels command line flag. They were deprecated in NFD v0.13 so
it's time to let them go. NodeFeatureRule(s) should be used to manage
ERs, instead.
2024-11-07 15:16:52 +02:00
Kubernetes Prow Robot
b997ade5b3
Merge pull request #1942 from marquiz/devel/drop-grpc
nfd-master: drop stale unreachable deprecation notices
2024-11-04 11:16:31 +01:00
Carlos Eduardo Arango Gutierrez
0bd82cf82a
Drop NFD gRPC API
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-10-29 15:15:18 +01:00
Kubernetes Prow Robot
fd2893e2a5
Merge pull request #1592 from AhmedThresh/feat-configure-cr-restrictions
feat/nfd-master: configure CR restrictions
2024-10-24 12:20:54 +01:00
Kubernetes Prow Robot
4dbe9c8951
Merge pull request #1836 from marquiz/devel/e2e-ptr
test/e2e: use ptr.To to get pointer to bool
2024-09-27 18:54:01 +01:00
AhmedGrati
28b40c90b8 deploy: add CR restrictions to the helm config
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
Signed-off-by: AhmedThresh <ahmed.grati@insat.ucar.tn>
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
Signed-off-by: AhmedThresh <ahmed.grati@insat.ucar.tn>
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
Signed-off-by: AhmedThresh <ahmed.grati@insat.ucar.tn>
2024-09-16 16:02:42 +02:00
Kubernetes Prow Robot
0fd0934de2
Merge pull request #1847 from marquiz/devel/drop-fsnotify
Drop dynamic run-time reconfiguration
2024-09-03 09:15:16 +01:00
Markus Lehtonen
91d6793787 test/e2e: simplify, remove unnecessary check
Make golangci-lint happy.
2024-08-30 16:42:45 +03:00
Markus Lehtonen
94bda00c75 test/e2e: drop the pod security admission hack
The Kubernetes e2e framework now supports setting the pod security
level.
2024-08-26 16:10:49 +03:00
Markus Lehtonen
02b6b7395c Drop dynamic run-time reconfiguration
Simplify the code and reduce possible error scenarios by dropping
fsnotify-based reconfiguration from nfd-master and nfd-worker. Also
eliminates repeated re-configuration in scenarios where kubelet
continuosly touches the (every minute) mounted file (configmap) on the
filesystem.

Also modifies the Helm and kustomize deployments so that nfd-master,
nfd-worker and nfd-topology-updater pods are restarted on configmap
updates. In kustomize, the slght downside of this is the name of the
config map(s) depends on the content, so every time a user customizes
the config data, the old unused configmap will be left and must be
garbage-collected manually.
2024-08-21 12:46:36 +03:00
Markus Lehtonen
8a6853f138 test/e2e: use ptr.To to get pointer to bool 2024-08-14 13:16:03 +03:00
Markus Lehtonen
3e1c43dc6f test/e2e: simplify TestMain
Drop unneeded bits.
2024-08-12 14:17:27 +03:00
AhmedGrati
7bad0d583c feat/nfd-master: support CR restrictions
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2024-08-10 22:39:10 +02:00
Markus Lehtonen
25e827a4c8 feature-gates: mark NodeFeatureAPI as GA
The feature gate is locked to true. That is, it is not possible to revert
back to the gPRC-based communication which makes the gRPC API ready for
removal.
2024-07-16 13:53:31 +03:00
Markus Lehtonen
5aeea28957 test/e2e: specify -sleep-interval in topology-updater exclude-memory test
Make the test finish considerably faster.
2024-07-16 12:47:08 +03:00
Markus Lehtonen
5a81f748bf test/e2e: set topology-updater sleep-interval in podfingerprint test
Run topology-updater with short sleep-interval to try to eliminate
flakiness in CI.
2024-07-16 10:22:46 +03:00
Carlos Eduardo Arango Gutierrez
e33e68ad5b
Add optionable arguments to NewWorker
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-07-09 15:08:26 +02:00
Carlos Eduardo Arango Gutierrez
47c054e1db
Add NodeFeatureGroup CRD
The NodeFeatureGroup is an NFD-specific custom resource that is designed for
grouping nodes based on their features. NFD-Master watches for NodeFeatureGroup
objects in the cluster and updates the status of the NodeFeatureGroup object
with the list of nodes that match the feature group rules. The NodeFeatureGroup
rules follow the same syntax as the NodeFeatureRule rules.

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-05-23 16:34:08 +02:00
Markus Lehtonen
d2652cebb7 test/e2e: stop importing kubernetes test/e2e
Don't import the kubernetes tests/e2e "root" package (we still use the
test/e2e/framework). We only used the simple e2e runner function from
there so copy that over to the nfd test/e2e package. This change removes
a lot of dependencies speeding up builds.
2024-04-26 09:34:34 +03:00
Carlos Eduardo Arango Gutierrez
3434557d7c
Move NFD api to a separate go mod
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-04-05 16:35:47 +02:00
Kubernetes Prow Robot
7df0f17f68
Merge pull request #1602 from ozhuraki/nrt-owner-ref
Add owner reference to NRT object
2024-03-19 01:12:59 -07:00
Markus Lehtonen
6f891ce1d2 Remove references to -enable-nodefeature-api flag
Fix documentation, code and e2e-tests.
2024-03-18 16:06:25 +02:00
Oleg Zhurakivskyy
c662265a47 topology-updater: Add owner reference to NRT object
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2024-03-15 16:36:27 +02:00
Markus Lehtonen
53003cbf69 pkg/utils: move JsonPatch from pkg/apihelper 2024-01-25 17:23:14 +02:00
Markus Lehtonen
cd18fe8970 source/cpu: drop deprecated cpu-rdt labels
Drop RDT labels that were deprecated in NFD v0.13. The RDT features
remain available for NodeFeatureRules to serve custom labeling.
2023-12-22 17:29:00 +02:00
Markus Lehtonen
ea68017af6 test/e2e: replace k8s.io/utils/pointer package
Use "k8s.io/utils/ptr" instead of the deprecated "k8s.io/utils/pointer".
2023-12-20 15:12:11 +02:00
Markus Lehtonen
fe412a54b9 apis/nfd: add matchName field in feature matcher terms
Extend the format of feature matcher terms (the elements of the
arrayspecified under under matchFeatures field) with new matchName
field. The value of this field is an expression that is evaluated
against the names of feature elements instead of their values (values
are matched with the matchExpressions field, instead).

The matchName field is useful e.g. in template rules for creating
per-feature-element labels based on feature names (instead of values)
and in non-template rules for checking if (at least) one of certain
feature element names are present.

If both matchExpressions and matchName for certain feature matcher term
is specified, they both must match in order to get an overall match.
Also, in this case the list of matched features (used in templating) is
the union of the results from matchExpressions and matchName.

An example of creating an "avx512" label if any AVX512* CPUID feature is
present:

  - name: "avx wildcard rule"
    labels:
        avx512: "true"
    matchFeatures:
      - feature: cpu.cpuid
        matchName: {op: InRegexp, value: ["^AVX512"]}

An example of a template rule creating a dynamic set of labels  based on
the existence of certain kconfig options.

  - name: "kconfig template rule"
    labelsTemplate: |
      {{ range .kernel.config }}kconfig-{{ .Name }}={{ .Value }}
      {{ end }}
    matchFeatures:
      - feature: kernel.config
        matchName: {op: In, value: ["SWAP", "X86", "ARM"]}

NOTE: this patch changes the corner case of nil/null match expressions
with instance features (i.e. "matchExpressions: null"). Previously, we
returned all instances for templating but now a nil match expression is
not evaluated and no instances for templating are returned.
2023-12-15 11:32:23 +02:00
Markus Lehtonen
8e477cdfa4 Use non-exp maps package
The maps package became available as a standard non-experimental package
in Go 1.21.
2023-12-12 17:31:25 +02:00
Kubernetes Prow Robot
794630f7df
Merge pull request #1489 from ArangoGutierrez/ginkofocus
Makefile: add env var controls to make test targets configurable
2023-12-08 18:24:20 +01:00
Carlos Eduardo Arango Gutierrez
f9195ef6a4
Makefile: add env var controls to make test targets configurable
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-12-08 17:48:00 +01:00
Markus Lehtonen
c5174a835d test/e2e: fix broken test
Missed one check of NodeFeature API is enablement.
2023-12-08 18:35:37 +02:00
Markus Lehtonen
c02e05245b test/e2e: test NodeFeature owner reference
Add tests for verifying that the automatic garbage-colletion of
NodeFeature objects created by nfd-worker works as expected.
2023-12-08 16:04:03 +02:00
Markus Lehtonen
dc5af8be04 nfd-master: predictable handling of unprefixed names
Make the handling of unprefixed names (of labels, annotations and
extended resources) well-defined and predictable. Previously the
resulting output was not predictable in case the same name was coming in
both the unprefixed and prefixed form, say unprefixed "foo=bar" coming from
one source (be it nfd-worker or NodeFeature(Rule)) and
"feature.node.kubernetes.io/foo=baz" from a NodeFeature(Rule).
Previously the output value was randomly either "bar" or "baz".

This patch adds prefixes to all names early in the processing
"pipeline", preventing random name clashes later on.
2023-11-23 22:16:04 +02:00
Markus Lehtonen
18fada0cfb test/e2e: increase timeout for waiting node status
In some occasions the node status (capacity) takes a lot of time to
update. Increase the timeout on extended resource tests. Revert the default
timeout back to 10s.
2023-11-16 13:26:09 +02:00
Markus Lehtonen
7015dae352 test/e2e: cleanup feature annotations
Delete NFD-managed feature annotations at test setup and teardown
2023-10-27 15:17:54 +03:00
Kubernetes Prow Robot
6b90401950
Merge pull request #1440 from marquiz/devel/e2e-fix
test/e2e: fix broken feature-annotations test
2023-10-27 11:16:02 +02:00
Markus Lehtonen
f732342a2a test/e2e: improved test logging 2023-10-27 10:21:11 +03:00
Markus Lehtonen
0fa330f2d4 test/e2e: fix log messages
Fix some typos and improve log messages a bit.
2023-10-26 23:01:08 +03:00
Markus Lehtonen
0d766a0fde test/e2e: fix broken feature-annotations test 2023-10-26 22:56:14 +03:00
Carlos Eduardo Arango Gutierrez
c0063be4f4
Discover node features as annotations
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: bebc <mchf1990212@gmail.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
2023-10-25 19:58:58 +02:00
Markus Lehtonen
ddec2e27cb test/e2e: drop the unused ignoreUnexpected arg from custom matcher 2023-10-20 18:04:18 +03:00
Markus Lehtonen
28132fb274 test/e2e: stricter validation of node annotations
Now that the hard-to-predict version annotations are gone we can do
strict validation of nfd-generated node annotations.
2023-10-20 18:04:18 +03:00
Markus Lehtonen
a9849f20ff nfd-master: fix retry of node updates
This patch addresses issues with slow node status (extended resources)
updates. Previously we did just a few retries in quick succession which
could result in the node update failing, just because node status was
updated slower than our retry window. The patch mitigates the issue by
increasing the number of tries to 15. In addition, it creates a
ratelimiter with a longer per-item (per-node) base delay.

The patch also fixes the e2e-tests to expose the issue.
2023-10-20 17:24:01 +03:00
Kubernetes Prow Robot
b6231b60fc
Merge pull request #1418 from ArangoGutierrez/test-utils-deplo
Fix pkg name for test/utils/deployment
2023-10-20 13:44:32 +02:00
Markus Lehtonen
d7a91b818e test/e2e: fix source/custom nodename test
We dropped the legacy rule format so we need to convert the e2e test
rules to the new format, accordingly.
2023-10-20 12:12:45 +03:00
Carlos Eduardo Arango Gutierrez
251f0d8a7e
Fix pkg name for test/utils/deployment
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-10-16 16:11:20 +02:00
Markus Lehtonen
1d8a83b045 nfd-master: stop creating NFD version annotations
We now have metrics for getting detailed information about the NFD
instances running. There should be no need to pollute the node object
with NFD version annotations.

One problem with the annotations also that they were incomplete in the
sense that they only covered nfd-master and nfd-worker but not
nfd-topology-updater or nfd-gc.

Also, there was a problem with stale annotations, giving misleading
information. E.g. there was no way to remove old/stale master.version
annotations if nfd-master was scheduled on another node where it was
previously running.
2023-10-05 14:53:29 +03:00
Markus Lehtonen
b09ce75c8e nfd-master: fix filtering of extended resources
Fix a bug in checking the allowed ".feature.node.kubernetes.io" ns
suffix for extended resources. Also update e2e-tests to cover this case.
2023-09-27 10:55:11 +03:00
Carlos Eduardo Arango Gutierrez
30b8751515
nfd_gc_test.go: fix multiple import of same pkg
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-09-06 09:47:15 +02:00