1
0
Fork 0
mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2024-12-14 11:57:51 +00:00
Commit graph

481 commits

Author SHA1 Message Date
Markus Lehtonen
fcb8d3cda4 nfd-master: implement opts for modifying NfdMaster instance
This provides a more controlled way for setting up the NfdMaster
instance for testing.
2024-04-05 20:21:19 +03:00
Kubernetes Prow Robot
199d665046
Merge pull request #1656 from marquiz/devel/channel-simplify
Tidy up usage of channels for signaling
2024-04-05 07:51:34 -07:00
Carlos Eduardo Arango Gutierrez
3434557d7c
Move NFD api to a separate go mod
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-04-05 16:35:47 +02:00
Kubernetes Prow Robot
cb24f7c234
Merge pull request #1657 from marquiz/devel/master-label-whitelist
nfd-master: prevent crash on empty config struct
2024-04-05 05:36:52 -07:00
Markus Lehtonen
26a80cf142 Tidy up usage of channels for signaling
This started as a small effort to simplify the usage of "ready" channel
in nfd-master. It extended into a wider simplification/unification of
the channel usage.
2024-04-05 14:39:58 +03:00
Markus Lehtonen
b27676451a nfd-master: prevent crash on empty config struct
Change the handling of LabelWhiteList config option to use a pointer to
detect when the option is unset. This doesn't fix any detected crash but
is merely general improvement and stabilization, serving easier testing.

Also, use the regexp type from the core libs for the config struct -
dropping the unmasrhalling code for our custom regexp type - as the core
regexp now implements unmarshaller itself.
2024-04-05 14:19:44 +03:00
Kubernetes Prow Robot
ad96c301a4
Merge pull request #1642 from marquiz/devel/master-updater-pool-lock
nfd-master: protect node updater pool queueing with a lock
2024-04-05 03:31:10 -07:00
Markus Lehtonen
44a5a5b4a8 nfd-master: get node object only once when updating node
Prevent excess queries of node objects from the Kubernetes apiserver.
This significantly speeds up node updates (and reduces the load on the
apiserver) as the client-side throttling (which is good) does not bite
us that hard.
2024-04-04 14:44:52 +03:00
Oleg Zhurakivskyy
f2e9557a2d nfd-topology-updater: Add liveness probe
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2024-04-03 13:15:54 +03:00
Markus Lehtonen
bce446c5b6 nfd-master: protect node updater pool queueing with a lock
Prevents races when (re-)starting the queue. There are no reports on
issues related to this (and I haven't come up with any actual failure
path in the current code) but better to be safe and follow the best
practices.
2024-03-27 16:53:34 +02:00
Markus Lehtonen
c4e010eafd nfd-master: do nfd API scheme registration in an init function
Prevents (rare) races on nfd-master reconfigurartion. Previously the
scheme was registered at nfd API controller creation/startup time. This
caused a race with some lister/informer goroutines of the previous
(stoppped) controller still running and accessing (reading) the sceme
while we were updating (writing) it.
2024-03-27 15:26:16 +02:00
Oleg Zhurakivskyy
7bd27c757a topology-updater: Set APIVersion, Kind in the OwnerReference explicitly
APIVersion and Kind are empty in the returned namespace object
and need to be set explicitly.

Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2024-03-20 20:09:06 +02:00
Kubernetes Prow Robot
0ad5e50f24
Merge pull request #1609 from ozhuraki/worker-health
nfd-worker: Add liveness probe
2024-03-19 06:57:23 -07:00
Oleg Zhurakivskyy
8b63d17af7 nfd-worker: Add liveness probe
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2024-03-19 15:34:53 +02:00
Kubernetes Prow Robot
c4ff25de52
Merge pull request #1596 from marquiz/devel/master-infinite-retry
nfd-master: retry node updates indefinitely
2024-03-19 04:00:50 -07:00
Kubernetes Prow Robot
7df0f17f68
Merge pull request #1602 from ozhuraki/nrt-owner-ref
Add owner reference to NRT object
2024-03-19 01:12:59 -07:00
Markus Lehtonen
e7f87de6df nfd-master: retry node updates indefinitely
Treat node updates like a reconciliation loop. Keep trying on node
update as long as it fails. Node update permafailing likely indicates a
bug in the nfd code (there should be no reason for it to fail forever)
and it's better to clearly see it in the logs/metrics rather than giving
up after a few retries.
2024-03-18 18:14:24 +02:00
Kubernetes Prow Robot
4790962123
Merge pull request #1595 from marquiz/devel/master-check-node-existence
nfd-master: check if node exists before trying update
2024-03-18 04:19:57 -07:00
Kubernetes Prow Robot
797fada92e
Merge pull request #1585 from kannon92/add-swap-support
add swap support in nfd
2024-03-18 04:19:48 -07:00
Carlos Eduardo Arango Gutierrez
06c4733bc5
Add FeatureGate framework to handle new features
Code inspired on https://github.com/kubernetes/component-base/tree/master/featuregate

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-03-15 19:11:32 +01:00
Oleg Zhurakivskyy
c662265a47 topology-updater: Add owner reference to NRT object
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2024-03-15 16:36:27 +02:00
Kubernetes Prow Robot
52d4337004
Merge pull request #1615 from marquiz/devel/master-mem-leak
nfd-master: fix memory leak in nfd api-controller
2024-03-14 08:21:33 -07:00
Carlos Eduardo Arango Gutierrez
69dbfdfbc0
Use close to signal stop channedl in worker and topology-updater
Fix stop channel management on Worker and T-updater in case of multiple callers

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-03-14 15:28:39 +01:00
Markus Lehtonen
70fd3757c4 nfd-master: fix memory leak in nfd api-controller
Fixes a memory leak that happened when stopping (and then re-starting)
the nfd api controller. The stop channel was not used properly which
caused the underlying informer to keep on running.
2024-03-14 15:39:10 +02:00
Kubernetes Prow Robot
b2ba043fae
Merge pull request #1605 from ArangoGutierrez/codegen
Update generate scripts to use latest code_gen functions
2024-03-12 05:18:25 -07:00
Kubernetes Prow Robot
ebd19fe692
Merge pull request #1590 from marquiz/devel/validation-assert
apis/nfd/validate: use testify/assert for checking test results
2024-03-12 04:44:09 -07:00
Carlos Eduardo Arango Gutierrez
dd24e8bc97
Update generate scripts to use latest code_gen functions
Rewrite the generate.sh into update_codegen inspired in
k8s.io/code-generator documentation.

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-03-12 11:35:47 +01:00
Markus Lehtonen
a562a6188a Update auto-generated code 2024-03-11 12:18:32 +02:00
Markus Lehtonen
a7bd22a75b nfd-master: check if node exists before trying update
Make the node-updater-pool worker fail fast (and not retry updates) if
a node does not exist.
2024-02-20 11:04:46 +02:00
Kevin Hannon
187f65f94e Add swap support in nfd 2024-02-19 10:20:56 -05:00
Markus Lehtonen
044fd4a3fd nfd-master: log errors on node update retries 2024-02-16 15:51:04 +02:00
Markus Lehtonen
1f3b9ccb97 apis/nfd/validate: use testify/assert for checking test results 2024-02-16 10:05:16 +02:00
Markus Lehtonen
2382c34697 nfd-master: fix node status patching
Correctly patch the "status" subresource. This got broken when
refactoring the code in 7a050e7cf9 and
wasn't even catched by the unit tests as the fake kubernetes client
doesn't handle subresources as the real apiserver does.
2024-01-26 22:00:13 +02:00
Markus Lehtonen
8a6a731eb0 Drop pkg/apihelper
The code is now unused.
2024-01-26 18:50:31 +02:00
Kubernetes Prow Robot
33858b7502
Merge pull request #1567 from marquiz/devel/apihelper-refactor-3
topology-updater: ditch apihelper
2024-01-26 17:07:13 +01:00
Markus Lehtonen
7a050e7cf9 nfd-master: ditch apihelper
Implement some of frequently used helper functions inpackage.

This patch also contains big changes to the nfd-master unit tests. Much
of this is about migrating from the mocked apihelper interface to fake
kubernetes client that provides a bit more apiserver'ish functionality.
At the same time there is quite a bit of renaming in the tests,
shortening and unifying naming and getting rid of the extensive usage of
"mock" everywhere.
2024-01-26 16:09:22 +02:00
Markus Lehtonen
c581a25a39 topology-updater: ditch apihelper
Stop using pkg/apihelper for accessing the Kubernetes API. Modify unit
tests to use the fake kubernetes client instead of mocked apihelper
interface.
2024-01-25 22:15:20 +02:00
Markus Lehtonen
53003cbf69 pkg/utils: move JsonPatch from pkg/apihelper 2024-01-25 17:23:14 +02:00
Markus Lehtonen
2326459d05 topology-updater: get topology api client directly
Stop using apihelper for getting the noderesourcetopology-api client.
2024-01-25 16:33:34 +02:00
Markus Lehtonen
acf815fb10 pkg/utils: move GetKubeconfig from pkg/apihelper here
This change is part of an effort to remove the pkg/apihelper package.
GetKubeconfig is useful helper functionality shared accross the codebase
so move it into a "safe" location.
2024-01-24 16:10:02 +02:00
Markus Lehtonen
57b7a3c6a8 Wrap nested errors 2024-01-22 22:45:15 +02:00
Markus Lehtonen
b452ab6a5c topology-updater: initialize properly with -no-publish
We need to parse kubeconfig (and initialize the apihelper) even with
-no-publish as the PodResourcesScanner accesses the k8s API even if
we're not publishing/updating NRTs.
2024-01-22 14:15:12 +02:00
Kubernetes Prow Robot
3667a4d073
Merge pull request #1537 from ozhuraki/apis-nfd-test
apis/nfd: Trivial typo fix in tests
2024-01-19 15:25:41 +01:00
Markus Lehtonen
58ae81804c go.mod: update dependencies 2024-01-15 21:29:32 +02:00
Oleg Zhurakivskyy
eec05e1c7a apis/nfd: Trivial typo fix in tests
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2024-01-15 18:06:58 +02:00
Markus Lehtonen
a053efda64 nfd-master: run a separate gRPC health server
This patch separates the gRPC health server from the deprecated gRPC
server (disabled by default, replaced by the NodeFeature CRD API) used
for node labeling requests. The new health server runs on hardcoded TCP
port number 8082.

The main motivation for this change is to make the Kubernetes' built-in
gRPC liveness probes to function if TLS is enabled (as they don't
support TLS).

The health server itself is a naive implementation (as it was before),
basically only checking that nfd-master has started and hasn't crashed.
The patch adds a TODO note to improve the functionality.
2024-01-04 13:58:26 +02:00
Carlos Eduardo Arango Gutierrez
57b6035b71
Add kubectl-nfd
kubectl-nfd is a kubectl plugin for debbuging NodeFeatureRules

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-12-21 16:00:19 +01:00
Markus Lehtonen
97bf841140 apis/nfd: split rule processing into a separate package
This patch tidies up the nfdv1alpha1 API package by refactoring out the
implementation of (NodeFeature)Rule evaluation into a separate package.
2023-12-20 12:52:15 +02:00
Gyuho Lee
ed0418b81c
chore(nfd-worker): fix minor typo in wrong label value format error
Signed-off-by: Gyuho Lee <gyuho@lepton.ai>
2023-12-19 02:29:37 +08:00
Markus Lehtonen
b28d5c1557 apis/nfd: drop unused validate function 2023-12-18 15:19:19 +02:00
Markus Lehtonen
74bc3bb2a8 apis/nfd: drop custom unmarshaller functions
Not needed in the external API.
2023-12-18 15:19:19 +02:00
Kubernetes Prow Robot
884edc67eb
Merge pull request #1477 from marquiz/devel/api-cleanup
apis/nfd: drop the private template caching fields
2023-12-15 15:42:31 +01:00
Markus Lehtonen
912c7dcf2c apis/nfd: fix an error in auto-generated code
Work around a bug in k8s deepcopy-gen.
2023-12-15 11:32:23 +02:00
Markus Lehtonen
fe412a54b9 apis/nfd: add matchName field in feature matcher terms
Extend the format of feature matcher terms (the elements of the
arrayspecified under under matchFeatures field) with new matchName
field. The value of this field is an expression that is evaluated
against the names of feature elements instead of their values (values
are matched with the matchExpressions field, instead).

The matchName field is useful e.g. in template rules for creating
per-feature-element labels based on feature names (instead of values)
and in non-template rules for checking if (at least) one of certain
feature element names are present.

If both matchExpressions and matchName for certain feature matcher term
is specified, they both must match in order to get an overall match.
Also, in this case the list of matched features (used in templating) is
the union of the results from matchExpressions and matchName.

An example of creating an "avx512" label if any AVX512* CPUID feature is
present:

  - name: "avx wildcard rule"
    labels:
        avx512: "true"
    matchFeatures:
      - feature: cpu.cpuid
        matchName: {op: InRegexp, value: ["^AVX512"]}

An example of a template rule creating a dynamic set of labels  based on
the existence of certain kconfig options.

  - name: "kconfig template rule"
    labelsTemplate: |
      {{ range .kernel.config }}kconfig-{{ .Name }}={{ .Value }}
      {{ end }}
    matchFeatures:
      - feature: kernel.config
        matchName: {op: In, value: ["SWAP", "X86", "ARM"]}

NOTE: this patch changes the corner case of nil/null match expressions
with instance features (i.e. "matchExpressions: null"). Previously, we
returned all instances for templating but now a nil match expression is
not evaluated and no instances for templating are returned.
2023-12-15 11:32:23 +02:00
Markus Lehtonen
b2d9e15a00 apis/nfd: drop the private template caching fields
Drop the private fields – that were supposed to be used for caching parsed
templates – from the Rule type. Keep the API typedefs cleaner and
simpler. Moreover, the caching was not even used in practice,
effectively complicating code without any benefit: the way the types
are used in nfd-master creates a local copy of Rule type storing the
cached template in the copy, wasting it from any future users.

There are also other possible caveats in caching like we tried to do it.
For example the objects returned by the api lister are supposed to be
treated as read-only - in particular if we would be to modify them there
should at least be proper locking in place as nfd-master potentially
processes the same rule (the same Go object) in parallel for multiple
nodes. If any optimization like this will be pursued it should be done
properly, probably with private type(s) at the consumer's end, not
contaminating the API types.
2023-12-15 10:48:07 +02:00
Markus Lehtonen
0bc1b6c28f apis/nfd: drop creation helper functions
Drop the creation helper functions as one step in an effort to tidy up
the api package. These functions were not much used outside unit tests
anyway, the static rules of the nfd-worker custom feature source being
the only exception (and if those happened to be invalid we'd catch that
e.g. in the e2e-tests).
2023-12-14 15:54:51 +02:00
Kubernetes Prow Robot
3ce5a1b218
Merge pull request #1482 from marquiz/devel/api-cleanup-2
apis/nfd: drop the private regexp caching field
2023-12-14 12:08:58 +01:00
Markus Lehtonen
cb0a46ec0e Use generics for maps and slices 2023-12-13 12:09:53 +02:00
Markus Lehtonen
a77983556f nfd-master: remove default denied ns from config
These are now handled by the validate package. If we have them here in
nfd-master, the default namespace (feature.node.kubernetes.io) gets
denied.
2023-12-12 16:12:53 +02:00
Kubernetes Prow Robot
efe5c03071
Merge pull request #1455 from ArangoGutierrez/validation
Create a Validate pkg
2023-12-12 11:04:06 +01:00
Carlos Eduardo Arango Gutierrez
affb93ea50
Create a Validate pkg
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-12-11 16:54:22 +01:00
Markus Lehtonen
34574f4211 nfd-worker: set owner reference in NodeFeature objects
This patch creates a owner-dependent relationship between the
nfd-worker pod and the NodeFeature object that it creates. With this
change the orphaned NodeFeature object(s) gets automatically
garbage-collected when the nfd-worker pod goes away, without the need
for manual clean-up actions.
2023-12-08 14:57:31 +02:00
Markus Lehtonen
8d40524b88 apis/nfd: drop the private regexp caching field
Drop the private field for caching parsed regexp from the
MatchExpression type. This tidies up the API type definition and not so
tied with particular implementation details. The change also elimiates
potential concurrency problems as no locking is in place in the API
types.

If caching will be desired in the future, it's better to do it properly
in a separate package, not directly in the API types.
2023-12-01 15:28:55 +02:00
Markus Lehtonen
b988139094 apis/nfd: validate input when matching expression
Don't assume that the fields are correct.
2023-12-01 09:22:32 +02:00
Markus Lehtonen
94bffbf645 generate: update kube code-gen to v1.28.4 2023-11-29 18:37:19 +02:00
Kubernetes Prow Robot
dfef0ebe4a
Merge pull request #1472 from marquiz/devel/typo-fix
nfd-worker: fix typo in log message
2023-11-24 16:53:49 +01:00
Markus Lehtonen
f266533a7d nfd-worker: fix typo in log message 2023-11-24 17:17:42 +02:00
Markus Lehtonen
f6c360188e Use T.Run in expression unit tests
The "better way" of running test cases, get e.g. better output in case
of errors.

Also drop some unneeded type definitions from the tests.
2023-11-24 17:14:12 +02:00
Markus Lehtonen
f489ca98b5 Reproducible output from expression matching
Fix flakyness of unit tests by adding back the sorting of matched
feature elements that was unadvisedly removed in
63c22551df. This might help debugging some
corner cases in real-life scenarios (when using templating), too.
2023-11-24 16:27:38 +02:00
Kubernetes Prow Robot
ed8898de6a
Merge pull request #1461 from marquiz/devel/no-implicit-ns
Option to stop implicitly adding default prefix to names
2023-11-24 14:53:09 +01:00
Kubernetes Prow Robot
7154458524
Merge pull request #1468 from marquiz/devel/nfr-template-fix
apis/nfd: fix multiple matcher terms targeting the same feature
2023-11-24 13:20:49 +01:00
Markus Lehtonen
1d012a28cd Option to stop implicitly adding default prefix to names
Add new autoDefaultNs (default is "true") config option to nfd-master.
Setting the config option to false stops NFD from automatically adding
the "feature.node.kubernetes.io/" prefix to labels, annotations and
extended resources. Taints are not affected as for them no prefix is
automatically added. The user-visible part of enabling the option change
is that NodeFeatureRules, local feature files, hooks and configuration
of the "custom" may need to be altereda (if the auto-prefixing is
relied on).

For now, the config option defaults to "true", meaning no change in
default behavior. However, the intent is to change the default to
"false" in a future release, deprecating the option and eventually
removing it (forcing it to "false").

The goal of stopping doing "auto-prefixing" is to simplify the operation
(of nfd and users). Make the naming more straightforward and easier to
understand and debug (kind of WYSIWYG), eliminating peculiar corner
cases:

1. Make validation simpler and unambiguous
2. Remove "overloading" of names, i.e. the mapping two values to the
   same actual name. E.g. previously something like

      labels:
        feature.node.kubernetes.io/foo: bar
        foo: baz

   Could actually result in node label:

     feature.node.kubernetes.io/foo: baz

3. Make the processing/usagee of the "rule.matched" and "local.labels"
   feature in NodeFeatureRules unambiguous and more understadable. E.g.
   previously you could have node label
   "feature.node.kubernetes.io/local-foo: bar" but in the NodeFeatureRule
   you'd need to use the unprefixed name "local-foo" or the fully
   prefixed name, depending on what was specified in the feature file (or
   hook) on the node(s).

NOTE: setting autoDefaultNs to false is a breaking change for users who
rely on automatic prefixing with the default feature.node.kubernetes.io/
namespace. NodeFeatureRules, feature files, hooks and custom rules
(configuration of the "custom" source of nfd-worker) will need to be
altered.  Unprefixed labels, annoations and extended resources will be
denied by nfd-master.
2023-11-24 12:48:20 +02:00
Markus Lehtonen
dc5af8be04 nfd-master: predictable handling of unprefixed names
Make the handling of unprefixed names (of labels, annotations and
extended resources) well-defined and predictable. Previously the
resulting output was not predictable in case the same name was coming in
both the unprefixed and prefixed form, say unprefixed "foo=bar" coming from
one source (be it nfd-worker or NodeFeature(Rule)) and
"feature.node.kubernetes.io/foo=baz" from a NodeFeature(Rule).
Previously the output value was randomly either "bar" or "baz".

This patch adds prefixes to all names early in the processing
"pipeline", preventing random name clashes later on.
2023-11-23 22:16:04 +02:00
Markus Lehtonen
678d7e89cb nfd-master: drop stale variables
Remove some stale variables that were leftover from the recent removal
of nfd version annotations.
2023-11-23 19:01:22 +02:00
Markus Lehtonen
63c22551df apis/nfd: fix multiple matcher terms targeting the same feature
Fix NodeFeatureRule templating in cases where multiple matchFeatures
terms are targeting the same feature. Previously, only matched feature
elements from the last matcher terms were used as the input to the
template. However, the input should contain all matched elements from
all matcher terms.

For example, consider the example rule snippet below:

  ...
  labelsTemplate: |
    {{ range .pci.device }}vendor.io/pci-device.{{ .class }}-{{ .device }}=exists
    {{ end }}
  matchFeatures:
    - feature: pci.device
      matchExpressions:
        class: {op: InRegexp, value: ["^03"]}
        vendor: {op: In, value: ["1234"]}
    - feature: pci.device
      matchExpressions:
        class: {op: InRegexp, value: ["^12"]}

This rule matches if both a pci device of class 03 from vendor 1234
exists and a pci device of class 12 (from any vendor) exists.
Previously, the template would only generate labels from the devices in
class 12 (as that's the last term). With this patch the template creates
device labels from devices in both classes 03 and 12.
2023-11-22 10:43:52 +02:00
Kubernetes Prow Robot
371ed3ff21
Merge pull request #1458 from marquiz/devel/logging-fix
apis/nfd: fix logging of rule expression processing
2023-11-21 12:04:59 +01:00
Markus Lehtonen
9cbe742bfb apis/nfd: fix incorrect comments of matching functions
This patch updates the comments to correspond to the actual behavior
which was changed back in 36341bf4c7.
2023-11-20 10:11:35 +02:00
Markus Lehtonen
8ec55fe8db apis/nfd: fix logging of rule expression processing 2023-11-10 09:40:54 +02:00
Carlos Eduardo Arango Gutierrez
c0063be4f4
Discover node features as annotations
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: bebc <mchf1990212@gmail.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
2023-10-25 19:58:58 +02:00
Markus Lehtonen
a9849f20ff nfd-master: fix retry of node updates
This patch addresses issues with slow node status (extended resources)
updates. Previously we did just a few retries in quick succession which
could result in the node update failing, just because node status was
updated slower than our retry window. The patch mitigates the issue by
increasing the number of tries to 15. In addition, it creates a
ratelimiter with a longer per-item (per-node) base delay.

The patch also fixes the e2e-tests to expose the issue.
2023-10-20 17:24:01 +03:00
Markus Lehtonen
98c3b0750d nfd-gc: add metrics
Implements three metrics for nfd-gc:

- nfd_gc_build_info: version information of nfd-gc.
- nfd_gc_objects_deleted_total: total number of NodeFeature and
  NodeResourceTopology objects deleted by nfd-gc.
- nfd_gc_object_delete_failures_total: number of errors encountered when
  deleting NodeFeature and NodeResourceTopology objects.
2023-10-09 13:39:28 +00:00
Markus Lehtonen
f5c6ce2843 nfd-gc: simplify initialization 2023-10-09 11:48:49 +03:00
Markus Lehtonen
5171ae0f90 Refactor metrics
Move common boilerplate code under pkg/utils.
2023-10-09 10:49:12 +03:00
Markus Lehtonen
1d8a83b045 nfd-master: stop creating NFD version annotations
We now have metrics for getting detailed information about the NFD
instances running. There should be no need to pollute the node object
with NFD version annotations.

One problem with the annotations also that they were incomplete in the
sense that they only covered nfd-master and nfd-worker but not
nfd-topology-updater or nfd-gc.

Also, there was a problem with stale annotations, giving misleading
information. E.g. there was no way to remove old/stale master.version
annotations if nfd-master was scheduled on another node where it was
previously running.
2023-10-05 14:53:29 +03:00
Markus Lehtonen
9ea0a1b420 nfd-master: correctly clean up annotations
Delete correct annotations if -instance is specified.
2023-10-05 11:10:06 +03:00
Markus Lehtonen
dbf00dcda6 apis/nfd: drop one stale comment line
Drop a leftover "docstring" comment that wasn't removed with the type it
refers to.
2023-09-27 14:23:12 +03:00
Markus Lehtonen
b09ce75c8e nfd-master: fix filtering of extended resources
Fix a bug in checking the allowed ".feature.node.kubernetes.io" ns
suffix for extended resources. Also update e2e-tests to cover this case.
2023-09-27 10:55:11 +03:00
AhmedGrati
7ab6314bdc chore: introduce a commong klog handling for cmd/nfd-*
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-09-07 22:38:15 +01:00
AhmedGrati
b0be40aa09 feat: add logging parameters in configuration file for nfd master
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-09-06 15:27:27 +01:00
Kubernetes Prow Robot
19520c079c
Merge pull request #1325 from ffromani/nfd-updater-fix-events
nfd-updater: events: enable timer-only flow
2023-09-04 05:47:49 -07:00
Francesco Romani
000c919071 nfd-updater: events: enable timer-only flow
The nfd-topology-updater has state-directories notification mechanism
enabled by default.
In theory, we can have only timer-based updates, but if the option
is given to disable the state-directories event source, then all
the update mechanism is mistakenly disabled, including the
timer-based updates.

The two updaters mechanism should be decoupled.
So this PR changes this to make sure we can enable just and only
the timer-based updates.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-09-04 13:05:50 +02:00
Kubernetes Prow Robot
f852c32a55
Merge pull request #1252 from AhmedGrati/test-add-updater-pool-unit-tests
test: add node updater pool unit tests
2023-09-01 07:34:32 -07:00
Kubernetes Prow Robot
e1f90a233b
Merge pull request #1305 from marquiz/devel/nf-gc
Garbage collection of NodeFeature objects
2023-08-28 02:59:42 -07:00
Kubernetes Prow Robot
6d95e59cd0
Merge pull request #1290 from marquiz/devel/metrics-new
metrics: additional metrics for nfd-master
2023-08-28 02:07:42 -07:00
Markus Lehtonen
e3415ec484 nfd-gc: support garbage collection of NodeFeatures
Hook into the same logic already exercised for NodeResourceTopology
objects: GC watches for node delete events and immediately drops stale
objects (NRT and now also NF). In addition there is a periodic resync to
catch any missed node deletes, once every hour by default.
2023-08-22 21:24:26 +03:00
Markus Lehtonen
01c08d67b6 Rename nfd-topology-gc to nfd-gc
This is preparation for making it a generic garbage collector for all
nfd-managed api objects.
2023-08-21 21:46:11 +03:00
Kubernetes Prow Robot
e0c477090b
Merge pull request #1311 from marquiz/devel/refactor-gc-5
topology-gc: simplify listing of node objects
2023-08-21 11:40:05 -07:00
Markus Lehtonen
f05b0e26ea topology-gc: move initial GC out of startNodeInformer()
Small refactor. Contextually this feels more like under periodicGC().
2023-08-21 10:11:46 +03:00
Kubernetes Prow Robot
a60502a313
Merge pull request #1307 from marquiz/devel/refactor-gc
topology-gc: refactor unit tests
2023-08-21 00:09:23 -07:00
Kubernetes Prow Robot
536f9d17d0
Merge pull request #1295 from marquiz/devel/topology-updater-metrics
nfd-topology-updater: add metrics support
2023-08-20 23:25:24 -07:00