1
0
Fork 0
mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2024-12-14 11:57:51 +00:00
Commit graph

82 commits

Author SHA1 Message Date
Markus Lehtonen
02b6b7395c Drop dynamic run-time reconfiguration
Simplify the code and reduce possible error scenarios by dropping
fsnotify-based reconfiguration from nfd-master and nfd-worker. Also
eliminates repeated re-configuration in scenarios where kubelet
continuosly touches the (every minute) mounted file (configmap) on the
filesystem.

Also modifies the Helm and kustomize deployments so that nfd-master,
nfd-worker and nfd-topology-updater pods are restarted on configmap
updates. In kustomize, the slght downside of this is the name of the
config map(s) depends on the content, so every time a user customizes
the config data, the old unused configmap will be left and must be
garbage-collected manually.
2024-08-21 12:46:36 +03:00
Markus Lehtonen
25e827a4c8 feature-gates: mark NodeFeatureAPI as GA
The feature gate is locked to true. That is, it is not possible to revert
back to the gPRC-based communication which makes the gRPC API ready for
removal.
2024-07-16 13:53:31 +03:00
Markus Lehtonen
522b87e325 nfd-worker: change TestRun to use NodeFeature API
Run nfd-worker with NodeFeature API enabled (against a fake apiserver)
instead of using the deprecated gRPC (against a nfd-master instance).

Expand the test to verify the features and labels that are advertised as
a NodeFeature object.
2024-07-12 09:50:09 +03:00
Markus Lehtonen
a269bf4d25 Drop the -enable-nodefeature-api flag
Was marked to be removed in v0.17.
2024-07-10 15:20:07 +03:00
Carlos Eduardo Arango Gutierrez
e33e68ad5b
Add optionable arguments to NewWorker
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-07-09 15:08:26 +02:00
Carlos Eduardo Arango Gutierrez
5d3ee1c51f
Use worker DS OwnerReference for NF's
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-07-04 13:53:24 +02:00
Kubernetes Prow Robot
5fe3433e35
Merge pull request #1713 from marquiz/devel/worker-log-fix
nfd-worker: improved log when creating NodeFeature object
2024-05-24 02:15:30 -07:00
Carlos Eduardo Arango Gutierrez
47c054e1db
Add NodeFeatureGroup CRD
The NodeFeatureGroup is an NFD-specific custom resource that is designed for
grouping nodes based on their features. NFD-Master watches for NodeFeatureGroup
objects in the cluster and updates the status of the NodeFeatureGroup object
with the list of nodes that match the feature group rules. The NodeFeatureGroup
rules follow the same syntax as the NodeFeatureRule rules.

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-05-23 16:34:08 +02:00
Markus Lehtonen
649036977e nfd-worker: improved log when creating NodeFeature object
Don't log an empty NodeFeature object.
2024-05-23 14:37:26 +03:00
Markus Lehtonen
560bd11d85 Re-add -enable-nodefeature-api cmdline flag
Bring back the -enable-nodefeature-api command line flag and the
corresponding enableNodeFeatureApi helm config value that were
removed without deprecation when the NodeFeatureAPI feature gate was
introduced. The thinking behind this change is to not break existing
users (without warning) unless totally unavoidable. Now the
-enable-nodefeature-api flag is marked as deprecated and slated for
removal in NFD v0.17.

The NodeFeatureAPI feature gate and the -enable-nodefeature-api flag
work together so that the NodeFeature API is disabled (gRPC is enabled,
instead) if either of them is set to false.

This patch selectively reverts parts of
06c4733bc5.
2024-05-16 10:53:49 +03:00
Kubernetes Prow Robot
6b80f654d4
Merge pull request #1600 from ArangoGutierrez/e2e-not-k8s
Move NFD api to a separate go mod
2024-04-09 02:06:06 -07:00
Markus Lehtonen
8709cccf71 nfd-master: parse kubeconfig even with NoPublish set
Don't try to be too smart when kubeconfig is needed. In practice, the
nfd-master really doesn't work anymore (with the NodeFeature API
enabled) without a kubeconfig set. This patch fixes crashes happening
when NoPublish is enabled, e.g. in listing all nodes in the nfd api
handler and in getting single node objects in the node updater pool.

This patch changes the kubeconfig parsing to happen at the creation of
the nfd-master instance. We don't need to do that at reconfigure time as
none of the dynamic config options affect it. Unit tests are adjusted,
accordingly.
2024-04-08 14:25:27 +03:00
Markus Lehtonen
fcb8d3cda4 nfd-master: implement opts for modifying NfdMaster instance
This provides a more controlled way for setting up the NfdMaster
instance for testing.
2024-04-05 20:21:19 +03:00
Carlos Eduardo Arango Gutierrez
3434557d7c
Move NFD api to a separate go mod
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-04-05 16:35:47 +02:00
Markus Lehtonen
26a80cf142 Tidy up usage of channels for signaling
This started as a small effort to simplify the usage of "ready" channel
in nfd-master. It extended into a wider simplification/unification of
the channel usage.
2024-04-05 14:39:58 +03:00
Oleg Zhurakivskyy
8b63d17af7 nfd-worker: Add liveness probe
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2024-03-19 15:34:53 +02:00
Carlos Eduardo Arango Gutierrez
06c4733bc5
Add FeatureGate framework to handle new features
Code inspired on https://github.com/kubernetes/component-base/tree/master/featuregate

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-03-15 19:11:32 +01:00
Carlos Eduardo Arango Gutierrez
69dbfdfbc0
Use close to signal stop channedl in worker and topology-updater
Fix stop channel management on Worker and T-updater in case of multiple callers

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-03-14 15:28:39 +01:00
Markus Lehtonen
acf815fb10 pkg/utils: move GetKubeconfig from pkg/apihelper here
This change is part of an effort to remove the pkg/apihelper package.
GetKubeconfig is useful helper functionality shared accross the codebase
so move it into a "safe" location.
2024-01-24 16:10:02 +02:00
Gyuho Lee
ed0418b81c
chore(nfd-worker): fix minor typo in wrong label value format error
Signed-off-by: Gyuho Lee <gyuho@lepton.ai>
2023-12-19 02:29:37 +08:00
Markus Lehtonen
cb0a46ec0e Use generics for maps and slices 2023-12-13 12:09:53 +02:00
Markus Lehtonen
34574f4211 nfd-worker: set owner reference in NodeFeature objects
This patch creates a owner-dependent relationship between the
nfd-worker pod and the NodeFeature object that it creates. With this
change the orphaned NodeFeature object(s) gets automatically
garbage-collected when the nfd-worker pod goes away, without the need
for manual clean-up actions.
2023-12-08 14:57:31 +02:00
Markus Lehtonen
f266533a7d nfd-worker: fix typo in log message 2023-11-24 17:17:42 +02:00
Markus Lehtonen
1d012a28cd Option to stop implicitly adding default prefix to names
Add new autoDefaultNs (default is "true") config option to nfd-master.
Setting the config option to false stops NFD from automatically adding
the "feature.node.kubernetes.io/" prefix to labels, annotations and
extended resources. Taints are not affected as for them no prefix is
automatically added. The user-visible part of enabling the option change
is that NodeFeatureRules, local feature files, hooks and configuration
of the "custom" may need to be altereda (if the auto-prefixing is
relied on).

For now, the config option defaults to "true", meaning no change in
default behavior. However, the intent is to change the default to
"false" in a future release, deprecating the option and eventually
removing it (forcing it to "false").

The goal of stopping doing "auto-prefixing" is to simplify the operation
(of nfd and users). Make the naming more straightforward and easier to
understand and debug (kind of WYSIWYG), eliminating peculiar corner
cases:

1. Make validation simpler and unambiguous
2. Remove "overloading" of names, i.e. the mapping two values to the
   same actual name. E.g. previously something like

      labels:
        feature.node.kubernetes.io/foo: bar
        foo: baz

   Could actually result in node label:

     feature.node.kubernetes.io/foo: baz

3. Make the processing/usagee of the "rule.matched" and "local.labels"
   feature in NodeFeatureRules unambiguous and more understadable. E.g.
   previously you could have node label
   "feature.node.kubernetes.io/local-foo: bar" but in the NodeFeatureRule
   you'd need to use the unprefixed name "local-foo" or the fully
   prefixed name, depending on what was specified in the feature file (or
   hook) on the node(s).

NOTE: setting autoDefaultNs to false is a breaking change for users who
rely on automatic prefixing with the default feature.node.kubernetes.io/
namespace. NodeFeatureRules, feature files, hooks and custom rules
(configuration of the "custom" source of nfd-worker) will need to be
altered.  Unprefixed labels, annoations and extended resources will be
denied by nfd-master.
2023-11-24 12:48:20 +02:00
Markus Lehtonen
5171ae0f90 Refactor metrics
Move common boilerplate code under pkg/utils.
2023-10-09 10:49:12 +03:00
AhmedGrati
7ab6314bdc chore: introduce a commong klog handling for cmd/nfd-*
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-09-07 22:38:15 +01:00
Markus Lehtonen
5091fef84b metrics: improve feature discovery duration metric
Rename the "NodeName" prometheus label to  "node", aligning  with
common prometheus/kubernetes conventions. Also reconfigure the
prometheus histogram buckets (now 10ms to 1s) to better match the
expected sample range.
2023-07-31 19:45:22 +03:00
Carlos Eduardo Arango Gutierrez
e3aedd33e2
Enable metrics via prometheus operator
Expose metrics via prometheus.monitoring.coreos.com/v1

The exposed metrics are

| Metric        | Type | Meaning |
| --------------- | ---------------- | ---------------- |
|  `nfd_master_build_info`           | Gauge | Version from which nfd-master was built. |
|  `nfd_worker_build_info`           | Gauge | Version from which nfd-worker was built. |
|  `nfd_updated_nodes`           |  Counter | Time taken to label a node |
|  `nfd_crd_processing_time`          |  Gauge | Time taken to process a NodeFeatureRule CRD |
| `nfd_feature_discovery_duration_seconds` |  HistogramVec | Time taken to discover features on a node |

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
2023-07-21 10:59:52 +02:00
Markus Lehtonen
bf670de68d pkg/utils: migrate KlogDump to structured logging
Drop the KlogDump helper in favor of klog.InfoS. However, that patch
introduces a new DelayedDumper() helper to avoid processing
(marshalling) of object unless really evaluated by the logging function.
2023-05-31 14:43:08 +03:00
Markus Lehtonen
7be08f9e7f nfd-worker: migrate to structured logging 2023-05-31 14:43:08 +03:00
AhmedGrati
7917434d38 feat: add master resync period configurability
This PR adds a config option for setting the NFD API controller resync period.
The resync period is only activated when the NodeFeature API has been
enabled (with -enable-nodefeature-api).

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-04-24 11:52:38 +02:00
Kubernetes Prow Robot
193c552b33
Merge pull request #1084 from AhmedGrati/feat-add-master-config-file
feat: add master config file
2023-04-04 10:41:40 -07:00
AhmedGrati
3fff409f6d Add master config file
Similar to the nfd-worker, in this PR we want to support the
dynamic run-time configurability through a config file for the nfd-master.

We'll use a json or yaml configuration file along with the fsnotify in
order to watch for changes in the config file. As a result, we're
allowing dynamic control of logging params, allowed namespaces,
extended resources, label whitelisting, and denied namespaces.

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-04-03 09:52:09 +01:00
AhmedGrati
d0a6289c0f chore: add debug dump of nfd worker configuration
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-03-18 00:49:07 +01:00
Ville Pihlava
b1c6b229fe Add discovery duration logging. 2023-02-13 12:55:57 +02:00
Ville Pihlava
2101cb20e4 Change nfd-worker to use Ticker instead of After. 2023-02-09 17:14:39 +02:00
Markus Lehtonen
1026d91d12 worker: move code
Simplify code bu dropping the unnecessary base client package.
2022-12-23 11:38:21 +02:00
Markus Lehtonen
112744bc50 nfd-worker: split out gRPC connection handling
Refactor the worker code and split out gRPC client connection handling
into a separate base type. The intent is to promote re-usability of code
for other NFD clients, too.
2021-08-20 15:29:27 +03:00
Kubernetes Prow Robot
c0e1000a7d
Merge pull request #474 from marquiz/devel/worker-log-verbosity
nfd-worker: don't log labels returned by sources by default
2021-03-15 12:52:34 -07:00
Markus Lehtonen
6c6249a599 nfd-worker: don't log labels returned by sources by default
Reduce default log verbosity. Only print out labels if log verbosity is
1 or higher ('core.klog.v: 1' config file option or '-v 1' on command
line). Also, dump the labels in a reproducible (sorted) format.
2021-03-15 21:42:33 +02:00
Markus Lehtonen
2d20a2ff7c nfd-worker: support certificate rotation
Watch for changes in TLS files and re-connect to nfd-master in the event
of changes.
2021-03-09 14:40:51 +02:00
Markus Lehtonen
dfc2596a22 pkg/utils: generalize file watcher
Add the capability to watch multiple files. Move it to a separate
package in order to make it reusable.
2021-03-09 14:20:34 +02:00
Markus Lehtonen
dd7691c486 nfd-worker: improve log messages of config handling 2021-03-02 18:49:58 +02:00
Carlos Eduardo Arango Gutierrez
389a8f87cf
logging: start log messages with lower case
Standarize logs to be lower case.

Signed-off-by: Carlos Eduardo Arango Gutierrez <carangog@redhat.com>
2021-03-01 10:07:21 -05:00
Markus Lehtonen
5e6f0779e9 nfd-worker: stop masking crashes in feature discovery
The code should be stable enough. If there are fatal bugs causing the
discovery to panic/segfault that should be made visible instead of
semi-siently hiding it. Also, this caused one (negative) test case to
fail undetected which is now fixed.
2021-03-01 09:14:19 +02:00
Markus Lehtonen
3f18e880b4 nfd-worker: dynamic configuration of klog
Make it possible to dynamically (at run-time) alter most of the logging
configuration from the config file.
2021-02-25 16:10:43 +02:00
Markus Lehtonen
7da7fde8f6 nfd-worker: switch to klog
Greatly expands logging capabilities and flexibility with verbosity
options, among other things.
2021-02-25 16:10:43 +02:00
Markus Lehtonen
3fd61eacdb nfd-worker: switch to flag in command line parsing 2021-02-24 12:06:16 +02:00
Markus Lehtonen
47033db9c1 nfd-master: use flag for command line parsing 2021-02-24 12:06:16 +02:00
Markus Lehtonen
6b744d4179 nfd-worker: extend unit test coverage of config handling
Add test cases for verifying the core config.

Also, add asynchronous tests for basic verification of dynamic config
file updates.
2021-02-17 21:52:25 +02:00