node-feature-discovery

mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2024-12-14 11:57:51 +00:00

Author	SHA1	Message	Date
Markus Lehtonen	02b6b7395c	Drop dynamic run-time reconfiguration Simplify the code and reduce possible error scenarios by dropping fsnotify-based reconfiguration from nfd-master and nfd-worker. Also eliminates repeated re-configuration in scenarios where kubelet continuosly touches the (every minute) mounted file (configmap) on the filesystem. Also modifies the Helm and kustomize deployments so that nfd-master, nfd-worker and nfd-topology-updater pods are restarted on configmap updates. In kustomize, the slght downside of this is the name of the config map(s) depends on the content, so every time a user customizes the config data, the old unused configmap will be left and must be garbage-collected manually.	2024-08-21 12:46:36 +03:00
Markus Lehtonen	25e827a4c8	feature-gates: mark NodeFeatureAPI as GA The feature gate is locked to true. That is, it is not possible to revert back to the gPRC-based communication which makes the gRPC API ready for removal.	2024-07-16 13:53:31 +03:00
Markus Lehtonen	522b87e325	nfd-worker: change TestRun to use NodeFeature API Run nfd-worker with NodeFeature API enabled (against a fake apiserver) instead of using the deprecated gRPC (against a nfd-master instance). Expand the test to verify the features and labels that are advertised as a NodeFeature object.	2024-07-12 09:50:09 +03:00
Markus Lehtonen	a269bf4d25	Drop the -enable-nodefeature-api flag Was marked to be removed in v0.17.	2024-07-10 15:20:07 +03:00
Carlos Eduardo Arango Gutierrez	e33e68ad5b	Add optionable arguments to NewWorker Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2024-07-09 15:08:26 +02:00
Carlos Eduardo Arango Gutierrez	5d3ee1c51f	Use worker DS OwnerReference for NF's Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2024-07-04 13:53:24 +02:00
Kubernetes Prow Robot	5fe3433e35	Merge pull request #1713 from marquiz/devel/worker-log-fix nfd-worker: improved log when creating NodeFeature object	2024-05-24 02:15:30 -07:00
Carlos Eduardo Arango Gutierrez	47c054e1db	Add NodeFeatureGroup CRD The NodeFeatureGroup is an NFD-specific custom resource that is designed for grouping nodes based on their features. NFD-Master watches for NodeFeatureGroup objects in the cluster and updates the status of the NodeFeatureGroup object with the list of nodes that match the feature group rules. The NodeFeatureGroup rules follow the same syntax as the NodeFeatureRule rules. Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2024-05-23 16:34:08 +02:00
Markus Lehtonen	649036977e	nfd-worker: improved log when creating NodeFeature object Don't log an empty NodeFeature object.	2024-05-23 14:37:26 +03:00
Markus Lehtonen	560bd11d85	Re-add -enable-nodefeature-api cmdline flag Bring back the -enable-nodefeature-api command line flag and the corresponding enableNodeFeatureApi helm config value that were removed without deprecation when the NodeFeatureAPI feature gate was introduced. The thinking behind this change is to not break existing users (without warning) unless totally unavoidable. Now the -enable-nodefeature-api flag is marked as deprecated and slated for removal in NFD v0.17. The NodeFeatureAPI feature gate and the -enable-nodefeature-api flag work together so that the NodeFeature API is disabled (gRPC is enabled, instead) if either of them is set to false. This patch selectively reverts parts of `06c4733bc5`.	2024-05-16 10:53:49 +03:00
Kubernetes Prow Robot	6b80f654d4	Merge pull request #1600 from ArangoGutierrez/e2e-not-k8s Move NFD api to a separate go mod	2024-04-09 02:06:06 -07:00
Markus Lehtonen	8709cccf71	nfd-master: parse kubeconfig even with NoPublish set Don't try to be too smart when kubeconfig is needed. In practice, the nfd-master really doesn't work anymore (with the NodeFeature API enabled) without a kubeconfig set. This patch fixes crashes happening when NoPublish is enabled, e.g. in listing all nodes in the nfd api handler and in getting single node objects in the node updater pool. This patch changes the kubeconfig parsing to happen at the creation of the nfd-master instance. We don't need to do that at reconfigure time as none of the dynamic config options affect it. Unit tests are adjusted, accordingly.	2024-04-08 14:25:27 +03:00
Markus Lehtonen	fcb8d3cda4	nfd-master: implement opts for modifying NfdMaster instance This provides a more controlled way for setting up the NfdMaster instance for testing.	2024-04-05 20:21:19 +03:00
Carlos Eduardo Arango Gutierrez	3434557d7c	Move NFD api to a separate go mod Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2024-04-05 16:35:47 +02:00
Markus Lehtonen	26a80cf142	Tidy up usage of channels for signaling This started as a small effort to simplify the usage of "ready" channel in nfd-master. It extended into a wider simplification/unification of the channel usage.	2024-04-05 14:39:58 +03:00
Oleg Zhurakivskyy	8b63d17af7	nfd-worker: Add liveness probe Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>	2024-03-19 15:34:53 +02:00
Carlos Eduardo Arango Gutierrez	06c4733bc5	Add FeatureGate framework to handle new features Code inspired on https://github.com/kubernetes/component-base/tree/master/featuregate Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2024-03-15 19:11:32 +01:00
Carlos Eduardo Arango Gutierrez	69dbfdfbc0	Use close to signal stop channedl in worker and topology-updater Fix stop channel management on Worker and T-updater in case of multiple callers Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2024-03-14 15:28:39 +01:00
Markus Lehtonen	acf815fb10	pkg/utils: move GetKubeconfig from pkg/apihelper here This change is part of an effort to remove the pkg/apihelper package. GetKubeconfig is useful helper functionality shared accross the codebase so move it into a "safe" location.	2024-01-24 16:10:02 +02:00
Gyuho Lee	ed0418b81c	chore(nfd-worker): fix minor typo in wrong label value format error Signed-off-by: Gyuho Lee <gyuho@lepton.ai>	2023-12-19 02:29:37 +08:00
Markus Lehtonen	cb0a46ec0e	Use generics for maps and slices	2023-12-13 12:09:53 +02:00
Markus Lehtonen	34574f4211	nfd-worker: set owner reference in NodeFeature objects This patch creates a owner-dependent relationship between the nfd-worker pod and the NodeFeature object that it creates. With this change the orphaned NodeFeature object(s) gets automatically garbage-collected when the nfd-worker pod goes away, without the need for manual clean-up actions.	2023-12-08 14:57:31 +02:00
Markus Lehtonen	f266533a7d	nfd-worker: fix typo in log message	2023-11-24 17:17:42 +02:00
Markus Lehtonen	1d012a28cd	Option to stop implicitly adding default prefix to names Add new autoDefaultNs (default is "true") config option to nfd-master. Setting the config option to false stops NFD from automatically adding the "feature.node.kubernetes.io/" prefix to labels, annotations and extended resources. Taints are not affected as for them no prefix is automatically added. The user-visible part of enabling the option change is that NodeFeatureRules, local feature files, hooks and configuration of the "custom" may need to be altereda (if the auto-prefixing is relied on). For now, the config option defaults to "true", meaning no change in default behavior. However, the intent is to change the default to "false" in a future release, deprecating the option and eventually removing it (forcing it to "false"). The goal of stopping doing "auto-prefixing" is to simplify the operation (of nfd and users). Make the naming more straightforward and easier to understand and debug (kind of WYSIWYG), eliminating peculiar corner cases: 1. Make validation simpler and unambiguous 2. Remove "overloading" of names, i.e. the mapping two values to the same actual name. E.g. previously something like labels: feature.node.kubernetes.io/foo: bar foo: baz Could actually result in node label: feature.node.kubernetes.io/foo: baz 3. Make the processing/usagee of the "rule.matched" and "local.labels" feature in NodeFeatureRules unambiguous and more understadable. E.g. previously you could have node label "feature.node.kubernetes.io/local-foo: bar" but in the NodeFeatureRule you'd need to use the unprefixed name "local-foo" or the fully prefixed name, depending on what was specified in the feature file (or hook) on the node(s). NOTE: setting autoDefaultNs to false is a breaking change for users who rely on automatic prefixing with the default feature.node.kubernetes.io/ namespace. NodeFeatureRules, feature files, hooks and custom rules (configuration of the "custom" source of nfd-worker) will need to be altered. Unprefixed labels, annoations and extended resources will be denied by nfd-master.	2023-11-24 12:48:20 +02:00
Markus Lehtonen	5171ae0f90	Refactor metrics Move common boilerplate code under pkg/utils.	2023-10-09 10:49:12 +03:00
AhmedGrati	7ab6314bdc	chore: introduce a commong klog handling for cmd/nfd-* Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-09-07 22:38:15 +01:00
Markus Lehtonen	5091fef84b	metrics: improve feature discovery duration metric Rename the "NodeName" prometheus label to "node", aligning with common prometheus/kubernetes conventions. Also reconfigure the prometheus histogram buckets (now 10ms to 1s) to better match the expected sample range.	2023-07-31 19:45:22 +03:00
Carlos Eduardo Arango Gutierrez	e3aedd33e2	Enable metrics via prometheus operator Expose metrics via prometheus.monitoring.coreos.com/v1 The exposed metrics are \| Metric \| Type \| Meaning \| \| --------------- \| ---------------- \| ---------------- \| \| `nfd_master_build_info` \| Gauge \| Version from which nfd-master was built. \| \| `nfd_worker_build_info` \| Gauge \| Version from which nfd-worker was built. \| \| `nfd_updated_nodes` \| Counter \| Time taken to label a node \| \| `nfd_crd_processing_time` \| Gauge \| Time taken to process a NodeFeatureRule CRD \| \| `nfd_feature_discovery_duration_seconds` \| HistogramVec \| Time taken to discover features on a node \| Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>	2023-07-21 10:59:52 +02:00
Markus Lehtonen	bf670de68d	pkg/utils: migrate KlogDump to structured logging Drop the KlogDump helper in favor of klog.InfoS. However, that patch introduces a new DelayedDumper() helper to avoid processing (marshalling) of object unless really evaluated by the logging function.	2023-05-31 14:43:08 +03:00
Markus Lehtonen	7be08f9e7f	nfd-worker: migrate to structured logging	2023-05-31 14:43:08 +03:00
AhmedGrati	7917434d38	feat: add master resync period configurability This PR adds a config option for setting the NFD API controller resync period. The resync period is only activated when the NodeFeature API has been enabled (with -enable-nodefeature-api). Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-04-24 11:52:38 +02:00
Kubernetes Prow Robot	193c552b33	Merge pull request #1084 from AhmedGrati/feat-add-master-config-file feat: add master config file	2023-04-04 10:41:40 -07:00
AhmedGrati	3fff409f6d	Add master config file Similar to the nfd-worker, in this PR we want to support the dynamic run-time configurability through a config file for the nfd-master. We'll use a json or yaml configuration file along with the fsnotify in order to watch for changes in the config file. As a result, we're allowing dynamic control of logging params, allowed namespaces, extended resources, label whitelisting, and denied namespaces. Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-04-03 09:52:09 +01:00
AhmedGrati	d0a6289c0f	chore: add debug dump of nfd worker configuration Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-03-18 00:49:07 +01:00
Ville Pihlava	b1c6b229fe	Add discovery duration logging.	2023-02-13 12:55:57 +02:00
Ville Pihlava	2101cb20e4	Change nfd-worker to use Ticker instead of After.	2023-02-09 17:14:39 +02:00
Markus Lehtonen	1026d91d12	worker: move code Simplify code bu dropping the unnecessary base client package.	2022-12-23 11:38:21 +02:00
Markus Lehtonen	112744bc50	nfd-worker: split out gRPC connection handling Refactor the worker code and split out gRPC client connection handling into a separate base type. The intent is to promote re-usability of code for other NFD clients, too.	2021-08-20 15:29:27 +03:00
Kubernetes Prow Robot	c0e1000a7d	Merge pull request #474 from marquiz/devel/worker-log-verbosity nfd-worker: don't log labels returned by sources by default	2021-03-15 12:52:34 -07:00
Markus Lehtonen	6c6249a599	nfd-worker: don't log labels returned by sources by default Reduce default log verbosity. Only print out labels if log verbosity is 1 or higher ('core.klog.v: 1' config file option or '-v 1' on command line). Also, dump the labels in a reproducible (sorted) format.	2021-03-15 21:42:33 +02:00
Markus Lehtonen	2d20a2ff7c	nfd-worker: support certificate rotation Watch for changes in TLS files and re-connect to nfd-master in the event of changes.	2021-03-09 14:40:51 +02:00
Markus Lehtonen	dfc2596a22	pkg/utils: generalize file watcher Add the capability to watch multiple files. Move it to a separate package in order to make it reusable.	2021-03-09 14:20:34 +02:00
Markus Lehtonen	dd7691c486	nfd-worker: improve log messages of config handling	2021-03-02 18:49:58 +02:00
Carlos Eduardo Arango Gutierrez	389a8f87cf	logging: start log messages with lower case Standarize logs to be lower case. Signed-off-by: Carlos Eduardo Arango Gutierrez <carangog@redhat.com>	2021-03-01 10:07:21 -05:00
Markus Lehtonen	5e6f0779e9	nfd-worker: stop masking crashes in feature discovery The code should be stable enough. If there are fatal bugs causing the discovery to panic/segfault that should be made visible instead of semi-siently hiding it. Also, this caused one (negative) test case to fail undetected which is now fixed.	2021-03-01 09:14:19 +02:00
Markus Lehtonen	3f18e880b4	nfd-worker: dynamic configuration of klog Make it possible to dynamically (at run-time) alter most of the logging configuration from the config file.	2021-02-25 16:10:43 +02:00
Markus Lehtonen	7da7fde8f6	nfd-worker: switch to klog Greatly expands logging capabilities and flexibility with verbosity options, among other things.	2021-02-25 16:10:43 +02:00
Markus Lehtonen	3fd61eacdb	nfd-worker: switch to flag in command line parsing	2021-02-24 12:06:16 +02:00
Markus Lehtonen	47033db9c1	nfd-master: use flag for command line parsing	2021-02-24 12:06:16 +02:00
Markus Lehtonen	6b744d4179	nfd-worker: extend unit test coverage of config handling Add test cases for verifying the core config. Also, add asynchronous tests for basic verification of dynamic config file updates.	2021-02-17 21:52:25 +02:00

1 2

82 commits