node-feature-discovery

mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2025-03-06 00:37:01 +00:00

Author	SHA1	Message	Date
Kubernetes Prow Robot	6b80f654d4	Merge pull request #1600 from ArangoGutierrez/e2e-not-k8s Move NFD api to a separate go mod	2024-04-09 02:06:06 -07:00
Markus Lehtonen	8709cccf71	nfd-master: parse kubeconfig even with NoPublish set Don't try to be too smart when kubeconfig is needed. In practice, the nfd-master really doesn't work anymore (with the NodeFeature API enabled) without a kubeconfig set. This patch fixes crashes happening when NoPublish is enabled, e.g. in listing all nodes in the nfd api handler and in getting single node objects in the node updater pool. This patch changes the kubeconfig parsing to happen at the creation of the nfd-master instance. We don't need to do that at reconfigure time as none of the dynamic config options affect it. Unit tests are adjusted, accordingly.	2024-04-08 14:25:27 +03:00
Markus Lehtonen	fcb8d3cda4	nfd-master: implement opts for modifying NfdMaster instance This provides a more controlled way for setting up the NfdMaster instance for testing.	2024-04-05 20:21:19 +03:00
Carlos Eduardo Arango Gutierrez	3434557d7c	Move NFD api to a separate go mod Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2024-04-05 16:35:47 +02:00
Markus Lehtonen	26a80cf142	Tidy up usage of channels for signaling This started as a small effort to simplify the usage of "ready" channel in nfd-master. It extended into a wider simplification/unification of the channel usage.	2024-04-05 14:39:58 +03:00
Oleg Zhurakivskyy	8b63d17af7	nfd-worker: Add liveness probe Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>	2024-03-19 15:34:53 +02:00
Carlos Eduardo Arango Gutierrez	06c4733bc5	Add FeatureGate framework to handle new features Code inspired on https://github.com/kubernetes/component-base/tree/master/featuregate Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2024-03-15 19:11:32 +01:00
Carlos Eduardo Arango Gutierrez	69dbfdfbc0	Use close to signal stop channedl in worker and topology-updater Fix stop channel management on Worker and T-updater in case of multiple callers Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2024-03-14 15:28:39 +01:00
Markus Lehtonen	acf815fb10	pkg/utils: move GetKubeconfig from pkg/apihelper here This change is part of an effort to remove the pkg/apihelper package. GetKubeconfig is useful helper functionality shared accross the codebase so move it into a "safe" location.	2024-01-24 16:10:02 +02:00
Gyuho Lee	ed0418b81c	chore(nfd-worker): fix minor typo in wrong label value format error Signed-off-by: Gyuho Lee <gyuho@lepton.ai>	2023-12-19 02:29:37 +08:00
Markus Lehtonen	cb0a46ec0e	Use generics for maps and slices	2023-12-13 12:09:53 +02:00
Markus Lehtonen	34574f4211	nfd-worker: set owner reference in NodeFeature objects This patch creates a owner-dependent relationship between the nfd-worker pod and the NodeFeature object that it creates. With this change the orphaned NodeFeature object(s) gets automatically garbage-collected when the nfd-worker pod goes away, without the need for manual clean-up actions.	2023-12-08 14:57:31 +02:00
Markus Lehtonen	f266533a7d	nfd-worker: fix typo in log message	2023-11-24 17:17:42 +02:00
Markus Lehtonen	1d012a28cd	Option to stop implicitly adding default prefix to names Add new autoDefaultNs (default is "true") config option to nfd-master. Setting the config option to false stops NFD from automatically adding the "feature.node.kubernetes.io/" prefix to labels, annotations and extended resources. Taints are not affected as for them no prefix is automatically added. The user-visible part of enabling the option change is that NodeFeatureRules, local feature files, hooks and configuration of the "custom" may need to be altereda (if the auto-prefixing is relied on). For now, the config option defaults to "true", meaning no change in default behavior. However, the intent is to change the default to "false" in a future release, deprecating the option and eventually removing it (forcing it to "false"). The goal of stopping doing "auto-prefixing" is to simplify the operation (of nfd and users). Make the naming more straightforward and easier to understand and debug (kind of WYSIWYG), eliminating peculiar corner cases: 1. Make validation simpler and unambiguous 2. Remove "overloading" of names, i.e. the mapping two values to the same actual name. E.g. previously something like labels: feature.node.kubernetes.io/foo: bar foo: baz Could actually result in node label: feature.node.kubernetes.io/foo: baz 3. Make the processing/usagee of the "rule.matched" and "local.labels" feature in NodeFeatureRules unambiguous and more understadable. E.g. previously you could have node label "feature.node.kubernetes.io/local-foo: bar" but in the NodeFeatureRule you'd need to use the unprefixed name "local-foo" or the fully prefixed name, depending on what was specified in the feature file (or hook) on the node(s). NOTE: setting autoDefaultNs to false is a breaking change for users who rely on automatic prefixing with the default feature.node.kubernetes.io/ namespace. NodeFeatureRules, feature files, hooks and custom rules (configuration of the "custom" source of nfd-worker) will need to be altered. Unprefixed labels, annoations and extended resources will be denied by nfd-master.	2023-11-24 12:48:20 +02:00
Markus Lehtonen	5171ae0f90	Refactor metrics Move common boilerplate code under pkg/utils.	2023-10-09 10:49:12 +03:00
AhmedGrati	7ab6314bdc	chore: introduce a commong klog handling for cmd/nfd-* Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-09-07 22:38:15 +01:00
Markus Lehtonen	5091fef84b	metrics: improve feature discovery duration metric Rename the "NodeName" prometheus label to "node", aligning with common prometheus/kubernetes conventions. Also reconfigure the prometheus histogram buckets (now 10ms to 1s) to better match the expected sample range.	2023-07-31 19:45:22 +03:00
Carlos Eduardo Arango Gutierrez	e3aedd33e2	Enable metrics via prometheus operator Expose metrics via prometheus.monitoring.coreos.com/v1 The exposed metrics are \| Metric \| Type \| Meaning \| \| --------------- \| ---------------- \| ---------------- \| \| `nfd_master_build_info` \| Gauge \| Version from which nfd-master was built. \| \| `nfd_worker_build_info` \| Gauge \| Version from which nfd-worker was built. \| \| `nfd_updated_nodes` \| Counter \| Time taken to label a node \| \| `nfd_crd_processing_time` \| Gauge \| Time taken to process a NodeFeatureRule CRD \| \| `nfd_feature_discovery_duration_seconds` \| HistogramVec \| Time taken to discover features on a node \| Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>	2023-07-21 10:59:52 +02:00
Markus Lehtonen	bf670de68d	pkg/utils: migrate KlogDump to structured logging Drop the KlogDump helper in favor of klog.InfoS. However, that patch introduces a new DelayedDumper() helper to avoid processing (marshalling) of object unless really evaluated by the logging function.	2023-05-31 14:43:08 +03:00
Markus Lehtonen	7be08f9e7f	nfd-worker: migrate to structured logging	2023-05-31 14:43:08 +03:00
AhmedGrati	7917434d38	feat: add master resync period configurability This PR adds a config option for setting the NFD API controller resync period. The resync period is only activated when the NodeFeature API has been enabled (with -enable-nodefeature-api). Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-04-24 11:52:38 +02:00
Kubernetes Prow Robot	193c552b33	Merge pull request #1084 from AhmedGrati/feat-add-master-config-file feat: add master config file	2023-04-04 10:41:40 -07:00
AhmedGrati	3fff409f6d	Add master config file Similar to the nfd-worker, in this PR we want to support the dynamic run-time configurability through a config file for the nfd-master. We'll use a json or yaml configuration file along with the fsnotify in order to watch for changes in the config file. As a result, we're allowing dynamic control of logging params, allowed namespaces, extended resources, label whitelisting, and denied namespaces. Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-04-03 09:52:09 +01:00
AhmedGrati	d0a6289c0f	chore: add debug dump of nfd worker configuration Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-03-18 00:49:07 +01:00
Ville Pihlava	b1c6b229fe	Add discovery duration logging.	2023-02-13 12:55:57 +02:00
Ville Pihlava	2101cb20e4	Change nfd-worker to use Ticker instead of After.	2023-02-09 17:14:39 +02:00
Markus Lehtonen	1026d91d12	worker: move code Simplify code bu dropping the unnecessary base client package.	2022-12-23 11:38:21 +02:00
Markus Lehtonen	112744bc50	nfd-worker: split out gRPC connection handling Refactor the worker code and split out gRPC client connection handling into a separate base type. The intent is to promote re-usability of code for other NFD clients, too.	2021-08-20 15:29:27 +03:00
Kubernetes Prow Robot	c0e1000a7d	Merge pull request #474 from marquiz/devel/worker-log-verbosity nfd-worker: don't log labels returned by sources by default	2021-03-15 12:52:34 -07:00
Markus Lehtonen	6c6249a599	nfd-worker: don't log labels returned by sources by default Reduce default log verbosity. Only print out labels if log verbosity is 1 or higher ('core.klog.v: 1' config file option or '-v 1' on command line). Also, dump the labels in a reproducible (sorted) format.	2021-03-15 21:42:33 +02:00
Markus Lehtonen	2d20a2ff7c	nfd-worker: support certificate rotation Watch for changes in TLS files and re-connect to nfd-master in the event of changes.	2021-03-09 14:40:51 +02:00
Markus Lehtonen	dfc2596a22	pkg/utils: generalize file watcher Add the capability to watch multiple files. Move it to a separate package in order to make it reusable.	2021-03-09 14:20:34 +02:00
Markus Lehtonen	dd7691c486	nfd-worker: improve log messages of config handling	2021-03-02 18:49:58 +02:00
Carlos Eduardo Arango Gutierrez	389a8f87cf	logging: start log messages with lower case Standarize logs to be lower case. Signed-off-by: Carlos Eduardo Arango Gutierrez <carangog@redhat.com>	2021-03-01 10:07:21 -05:00
Markus Lehtonen	5e6f0779e9	nfd-worker: stop masking crashes in feature discovery The code should be stable enough. If there are fatal bugs causing the discovery to panic/segfault that should be made visible instead of semi-siently hiding it. Also, this caused one (negative) test case to fail undetected which is now fixed.	2021-03-01 09:14:19 +02:00
Markus Lehtonen	3f18e880b4	nfd-worker: dynamic configuration of klog Make it possible to dynamically (at run-time) alter most of the logging configuration from the config file.	2021-02-25 16:10:43 +02:00
Markus Lehtonen	7da7fde8f6	nfd-worker: switch to klog Greatly expands logging capabilities and flexibility with verbosity options, among other things.	2021-02-25 16:10:43 +02:00
Markus Lehtonen	3fd61eacdb	nfd-worker: switch to flag in command line parsing	2021-02-24 12:06:16 +02:00
Markus Lehtonen	47033db9c1	nfd-master: use flag for command line parsing	2021-02-24 12:06:16 +02:00
Markus Lehtonen	6b744d4179	nfd-worker: extend unit test coverage of config handling Add test cases for verifying the core config. Also, add asynchronous tests for basic verification of dynamic config file updates.	2021-02-17 21:52:25 +02:00
Markus Lehtonen	2b24ed2c18	nfd-worker: implement Stop() method	2021-02-17 21:50:58 +02:00
Markus Lehtonen	278ccdb997	source/fake: make the fake source configurable Enables more flexible testing.	2021-02-17 21:50:58 +02:00
Markus Lehtonen	c2c9dff724	nfd-worker: bail out on invalid config file Changes the behaviour so that if the specified configuration file exists it must be valid. Error out at startup if the config is invalid. Similarly, exit with an error at runtime if the config file becomes invalid. Bailing out, instead of just printing an error, was a deliberate choice in order to make configuration mistakes evident. Having no configuration file is tolerated, however. If the specified configuration file does not exists nfd-worker resorts to default settings.	2021-02-17 21:42:50 +02:00
Markus Lehtonen	7e88f00e05	nfd-worker: add core.sources config option Add a config file option for controlling the enabled feature sources, aimed at replacing the --sources command line flag which is now marked as deprecated. The command line flag takes precedence over the config file option.	2021-02-17 21:36:20 +02:00
Markus Lehtonen	ed177350fc	nfd-worker: add core.labelWhiteList config option Add a config file option for label whitelisting. Deprecate the --label-whitelist command line flag. Note that the command line flag has higher priority than the config file option.	2021-02-17 21:35:44 +02:00
Markus Lehtonen	d1d8de944e	nfd-worker: add core.sleepInterval config option Add a new config file option for (dynamically) controlling the sleep interval. At the same time, deprecate the --sleep-interval command line flag. The command line flag takes precedence over the config file option.	2021-02-17 21:35:13 +02:00
Markus Lehtonen	e6bdc17d8c	nfd-worker: add core config Allows dynamic (re-)configuration of most nfd-worker options. The goal is to have most configuration parameters specified in the configuration file and deprecate most of the command line flags. The priority is intended to be such that command line flags override whatever is specified in the configuration file. Thus, specifying something on the command line effectively disables dynamic configurability of that parameter. This patch adds core.noPublish config file option to demonstrate how the new mechanism is supposed to work. The --no-publish command line flag takes precedence over this config file option.	2021-02-17 21:35:12 +02:00
Markus Lehtonen	29910464a0	nfd-worker: always re-label after a re-config event Always do re-discovery and re-labeling after a configuration file change. his way the new config comes into effect immediately, even if the sleep interval is long (or infinite) # Please enter the commit message for your changes. Lines starting	2021-02-10 22:09:27 +02:00
Markus Lehtonen	b6ff514853	nfd-worker: use fsnotify for watching for config file changes Add support for detecting configuration file changes via file system notifications (fsnotify). Watches are added for the whole directory chain (up to root directory) so that all changes (even directory renames) affecting the given configuration file path are captured. Previously dynamic (re-)configuration of nfd-worker was implemented by (re-)reading the configuration file on every labeling pass. This was simple and effective, even if a bit wasteful. However, it didn't provide asynchronous configuration updates that will be required for e.g. controlling the "sleep-interval" parameter dynamically which will be implemented by later patches.	2021-02-10 22:09:27 +02:00
Markus Lehtonen	6958a6677f	nfd-worker: use timer channel for sleep interval	2021-02-10 22:09:27 +02:00

1 2

72 commits