node-feature-discovery

mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2025-03-09 10:17:04 +00:00

Author	SHA1	Message	Date
Markus Lehtonen	45f49d574a	nfd-master: drop resourceLabels Drop the resourceLabels config file option and the corresponding -resource-labels command line flag. They were deprecated in NFD v0.13 so it's time to let them go. NodeFeatureRule(s) should be used to manage ERs, instead.	2024-11-07 15:16:52 +02:00
Carlos Eduardo Arango Gutierrez	0bd82cf82a	Drop NFD gRPC API Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2024-10-29 15:15:18 +01:00
Kubernetes Prow Robot	fd2893e2a5	Merge pull request #1592 from AhmedThresh/feat-configure-cr-restrictions feat/nfd-master: configure CR restrictions	2024-10-24 12:20:54 +01:00
AhmedGrati	28b40c90b8	deploy: add CR restrictions to the helm config Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com> Signed-off-by: AhmedThresh <ahmed.grati@insat.ucar.tn> Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com> Signed-off-by: AhmedThresh <ahmed.grati@insat.ucar.tn> Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com> Signed-off-by: AhmedThresh <ahmed.grati@insat.ucar.tn>	2024-09-16 16:02:42 +02:00
Markus Lehtonen	02b6b7395c	Drop dynamic run-time reconfiguration Simplify the code and reduce possible error scenarios by dropping fsnotify-based reconfiguration from nfd-master and nfd-worker. Also eliminates repeated re-configuration in scenarios where kubelet continuosly touches the (every minute) mounted file (configmap) on the filesystem. Also modifies the Helm and kustomize deployments so that nfd-master, nfd-worker and nfd-topology-updater pods are restarted on configmap updates. In kustomize, the slght downside of this is the name of the config map(s) depends on the content, so every time a user customizes the config data, the old unused configmap will be left and must be garbage-collected manually.	2024-08-21 12:46:36 +03:00
AhmedGrati	7bad0d583c	feat/nfd-master: support CR restrictions Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2024-08-10 22:39:10 +02:00
Carlos Eduardo Arango Gutierrez	47c054e1db	Add NodeFeatureGroup CRD The NodeFeatureGroup is an NFD-specific custom resource that is designed for grouping nodes based on their features. NFD-Master watches for NodeFeatureGroup objects in the cluster and updates the status of the NodeFeatureGroup object with the list of nodes that match the feature group rules. The NodeFeatureGroup rules follow the same syntax as the NodeFeatureRule rules. Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2024-05-23 16:34:08 +02:00
Markus Lehtonen	121345472d	nfd-master: add DisableAutoPrefix feature gate Now that we have support for feature gates deprecate the autoDefaultNs config option of nfd-master and replace it with a new alpha feature gate DisableAutoPrefix (defaults to false). Using a feature gate to handle and communicate these kind of changes, where the default behavior is intended to be changed in a future release, feels much more natural than using random flags/options. The combined logic of the feature gate and the config option is a logical OR over disabling auto-prefixing. That is, auto-prefixing is disabled if either the feature gate or the config options is used set to disable it: \| DisableAutoPrefix (feature gate) \| false \| true -------------------- \| -------------------------------- autoDefaultNs true \| ON \| OFF (config opt) false \| OFF \| OFF	2024-05-15 17:01:16 +03:00
TessaIO	de50ac8800	chore/nfd-master: remove warnings in nfd-master unit tests file Signed-off-by: TessaIO <ahmedgrati1999@gmail.com>	2024-04-22 22:27:15 +02:00
Kubernetes Prow Robot	91d3d5a7b0	Merge pull request #1653 from marquiz/devel/master-multiple-k8sclients nfd-master: use separate k8s api clients for each updater	2024-04-15 09:18:51 -07:00
Markus Lehtonen	8ad6210d5c	nfd-master: use separate k8s api clients for each updater Sharing the same client between updater threads virtually serializes access, in practice making the effective parallelism close to 1. With this patch, in my bench cluster of 300 nodes, the time taken by updating all nodes drops from ~2 minutes to ~12 seconds (with the default parallelism of 10 node updater threads). This demonstrates the 10-fold increased parallelism from ~1 to 10. There might be other solutions that could be explored, e.g. caching nodes with an indexer/lister but otoh nfd doesn't necessarily need/want to watch every little change in each node. We only need to get the node when something in our own CRDs change (we don't react to any changes in the node object itself). Using multiple clients was the most obvious choice to solve the problem for now.	2024-04-15 19:00:30 +03:00
Kubernetes Prow Robot	6b80f654d4	Merge pull request #1600 from ArangoGutierrez/e2e-not-k8s Move NFD api to a separate go mod	2024-04-09 02:06:06 -07:00
Markus Lehtonen	8709cccf71	nfd-master: parse kubeconfig even with NoPublish set Don't try to be too smart when kubeconfig is needed. In practice, the nfd-master really doesn't work anymore (with the NodeFeature API enabled) without a kubeconfig set. This patch fixes crashes happening when NoPublish is enabled, e.g. in listing all nodes in the nfd api handler and in getting single node objects in the node updater pool. This patch changes the kubeconfig parsing to happen at the creation of the nfd-master instance. We don't need to do that at reconfigure time as none of the dynamic config options affect it. Unit tests are adjusted, accordingly.	2024-04-08 14:25:27 +03:00
Markus Lehtonen	fcb8d3cda4	nfd-master: implement opts for modifying NfdMaster instance This provides a more controlled way for setting up the NfdMaster instance for testing.	2024-04-05 20:21:19 +03:00
Kubernetes Prow Robot	199d665046	Merge pull request #1656 from marquiz/devel/channel-simplify Tidy up usage of channels for signaling	2024-04-05 07:51:34 -07:00
Carlos Eduardo Arango Gutierrez	3434557d7c	Move NFD api to a separate go mod Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2024-04-05 16:35:47 +02:00
Markus Lehtonen	26a80cf142	Tidy up usage of channels for signaling This started as a small effort to simplify the usage of "ready" channel in nfd-master. It extended into a wider simplification/unification of the channel usage.	2024-04-05 14:39:58 +03:00
Markus Lehtonen	b27676451a	nfd-master: prevent crash on empty config struct Change the handling of LabelWhiteList config option to use a pointer to detect when the option is unset. This doesn't fix any detected crash but is merely general improvement and stabilization, serving easier testing. Also, use the regexp type from the core libs for the config struct - dropping the unmasrhalling code for our custom regexp type - as the core regexp now implements unmarshaller itself.	2024-04-05 14:19:44 +03:00
Markus Lehtonen	44a5a5b4a8	nfd-master: get node object only once when updating node Prevent excess queries of node objects from the Kubernetes apiserver. This significantly speeds up node updates (and reduces the load on the apiserver) as the client-side throttling (which is good) does not bite us that hard.	2024-04-04 14:44:52 +03:00
Carlos Eduardo Arango Gutierrez	06c4733bc5	Add FeatureGate framework to handle new features Code inspired on https://github.com/kubernetes/component-base/tree/master/featuregate Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2024-03-15 19:11:32 +01:00
Markus Lehtonen	7a050e7cf9	nfd-master: ditch apihelper Implement some of frequently used helper functions inpackage. This patch also contains big changes to the nfd-master unit tests. Much of this is about migrating from the mocked apihelper interface to fake kubernetes client that provides a bit more apiserver'ish functionality. At the same time there is quite a bit of renaming in the tests, shortening and unifying naming and getting rid of the extensive usage of "mock" everywhere.	2024-01-26 16:09:22 +02:00
Markus Lehtonen	53003cbf69	pkg/utils: move JsonPatch from pkg/apihelper	2024-01-25 17:23:14 +02:00
Markus Lehtonen	58ae81804c	go.mod: update dependencies	2024-01-15 21:29:32 +02:00
Carlos Eduardo Arango Gutierrez	affb93ea50	Create a Validate pkg Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2023-12-11 16:54:22 +01:00
Markus Lehtonen	1d012a28cd	Option to stop implicitly adding default prefix to names Add new autoDefaultNs (default is "true") config option to nfd-master. Setting the config option to false stops NFD from automatically adding the "feature.node.kubernetes.io/" prefix to labels, annotations and extended resources. Taints are not affected as for them no prefix is automatically added. The user-visible part of enabling the option change is that NodeFeatureRules, local feature files, hooks and configuration of the "custom" may need to be altereda (if the auto-prefixing is relied on). For now, the config option defaults to "true", meaning no change in default behavior. However, the intent is to change the default to "false" in a future release, deprecating the option and eventually removing it (forcing it to "false"). The goal of stopping doing "auto-prefixing" is to simplify the operation (of nfd and users). Make the naming more straightforward and easier to understand and debug (kind of WYSIWYG), eliminating peculiar corner cases: 1. Make validation simpler and unambiguous 2. Remove "overloading" of names, i.e. the mapping two values to the same actual name. E.g. previously something like labels: feature.node.kubernetes.io/foo: bar foo: baz Could actually result in node label: feature.node.kubernetes.io/foo: baz 3. Make the processing/usagee of the "rule.matched" and "local.labels" feature in NodeFeatureRules unambiguous and more understadable. E.g. previously you could have node label "feature.node.kubernetes.io/local-foo: bar" but in the NodeFeatureRule you'd need to use the unprefixed name "local-foo" or the fully prefixed name, depending on what was specified in the feature file (or hook) on the node(s). NOTE: setting autoDefaultNs to false is a breaking change for users who rely on automatic prefixing with the default feature.node.kubernetes.io/ namespace. NodeFeatureRules, feature files, hooks and custom rules (configuration of the "custom" source of nfd-worker) will need to be altered. Unprefixed labels, annoations and extended resources will be denied by nfd-master.	2023-11-24 12:48:20 +02:00
Markus Lehtonen	dc5af8be04	nfd-master: predictable handling of unprefixed names Make the handling of unprefixed names (of labels, annotations and extended resources) well-defined and predictable. Previously the resulting output was not predictable in case the same name was coming in both the unprefixed and prefixed form, say unprefixed "foo=bar" coming from one source (be it nfd-worker or NodeFeature(Rule)) and "feature.node.kubernetes.io/foo=baz" from a NodeFeature(Rule). Previously the output value was randomly either "bar" or "baz". This patch adds prefixes to all names early in the processing "pipeline", preventing random name clashes later on.	2023-11-23 22:16:04 +02:00
Markus Lehtonen	678d7e89cb	nfd-master: drop stale variables Remove some stale variables that were leftover from the recent removal of nfd version annotations.	2023-11-23 19:01:22 +02:00
Carlos Eduardo Arango Gutierrez	c0063be4f4	Discover node features as annotations Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> Co-authored-by: bebc <mchf1990212@gmail.com> Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>	2023-10-25 19:58:58 +02:00
Markus Lehtonen	1d8a83b045	nfd-master: stop creating NFD version annotations We now have metrics for getting detailed information about the NFD instances running. There should be no need to pollute the node object with NFD version annotations. One problem with the annotations also that they were incomplete in the sense that they only covered nfd-master and nfd-worker but not nfd-topology-updater or nfd-gc. Also, there was a problem with stale annotations, giving misleading information. E.g. there was no way to remove old/stale master.version annotations if nfd-master was scheduled on another node where it was previously running.	2023-10-05 14:53:29 +03:00
guoguangwu	b946bcc0f5	nfd-master-internal_test.go rm pkg imported twice Signed-off-by: guoguangwu <guoguangwu@magic-shield.com>	2023-06-21 16:53:55 +08:00
Kubernetes Prow Robot	306969a945	Merge pull request #1133 from AhmedGrati/feat-parallelize-nodes-update feat: parallelize nodes update	2023-06-02 05:28:57 -07:00
AhmedGrati	b3cfe17392	feat: parallelize nodes update This PR aims to optimize the process of updating nodes with corresponding features. In fact, previously, we were updating nodes sequentially even though they are independent from each other. Therefore, we integrated new components: LabelersNodePool which is responsible for spininng a goroutine whenever there's a request for updating nodes, and a Workqueue which is responsible for holding nodes names that should be updated. Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-06-02 11:41:50 +01:00
AhmedGrati	08b9c3486e	feat: support dynamic values for labels in the NodeFeatureRule This PR aims to support the dynamic values for labels in the NodeFeatureRule CRD, it would offer more flexible labeling for users. To achieve this, we check whether label value starts with "@", and if it's the case, we will get the value of the feature value, and update the value of the label with the feature value. Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-05-31 23:30:26 +01:00
Markus Lehtonen	2a3c7e4c93	nfd-master: add validation of label names and values Validate labels before trying to update the node. Makes us fail early nad prevent useless retries in case invalid labels are tried.	2023-05-29 16:54:14 +03:00
PiotrProkop	272fd4784f	Add new flag enable-leader-election for nfd-master. It allows NFD-master to be run in active-passive way when running multiple instances of NFD-master to prevent multiple components from updating same custom resources. Signed-off-by: PiotrProkop <pprokop@nvidia.com>	2023-05-15 13:30:07 +02:00
Kubernetes Prow Robot	2356223ffc	Merge pull request #1139 from AhmedGrati/feat-configure-master-resync feat: add master resync period configurability	2023-04-24 03:49:02 -07:00
AhmedGrati	7917434d38	feat: add master resync period configurability This PR adds a config option for setting the NFD API controller resync period. The resync period is only activated when the NodeFeature API has been enabled (with -enable-nodefeature-api). Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-04-24 11:52:38 +02:00
Markus Lehtonen	37306662fe	nfd-master: don't create emtpy annotations Make the nfd.node.kubernetes.io/feature-labels and nfd.node.kubernetes.io/extended-resources annotations behave similary to the taints annotation: only create the annotations if some labels or extended resources are created.	2023-04-21 16:14:17 +03:00
Kubernetes Prow Robot	ad07829d0a	Merge pull request #1099 from ArangoGutierrez/extended_resources_v2 Create extended resources with NodeFeatureRule	2023-04-07 08:09:15 -07:00
Fabiano Fidêncio	250aea4741	Create extended resources with NodeFeatureRule Add support for management of Extended Resources via the NodeFeatureRule CRD API. There are usage scenarios where users want to advertise features as extended resources instead of labels (or annotations). This patch enables the discovery of extended resources, via annotation and patch of node.status.capacity and node.status.allocatable. By using the NodeFeatureRule API. Co-authored-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com> Co-authored-by: Fabiano Fidêncio <fabiano.fidencio@intel.com> Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com> Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2023-04-07 16:14:56 +02:00
Markus Lehtonen	f64c23968a	nfd-master: fix node update Update node status before node metadata. This fixes a problem where we lose track of NFD-managed extended resources in case patching node status fails. Previously we removed all labels and annotations (including the one listing our ERs) and only after that updated node status. If node status update failed we had lost the annotation but extended resources were still there, leaving them orphaned.	2023-04-06 22:04:35 +03:00
AhmedGrati	3fff409f6d	Add master config file Similar to the nfd-worker, in this PR we want to support the dynamic run-time configurability through a config file for the nfd-master. We'll use a json or yaml configuration file along with the fsnotify in order to watch for changes in the config file. As a result, we're allowing dynamic control of logging params, allowed namespaces, extended resources, label whitelisting, and denied namespaces. Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-04-03 09:52:09 +01:00
AhmedGrati	b499799364	feat: add deny-label-ns flag which supports wildcard Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-02-15 09:47:00 +01:00
Markus Lehtonen	6ddd87e465	nfd-master: support NodeFeature objects Add initial support for handling NodeFeature objects. With this patch nfd-master watches NodeFeature objects in all namespaces and reacts to changes in any of these. The node which a certain NodeFeature object affects is determined by the "nfd.node.kubernetes.io/node-name" annotation of the object. When a NodeFeature object targeting certain node is changed, nfd-master needs to process all other objects targeting the same node, too, because there may be dependencies between them. Add a new command line flag for selecting between gRPC and NodeFeature CRD API as the source of feature requests. Enabling NodeFeature API disables the gRPC interface. -enable-nodefeature-api enable NodeFeature CRD API for incoming feature requests, will disable the gRPC interface (defaults to false) It is not possible to serve gRPC and watch NodeFeature objects at the same time. This is deliberate to avoid labeling races e.g. by nfd-worker sending gRPC requests but NodeFeature objects in the cluster "overriding" those changes (labels from the gRPC requests will get overridden when NodeFeature objects are processed).	2022-12-14 07:31:28 +02:00
Feruzjon Muyassarov	7ea0e0b0a7	Add argument to updateNodeFeatures method to pass client from caller This commit adds an argument to updateNodeFeatures method for receiving client argument, which currently gets initialized within the method itself. This is a minor improvement for https://github.com/kubernetes-sigs/node-feature-discovery/pull/910. Ref:https://github.com/kubernetes-sigs/node-feature-discovery/pull/910#discussion_r1012703631 Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>	2022-11-06 22:37:11 +02:00
Feruzjon Muyassarov	71434a1392	Standardize "k8s.io/api/core/v1" package short name Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>	2022-10-15 02:22:41 +03:00
Markus Lehtonen	c1e6b41e56	apis/nfd: move annotation and label consts from nfd-master Move consts related to NFD annotations and labels from nfd-master to the api. Makes them more logically accessible for clients.	2022-10-06 11:23:56 +03:00
Markus Lehtonen	389a3d4e2e	nfd-master: drop cleanup of ancient incubator labels Remove the cleanup code that removes ancient NFD labels with the node.alpha.kubernetes-incubator.io/ prefix. This label namespace was deprecated/dropped already in v0.4.0 so it should be safe to drop this code.	2022-09-20 19:56:58 +03:00
Markus Lehtonen	a57a25f63c	Use single-dash format of cmdline flags Use the single-dash (i.e. '-option' instead of '--option') format consistently accross log messages and documentation. This is the format that was mostly used, already, and shown by command line help of the binaries, for example.	2021-11-25 18:03:54 +02:00
Markus Lehtonen	0b386981a6	pkg/nfd-master: fix linter errors in tests	2021-10-04 09:51:38 +03:00

1 2

73 commits