node-feature-discovery

mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2024-12-14 11:57:51 +00:00

Author	SHA1	Message	Date
Markus Lehtonen	638e7744f1	nfd-master: mark the -crd-controller flag as deprecated Plan the removal of the -crd-controller flag along with the gRPC API. This flag does not make much sense after that as all communication with nfd-worker is based on CRDs - with the CRD controller disabled nfd-master is virtually a functionless stub.	2024-03-13 15:10:35 +02:00
leemingeer	b6d8ce7a5a	nfd-topology-updater add pods fingerprint by default	2024-01-26 17:55:34 +08:00
Markus Lehtonen	d7ec0bf674	topology-updater: document the -no-publish flag correctly	2024-01-22 14:21:02 +02:00
Markus Lehtonen	a053efda64	nfd-master: run a separate gRPC health server This patch separates the gRPC health server from the deprecated gRPC server (disabled by default, replaced by the NodeFeature CRD API) used for node labeling requests. The new health server runs on hardcoded TCP port number 8082. The main motivation for this change is to make the Kubernetes' built-in gRPC liveness probes to function if TLS is enabled (as they don't support TLS). The health server itself is a naive implementation (as it was before), basically only checking that nfd-master has started and hasn't crashed. The patch adds a TODO note to improve the functionality.	2024-01-04 13:58:26 +02:00
Carlos Eduardo Arango Gutierrez	57b6035b71	Add kubectl-nfd kubectl-nfd is a kubectl plugin for debbuging NodeFeatureRules Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2023-12-21 16:00:19 +01:00
Markus Lehtonen	98c3b0750d	nfd-gc: add metrics Implements three metrics for nfd-gc: - nfd_gc_build_info: version information of nfd-gc. - nfd_gc_objects_deleted_total: total number of NodeFeature and NodeResourceTopology objects deleted by nfd-gc. - nfd_gc_object_delete_failures_total: number of errors encountered when deleting NodeFeature and NodeResourceTopology objects.	2023-10-09 13:39:28 +00:00
AhmedGrati	7ab6314bdc	chore: introduce a commong klog handling for cmd/nfd-* Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-09-07 22:38:15 +01:00
Kubernetes Prow Robot	c0c1b89a92	Merge pull request #1334 from ArangoGutierrez/grpc_gone_v2 Deprecate gRPC API	2023-09-07 00:38:59 -07:00
Carlos Eduardo Arango Gutierrez	9966d2ae12	Deprecate gRPC API Now that the NodeFeature API has been set enabled by default, the gRPC mode will be deprecated and with it all flags and features around it. For nfd-master, flags -port, -key-file, -ca-file, -cert-file, -verify-node-name, -enable-nodefeature-api are now marked as deprecated. For nfd-worker flags -enable-nodefeature-api, -ca-file, -cert-file, -key-file, -server, -server-name-override are now marked as deprecated. Deprecated flags, as well as gRPC related code will be removed in future releases. Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>	2023-09-07 06:48:15 +02:00
AhmedGrati	b0be40aa09	feat: add logging parameters in configuration file for nfd master Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-09-06 15:27:27 +01:00
Carlos Eduardo Arango Gutierrez	04e954a7c3	Enable NodeFeature API by default Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>	2023-09-05 20:21:31 +02:00
Kubernetes Prow Robot	a658c54de3	Merge pull request #1297 from marquiz/devel/topology-updater-version topology-updater: make -version always runnable	2023-08-28 04:05:43 -07:00
Markus Lehtonen	01c08d67b6	Rename nfd-topology-gc to nfd-gc This is preparation for making it a generic garbage collector for all nfd-managed api objects.	2023-08-21 21:46:11 +03:00
Markus Lehtonen	5ba8d14b86	topology-updater: make -version always runnable Make it possible to run -version in an environment whithout the NODE_ADDRESS environment variable set.	2023-08-07 11:56:58 +03:00
Markus Lehtonen	06b333db1e	nfd-topology-updater: add metrics support For now, add only one metric, a counter for the errors occurring while scanning pod resources on the node.	2023-08-04 16:48:37 +03:00
Kubernetes Prow Robot	e0f10a81de	Merge pull request #1256 from PiotrProkop/fix-topo-updater-policy-and-scope-advertisment Fix Topology Manager policy and scope not being updated after NRT creation	2023-07-28 00:25:54 -07:00
Carlos Eduardo Arango Gutierrez	e3aedd33e2	Enable metrics via prometheus operator Expose metrics via prometheus.monitoring.coreos.com/v1 The exposed metrics are \| Metric \| Type \| Meaning \| \| --------------- \| ---------------- \| ---------------- \| \| `nfd_master_build_info` \| Gauge \| Version from which nfd-master was built. \| \| `nfd_worker_build_info` \| Gauge \| Version from which nfd-worker was built. \| \| `nfd_updated_nodes` \| Counter \| Time taken to label a node \| \| `nfd_crd_processing_time` \| Gauge \| Time taken to process a NodeFeatureRule CRD \| \| `nfd_feature_discovery_duration_seconds` \| HistogramVec \| Time taken to discover features on a node \| Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>	2023-07-21 10:59:52 +02:00
pprokop	6d98b6150b	Fix Topology Manager policy and scope not being updated properly NFD is only detecting policy and scope of Topology Manager when NRT object doesn't exist. This means that topologyManagerScope and topologyManagerPolicy attributes won't be updated even if kubelet config was changed to use other TopologyManager policy and scope. Signed-off-by: pprokop <pprokop@nvidia.com>	2023-07-20 16:31:12 +02:00
Carlos Eduardo Arango Gutierrez	c02c3d83ed	Fix a typo on nfd-master cmd Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2023-06-06 20:05:07 +02:00
AhmedGrati	b3cfe17392	feat: parallelize nodes update This PR aims to optimize the process of updating nodes with corresponding features. In fact, previously, we were updating nodes sequentially even though they are independent from each other. Therefore, we integrated new components: LabelersNodePool which is responsible for spininng a goroutine whenever there's a request for updating nodes, and a Workqueue which is responsible for holding nodes names that should be updated. Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-06-02 11:41:50 +01:00
Markus Lehtonen	6e3b181ab4	topology-updater: migrate to structured logging	2023-05-31 14:43:08 +03:00
Markus Lehtonen	7be08f9e7f	nfd-worker: migrate to structured logging	2023-05-31 14:43:08 +03:00
Markus Lehtonen	8113d651c2	nfd-master: migrate to structured logging	2023-05-31 14:43:05 +03:00
Kubernetes Prow Robot	70d5ef477f	Merge pull request #1219 from PiotrProkop/leader-elect Add leader election for nfd-master	2023-05-22 00:36:21 -07:00
PiotrProkop	272fd4784f	Add new flag enable-leader-election for nfd-master. It allows NFD-master to be run in active-passive way when running multiple instances of NFD-master to prevent multiple components from updating same custom resources. Signed-off-by: PiotrProkop <pprokop@nvidia.com>	2023-05-15 13:30:07 +02:00
Markus Lehtonen	1200fd05c5	topology-updater: use node IP in the default configz URI Use a separate NODE_ADDRESS environment variable in the default value of -kubelet-config-uri (instead of NODE_NAME that was previously used). Also change the kustomize and Helm deployments to set this variable to node IP address. This should make the default deployment more robust, making it work in scenarios where node name does not resolve to the node ip, e.g. nodename != hostname.	2023-05-05 13:29:51 +03:00
AhmedGrati	87c2d7e184	nfd-master: fix resync period config option This PR fixes the resync-period configuration option of the nfd-master. In fact, previously, changes were not reflected in the nfd-master at runtime. e2e tests are also implemented to make sure that the fix is already working as expected. Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-05-02 13:17:01 +02:00
AhmedGrati	7917434d38	feat: add master resync period configurability This PR adds a config option for setting the NFD API controller resync period. The resync period is only activated when the NodeFeature API has been enabled (with -enable-nodefeature-api). Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-04-24 11:52:38 +02:00
Markus Lehtonen	8511980bf4	nfd-master: deprecate the -resource-labels flag Mark the -resource-labels flag (and the corresponding resourceLabels config option) as deprecated. We now support managing extended resources via NodeFeatureRule objects. This kludge deserves to go, eventually.	2023-04-13 11:30:58 +03:00
Kubernetes Prow Robot	193c552b33	Merge pull request #1084 from AhmedGrati/feat-add-master-config-file feat: add master config file	2023-04-04 10:41:40 -07:00
AhmedGrati	3fff409f6d	Add master config file Similar to the nfd-worker, in this PR we want to support the dynamic run-time configurability through a config file for the nfd-master. We'll use a json or yaml configuration file along with the fsnotify in order to watch for changes in the config file. As a result, we're allowing dynamic control of logging params, allowed namespaces, extended resources, label whitelisting, and denied namespaces. Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-04-03 09:52:09 +01:00
Talor Itzhak	8924213d14	topology-updater: make it possible to disable sleep-interval Especially convenient for testing porpuses and completely harmless Signed-off-by: Talor Itzhak <titzhak@redhat.com>	2023-03-12 12:43:17 +02:00
Talor Itzhak	7b248ecae2	topology-updater: update CRs when notified When a message received via the channel, the main loop updates the `NodeResourceTopology` objects. The notifier will send a message via the channel if: 1. It reached the sleep timeout. 2. It detected a change in Kubelet state files Signed-off-by: Talor Itzhak <titzhak@redhat.com>	2023-03-12 12:37:24 +02:00
Talor Itzhak	175e0c81aa	topology-updater: add kubelet-state-dir flag On different Kubernetes flavors like OpenShift for exmaple, the Kubelet state directory path is different. make it configurable for maximum flexability. Signed-off-by: Talor Itzhak <titzhak@redhat.com>	2023-03-12 12:37:24 +02:00
Jose Luis Ojosnegros Manchón	b340d112a8	topology-updater:compute pod set fingerprint Add an option to compute the fingerprint of the current pod set on each node. Report this new fingerprint using an attribute in NRT object.	2023-02-22 10:22:50 +01:00
Kubernetes Prow Robot	a92614c292	Merge pull request #1051 from AhmedGrati/feat-add-deny-label-ns-with-wildcard feat: add deny-label-ns flag which supports wildcard	2023-02-15 03:42:25 -08:00
AhmedGrati	b499799364	feat: add deny-label-ns flag which supports wildcard Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-02-15 09:47:00 +01:00
pprokop	5484babcb1	Advertise TopologyManger policy and scope as Attributes Signed-off-by: pprokop <pprokop@nvidia.com>	2023-02-10 12:03:11 +01:00
PiotrProkop	59afae50ba	Add NodeResourceTopology garbage collector NodeResourceTopology(aka NRT) custom resource is used to enable NUMA aware Scheduling in Kubernetes. As of now node-feature-discovery daemons are used to advertise those resources but there is no service responsible for removing obsolete objects(without corresponding Kubernetes node). This patch adds new daemon called nfd-topology-gc which removes old NRTs. Signed-off-by: PiotrProkop <pprokop@nvidia.com>	2023-01-11 10:15:21 +01:00
Kubernetes Prow Robot	8eb6640754	Merge pull request #1020 from marquiz/devel/worker-refactor worker: move code	2022-12-27 00:45:34 -08:00
Markus Lehtonen	1026d91d12	worker: move code Simplify code bu dropping the unnecessary base client package.	2022-12-23 11:38:21 +02:00
Markus Lehtonen	0283f68702	topology-updater: move code Move and rename the Go package. It has nothing to do with NFD gRPC client anymore so move it out of the nfd-client package.	2022-12-23 11:37:46 +02:00
Markus Lehtonen	aa97105854	Add common utility function for getting node name	2022-12-23 09:50:15 +02:00
Kubernetes Prow Robot	e10957009b	Merge pull request #992 from marquiz/fixes/enable-nodefeature-flag nfd-master: fix creation of the -enable-nodefeature-api flag	2022-12-14 03:37:34 -08:00
Markus Lehtonen	81b0945ced	nfd-master: fix creation of the -enable-nodefeature-api flag Extra dash caused a panic when trying to run the binary.	2022-12-14 12:51:14 +02:00
Markus Lehtonen	9f0806593d	nfd-master: rename -featurerules-controller flag to -crd-controller Deprecate the '-featurerules-controller' command line flag as the name does not describe the functionality anymore: in practice it controls the CRD controller handling both NodeFeature and NodeFeatureRule objects. The patch introduces a duplicate, more generally named, flag '-crd-controller'. A warning is printed in the log if '-featurerules-controller' flag is encountered.	2022-12-14 10:23:45 +02:00
Markus Lehtonen	6ddd87e465	nfd-master: support NodeFeature objects Add initial support for handling NodeFeature objects. With this patch nfd-master watches NodeFeature objects in all namespaces and reacts to changes in any of these. The node which a certain NodeFeature object affects is determined by the "nfd.node.kubernetes.io/node-name" annotation of the object. When a NodeFeature object targeting certain node is changed, nfd-master needs to process all other objects targeting the same node, too, because there may be dependencies between them. Add a new command line flag for selecting between gRPC and NodeFeature CRD API as the source of feature requests. Enabling NodeFeature API disables the gRPC interface. -enable-nodefeature-api enable NodeFeature CRD API for incoming feature requests, will disable the gRPC interface (defaults to false) It is not possible to serve gRPC and watch NodeFeature objects at the same time. This is deliberate to avoid labeling races e.g. by nfd-worker sending gRPC requests but NodeFeature objects in the cluster "overriding" those changes (labels from the gRPC requests will get overridden when NodeFeature objects are processed).	2022-12-14 07:31:28 +02:00
Markus Lehtonen	237494463b	nfd-worker: support creating NodeFeatures object Support the new NodeFeatures object of the NFD CRD api. Add two new command line options to nfd-worker: -kubeconfig specifies the kubeconfig to use for connecting k8s api (defaults to empty which implies in-cluster config) -enable-nodefeature-api enable the NodeFeature CRD API for communicating node features to nfd-master, will also automatically disable gRPC (defgault to false) No config file option for selecting the API is available as there should be no need for dynamically selecting between gRPC and CRD. The nfd-master configuration must be changed in tandem and it is safer (and avoid awkward configuration races) to configure the whole NFD deployment at once. Default behavior of nfd-worker is not changed i.e. NodeFeatures object creation is not enabled by default (but must be enabled with the command line flag). The patch also updates the kustomize and Helm deployment, adding RBAC rules for nfd-worker and updating the example worker configuration.	2022-12-14 07:31:28 +02:00
Markus Lehtonen	f13ed2d91c	nfd-topology-updater: update NodeResourceTopology objects directly Drop the gRPC communication to nfd-master and connect to the Kubernetes API server directly when updating NodeResourceTopology objects. Topology-updater already has connection to the API server for listing Pods so this is not that dramatic change. It also simplifies the code a lot as there is no need for the NFD gRPC client and no need for managing TLS certs/keys. This change aligns nfd-topology-updater with the future direction of nfd-worker where the gRPC API is being dropped and replaced by a CRD-based API. This patch also update deployment files and documentation to reflect this change.	2022-12-08 11:03:22 +02:00
Feruzjon Muyassarov	2bdf427b89	nfd-master logic update for setting node taints This commits extends NFD master code to support adding node taints from NodeFeatureRule CR. We also introduce a new annotation for taints which helps to identify if the taint set on node is owned by NFD or not. When user deletes the taint entry from NodeFeatureRule CR, NFD will remove the taint from the node. But to avoid accidental deletion of taints not owned by the NFD, it needs to know the owner. Keeping track of NFD set taints in the annotation can be used during the filtering of the owner. Also enable-taints flag is added to allow users opt in/out for node tainting feature. The flag takes precedence over taints defined in NodeFeatureRule CR. In other words, if enbale-taints is set to false(disabled) and user still defines taints on the CR, NFD will ignore those taints and skip them from setting on the node. Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>	2022-12-02 17:25:00 +02:00

1 2 3

106 commits