node-feature-discovery

mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2024-12-14 11:57:51 +00:00

Author	SHA1	Message	Date
Oleg Zhurakivskyy	f2e9557a2d	nfd-topology-updater: Add liveness probe Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>	2024-04-03 13:15:54 +03:00
Kubernetes Prow Robot	0ad5e50f24	Merge pull request #1609 from ozhuraki/worker-health nfd-worker: Add liveness probe	2024-03-19 06:57:23 -07:00
Oleg Zhurakivskyy	8b63d17af7	nfd-worker: Add liveness probe Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>	2024-03-19 15:34:53 +02:00
Markus Lehtonen	6f891ce1d2	Remove references to -enable-nodefeature-api flag Fix documentation, code and e2e-tests.	2024-03-18 16:06:25 +02:00
Carlos Eduardo Arango Gutierrez	06c4733bc5	Add FeatureGate framework to handle new features Code inspired on https://github.com/kubernetes/component-base/tree/master/featuregate Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2024-03-15 19:11:32 +01:00
Markus Lehtonen	638e7744f1	nfd-master: mark the -crd-controller flag as deprecated Plan the removal of the -crd-controller flag along with the gRPC API. This flag does not make much sense after that as all communication with nfd-worker is based on CRDs - with the CRD controller disabled nfd-master is virtually a functionless stub.	2024-03-13 15:10:35 +02:00
leemingeer	b6d8ce7a5a	nfd-topology-updater add pods fingerprint by default	2024-01-26 17:55:34 +08:00
Markus Lehtonen	d7ec0bf674	topology-updater: document the -no-publish flag correctly	2024-01-22 14:21:02 +02:00
Markus Lehtonen	a053efda64	nfd-master: run a separate gRPC health server This patch separates the gRPC health server from the deprecated gRPC server (disabled by default, replaced by the NodeFeature CRD API) used for node labeling requests. The new health server runs on hardcoded TCP port number 8082. The main motivation for this change is to make the Kubernetes' built-in gRPC liveness probes to function if TLS is enabled (as they don't support TLS). The health server itself is a naive implementation (as it was before), basically only checking that nfd-master has started and hasn't crashed. The patch adds a TODO note to improve the functionality.	2024-01-04 13:58:26 +02:00
Carlos Eduardo Arango Gutierrez	57b6035b71	Add kubectl-nfd kubectl-nfd is a kubectl plugin for debbuging NodeFeatureRules Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2023-12-21 16:00:19 +01:00
Markus Lehtonen	98c3b0750d	nfd-gc: add metrics Implements three metrics for nfd-gc: - nfd_gc_build_info: version information of nfd-gc. - nfd_gc_objects_deleted_total: total number of NodeFeature and NodeResourceTopology objects deleted by nfd-gc. - nfd_gc_object_delete_failures_total: number of errors encountered when deleting NodeFeature and NodeResourceTopology objects.	2023-10-09 13:39:28 +00:00
AhmedGrati	7ab6314bdc	chore: introduce a commong klog handling for cmd/nfd-* Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-09-07 22:38:15 +01:00
Kubernetes Prow Robot	c0c1b89a92	Merge pull request #1334 from ArangoGutierrez/grpc_gone_v2 Deprecate gRPC API	2023-09-07 00:38:59 -07:00
Carlos Eduardo Arango Gutierrez	9966d2ae12	Deprecate gRPC API Now that the NodeFeature API has been set enabled by default, the gRPC mode will be deprecated and with it all flags and features around it. For nfd-master, flags -port, -key-file, -ca-file, -cert-file, -verify-node-name, -enable-nodefeature-api are now marked as deprecated. For nfd-worker flags -enable-nodefeature-api, -ca-file, -cert-file, -key-file, -server, -server-name-override are now marked as deprecated. Deprecated flags, as well as gRPC related code will be removed in future releases. Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>	2023-09-07 06:48:15 +02:00
AhmedGrati	b0be40aa09	feat: add logging parameters in configuration file for nfd master Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-09-06 15:27:27 +01:00
Carlos Eduardo Arango Gutierrez	04e954a7c3	Enable NodeFeature API by default Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>	2023-09-05 20:21:31 +02:00
Kubernetes Prow Robot	a658c54de3	Merge pull request #1297 from marquiz/devel/topology-updater-version topology-updater: make -version always runnable	2023-08-28 04:05:43 -07:00
Markus Lehtonen	01c08d67b6	Rename nfd-topology-gc to nfd-gc This is preparation for making it a generic garbage collector for all nfd-managed api objects.	2023-08-21 21:46:11 +03:00
Markus Lehtonen	5ba8d14b86	topology-updater: make -version always runnable Make it possible to run -version in an environment whithout the NODE_ADDRESS environment variable set.	2023-08-07 11:56:58 +03:00
Markus Lehtonen	06b333db1e	nfd-topology-updater: add metrics support For now, add only one metric, a counter for the errors occurring while scanning pod resources on the node.	2023-08-04 16:48:37 +03:00
Kubernetes Prow Robot	e0f10a81de	Merge pull request #1256 from PiotrProkop/fix-topo-updater-policy-and-scope-advertisment Fix Topology Manager policy and scope not being updated after NRT creation	2023-07-28 00:25:54 -07:00
Carlos Eduardo Arango Gutierrez	e3aedd33e2	Enable metrics via prometheus operator Expose metrics via prometheus.monitoring.coreos.com/v1 The exposed metrics are \| Metric \| Type \| Meaning \| \| --------------- \| ---------------- \| ---------------- \| \| `nfd_master_build_info` \| Gauge \| Version from which nfd-master was built. \| \| `nfd_worker_build_info` \| Gauge \| Version from which nfd-worker was built. \| \| `nfd_updated_nodes` \| Counter \| Time taken to label a node \| \| `nfd_crd_processing_time` \| Gauge \| Time taken to process a NodeFeatureRule CRD \| \| `nfd_feature_discovery_duration_seconds` \| HistogramVec \| Time taken to discover features on a node \| Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>	2023-07-21 10:59:52 +02:00
pprokop	6d98b6150b	Fix Topology Manager policy and scope not being updated properly NFD is only detecting policy and scope of Topology Manager when NRT object doesn't exist. This means that topologyManagerScope and topologyManagerPolicy attributes won't be updated even if kubelet config was changed to use other TopologyManager policy and scope. Signed-off-by: pprokop <pprokop@nvidia.com>	2023-07-20 16:31:12 +02:00
Carlos Eduardo Arango Gutierrez	c02c3d83ed	Fix a typo on nfd-master cmd Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2023-06-06 20:05:07 +02:00
AhmedGrati	b3cfe17392	feat: parallelize nodes update This PR aims to optimize the process of updating nodes with corresponding features. In fact, previously, we were updating nodes sequentially even though they are independent from each other. Therefore, we integrated new components: LabelersNodePool which is responsible for spininng a goroutine whenever there's a request for updating nodes, and a Workqueue which is responsible for holding nodes names that should be updated. Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-06-02 11:41:50 +01:00
Markus Lehtonen	6e3b181ab4	topology-updater: migrate to structured logging	2023-05-31 14:43:08 +03:00
Markus Lehtonen	7be08f9e7f	nfd-worker: migrate to structured logging	2023-05-31 14:43:08 +03:00
Markus Lehtonen	8113d651c2	nfd-master: migrate to structured logging	2023-05-31 14:43:05 +03:00
Kubernetes Prow Robot	70d5ef477f	Merge pull request #1219 from PiotrProkop/leader-elect Add leader election for nfd-master	2023-05-22 00:36:21 -07:00
PiotrProkop	272fd4784f	Add new flag enable-leader-election for nfd-master. It allows NFD-master to be run in active-passive way when running multiple instances of NFD-master to prevent multiple components from updating same custom resources. Signed-off-by: PiotrProkop <pprokop@nvidia.com>	2023-05-15 13:30:07 +02:00
Markus Lehtonen	1200fd05c5	topology-updater: use node IP in the default configz URI Use a separate NODE_ADDRESS environment variable in the default value of -kubelet-config-uri (instead of NODE_NAME that was previously used). Also change the kustomize and Helm deployments to set this variable to node IP address. This should make the default deployment more robust, making it work in scenarios where node name does not resolve to the node ip, e.g. nodename != hostname.	2023-05-05 13:29:51 +03:00
AhmedGrati	87c2d7e184	nfd-master: fix resync period config option This PR fixes the resync-period configuration option of the nfd-master. In fact, previously, changes were not reflected in the nfd-master at runtime. e2e tests are also implemented to make sure that the fix is already working as expected. Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-05-02 13:17:01 +02:00
AhmedGrati	7917434d38	feat: add master resync period configurability This PR adds a config option for setting the NFD API controller resync period. The resync period is only activated when the NodeFeature API has been enabled (with -enable-nodefeature-api). Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-04-24 11:52:38 +02:00
Markus Lehtonen	8511980bf4	nfd-master: deprecate the -resource-labels flag Mark the -resource-labels flag (and the corresponding resourceLabels config option) as deprecated. We now support managing extended resources via NodeFeatureRule objects. This kludge deserves to go, eventually.	2023-04-13 11:30:58 +03:00
Kubernetes Prow Robot	193c552b33	Merge pull request #1084 from AhmedGrati/feat-add-master-config-file feat: add master config file	2023-04-04 10:41:40 -07:00
AhmedGrati	3fff409f6d	Add master config file Similar to the nfd-worker, in this PR we want to support the dynamic run-time configurability through a config file for the nfd-master. We'll use a json or yaml configuration file along with the fsnotify in order to watch for changes in the config file. As a result, we're allowing dynamic control of logging params, allowed namespaces, extended resources, label whitelisting, and denied namespaces. Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-04-03 09:52:09 +01:00
Talor Itzhak	8924213d14	topology-updater: make it possible to disable sleep-interval Especially convenient for testing porpuses and completely harmless Signed-off-by: Talor Itzhak <titzhak@redhat.com>	2023-03-12 12:43:17 +02:00
Talor Itzhak	7b248ecae2	topology-updater: update CRs when notified When a message received via the channel, the main loop updates the `NodeResourceTopology` objects. The notifier will send a message via the channel if: 1. It reached the sleep timeout. 2. It detected a change in Kubelet state files Signed-off-by: Talor Itzhak <titzhak@redhat.com>	2023-03-12 12:37:24 +02:00
Talor Itzhak	175e0c81aa	topology-updater: add kubelet-state-dir flag On different Kubernetes flavors like OpenShift for exmaple, the Kubelet state directory path is different. make it configurable for maximum flexability. Signed-off-by: Talor Itzhak <titzhak@redhat.com>	2023-03-12 12:37:24 +02:00
Jose Luis Ojosnegros Manchón	b340d112a8	topology-updater:compute pod set fingerprint Add an option to compute the fingerprint of the current pod set on each node. Report this new fingerprint using an attribute in NRT object.	2023-02-22 10:22:50 +01:00
Kubernetes Prow Robot	a92614c292	Merge pull request #1051 from AhmedGrati/feat-add-deny-label-ns-with-wildcard feat: add deny-label-ns flag which supports wildcard	2023-02-15 03:42:25 -08:00
AhmedGrati	b499799364	feat: add deny-label-ns flag which supports wildcard Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-02-15 09:47:00 +01:00
pprokop	5484babcb1	Advertise TopologyManger policy and scope as Attributes Signed-off-by: pprokop <pprokop@nvidia.com>	2023-02-10 12:03:11 +01:00
PiotrProkop	59afae50ba	Add NodeResourceTopology garbage collector NodeResourceTopology(aka NRT) custom resource is used to enable NUMA aware Scheduling in Kubernetes. As of now node-feature-discovery daemons are used to advertise those resources but there is no service responsible for removing obsolete objects(without corresponding Kubernetes node). This patch adds new daemon called nfd-topology-gc which removes old NRTs. Signed-off-by: PiotrProkop <pprokop@nvidia.com>	2023-01-11 10:15:21 +01:00
Kubernetes Prow Robot	8eb6640754	Merge pull request #1020 from marquiz/devel/worker-refactor worker: move code	2022-12-27 00:45:34 -08:00
Markus Lehtonen	1026d91d12	worker: move code Simplify code bu dropping the unnecessary base client package.	2022-12-23 11:38:21 +02:00
Markus Lehtonen	0283f68702	topology-updater: move code Move and rename the Go package. It has nothing to do with NFD gRPC client anymore so move it out of the nfd-client package.	2022-12-23 11:37:46 +02:00
Markus Lehtonen	aa97105854	Add common utility function for getting node name	2022-12-23 09:50:15 +02:00
Kubernetes Prow Robot	e10957009b	Merge pull request #992 from marquiz/fixes/enable-nodefeature-flag nfd-master: fix creation of the -enable-nodefeature-api flag	2022-12-14 03:37:34 -08:00
Markus Lehtonen	81b0945ced	nfd-master: fix creation of the -enable-nodefeature-api flag Extra dash caused a panic when trying to run the binary.	2022-12-14 12:51:14 +02:00

1 2 3

111 commits