node-feature-discovery

mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2025-03-06 16:57:10 +00:00

Author	SHA1	Message	Date
Kubernetes Prow Robot	feea0e328e	Merge pull request #2010 from mfranczy/image-compatibility-nfr Bugfixes for image compatibility feature	2025-01-09 23:58:31 -08:00
Marcin Franczyk	8db03fe0f8	Add unit tests for invalid feature in the compatibility spec Signed-off-by: Marcin Franczyk <marcin0franczyk@gmail.com>	2025-01-09 10:16:34 +01:00
Kubernetes Prow Robot	3bedeaf546	Merge pull request #2006 from adrianchiris/fix-worker-role Add support running with OwnerReferencesPermissionEnforcement	2025-01-08 05:58:30 -08:00
Marcin Franczyk	241c886bf9	Sort the list of compatibility artifacts in desc order Signed-off-by: Marcin Franczyk <marcin0franczyk@gmail.com>	2025-01-08 14:00:09 +01:00
Marcin Franczyk	75ed142298	Fix image compatibility processing panic in case of a nil pointer Signed-off-by: Marcin Franczyk <marcin0franczyk@gmail.com>	2025-01-08 13:59:08 +01:00
Marcin Franczyk	60b8a2136a	Allow for rule processing in case of a missing feature Signed-off-by: Marcin Franczyk <marcin0franczyk@gmail.com>	2025-01-08 13:58:03 +01:00
adrianc	3f012c2d5a	Add support running with OwnerReferencesPermissionEnforcement when OwnerReferencesPermissionEnforcement validating webhook is enabled additional permissions are required to set/update owner ref field. NFD worker sets/updates NodeFeature owner ref field to the worker pod and owning daemonset. owner reference can only be updated if the worker has delete permissions for NodeFeatures. if owner reference has blockOwnerDeletion (as the case for the daemonset owner reference) then it requires update permissions to the finalizers of the owner, to avoid this, we set blockOwnerDeleteion to false for all owners referenced from NFD worker pod when setting/updating NodeFeature owner ref. Signed-off-by: adrianc <adrianc@nvidia.com>	2025-01-08 13:44:30 +02:00
Markus Lehtonen	98cd96312e	Drop setup of grpc logging	2025-01-07 16:13:54 +02:00
Markus Lehtonen	97345a4a96	Merge branch 'master' into feat/skip-nodes	2024-12-20 10:38:44 +02:00
Kavin	6b0352a190	Remove error logs for nodes without nodefeatures	2024-12-18 23:27:26 +05:30
Marcin Franczyk	efc299ecf6	Introduce nfd client tool with a subset of image compatibility commands Signed-off-by: Marcin Franczyk <marcin0franczyk@gmail.com>	2024-12-18 10:49:02 +01:00
Marcin Franczyk	51bbbe202d	Extend NFR code with MatchStatus and introduce failFast strategy. MatchStatus provides details about successful expressions and their results, which are the matched host features. Additionally, a new flag controls rule processing behavior: it can either stop at the first error or continue processing all expressions and rules. Signed-off-by: Marcin Franczyk <marcin0franczyk@gmail.com>	2024-12-18 10:48:14 +01:00
Kubernetes Prow Robot	3e87c97ac2	Merge pull request #1976 from marquiz/devel/grpc-api-cleanup Cleanup for NodeFeature API being GA	2024-12-13 15:14:26 +01:00
Markus Lehtonen	fc103a6028	Cleanup for NodeFeature API being GA Drop references to the gRPC API and don't suggest that NodeFeatureAPI could be disabled. Also update the developer guide for instructions running nfd components outside the cluster.	2024-12-13 15:40:46 +02:00
Kubernetes Prow Robot	caaac59eba	Merge pull request #1860 from ozhuraki/no-owner-refs nfd-worker: Add an option to disable setting the owner references	2024-12-13 13:12:26 +01:00
Oleg Zhurakivskyy	f13ccb1fb5	nfd-master: check that namespace informer cache sync succeeded Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>	2024-12-02 12:46:25 +02:00
Oleg Zhurakivskyy	20ef877ab1	nfd-worker: Add an option to disable setting the owner references In some cases it's desirable to control automatic garbage collection of NodeFeature object. Add an option to disable setting the owner references to Pod for NodeFeature object. Closes: 1817 Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>	2024-11-28 16:50:10 +02:00
Kubernetes Prow Robot	443913e019	Merge pull request #1956 from googs1025/chore/add_metrics_prefix chore: add metrics system prefix	2024-11-28 09:00:59 +00:00
googs1025	e631a52374	chore: add metrics system prefix	2024-11-28 09:57:40 +08:00
Markus Lehtonen	2220b99621	pkg/utils: drop fswatcher Dead code.	2024-11-26 14:40:53 +02:00
Markus Lehtonen	45f49d574a	nfd-master: drop resourceLabels Drop the resourceLabels config file option and the corresponding -resource-labels command line flag. They were deprecated in NFD v0.13 so it's time to let them go. NodeFeatureRule(s) should be used to manage ERs, instead.	2024-11-07 15:16:52 +02:00
Carlos Eduardo Arango Gutierrez	62f4eddce6	Drop support for hooks Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2024-11-04 14:50:07 +01:00
Kubernetes Prow Robot	b997ade5b3	Merge pull request #1942 from marquiz/devel/drop-grpc nfd-master: drop stale unreachable deprecation notices	2024-11-04 11:16:31 +01:00
Kubernetes Prow Robot	1c6ce897f2	Merge pull request #1816 from marquiz/devel/gc-test-assert-msg tests: better assertion message in nfd-gc unit tests	2024-10-31 19:33:27 +00:00
Markus Lehtonen	ca85075972	nfd-master: use Typed* workqueue types Drop the usage of deprecated functions and types, makes linters happy.	2024-10-30 12:25:16 +02:00
Carlos Eduardo Arango Gutierrez	0bd82cf82a	Drop NFD gRPC API Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2024-10-29 15:15:18 +01:00
Kubernetes Prow Robot	fd2893e2a5	Merge pull request #1592 from AhmedThresh/feat-configure-cr-restrictions feat/nfd-master: configure CR restrictions	2024-10-24 12:20:54 +01:00
Markus Lehtonen	db07fe1ff4	nfd-gc: drop one duplicate import from tests	2024-09-27 15:26:18 +03:00
AhmedGrati	28b40c90b8	deploy: add CR restrictions to the helm config Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com> Signed-off-by: AhmedThresh <ahmed.grati@insat.ucar.tn> Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com> Signed-off-by: AhmedThresh <ahmed.grati@insat.ucar.tn> Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com> Signed-off-by: AhmedThresh <ahmed.grati@insat.ucar.tn>	2024-09-16 16:02:42 +02:00
Markus Lehtonen	9fad67ee39	nfd-master: cleanup updater-pool method args We store the work queues in the updater pool struct so we don't need to pass those as function arguments.	2024-09-16 14:50:08 +03:00
Markus Lehtonen	02b6b7395c	Drop dynamic run-time reconfiguration Simplify the code and reduce possible error scenarios by dropping fsnotify-based reconfiguration from nfd-master and nfd-worker. Also eliminates repeated re-configuration in scenarios where kubelet continuosly touches the (every minute) mounted file (configmap) on the filesystem. Also modifies the Helm and kustomize deployments so that nfd-master, nfd-worker and nfd-topology-updater pods are restarted on configmap updates. In kustomize, the slght downside of this is the name of the config map(s) depends on the content, so every time a user customizes the config data, the old unused configmap will be left and must be garbage-collected manually.	2024-08-21 12:46:36 +03:00
Markus Lehtonen	2bb8a72532	nfd-master: proper shutdown of nfd api informers Stop blocking on event channels when the api controller is stopped. Ensures that the nfd API informer factory is properly shut down and all resources released when stop() is called. This eliminates a memory leak on re-configure events when leader election is enabled.	2024-08-20 12:44:08 +03:00
Kubernetes Prow Robot	5a5b9e3c19	Merge pull request #1843 from marquiz/devel/master-chan nfd-master: use only unbuffered chans in the nfd api-controller	2024-08-19 07:23:12 -07:00
Markus Lehtonen	bf6ffadf36	nfd-master: use only unbuffered chans in the nfd api-controller There's no reason why the "update all" chans should be buffered (while the other are not).	2024-08-19 14:02:13 +03:00
Markus Lehtonen	0d3c1ac75b	nfd-master: explicit state variable for the node updater pool	2024-08-19 13:27:56 +03:00
AhmedGrati	7bad0d583c	feat/nfd-master: support CR restrictions Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2024-08-10 22:39:10 +02:00
Markus Lehtonen	d6c1a7e44f	tests: better assertion message in nfd-gc unit tests	2024-08-02 08:23:54 +03:00
Markus Lehtonen	45164f580a	nfd-gc: use paging when listing CRs List NodeFeature and NodeResourceTopology objects in pages of 200 items. This reduces memory consumption and eliminates timeouts (on the apiserver side) in big clusters of thousands of nodes.	2024-08-02 08:20:17 +03:00
Kubernetes Prow Robot	57f1b79856	Merge pull request #1813 from marquiz/devel/gc-metalister nfd-gc: only fetch object metadata	2024-08-01 12:53:33 -07:00
Markus Lehtonen	54befffa94	nfd-gc: only fetch object metadata Significantly reduce the apiserver and network load by only listing/getting the object metadata.	2024-07-30 16:01:04 +03:00
Kubernetes Prow Robot	2d24a4bee4	Merge pull request #1811 from marquiz/devel/informer-listopts nfd-master: tweak list options for NodeFeature informer	2024-07-30 03:56:04 -07:00
Markus Lehtonen	454d443b72	nfd-gc: check that node informer cache sync succeeded	2024-07-26 10:29:15 +03:00
Markus Lehtonen	a2068f7ce3	nfd-master: tweak list options for NodeFeature informer Fix cache syncing problems on big clusters with thousands of NodeFeature objects. On the initial list (sync) the client-go cache reflector sets the ResourceVersion to "0" (instead of leaving it empty). This causes problems in the api server with (apiserver) logs like: E writers.go:122] apiserver was unable to write a JSON response: http: Handler timeout E status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http: Handler timeout"}: http: Handler timeout On the nfd-master side we see corresponding log snippets like: W reflector.go:547] failed to list v1alpha1.NodeFeature: stream error when reading response body, may be caused by closed connection. Please retry. Original error: stream error: stream ID 1521; INTERNAL_ERROR; received from peer I trace.go:236] "Reflector ListAndWatch" name: () (total time: 61126ms): ---"Objects listed" error:stream error when reading response body, may be caused by closed connection. Please retry. Original error: stream error: stream ID 1521; INTERNAL_ERROR; received from peer 61126ms (**) Decreasing the page size (opts.Limits) does not have any effect on the timeouts. However, setting ResourceVersion to an empty value seems to get the paging on its tracks, eliminating the timeouts. TODO: investigate in Kubernetes upstream the root cause of the timeouts with ResourceVersion="0".	2024-07-25 16:29:05 +03:00
Markus Lehtonen	ea3243fb00	nfd-master: check nfd api informer cache sync result Bail out if there were errors in syncing the cache of any resource.	2024-07-25 09:58:40 +03:00
Markus Lehtonen	25e827a4c8	feature-gates: mark NodeFeatureAPI as GA The feature gate is locked to true. That is, it is not possible to revert back to the gPRC-based communication which makes the gRPC API ready for removal.	2024-07-16 13:53:31 +03:00
Markus Lehtonen	522b87e325	nfd-worker: change TestRun to use NodeFeature API Run nfd-worker with NodeFeature API enabled (against a fake apiserver) instead of using the deprecated gRPC (against a nfd-master instance). Expand the test to verify the features and labels that are advertised as a NodeFeature object.	2024-07-12 09:50:09 +03:00
Markus Lehtonen	a269bf4d25	Drop the -enable-nodefeature-api flag Was marked to be removed in v0.17.	2024-07-10 15:20:07 +03:00
Kubernetes Prow Robot	393af96a88	Merge pull request #1755 from ArangoGutierrez/1752 Use worker DS OwnerReference for NF's	2024-07-09 06:33:07 -07:00
Carlos Eduardo Arango Gutierrez	e33e68ad5b	Add optionable arguments to NewWorker Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2024-07-09 15:08:26 +02:00
Kubernetes Prow Robot	3bb7a1caff	Merge pull request #1766 from marquiz/devel/simplify Simplify code	2024-07-09 00:19:28 -07:00

1 2 3 4 5 ...

511 commits