node-feature-discovery

mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2024-12-14 11:57:51 +00:00

Author	SHA1	Message	Date
Markus Lehtonen	5171ae0f90	Refactor metrics Move common boilerplate code under pkg/utils.	2023-10-09 10:49:12 +03:00
Francesco Romani	000c919071	nfd-updater: events: enable timer-only flow The nfd-topology-updater has state-directories notification mechanism enabled by default. In theory, we can have only timer-based updates, but if the option is given to disable the state-directories event source, then all the update mechanism is mistakenly disabled, including the timer-based updates. The two updaters mechanism should be decoupled. So this PR changes this to make sure we can enable just and only the timer-based updates. Signed-off-by: Francesco Romani <fromani@redhat.com>	2023-09-04 13:05:50 +02:00
Markus Lehtonen	06b333db1e	nfd-topology-updater: add metrics support For now, add only one metric, a counter for the errors occurring while scanning pod resources on the node.	2023-08-04 16:48:37 +03:00
pprokop	6d98b6150b	Fix Topology Manager policy and scope not being updated properly NFD is only detecting policy and scope of Topology Manager when NRT object doesn't exist. This means that topologyManagerScope and topologyManagerPolicy attributes won't be updated even if kubelet config was changed to use other TopologyManager policy and scope. Signed-off-by: pprokop <pprokop@nvidia.com>	2023-07-20 16:31:12 +02:00
hang.jiang	698031fc2d	Stop ticker in time to avoid memory leak Because it will cause memory leak if we do not stop ticker when the function has completed. Signed-off-by: hang.jiang <hang.jiang@daocloud.io>	2023-07-05 18:35:01 +08:00
Markus Lehtonen	bf670de68d	pkg/utils: migrate KlogDump to structured logging Drop the KlogDump helper in favor of klog.InfoS. However, that patch introduces a new DelayedDumper() helper to avoid processing (marshalling) of object unless really evaluated by the logging function.	2023-05-31 14:43:08 +03:00
Markus Lehtonen	6e3b181ab4	topology-updater: migrate to structured logging	2023-05-31 14:43:08 +03:00
pprokop	5a9a12151c	nfd-topology-updater: fix kubelet state file notifier - kubelet_internal_checkpoint file is in /var/lib/kubelet/device-plugins not /var/lib/kubelet fsWatcher doesn't watch dirs recursively - e.Name returned from fsWatcher events is a full path not a basename Signed-off-by: pprokop <pprokop@nvidia.com>	2023-04-24 13:21:56 +02:00
Talor Itzhak	5c6be580f4	reactive updates: add an option to disable the feature Access to the kubelet state directory may raise concerns in some setups, added an option to disable it. The feature is enabled by default. Signed-off-by: Talor Itzhak <titzhak@redhat.com>	2023-03-16 11:53:16 +02:00
Talor Itzhak	8924213d14	topology-updater: make it possible to disable sleep-interval Especially convenient for testing porpuses and completely harmless Signed-off-by: Talor Itzhak <titzhak@redhat.com>	2023-03-12 12:43:17 +02:00
Talor Itzhak	1c12876815	topology-updater: log event type that triggered update Specify the event type as part of the log message. In order to reduce the log volume, make it V4 Signed-off-by: Talor Itzhak <titzhak@redhat.com>	2023-03-12 12:37:24 +02:00
Talor Itzhak	7b248ecae2	topology-updater: update CRs when notified When a message received via the channel, the main loop updates the `NodeResourceTopology` objects. The notifier will send a message via the channel if: 1. It reached the sleep timeout. 2. It detected a change in Kubelet state files Signed-off-by: Talor Itzhak <titzhak@redhat.com>	2023-03-12 12:37:24 +02:00
Talor Itzhak	175e0c81aa	topology-updater: add kubelet-state-dir flag On different Kubernetes flavors like OpenShift for exmaple, the Kubelet state directory path is different. make it configurable for maximum flexability. Signed-off-by: Talor Itzhak <titzhak@redhat.com>	2023-03-12 12:37:24 +02:00
Talor Itzhak	0f65b87329	kubeletnotifier: introduce kubeletnotifier package Enabling reactive update for nfd-topology-updater by detecting changes in Kubelet state/checkpoint files, and signaling to the main loop to update the NodeResourceTopology objects. This has high value when scaling is an issue. Having multiple pods deployed in between single update instance might reflect incorrect resource accounting in the NRT CRs. Example: Time Interval = 5s t0 - New update sent to NRT CRs t1 - Schedule guaranteed podA t2 - Schedule guaranteed podB time elapsed between t0-t2 < 5 seconds, IOW the update on t0 is the recent update. In t2 the resource accounting reflected by NRT is not aligned with the actual accounting because NRT CRs doesn't reflect the change happened in t1. With this reactive update feature we expect an update to be trigger between t1 and t2 so the NRT objects will reflect more accurate picture. There still might be a scenario when the updates aren't fast enough, but this is an additional future planned optimization. The notifier has two event types: 1. Time based - keeping the old behavior, trigger an update per interval. 2. FS event - trigger an update when Kubelet state/checkpoint files modified. Signed-off-by: Talor Itzhak <titzhak@redhat.com>	2023-03-12 12:37:24 +02:00
Jose Luis Ojosnegros Manchón	b340d112a8	topology-updater:compute pod set fingerprint Add an option to compute the fingerprint of the current pod set on each node. Report this new fingerprint using an attribute in NRT object.	2023-02-22 10:22:50 +01:00
Jose Luis Ojosnegros Manchón	1a687cb286	topology-updater: Refactor Scan to expand response We are gonna add new data to Scan response so better introduce a new ScanResponse struct as Scan return value to make it easier.	2023-02-22 09:56:28 +01:00
pprokop	5484babcb1	Advertise TopologyManger policy and scope as Attributes Signed-off-by: pprokop <pprokop@nvidia.com>	2023-02-10 12:03:11 +01:00
Jose Luis Ojosnegros Manchón	2967f3307a	nrt-api: move from v1alpha1 to v1alpha2	2023-02-09 12:29:54 +01:00
Markus Lehtonen	0283f68702	topology-updater: move code Move and rename the Go package. It has nothing to do with NFD gRPC client anymore so move it out of the nfd-client package.	2022-12-23 11:37:46 +02:00

19 commits