The nfd-topology-updater has state-directories notification mechanism
enabled by default.
In theory, we can have only timer-based updates, but if the option
is given to disable the state-directories event source, then all
the update mechanism is mistakenly disabled, including the
timer-based updates.
The two updaters mechanism should be decoupled.
So this PR changes this to make sure we can enable just and only
the timer-based updates.
Signed-off-by: Francesco Romani <fromani@redhat.com>
NFD is only detecting policy and scope of Topology Manager when NRT object doesn't exist.
This means that topologyManagerScope and topologyManagerPolicy attributes won't be updated
even if kubelet config was changed to use other TopologyManager policy and scope.
Signed-off-by: pprokop <pprokop@nvidia.com>
Drop the KlogDump helper in favor of klog.InfoS. However, that patch
introduces a new DelayedDumper() helper to avoid processing
(marshalling) of object unless really evaluated by the logging function.
- kubelet_internal_checkpoint file is in /var/lib/kubelet/device-plugins not /var/lib/kubelet
fsWatcher doesn't watch dirs recursively
- e.Name returned from fsWatcher events is a full path not a basename
Signed-off-by: pprokop <pprokop@nvidia.com>
Access to the kubelet state directory may raise concerns in some setups, added an option to disable it.
The feature is enabled by default.
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
When a message received via the channel,
the main loop updates the `NodeResourceTopology` objects.
The notifier will send a message via the channel if:
1. It reached the sleep timeout.
2. It detected a change in Kubelet state files
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
On different Kubernetes flavors like OpenShift for exmaple,
the Kubelet state directory path is different. make it configurable
for maximum flexability.
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
Enabling reactive update for nfd-topology-updater
by detecting changes in Kubelet state/checkpoint files,
and signaling to the main loop to update the NodeResourceTopology
objects.
This has high value when scaling is an issue.
Having multiple pods deployed in between single update instance
might reflect incorrect resource accounting in the NRT CRs.
Example:
Time Interval = 5s
t0 - New update sent to NRT CRs
t1 - Schedule guaranteed podA
t2 - Schedule guaranteed podB
time elapsed between t0-t2 < 5 seconds,
IOW the update on t0 is the recent update.
In t2 the resource accounting reflected by NRT
is not aligned with the actual accounting because
NRT CRs doesn't reflect the change happened in t1.
With this reactive update feature we expect an update to be trigger
between t1 and t2 so the NRT objects will reflect more accurate
picture.
There still might be a scenario when the updates
aren't fast enough, but this is an additional
future planned optimization.
The notifier has two event types:
1. Time based - keeping the old behavior, trigger
an update per interval.
2. FS event - trigger an update when Kubelet state/checkpoint files modified.
Signed-off-by: Talor Itzhak <titzhak@redhat.com>