The total amount of keys that can be used on a specific TDX system is
exposed via the cgroups misc.capacity. See:
```
$ cat /sys/fs/cgroup/misc.capacity
tdx 31
```
The first step to properly manage the amount of keys present in a node
is exposing it via the NFD, and that's exactly what this commit does.
An example of how it ends up being exposed via the NFD:
```
$ kubectl get node 984fee00befb.jf.intel.com -o jsonpath='{.metadata.labels}' | jq | grep tdx.total_keys
"feature.node.kubernetes.io/cpu-security.tdx.total_keys": "31",
```
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Access to the kubelet state directory may raise concerns in some setups, added an option to disable it.
The feature is enabled by default.
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
We don't necessarily need to keep the codecov coverage report on the
git. As such, adding it to the gitignore to avoid it from accidental
commiting.
Signed-off-by: Muyassarov, Feruzjon <feruzjon.muyassarov@intel.com>
When a message received via the channel,
the main loop updates the `NodeResourceTopology` objects.
The notifier will send a message via the channel if:
1. It reached the sleep timeout.
2. It detected a change in Kubelet state files
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
On different Kubernetes flavors like OpenShift for exmaple,
the Kubelet state directory path is different. make it configurable
for maximum flexability.
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
Enabling reactive update for nfd-topology-updater
by detecting changes in Kubelet state/checkpoint files,
and signaling to the main loop to update the NodeResourceTopology
objects.
This has high value when scaling is an issue.
Having multiple pods deployed in between single update instance
might reflect incorrect resource accounting in the NRT CRs.
Example:
Time Interval = 5s
t0 - New update sent to NRT CRs
t1 - Schedule guaranteed podA
t2 - Schedule guaranteed podB
time elapsed between t0-t2 < 5 seconds,
IOW the update on t0 is the recent update.
In t2 the resource accounting reflected by NRT
is not aligned with the actual accounting because
NRT CRs doesn't reflect the change happened in t1.
With this reactive update feature we expect an update to be trigger
between t1 and t2 so the NRT objects will reflect more accurate
picture.
There still might be a scenario when the updates
aren't fast enough, but this is an additional
future planned optimization.
The notifier has two event types:
1. Time based - keeping the old behavior, trigger
an update per interval.
2. FS event - trigger an update when Kubelet state/checkpoint files modified.
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
Omit go version control information (buildvcs), otherwise
go command fails to obtain vcs status as shown below:
error obtaining VCS status: exit status 128
Use -buildvcs=false to disable VCS stamping.
Signed-off-by: Muyassarov, Feruzjon <feruzjon.muyassarov@intel.com>
Create RBAC rules if topology-updater is enabled. Previously installing
with topologyUpdater.enable=true (without
topologyUpdater.rbac.create=true) resulted in a crashloogbackoff as RBAC
was missing.
Reduce the wait time of nfd-worker pods to be in ready-state (before
proceeding with tests) from five to two seconds. Make tests faster to
run. Two seconds should be enough for nfd-workers to do their job and
get nodes labeled.