Drop the gRPC communication to nfd-master and connect to the Kubernetes
API server directly when updating NodeResourceTopology objects.
Topology-updater already has connection to the API server for listing
Pods so this is not that dramatic change. It also simplifies the code
a lot as there is no need for the NFD gRPC client and no need for
managing TLS certs/keys.
This change aligns nfd-topology-updater with the future direction of
nfd-worker where the gRPC API is being dropped and replaced by a
CRD-based API.
This patch also update deployment files and documentation to reflect
this change.
Fixes stricter API check on daemonset pod spec that started to cause e2e
test failures. RestartPolicyNever that we previously set (by defaylt)
isn't compatible with DaemonSets.
The new package should provide pod-related utilities,
hence let's move all the daemonset-related utilities
to their own package as well.
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
By moving those utils in to a seperate package,
we can make the functions names shorter and clearer.
For example, instead of:
```
testutils.NFDWorkerPod(opts...)
testutils.NFDMasterPod(opts...)
testutils.SpecWithContainerImage(...)
```
we'll have:
```
testpod.NFDWorker(opts...)
testpod.NFDMaster(opts...)
testpod.SpecWithContainerImage(...)
```
It will also make the package more isolated and portable.
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
The master pod need these `SecurityContext` configurations
In order to run inside a namespace with restricted policy
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
Change the pod spec generator functions to accept parameterization in
the form of more generic "mutator functions". This makes the addition of
new test specific pod spec customizations a lot cleaner. Plus, hopefully
makes the code a bit more readable as well.
Also, slightly simplify the SpecWithConfigMap() but dropping one
redundant argument.
Inspired by latest contributions by Talor Itzhak (titzhak@redhat.com).
Different tests requires different configuration
of the topology-updater DaemonSet.
Here, we decouple the configuration from the creation part
using `JustBeforeEach` so that each test container
will has its own configuration.
Additional reading:
https://onsi.github.io/ginkgo/#separating-creation-and-configuration-justbeforeeach
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
It might take time for the CRD to get deleted
and it might cause some falkiness in the tests.
Now before we create the CRD, we make sure to delete
the old object, wait for it deletion to complete
and only then create a new CRD object.
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
We might not get the most updated node topology
resource on the first `GET` call.
Hence, put the whole check inside `Eventually`,
and check for the most updated node topology resource on every
iteration.
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
The tested pods have some lax spec wrt security,
hence a restrict podSecurity namespace won't allow running those pods.
In topology-updater tests, the topology-updater pod
needs to run the container as root
so change the namespace podSecurity from restricted to priviliged.
In node-feature-discovery tests, we don't need root access,
so add the required security context configuration.
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
Error strings should not be capitalized (ST1005) & remove the
redundancy from array, slice or map composite literals.
Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>
Add tests covering the basic functionality of NodeFeatureRule objects,
covering different feature types ("flag features", "attribute features"
and "instance features") as well as backreferencing (using the output of
previously run rules) and templating. The test relies on the "fake"
feature source and its default configuration.
We need this fix https://github.com/kubernetes/kubernetes/pull/110875
to have reliable tests, but up until we can bump the k/k deps to 1.25+,
we can't consume it.
So borrow it from k/k repo for the time being.
Signed-off-by: Francesco Romani <fromani@redhat.com>
In some cases (CI) it is useful to run NFD e2e tests using
ephemeral clusters. To save time and bandwidth, it is also useful
to prime the ephemeral cluster with the images under test.
In these circumstances there is no risk of running a stale image,
and having a `Always` PullPolicy hardcoded actually makes
the whole exercise null.
So we add a new option, disabled by default, to make the e2e
manifest use the `IfNotPresent` pull policy, to effectively
cover this use case.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Change the part of the e2e-test configuration that contains
node-specific expected labels and annotations to a list, instead of a
map. This makes the parsing order deterministic and makes it possible to
e.g. have a default at the end of the list that captures "all the rest".
The test was broken twofold: Firstly, the annotation was not checked at
all because the name of the node where nfd-master is running was not
set. Secondly, the annotation prefix was used incorrectly.
Lift the restriction to run custom rule tests on non-master node. Try to
find one but do not fail if that fails. Makes the end-to-end tests
runnable on single-node clusters such a simple minikube deployments.
- This patch allows to expose Resource Hardware Topology information
through CRDs in Node Feature Discovery.
- In order to do this we introduce another software component called
nfd-topology-updater in addition to the already existing software
components nfd-master and nfd-worker.
- nfd-master was enhanced to communicate with nfd-topology-updater
over gRPC followed by creation of CRs corresponding to the nodes
in the cluster exposing resource hardware topology information
of that node.
- Pin kubernetes dependency to one that include pod resource implementation
- This code is responsible for obtaining hardware information from the system
as well as pod resource information from the Pod Resource API in order to
determine the allocatable resource information for each NUMA zone. This
information along with Costs for NUMA zones (obtained by reading NUMA distances)
is gathered by nfd-topology-updater running on all the nodes
of the cluster and propagate NUMA zone costs to master in order to populate
that information in the CRs corresponding to the nodes.
- We use GHW facilities for obtaining system information like CPUs, topology,
NUMA distances etc.
- This also includes updates made to Makefile and Dockerfile and Manifests for
deploying nfd-topology-updater.
- This patch includes unit tests
- As part of the Topology Aware Scheduling work, this patch captures
the configured Topology manager scope in addition to the Topology manager policy.
Based on the value of both attribues a single string will be populated to the CRD.
The string value will be on of the following {SingleNUMANodeContainerLevel,
SingleNUMANodePodLevel, BestEffort, Restricted, None}
Co-Authored-by: Artyom Lukianov <alukiano@redhat.com>
Co-Authored-by: Francesco Romani <fromani@redhat.com>
Co-Authored-by: Talor Itzhak <titzhak@redhat.com>
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
Mount /usr/lib and /usr/src as /host-usr/lib and /host-usr/src inside the pod
to allow NFD to search for the kernel configuration file inside /usr.
This solves the problem of the kernel config file not being present in /boot
on s390x RHCOS.
Signed-off-by: Jan Schintag <jan.schintag@de.ibm.com>
There are cases when the only available metadata for discovering
features is the node's name. The "nodename" rule extends the custom
source and matches when the node's name matches one of the given
nodename regexp patterns.
It is also possible now to set an optional "value" on custom rules,
which overrides the default "true" label value in case the rule matches.
In order to allow more dynamic configurations without having to modify
the complete worker configuration, custom rules are additionally read
from a "custom.d" directory now. Typically that directory will be filled
by mounting one or more ConfigMaps.
Signed-off-by: Marc Sluiter <msluiter@redhat.com>
This can be used to help running multiple parallel NFD deployments in
the same cluster. The flag changes the node annotation namespace to
<instance>.nfd.node.kubernetes.io allowing different nfd-master intances
to store metadata in separate annotations.