The nfd-topology-updater has state-directories notification mechanism
enabled by default.
In theory, we can have only timer-based updates, but if the option
is given to disable the state-directories event source, then all
the update mechanism is mistakenly disabled, including the
timer-based updates.
The two updaters mechanism should be decoupled.
So this PR changes this to make sure we can enable just and only
the timer-based updates.
Signed-off-by: Francesco Romani <fromani@redhat.com>
This PR aims to support the dynamic values for labels in the
NodeFeatureRule CRD, it would offer more flexible labeling for users.
To achieve this, we check whether label value starts with "@", and if
it's the case, we will get the value of the feature value, and update
the value of the label with the feature value.
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
Implement a new generic type nodeListPropertyMatcher, a generic Gomega
matcher for matching basically any property of a set of node objects. We
will be using it for verifying labels, annotations, extended resources
and taints for now. This moves the tests in a more Gomega'ish direction,
leveraging code re-use and providing way more informative error messages
in case of test failures.
The patch adds a new eventuallyNonControlPlaneNodes helper assertion for
asserting all (non-control-plane) nodes in the cluster, intended to
replace the ugly simplePoll() helper function.
This patch implements a matcher for node labels and converts tests to
use it instead of the old checkForNodeLabels helper function.
This PR fixes the resync-period configuration option of the nfd-master.
In fact, previously, changes were not reflected in the nfd-master at
runtime. e2e tests are also implemented to make sure that the fix is
already working as expected.
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
Eliminate all context.TODO() from the e2e tests and use ginkgo context
instead. This ensures that calls involving context are properly
cancelled and return fast in case the tests get aborted.
Add support for management of Extended Resources via the
NodeFeatureRule CRD API.
There are usage scenarios where users want to advertise features
as extended resources instead of labels (or annotations).
This patch enables the discovery of extended resources, via annotation
and patch of node.status.capacity and node.status.allocatable. By using
the NodeFeatureRule API.
Co-authored-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
Co-authored-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Disallow taints having a key with "kubernetes.io/" or "*.kubernetes.io/"
prefix. This is a precaution to protect the user from messing up with
the "official" well-known taints from Kubernetes itself. The only
exception is that the "nfd.node.kubernetes.io/" prefix is allowed.
However, there is one allowed NFD-specific namespace (and its
sub-namespaces) i.e. "feature.node.kubernetes.io" under the
kubernetes.io domain that can be used for NFD-managed taints.
Also disallow unprefixed taint keys. We don't add a default prefix to
unprefixed taints (like we do for labels) from NodeFeatureRules. This is
to prevent unpleasant surprises to users that need to manage matching
tolerations for their workloads.
Make the default master pod run with no special options. Move the
customizations of the master pod to the setup functions of the tests
that actually need it.
Also, cleanup the configuration of nfd-worker of some tests.
Wait for the deletion of NFD CRDs to complete before trying to re-create
them. Prevents errors in case CRDs already exist on the cluster when
e2e-tests are launched.
The node cleanup function was not removing all NFD-labels. It omitted
NFD-originated labels that used a non-default label namespace. This
patch fixes the issue by getting all NFD-managed labels from the special
annotation (nfd.node.kubernetes.io/feature-labels).
The patch also adds the ability to cleanup extended resources in a
similar way. This will be needed by future work.
Also changes the order of cleaning up CRs and the node. It is the right
order as cleaning up the CRs may still update the node.
Similar to the nfd-worker, in this PR we want to support the
dynamic run-time configurability through a config file for the nfd-master.
We'll use a json or yaml configuration file along with the fsnotify in
order to watch for changes in the config file. As a result, we're
allowing dynamic control of logging params, allowed namespaces,
extended resources, label whitelisting, and denied namespaces.
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
Reduce the wait time of nfd-worker pods to be in ready-state (before
proceeding with tests) from five to two seconds. Make tests faster to
run. Two seconds should be enough for nfd-workers to do their job and
get nodes labeled.
The docker image that used during e2e test
composed of repo and tag flags that are
passed to the test itself.
The problem is that the docker image initialized
before the flags are parsed. Hence, it will always contains
the default flags value.
Moving the variable into a separate function, fixing the issue.
Also, moving the global variables to `e2e_test.go` since
it commonly used by all tests.
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
Use the "single-dash" version of nfd command line flags in deployment
files and e2e-tests. No impact in functionality, just aligns with
documentation and other parts of the codebase.
Add an initial test set for the NodeFeature API. This is done simply by
running a second pass of the tests but with -enable-nodefeature-api
(i.e. NodeFeature API enabled and gRPC disabled). This should give basic
confidence that the API actually works and form a basis for further
imporovements on testing the new CRD API.
Drop the pod-security.kubernetes.io/enforce label from the test
namespace, i.e. remove pod security admission enforcement. NFD-worker
uses restricted host mounts (/sys) etc so pod creation fails even in
privileged mode if pod security admission enforcement is enabled.
Only generate CRDs once in the beginning of the test run. Use the "Ordered"
option for the test container so that we can utilize ginkgo.BeforeAll to
only do stuff once before the first test. Changing from unordered to
ordered shouldn't make a big difference here.
Add a cleanup function to remove stale NodeFeatureRule objects that are
cluster-scoped and not deleted with the test namespace.
Use RuntimeDefault seccomp profile in nfd worker and topology
updater pod spec similar to nfd master.
Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>
After introducing NodeFeatureRule we packed two CRD definitions in one
yaml file. Our e2e-tests were not prepared to that and the file itself
was also renamed so it couldn't even be read by the test suite.
With this change the e2e-tests start to create NodeFeatre CRD in the
test cluster, preparing for the addition of e2e-tests for NodeFeature
API.
Drop the gRPC communication to nfd-master and connect to the Kubernetes
API server directly when updating NodeResourceTopology objects.
Topology-updater already has connection to the API server for listing
Pods so this is not that dramatic change. It also simplifies the code
a lot as there is no need for the NFD gRPC client and no need for
managing TLS certs/keys.
This change aligns nfd-topology-updater with the future direction of
nfd-worker where the gRPC API is being dropped and replaced by a
CRD-based API.
This patch also update deployment files and documentation to reflect
this change.
Fixes stricter API check on daemonset pod spec that started to cause e2e
test failures. RestartPolicyNever that we previously set (by defaylt)
isn't compatible with DaemonSets.
The new package should provide pod-related utilities,
hence let's move all the daemonset-related utilities
to their own package as well.
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
By moving those utils in to a seperate package,
we can make the functions names shorter and clearer.
For example, instead of:
```
testutils.NFDWorkerPod(opts...)
testutils.NFDMasterPod(opts...)
testutils.SpecWithContainerImage(...)
```
we'll have:
```
testpod.NFDWorker(opts...)
testpod.NFDMaster(opts...)
testpod.SpecWithContainerImage(...)
```
It will also make the package more isolated and portable.
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
The master pod need these `SecurityContext` configurations
In order to run inside a namespace with restricted policy
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
Change the pod spec generator functions to accept parameterization in
the form of more generic "mutator functions". This makes the addition of
new test specific pod spec customizations a lot cleaner. Plus, hopefully
makes the code a bit more readable as well.
Also, slightly simplify the SpecWithConfigMap() but dropping one
redundant argument.
Inspired by latest contributions by Talor Itzhak (titzhak@redhat.com).
Different tests requires different configuration
of the topology-updater DaemonSet.
Here, we decouple the configuration from the creation part
using `JustBeforeEach` so that each test container
will has its own configuration.
Additional reading:
https://onsi.github.io/ginkgo/#separating-creation-and-configuration-justbeforeeach
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
It might take time for the CRD to get deleted
and it might cause some falkiness in the tests.
Now before we create the CRD, we make sure to delete
the old object, wait for it deletion to complete
and only then create a new CRD object.
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
We might not get the most updated node topology
resource on the first `GET` call.
Hence, put the whole check inside `Eventually`,
and check for the most updated node topology resource on every
iteration.
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
The tested pods have some lax spec wrt security,
hence a restrict podSecurity namespace won't allow running those pods.
In topology-updater tests, the topology-updater pod
needs to run the container as root
so change the namespace podSecurity from restricted to priviliged.
In node-feature-discovery tests, we don't need root access,
so add the required security context configuration.
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
Error strings should not be capitalized (ST1005) & remove the
redundancy from array, slice or map composite literals.
Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>
Add tests covering the basic functionality of NodeFeatureRule objects,
covering different feature types ("flag features", "attribute features"
and "instance features") as well as backreferencing (using the output of
previously run rules) and templating. The test relies on the "fake"
feature source and its default configuration.
We need this fix https://github.com/kubernetes/kubernetes/pull/110875
to have reliable tests, but up until we can bump the k/k deps to 1.25+,
we can't consume it.
So borrow it from k/k repo for the time being.
Signed-off-by: Francesco Romani <fromani@redhat.com>
In some cases (CI) it is useful to run NFD e2e tests using
ephemeral clusters. To save time and bandwidth, it is also useful
to prime the ephemeral cluster with the images under test.
In these circumstances there is no risk of running a stale image,
and having a `Always` PullPolicy hardcoded actually makes
the whole exercise null.
So we add a new option, disabled by default, to make the e2e
manifest use the `IfNotPresent` pull policy, to effectively
cover this use case.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Change the part of the e2e-test configuration that contains
node-specific expected labels and annotations to a list, instead of a
map. This makes the parsing order deterministic and makes it possible to
e.g. have a default at the end of the list that captures "all the rest".
The test was broken twofold: Firstly, the annotation was not checked at
all because the name of the node where nfd-master is running was not
set. Secondly, the annotation prefix was used incorrectly.
Lift the restriction to run custom rule tests on non-master node. Try to
find one but do not fail if that fails. Makes the end-to-end tests
runnable on single-node clusters such a simple minikube deployments.
- This patch allows to expose Resource Hardware Topology information
through CRDs in Node Feature Discovery.
- In order to do this we introduce another software component called
nfd-topology-updater in addition to the already existing software
components nfd-master and nfd-worker.
- nfd-master was enhanced to communicate with nfd-topology-updater
over gRPC followed by creation of CRs corresponding to the nodes
in the cluster exposing resource hardware topology information
of that node.
- Pin kubernetes dependency to one that include pod resource implementation
- This code is responsible for obtaining hardware information from the system
as well as pod resource information from the Pod Resource API in order to
determine the allocatable resource information for each NUMA zone. This
information along with Costs for NUMA zones (obtained by reading NUMA distances)
is gathered by nfd-topology-updater running on all the nodes
of the cluster and propagate NUMA zone costs to master in order to populate
that information in the CRs corresponding to the nodes.
- We use GHW facilities for obtaining system information like CPUs, topology,
NUMA distances etc.
- This also includes updates made to Makefile and Dockerfile and Manifests for
deploying nfd-topology-updater.
- This patch includes unit tests
- As part of the Topology Aware Scheduling work, this patch captures
the configured Topology manager scope in addition to the Topology manager policy.
Based on the value of both attribues a single string will be populated to the CRD.
The string value will be on of the following {SingleNUMANodeContainerLevel,
SingleNUMANodePodLevel, BestEffort, Restricted, None}
Co-Authored-by: Artyom Lukianov <alukiano@redhat.com>
Co-Authored-by: Francesco Romani <fromani@redhat.com>
Co-Authored-by: Talor Itzhak <titzhak@redhat.com>
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>