The Kuberenetes pod resource API now exposing the memory and hugepages information
for guaranteed pods. We can use this information to update NodeResourceTopology
resource with memory and hugepages data.
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
The methods are used during calculation of reserved memory for system workloads.
The calcualation is `resourceCapacity - resourceAllocatable`.
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
Use 'go generate' for auto-generating code. Drop the old 'mock' and
'apigen' makefile targets. Those are replaced with a single
make generate
which (re-)generates everything.
There have been recent changes made to the noderesourcetopology API
storing the proto file generated using go-to-protobuf tool and
this code inports the proto generated in the API in the topology-updater.proto
The PRs corresponding to the changes are as follows:
https://github.com/k8stopologyawareschedwg/noderesourcetopology-api/pull/9https://github.com/k8stopologyawareschedwg/noderesourcetopology-api/pull/13
Commands used to generate topology-updater.pb.go file:
go install github.com/golang/protobuf/protoc-gen-go@v1.4.3
go mod vendor
protoc --go_opt=paths=source_relative --go_out=plugins=grpc:. pkg/topologyupdater/topology-updater.proto -I. -Ivendor
As part of implmentation of this patch, reserved (non-allocatable) CPUs
are evaluated by performing a difference between all the CPUs on a system
(determined by using ghw) and allocatable CPUs (determined by querying
GetAllocatableResources podResource API endpoint).
When aggregator creates the NUMA zones, it will skip the zone creation if
there are no allocatable resources. In this update we creates those missing
zone with zero allocatable/available resources so we won't have holes in the
array of reported zones.
Co-Authored-by: Talor Itzhak <titzhak@redhat.com>
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
For accounting we should consider all guaranteed pods with
integral CPU requests and all the pods with device requests
This patch ensures that pods are only considered
for accounting disregarding non-guranteed pods without any
device request.
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
- Files obtained after running make mock
- Run `go get github.com/vektra/mockery` and make sure that
mockery is in your $PATH
- run `make mock`
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
- This patch allows to expose Resource Hardware Topology information
through CRDs in Node Feature Discovery.
- In order to do this we introduce another software component called
nfd-topology-updater in addition to the already existing software
components nfd-master and nfd-worker.
- nfd-master was enhanced to communicate with nfd-topology-updater
over gRPC followed by creation of CRs corresponding to the nodes
in the cluster exposing resource hardware topology information
of that node.
- Pin kubernetes dependency to one that include pod resource implementation
- This code is responsible for obtaining hardware information from the system
as well as pod resource information from the Pod Resource API in order to
determine the allocatable resource information for each NUMA zone. This
information along with Costs for NUMA zones (obtained by reading NUMA distances)
is gathered by nfd-topology-updater running on all the nodes
of the cluster and propagate NUMA zone costs to master in order to populate
that information in the CRs corresponding to the nodes.
- We use GHW facilities for obtaining system information like CPUs, topology,
NUMA distances etc.
- This also includes updates made to Makefile and Dockerfile and Manifests for
deploying nfd-topology-updater.
- This patch includes unit tests
- As part of the Topology Aware Scheduling work, this patch captures
the configured Topology manager scope in addition to the Topology manager policy.
Based on the value of both attribues a single string will be populated to the CRD.
The string value will be on of the following {SingleNUMANodeContainerLevel,
SingleNUMANodePodLevel, BestEffort, Restricted, None}
Co-Authored-by: Artyom Lukianov <alukiano@redhat.com>
Co-Authored-by: Francesco Romani <fromani@redhat.com>
Co-Authored-by: Talor Itzhak <titzhak@redhat.com>
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
Setup the topologyupdater API for gRPC communication of
nfd-topology-updater with master
We generate pb.go file to reflect latest dependency changes
using github.com/golang/protobuf/protoc-gen-go and generate
grpc files via:
`protoc pkg/topologyupdater/topology-updater.proto --go_out=plugins=grpc:.`
Please refer to: https://github.com/k8stopologyawareschedwg/noderesourcetopology-api/blob/master/pkg/apis/topology/v1alpha1/types.go
Co-Authored-by: Artyom Lukianov <alukiano@redhat.com>
Co-Authored-by: Francesco Romani <fromani@redhat.com>
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
Specify a new interface for managing "raw" feature data. This is the
first step to separate raw feature data from node labels. None of the
feature sources implement this interface, yet.
This patch unifies the data format of "raw" features by dividing them
into three different basic types.
- keys, a set of names without any associated values, e.g. CPUID flags
or loaded kernel modules
- values, a map of key-value pairs, for features with a single value,
e.g. kernel config flags or os version
- instances, a list of instances each of which has multiple attributes
(key-value pairs of their own), e.g. PCI or USB devices
The new feature data types are defined in a new "pkg/api/feature"
package, catering decoupling and re-usability of code e.g. within future
extentions of the NFD gRPC API.
Rename the Discover() method of LabelSource interface to GetLabels().
Implement new registration infrastructure under the "source" package.
This change loosens the coupling between label sources and the
nfd-worker, making it easier to refactor and move the code around.
Also, create a separate interface (ConfigurableSource) for configurable
feature sources in order to eliminate boilerplate code.
Add safety checks to the sources that they actually implement the
interfaces they should.
In sake of consistency and predictability (of behavior) change all
methods of the sources to use pointer receivers.
Add simple unit tests for the new functionality and include source/...
into make test target.
Refactor the worker code and split out gRPC client connection handling
into a separate base type. The intent is to promote re-usability of code
for other NFD clients, too.
Add a separate label namespace for profile labels, intended for
user-specified higher level "meta features". Also sub-namespaces of this
(i.e. <sub-ns>.profile.node.kubernetes.io) are allowed.
Allow <sub-ns>.feature.node.kubernetes.io label namespaces. Makes it
possible to have e.g. vendor specific label ns without the need to user
-extra-label-ns.
For auto-generating api(s).
Also, re-generate/refresh the gRPC with `make apigen` (with protoc
v3.17.3 and protoc-gen-go from github.com/golang/protobuf v1.5.2) to
sync up things.
Reduce default log verbosity. Only print out labels if log verbosity is
1 or higher ('core.klog.v: 1' config file option or '-v 1' on command
line). Also, dump the labels in a reproducible (sorted) format.
The code should be stable enough. If there are fatal bugs causing the
discovery to panic/segfault that should be made visible instead of
semi-siently hiding it. Also, this caused one (negative) test case to
fail undetected which is now fixed.
Changes the behaviour so that if the specified configuration file exists
it must be valid. Error out at startup if the config is invalid.
Similarly, exit with an error at runtime if the config file becomes
invalid. Bailing out, instead of just printing an error, was a
deliberate choice in order to make configuration mistakes evident.
Having no configuration file is tolerated, however. If the specified
configuration file does not exists nfd-worker resorts to default
settings.
Add a config file option for controlling the enabled feature sources,
aimed at replacing the --sources command line flag which is now marked
as deprecated. The command line flag takes precedence over the config
file option.
Add a config file option for label whitelisting. Deprecate the
--label-whitelist command line flag. Note that the command line flag has
higher priority than the config file option.
Add a new config file option for (dynamically) controlling the sleep
interval. At the same time, deprecate the --sleep-interval command line
flag. The command line flag takes precedence over the config file option.
Allows dynamic (re-)configuration of most nfd-worker options. The goal
is to have most configuration parameters specified in the configuration
file and deprecate most of the command line flags. The priority is
intended to be such that command line flags override whatever is
specified in the configuration file. Thus, specifying something on the
command line effectively disables dynamic configurability of that
parameter.
This patch adds core.noPublish config file option to demonstrate how the
new mechanism is supposed to work. The --no-publish command line flag
takes precedence over this config file option.
Always do re-discovery and re-labeling after a configuration file
change. his way the new config comes into effect immediately, even if
the sleep interval is long (or infinite) # Please enter the commit
message for your changes. Lines starting
Add support for detecting configuration file changes via file system
notifications (fsnotify). Watches are added for the whole directory
chain (up to root directory) so that all changes (even directory
renames) affecting the given configuration file path are captured.
Previously dynamic (re-)configuration of nfd-worker was implemented by
(re-)reading the configuration file on every labeling pass. This was
simple and effective, even if a bit wasteful. However, it didn't provide
asynchronous configuration updates that will be required for e.g.
controlling the "sleep-interval" parameter dynamically which will be
implemented by later patches.
This can be used to help running multiple parallel NFD deployments in
the same cluster. The flag changes the node annotation namespace to
<instance>.nfd.node.kubernetes.io allowing different nfd-master intances
to store metadata in separate annotations.
Handle both creation and parsing of the "feature-labels" and
"extended-resources" annotations in the function. I think this is more
logical to keep them together.
When updating node labels and annotations use JSON patches instead of
doing a read-modify-write on the whole node object. Patching is already
being used in managing extended resources so some of the existing code
was re-usable.
This patch should mitigate the problem of node update failures caused by
race conditions (a change in the node object between our read and write)
resulting e.g. in errors/restarts in nfd worker pods.
For historical reasons the labels in the default nfd namespace have been
internally represented without the namespace part. I.e. instead of
"feature.node.kubernetes.io/foo" we just use "foo". NFD worker uses this
representation, too, both internally and over the gRPC requests. The
same scheme has been used for annotations.
This patch changes NFD master to use fully namespaced label and
annotation names internally. This hopefully makes the code a bit more
understandable. It also addresses some corner cases making the handling
of label names consistent, making it possible to use both "truncated"
and fully namespaced names over the gRPC interface (and in the
annotations).
A new special value 'all' is a shortcut for enabling all feature
sources. It should be the only name specified -- if any other names are
specified 'all' does not take effect, but, we only enable the listed
feature sources. E.g.
--sources=all enables all sources, but
--sources=all,cpu only enables the cpu source
Also, print a warning if unknown sources are specified.
A new sub-command like flag for cleaning up a cluster. When --prune is
specified nfd-master removes all NFD related labels, annotations and
extended resources from all nodes of the cluster and exits.
This should help undeployment of NFD and be useful for development.
Dumb re-read/re-parse of the configuration file on every round of
discoery. Probably not the most elegant solution to watch for config
file changes, but, it works and doesn't cost much overhead.
Extend the FeatureSource interface with new methods for configuration
handling. This enables easier on-the fly reconfiguration of the
feature sources. Further, it simplifies adding config support to feature
sources in the future. Stub methods are added to sources that do not
currently have any configurability.
The patch fixes some (corner) cases with the overrides (--options)
handling, too:
- Overrides were not applied if config file was missing or its parsing
failed
- Overrides for a certain source did not have effect if an empty config
for the source was specified in the config file. This was caused by
the first pass of parsing (config file) setting a nil pointer to the
source-specific config, effectively detaching it from the main config.
The second pass would then create a new instance of the source
specific config, but, this was not visible in the feature source, of
course.
Make the list of enabled sources and the label whitelist regexp members
of the nfdWorker instance. Get rid of the not-that-well-defined
configureParameters() function.
Unify handling of --label-whitelist in nfd-worker and nfd-master. That is,
in nfd-worker, apply the regexp filter on non-namespaced part of the
label name.
Brief history:
1. Originally the whitelist regexp was applied on the full namespaced
label name (that would be e.g.
'feature.node.kubernetes.io/cpu-cpuid.AVX' in the current nfd version)
2. Commit 81752b2d changed the behavior so that the regexp was applied
on the non-namespaced part (that would be `cpu-cpuid.AVX`)
3. Commit 40918827 added support for custom label namespaces. With this
change, the label whitelist handling diverged between nfd-worker and
nfd-master. In nfd-master the whitelist regexp is always applied on
the non-namespaced label name. However, in nfd-worker the whitelist
handling is two-fold (and inconsistent): for labels in the standard
nfd namespace regexp is applied on the non-namespaced part (e.g.
`cpu-cpuid.AVX`, but, for labels in custom namespaces the regexp is
applied on the full name (e.g. `example.com/my-feature`).
This patch changes nfd-worker to behave similarly to nfd-master. The
namespace part is now always omitted, which should be easier for the
users to comprehend.
Also, fixes a bug in the label name prefixing so that the name of the
feature source is not prefixed into labels with custom label namespace
(effectively mangling the intended namespace). For example, previously a
'example.com/feature' label from the 'custom' feature source would be
prefixed with the source name, mangling it to
'custom-example.com/feature'.
This builds on the PCI support to enable the discovery of USB devices.
This is primarily intended to be used for the discovery of Edge-based
heterogeneous accelerators that are connected via USB, such as the Coral
USB Accelerator and the Intel NCS2 - our main motivation for adding this
capability to NFD, and as part of our work in the SODALITE H2020
project.
USB devices may define their base class at either the device or
interface levels. In the case where no device class is set, the
per-device interfaces are enumerated instead. USB devices may
furthermore have multiple interfaces, which may or may not use the
identical class across each interface. We therefore report device
existence for each unique class definition to enable more fine-grained
labelling and node selection.
The default labelling format includes the class, vendor and device
(product) IDs, as follows:
feature.node.kubernetes.io/usb-fe_1a6e_089a.present=true
As with PCI, a subset of device classes are whitelisted for matching.
By default, there are only a subset of device classes under which
accelerators tend to be mapped, which is used as the basis for
the whitelist. These are:
- Video
- Miscellaneous
- Application Specific
- Vendor Specific
For those interested in matching other classes, this may be extended
by using the UsbId rule provided through the custom source. A full
list of class codes is provided by the USB-IF at:
https://www.usb.org/defined-class-codes
For the moment, owing to a lack of a demonstrable use case, neither
the subclass nor the protocol information are exposed. If this
becomes necessary, support for these attributes can be trivially
added.
Signed-off-by: Paul Mundt <paul.mundt@adaptant.io>
Just print a warning instead of exiting with an error if no version has
been specified at build-time. This was pointless and just annoying at
development time when doing builds with go directly.
This adds support for making selected labels extended resources.
Labels which have integer values, can be promoted to Kubernetes extended
resources by listing them to the added command line flag
`--resource-labels`. These labels won't then show in the node label
section, they will appear only as extended resources.
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
Add 'cpuid/attributeBlacklist' and 'cpuid/attributeWhitelist' config
options for the cpu feature source. These can be used to filter the set
of cpuid capabilities that get published. The intention is to reduce
clutter in the NFD label space, getting rid of "obvious" or misleading
cpuid labels. Whitelisting has higher priority, i.e. only whitelist
takes effect if both attributeWhitelist and attributeBlacklist are
specified.