A new sub-command like flag for cleaning up a cluster. When --prune is
specified nfd-master removes all NFD related labels, annotations and
extended resources from all nodes of the cluster and exits.
This should help undeployment of NFD and be useful for development.
Dumb re-read/re-parse of the configuration file on every round of
discoery. Probably not the most elegant solution to watch for config
file changes, but, it works and doesn't cost much overhead.
Extend the FeatureSource interface with new methods for configuration
handling. This enables easier on-the fly reconfiguration of the
feature sources. Further, it simplifies adding config support to feature
sources in the future. Stub methods are added to sources that do not
currently have any configurability.
The patch fixes some (corner) cases with the overrides (--options)
handling, too:
- Overrides were not applied if config file was missing or its parsing
failed
- Overrides for a certain source did not have effect if an empty config
for the source was specified in the config file. This was caused by
the first pass of parsing (config file) setting a nil pointer to the
source-specific config, effectively detaching it from the main config.
The second pass would then create a new instance of the source
specific config, but, this was not visible in the feature source, of
course.
Make the list of enabled sources and the label whitelist regexp members
of the nfdWorker instance. Get rid of the not-that-well-defined
configureParameters() function.
Unify handling of --label-whitelist in nfd-worker and nfd-master. That is,
in nfd-worker, apply the regexp filter on non-namespaced part of the
label name.
Brief history:
1. Originally the whitelist regexp was applied on the full namespaced
label name (that would be e.g.
'feature.node.kubernetes.io/cpu-cpuid.AVX' in the current nfd version)
2. Commit 81752b2d changed the behavior so that the regexp was applied
on the non-namespaced part (that would be `cpu-cpuid.AVX`)
3. Commit 40918827 added support for custom label namespaces. With this
change, the label whitelist handling diverged between nfd-worker and
nfd-master. In nfd-master the whitelist regexp is always applied on
the non-namespaced label name. However, in nfd-worker the whitelist
handling is two-fold (and inconsistent): for labels in the standard
nfd namespace regexp is applied on the non-namespaced part (e.g.
`cpu-cpuid.AVX`, but, for labels in custom namespaces the regexp is
applied on the full name (e.g. `example.com/my-feature`).
This patch changes nfd-worker to behave similarly to nfd-master. The
namespace part is now always omitted, which should be easier for the
users to comprehend.
Also, fixes a bug in the label name prefixing so that the name of the
feature source is not prefixed into labels with custom label namespace
(effectively mangling the intended namespace). For example, previously a
'example.com/feature' label from the 'custom' feature source would be
prefixed with the source name, mangling it to
'custom-example.com/feature'.
This builds on the PCI support to enable the discovery of USB devices.
This is primarily intended to be used for the discovery of Edge-based
heterogeneous accelerators that are connected via USB, such as the Coral
USB Accelerator and the Intel NCS2 - our main motivation for adding this
capability to NFD, and as part of our work in the SODALITE H2020
project.
USB devices may define their base class at either the device or
interface levels. In the case where no device class is set, the
per-device interfaces are enumerated instead. USB devices may
furthermore have multiple interfaces, which may or may not use the
identical class across each interface. We therefore report device
existence for each unique class definition to enable more fine-grained
labelling and node selection.
The default labelling format includes the class, vendor and device
(product) IDs, as follows:
feature.node.kubernetes.io/usb-fe_1a6e_089a.present=true
As with PCI, a subset of device classes are whitelisted for matching.
By default, there are only a subset of device classes under which
accelerators tend to be mapped, which is used as the basis for
the whitelist. These are:
- Video
- Miscellaneous
- Application Specific
- Vendor Specific
For those interested in matching other classes, this may be extended
by using the UsbId rule provided through the custom source. A full
list of class codes is provided by the USB-IF at:
https://www.usb.org/defined-class-codes
For the moment, owing to a lack of a demonstrable use case, neither
the subclass nor the protocol information are exposed. If this
becomes necessary, support for these attributes can be trivially
added.
Signed-off-by: Paul Mundt <paul.mundt@adaptant.io>
Just print a warning instead of exiting with an error if no version has
been specified at build-time. This was pointless and just annoying at
development time when doing builds with go directly.
This adds support for making selected labels extended resources.
Labels which have integer values, can be promoted to Kubernetes extended
resources by listing them to the added command line flag
`--resource-labels`. These labels won't then show in the node label
section, they will appear only as extended resources.
Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
Add 'cpuid/attributeBlacklist' and 'cpuid/attributeWhitelist' config
options for the cpu feature source. These can be used to filter the set
of cpuid capabilities that get published. The intention is to reduce
clutter in the NFD label space, getting rid of "obvious" or misleading
cpuid labels. Whitelisting has higher priority, i.e. only whitelist
takes effect if both attributeWhitelist and attributeBlacklist are
specified.
Remove 'cpuid', 'pstate' and 'rdt' feature sources and move their
functionality under the 'cpu' source. The goal is to have a more
systematic organization of feature sources and labels. After this change
we now basically have one source per type of hw, one for kernel and one
for userspace sw.
Related feature labels are changed, correspondingly, new labels being:
feature.node.k8s.io/cpu-cpuid.<cpuid flag>
feature.node.k8s.io/cpu-pstate.turbo
feature.node.k8s.io/cpu-rdt.<rdt feature>
Also, adds new method WaitForReady() into NfdMaster.
In practice, this quite widely tests nfd-master, too, as the tests
create an instance of NfdMaster and verify that the communication
between master and worker works.
Move most of the code under cmd/nfd-master and cmd/nfd-worker into new
packages pkg/nfd-master and pk/nfd-worker, respectively. Makes extending
unit tests to "main" functions easier.
Refactor NFD into a simple server-client system. Labeling is now done by
a separate 'nfd-master' server. It is a simple service with small
codebase, designed for easy isolation. The feature discovery part is
implemented in a 'nfd-worker' client which sends labeling requests to
nfd-server, thus, requiring no access/permissions to the Kubernetes API
itself.
Client-server communication is implemented by using gRPC. The protocol
currently consists of only one request, i.e. the labeling request.
The spec templates are converted to the new scheme. The nfd-master
server can be deployed using the nfd-master.yaml.template which now also
contains the necessary RBAC configuration. NFD workers can be deployed
by using the nfd-worker-daemonset.yaml.template or
nfd-worker-job.yaml.template (most easily used with the label-nodes.sh
script).
Only nfd-worker currently support config file or options. The (default)
NFD config file is renamed to nfd-worker.conf.