1
0
Fork 0
mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2024-12-14 11:57:51 +00:00
Commit graph

467 commits

Author SHA1 Message Date
Markus Lehtonen
7da7fde8f6 nfd-worker: switch to klog
Greatly expands logging capabilities and flexibility with verbosity
options, among other things.
2021-02-25 16:10:43 +02:00
Markus Lehtonen
3ffb7b8fc5 nfd-master: switch to klog 2021-02-25 07:50:37 +02:00
Markus Lehtonen
3fd61eacdb nfd-worker: switch to flag in command line parsing 2021-02-24 12:06:16 +02:00
Markus Lehtonen
47033db9c1 nfd-master: use flag for command line parsing 2021-02-24 12:06:16 +02:00
Markus Lehtonen
6b744d4179 nfd-worker: extend unit test coverage of config handling
Add test cases for verifying the core config.

Also, add asynchronous tests for basic verification of dynamic config
file updates.
2021-02-17 21:52:25 +02:00
Markus Lehtonen
2b24ed2c18 nfd-worker: implement Stop() method 2021-02-17 21:50:58 +02:00
Markus Lehtonen
278ccdb997 source/fake: make the fake source configurable
Enables more flexible testing.
2021-02-17 21:50:58 +02:00
Markus Lehtonen
c2c9dff724 nfd-worker: bail out on invalid config file
Changes the behaviour so that if the specified configuration file exists
it must be valid. Error out at startup if the config is invalid.
Similarly, exit with an error at runtime if the config file becomes
invalid. Bailing out, instead of just printing an error, was a
deliberate choice in order to make configuration mistakes evident.

Having no configuration file is tolerated, however. If the specified
configuration file does not exists nfd-worker resorts to default
settings.
2021-02-17 21:42:50 +02:00
Markus Lehtonen
7e88f00e05 nfd-worker: add core.sources config option
Add a config file option for controlling the enabled feature sources,
aimed at replacing the --sources command line flag which is now marked
as deprecated. The command line flag takes precedence over the config
file option.
2021-02-17 21:36:20 +02:00
Markus Lehtonen
ed177350fc nfd-worker: add core.labelWhiteList config option
Add a config file option for label whitelisting. Deprecate the
--label-whitelist command line flag. Note that the command line flag has
higher priority than the config file option.
2021-02-17 21:35:44 +02:00
Markus Lehtonen
d1d8de944e nfd-worker: add core.sleepInterval config option
Add a new config file option for (dynamically) controlling the sleep
interval. At the same time, deprecate the --sleep-interval command line
flag. The command line flag takes precedence over the config file option.
2021-02-17 21:35:13 +02:00
Markus Lehtonen
e6bdc17d8c nfd-worker: add core config
Allows dynamic (re-)configuration of most nfd-worker options. The goal
is to have most configuration parameters specified in the configuration
file and deprecate most of the command line flags. The priority is
intended to be such that command line flags override whatever is
specified in the configuration file. Thus, specifying something on the
command line effectively disables dynamic configurability of that
parameter.

This patch adds core.noPublish config file option to demonstrate how the
new mechanism is supposed to work. The --no-publish command line flag
takes precedence over this config file option.
2021-02-17 21:35:12 +02:00
Kubernetes Prow Robot
85bde7f749
Merge pull request #431 from marquiz/devel/master-instance-flag
nfd-master: implement --instance flag
2021-02-11 02:40:15 -08:00
Markus Lehtonen
29910464a0 nfd-worker: always re-label after a re-config event
Always do re-discovery and re-labeling after a configuration file
change. his way the new config comes into effect immediately, even if
the sleep interval is long (or infinite) # Please enter the commit
message for your changes. Lines starting
2021-02-10 22:09:27 +02:00
Markus Lehtonen
b6ff514853 nfd-worker: use fsnotify for watching for config file changes
Add support for detecting configuration file changes via file system
notifications (fsnotify). Watches are added for the whole directory
chain (up to root directory) so that all changes (even directory
renames) affecting the given configuration file path are captured.

Previously dynamic (re-)configuration of nfd-worker was implemented by
(re-)reading the configuration file on every labeling pass. This was
simple and effective, even if a bit wasteful. However, it didn't provide
asynchronous configuration updates that will be required for e.g.
controlling the "sleep-interval" parameter dynamically which will be
implemented by later patches.
2021-02-10 22:09:27 +02:00
Markus Lehtonen
6958a6677f nfd-worker: use timer channel for sleep interval 2021-02-10 22:09:27 +02:00
Markus Lehtonen
e52ec3480f nfd-master: implement --instance flag
This can be used to help running multiple parallel NFD deployments in
the same cluster. The flag changes the node annotation namespace to
<instance>.nfd.node.kubernetes.io allowing different nfd-master intances
to store metadata in separate annotations.
2021-02-10 13:48:31 +02:00
Markus Lehtonen
705687192d nfd-master: make updateNodeFeatures a method of nfdMaster 2021-02-10 13:46:59 +02:00
Markus Lehtonen
cdca6d667a nfd-master: make nodeName non-global 2021-02-10 13:46:59 +02:00
Markus Lehtonen
b146508e64 nfd-master: drop separate labelerServer type
Simplify code by changing nfdMaster to implement LabelerServer interface
by itself.
2021-02-10 13:46:59 +02:00
Markus Lehtonen
76b95b6c55 Replace improper usage of filepath.Join with path.Join
In JSON and kubernetes API object names we want to use slashes instead
of the OS dependent file path separator.
2021-02-10 12:54:31 +02:00
Markus Lehtonen
19b8f2cd3d nfd-master: more detailed unit testing of extended resources 2020-11-24 12:45:06 +02:00
Markus Lehtonen
d17743a0b9 nfd-master: handle label annotations in the same func
Handle both creation and parsing of the "feature-labels" and
"extended-resources" annotations in the function. I think this is more
logical to keep them together.
2020-11-24 12:45:06 +02:00
Markus Lehtonen
95ff300d74 nfd-master: patch node object instead of rewriting it
When updating node labels and annotations use JSON patches instead of
doing a read-modify-write on the whole node object. Patching is already
being used in managing extended resources so some of the existing code
was re-usable.

This patch should mitigate the problem of node update failures caused by
race conditions (a change in the node object between our read and write)
resulting e.g. in errors/restarts in nfd worker pods.
2020-11-24 12:45:06 +02:00
Markus Lehtonen
1ea301d272 nfd-master: change statusOp to a more generalized JSON patch
Generalize and rename 'statusOp' type to a more flexible 'JsonPatch'.
Move it to the apihelper package.
2020-11-24 12:45:06 +02:00
Markus Lehtonen
bb1e4c60fb nfd-master: use namespaced label and annotation names internally
For historical reasons the labels in the default nfd namespace have been
internally represented without the namespace part. I.e. instead of
"feature.node.kubernetes.io/foo" we just use "foo". NFD worker uses this
representation, too, both internally and over the gRPC requests. The
same scheme has been used for annotations.

This patch changes NFD master to use fully namespaced label and
annotation names internally. This hopefully makes the code a bit more
understandable. It also addresses some corner cases making the handling
of label names consistent, making it possible to use both "truncated"
and fully namespaced names over the gRPC interface (and in the
annotations).
2020-11-24 12:45:06 +02:00
Markus Lehtonen
29cbb2429c nfd-worker: add special handling for --sources=all
A new special value 'all' is a shortcut for enabling all feature
sources. It should be the only name specified -- if any other names are
specified 'all' does not take effect, but, we only enable the listed
feature sources. E.g.
  --sources=all enables all sources, but
  --sources=all,cpu only enables the cpu source

Also, print a warning if unknown sources are specified.
2020-11-20 16:23:53 +02:00
Artyom Lukianov
f363ba0e92 Update e2e test to work with updated dependencies
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2020-11-18 13:09:13 +02:00
Markus Lehtonen
458dd8dc58 nfd-master: add --kubeconfig flag
Useful with --prune and for development purposes.
2020-09-07 07:51:42 +03:00
Markus Lehtonen
4669770020 nfd-master: implement --prune flag
A new sub-command like flag for cleaning up a cluster. When --prune is
specified nfd-master removes all NFD related labels, annotations and
extended resources from all nodes of the cluster and exits.

This should help undeployment of NFD and be useful for development.
2020-09-07 07:51:42 +03:00
Markus Lehtonen
6869a99ceb nfd-master: fix one docstring 2020-09-07 07:51:42 +03:00
Markus Lehtonen
9e813a559c nfd-worker: reload config on each re-discovery pass
Dumb re-read/re-parse of the configuration file on every round of
discoery. Probably not the most elegant solution to watch for config
file changes, but, it works and doesn't cost much overhead.
2020-05-21 00:59:39 +03:00
Markus Lehtonen
a2b9df5cd3 nfd-worker: rework configuration handling
Extend the FeatureSource interface with new methods for configuration
handling. This enables easier on-the fly reconfiguration of the
feature sources. Further, it simplifies adding config support to feature
sources in the future. Stub methods are added to sources that do not
currently have any configurability.

The patch fixes some (corner) cases with the overrides (--options)
handling, too:
- Overrides were not applied if config file was missing or its parsing
  failed
- Overrides for a certain source did not have effect if an empty config
  for the source was specified in the config file. This was caused by
  the first pass of parsing (config file) setting a nil pointer to the
  source-specific config, effectively detaching it from the main config.
  The second pass would then create a new instance of the source
  specific config, but, this was not visible in the feature source, of
  course.
2020-05-21 00:59:37 +03:00
Markus Lehtonen
c95ad3198c nfd-worker: refactor handling of enabled sources and labels
Make the list of enabled sources and the label whitelist regexp members
of the nfdWorker instance. Get rid of the not-that-well-defined
configureParameters() function.
2020-05-21 00:48:21 +03:00
Markus Lehtonen
818fc4cc70 nfd-worker: fix --label-whitelist
Unify handling of --label-whitelist in nfd-worker and nfd-master. That is,
in nfd-worker, apply the regexp filter on non-namespaced part of the
label name.

Brief history:
1. Originally the whitelist regexp was applied on the full namespaced
   label name (that would be e.g.
   'feature.node.kubernetes.io/cpu-cpuid.AVX' in the current nfd version)

2. Commit 81752b2d changed the behavior so that the regexp was applied
   on the non-namespaced part (that would be `cpu-cpuid.AVX`)

3. Commit 40918827 added support for custom label namespaces. With this
   change, the label whitelist handling diverged between nfd-worker and
   nfd-master. In nfd-master the whitelist regexp is always applied on
   the non-namespaced label name. However, in nfd-worker the whitelist
   handling is two-fold (and inconsistent): for labels in the standard
   nfd namespace regexp is applied on the non-namespaced part (e.g.
   `cpu-cpuid.AVX`, but, for labels in custom namespaces the regexp is
   applied on the full name (e.g. `example.com/my-feature`).

This patch changes nfd-worker to behave similarly to nfd-master. The
namespace part is now always omitted, which should be easier for the
users to comprehend.

Also, fixes a bug in the label name prefixing so that the name of the
feature source is not prefixed into labels with custom label namespace
(effectively mangling the intended namespace). For example, previously a
'example.com/feature' label from the 'custom' feature source would be
prefixed with the source name, mangling it to
'custom-example.com/feature'.
2020-05-20 23:07:13 +03:00
Markus Lehtonen
a65d05bd9c source/panic_fake: rename module to make lint happy 2020-05-20 21:48:06 +03:00
Markus Lehtonen
853609f721 nfd-master: lint fixes 2020-05-20 21:48:06 +03:00
Markus Lehtonen
523aa894a3 pkg/cpuid: lint fixes 2020-05-20 21:48:06 +03:00
Markus Lehtonen
c7b1d67b6b nfd-worker: drop deprecated grpc.WithTimeout 2020-05-20 21:48:06 +03:00
Markus Lehtonen
91f3ddcc45 nfd-worker: lint fixes 2020-05-20 21:48:06 +03:00
Paul Mundt
c0ea69411b usb: Add support for USB device discovery
This builds on the PCI support to enable the discovery of USB devices.

This is primarily intended to be used for the discovery of Edge-based
heterogeneous accelerators that are connected via USB, such as the Coral
USB Accelerator and the Intel NCS2 - our main motivation for adding this
capability to NFD, and as part of our work in the SODALITE H2020
project.

USB devices may define their base class at either the device or
interface levels. In the case where no device class is set, the
per-device interfaces are enumerated instead. USB devices may
furthermore have multiple interfaces, which may or may not use the
identical class across each interface. We therefore report device
existence for each unique class definition to enable more fine-grained
labelling and node selection.

The default labelling format includes the class, vendor and device
(product) IDs, as follows:

	feature.node.kubernetes.io/usb-fe_1a6e_089a.present=true

As with PCI, a subset of device classes are whitelisted for matching.
By default, there are only a subset of device classes under which
accelerators tend to be mapped, which is used as the basis for
the whitelist. These are:

	- Video
	- Miscellaneous
	- Application Specific
	- Vendor Specific

For those interested in matching other classes, this may be extended
by using the UsbId rule provided through the custom source. A full
list of class codes is provided by the USB-IF at:

	https://www.usb.org/defined-class-codes

For the moment, owing to a lack of a demonstrable use case, neither
the subclass nor the protocol information are exposed. If this
becomes necessary, support for these attributes can be trivially
added.

Signed-off-by: Paul Mundt <paul.mundt@adaptant.io>
2020-05-20 16:18:39 +02:00
Markus Lehtonen
409dc11389 Switch to sigs.k8s.io/yaml
Replace github.com/ghodss/yaml.
2020-04-23 16:54:14 +03:00
Kubernetes Prow Robot
6d1aa73ca1
Merge pull request #298 from marquiz/devel/version
version: allow undefined version
2020-03-24 09:46:48 -07:00
Kubernetes Prow Robot
7c4ff52a3c
Merge pull request #290 from adrianchiris/custom_features
Support custom features
2020-03-24 08:26:48 -07:00
Markus Lehtonen
8c964b9daf version: allow undefined version
Just print a warning instead of exiting with an error if no version has
been specified at build-time. This was pointless and just annoying at
development time when doing builds with go directly.
2020-03-20 07:21:43 +02:00
Ukri Niemimuukko
903a939836 nfd-master: add extended resource support
This adds support for making selected labels extended resources.

Labels which have integer values, can be promoted to Kubernetes extended
resources by listing them to the added command line flag
`--resource-labels`. These labels won't then show in the node label
section, they will appear only as extended resources.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2020-03-19 13:19:22 +02:00
Adrian Chiris
192b3d7bdd Add 'custom' feature Source to nfd-worker 2020-03-19 09:32:07 +02:00
Markus Lehtonen
54eaf16871 nfd-master: export label and annotation prefixes
In order to be able to use the constants in end-to-end tests.
2020-02-27 14:21:00 +02:00
Markus Lehtonen
500a9e9b1a apihelpers: use Clientset.CoreV1()
Instead of the deprecated Clientset.Core().
2020-02-05 16:25:57 +02:00
Kubernetes Prow Robot
b4e1885064
Merge pull request #265 from marquiz/devel/worker
nfd-worker: don't connect to master when --no-publish is used
2019-09-03 10:58:59 -07:00
Markus Lehtonen
9c7edd24ca nfd-worker: fix typo and wording in log message 2019-09-03 14:22:52 +03:00
Markus Lehtonen
bd5aaaac78 nfd-worker: don't connect to master when --no-publish is used
Prevent worker from trying to connect to the master when the
--no-publish flag is specified.
2019-08-29 15:49:58 +03:00
Markus Lehtonen
4fb8bd8efc nfd-worker: try connecting to nfd-master for 60s
Instead of erroring out right away, try to connect to nfd-master for 60
seconds until giving up.
2019-07-02 16:42:52 +03:00
Kubernetes Prow Robot
8996be7e09
Merge pull request #231 from Ethyling/change-label-prefix
Allow to change labels namespace
2019-05-14 07:43:13 -07:00
Markus Lehtonen
7c5f7d600e source/cpu: make cpuid configurable
Add 'cpuid/attributeBlacklist' and 'cpuid/attributeWhitelist' config
options for the cpu feature source. These can be used to filter the set
of cpuid capabilities that get published. The intention is to reduce
clutter in the NFD label space, getting rid of "obvious" or misleading
cpuid labels. Whitelisting has higher priority, i.e.  only whitelist
takes effect if both attributeWhitelist and attributeBlacklist are
specified.
2019-05-13 17:17:02 +03:00
Jordan Jacobelli
40918827f6
Allow to change labels namespace
The aim here is to allow to override the default namespace
of NFD. The allowed namespaces are whitelisted.
See https://github.com/kubernetes-sigs/node-feature-discovery/issues/227

Signed-off-by: Jordan Jacobelli <jjacobelli@nvidia.com>
2019-05-09 13:17:52 -07:00
Markus Lehtonen
655f5c5555 sources: move all cpu related features under the cpu source
Remove 'cpuid', 'pstate' and 'rdt' feature sources and move their
functionality under the 'cpu' source. The goal is to have a more
systematic organization of feature sources and labels. After this change
we now basically have one source per type of hw, one for kernel and one
for userspace sw.

Related feature labels are changed, correspondingly, new labels being:
    feature.node.k8s.io/cpu-cpuid.<cpuid flag>
    feature.node.k8s.io/cpu-pstate.turbo
    feature.node.k8s.io/cpu-rdt.<rdt feature>
2019-05-09 20:18:36 +03:00
Markus Lehtonen
470cf8dff2 nfd-master: correct a mistake in unit tests
Annotations were not correctly checked when testing
mockServer.updateNodeFeatures().
2019-05-08 23:07:52 +03:00
Markus Lehtonen
7f43a3db4e nfd-master: fix --label-whitelist
Make the --label-whitelist effective. Previously, it was unused and had
no effect. Also, add simple unit test for that.
2019-05-08 23:07:52 +03:00
Markus Lehtonen
5553259062 apihelpers: drop unused fields from K8sHelpers 2019-05-07 10:37:16 +03:00
Markus Lehtonen
75a8f0c146 Refactor APIHelpers
Remove functionality that was not interacting with Kubernetes API.
Makes the architecture a bit simpler and simplifies testing.
2019-05-06 16:26:41 +03:00
Markus Lehtonen
35d26001e4 nfd-worker: extend unit test to cover 'main'
Also, adds new method WaitForReady() into NfdMaster.

In practice, this quite widely tests nfd-master, too, as the tests
create an instance of NfdMaster and verify that the communication
between master and worker works.
2019-05-06 16:26:41 +03:00
Markus Lehtonen
2de0a019a3 Move most of functionality in cmd/ to pkg/
Move most of the code under cmd/nfd-master and cmd/nfd-worker into new
packages pkg/nfd-master and pk/nfd-worker, respectively. Makes extending
unit tests to "main" functions easier.
2019-05-06 16:26:41 +03:00
Markus Lehtonen
86382afe56 Re-factor cpuid functionality out of source/rdt
Move the cpuid functionality into a separate library package so that it
can be easily re-used by other sources.
2019-04-12 14:36:08 +03:00
Markus Lehtonen
f8bc07952f Fix unit tests after master-worker split
Refactor old tests and add tests for new functions. Add 'test' target in
Makefile.
2019-04-04 22:40:24 +03:00
Markus Lehtonen
39be798472 Split NFD into client and server
Refactor NFD into a simple server-client system. Labeling is now done by
a separate 'nfd-master' server. It is a simple service with small
codebase, designed for easy isolation. The feature discovery part is
implemented in a 'nfd-worker' client which sends labeling requests to
nfd-server, thus, requiring no access/permissions to the Kubernetes API
itself.

Client-server communication is implemented by using gRPC. The protocol
currently consists of only one request, i.e. the labeling request.

The spec templates are converted to the new scheme. The nfd-master
server can be deployed using the nfd-master.yaml.template which now also
contains the necessary RBAC configuration. NFD workers can be deployed
by using the nfd-worker-daemonset.yaml.template or
nfd-worker-job.yaml.template (most easily used with the label-nodes.sh
script).

Only nfd-worker currently support config file or options. The (default)
NFD config file is renamed to nfd-worker.conf.
2019-04-04 22:40:24 +03:00
Markus Lehtonen
c1377589b3 Move version information into a separate module 2019-04-04 22:40:24 +03:00