Bring back the -enable-nodefeature-api command line flag and the
corresponding enableNodeFeatureApi helm config value that were
removed without deprecation when the NodeFeatureAPI feature gate was
introduced. The thinking behind this change is to not break existing
users (without warning) unless totally unavoidable. Now the
-enable-nodefeature-api flag is marked as deprecated and slated for
removal in NFD v0.17.
The NodeFeatureAPI feature gate and the -enable-nodefeature-api flag
work together so that the NodeFeature API is disabled (gRPC is enabled,
instead) if either of them is set to false.
This patch selectively reverts parts of
06c4733bc5.
This started as a small effort to simplify the usage of "ready" channel
in nfd-master. It extended into a wider simplification/unification of
the channel usage.
This change is part of an effort to remove the pkg/apihelper package.
GetKubeconfig is useful helper functionality shared accross the codebase
so move it into a "safe" location.
This patch creates a owner-dependent relationship between the
nfd-worker pod and the NodeFeature object that it creates. With this
change the orphaned NodeFeature object(s) gets automatically
garbage-collected when the nfd-worker pod goes away, without the need
for manual clean-up actions.
Add new autoDefaultNs (default is "true") config option to nfd-master.
Setting the config option to false stops NFD from automatically adding
the "feature.node.kubernetes.io/" prefix to labels, annotations and
extended resources. Taints are not affected as for them no prefix is
automatically added. The user-visible part of enabling the option change
is that NodeFeatureRules, local feature files, hooks and configuration
of the "custom" may need to be altereda (if the auto-prefixing is
relied on).
For now, the config option defaults to "true", meaning no change in
default behavior. However, the intent is to change the default to
"false" in a future release, deprecating the option and eventually
removing it (forcing it to "false").
The goal of stopping doing "auto-prefixing" is to simplify the operation
(of nfd and users). Make the naming more straightforward and easier to
understand and debug (kind of WYSIWYG), eliminating peculiar corner
cases:
1. Make validation simpler and unambiguous
2. Remove "overloading" of names, i.e. the mapping two values to the
same actual name. E.g. previously something like
labels:
feature.node.kubernetes.io/foo: bar
foo: baz
Could actually result in node label:
feature.node.kubernetes.io/foo: baz
3. Make the processing/usagee of the "rule.matched" and "local.labels"
feature in NodeFeatureRules unambiguous and more understadable. E.g.
previously you could have node label
"feature.node.kubernetes.io/local-foo: bar" but in the NodeFeatureRule
you'd need to use the unprefixed name "local-foo" or the fully
prefixed name, depending on what was specified in the feature file (or
hook) on the node(s).
NOTE: setting autoDefaultNs to false is a breaking change for users who
rely on automatic prefixing with the default feature.node.kubernetes.io/
namespace. NodeFeatureRules, feature files, hooks and custom rules
(configuration of the "custom" source of nfd-worker) will need to be
altered. Unprefixed labels, annoations and extended resources will be
denied by nfd-master.
Expose metrics via prometheus.monitoring.coreos.com/v1
The exposed metrics are
| Metric | Type | Meaning |
| --------------- | ---------------- | ---------------- |
| `nfd_master_build_info` | Gauge | Version from which nfd-master was built. |
| `nfd_worker_build_info` | Gauge | Version from which nfd-worker was built. |
| `nfd_updated_nodes` | Counter | Time taken to label a node |
| `nfd_crd_processing_time` | Gauge | Time taken to process a NodeFeatureRule CRD |
| `nfd_feature_discovery_duration_seconds` | HistogramVec | Time taken to discover features on a node |
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
Drop the KlogDump helper in favor of klog.InfoS. However, that patch
introduces a new DelayedDumper() helper to avoid processing
(marshalling) of object unless really evaluated by the logging function.
This PR adds a config option for setting the NFD API controller resync period.
The resync period is only activated when the NodeFeature API has been
enabled (with -enable-nodefeature-api).
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
Refactor the worker code and split out gRPC client connection handling
into a separate base type. The intent is to promote re-usability of code
for other NFD clients, too.
Reduce default log verbosity. Only print out labels if log verbosity is
1 or higher ('core.klog.v: 1' config file option or '-v 1' on command
line). Also, dump the labels in a reproducible (sorted) format.
The code should be stable enough. If there are fatal bugs causing the
discovery to panic/segfault that should be made visible instead of
semi-siently hiding it. Also, this caused one (negative) test case to
fail undetected which is now fixed.
Changes the behaviour so that if the specified configuration file exists
it must be valid. Error out at startup if the config is invalid.
Similarly, exit with an error at runtime if the config file becomes
invalid. Bailing out, instead of just printing an error, was a
deliberate choice in order to make configuration mistakes evident.
Having no configuration file is tolerated, however. If the specified
configuration file does not exists nfd-worker resorts to default
settings.
Add a config file option for controlling the enabled feature sources,
aimed at replacing the --sources command line flag which is now marked
as deprecated. The command line flag takes precedence over the config
file option.
Add a config file option for label whitelisting. Deprecate the
--label-whitelist command line flag. Note that the command line flag has
higher priority than the config file option.
Add a new config file option for (dynamically) controlling the sleep
interval. At the same time, deprecate the --sleep-interval command line
flag. The command line flag takes precedence over the config file option.
Allows dynamic (re-)configuration of most nfd-worker options. The goal
is to have most configuration parameters specified in the configuration
file and deprecate most of the command line flags. The priority is
intended to be such that command line flags override whatever is
specified in the configuration file. Thus, specifying something on the
command line effectively disables dynamic configurability of that
parameter.
This patch adds core.noPublish config file option to demonstrate how the
new mechanism is supposed to work. The --no-publish command line flag
takes precedence over this config file option.
Always do re-discovery and re-labeling after a configuration file
change. his way the new config comes into effect immediately, even if
the sleep interval is long (or infinite) # Please enter the commit
message for your changes. Lines starting
Add support for detecting configuration file changes via file system
notifications (fsnotify). Watches are added for the whole directory
chain (up to root directory) so that all changes (even directory
renames) affecting the given configuration file path are captured.
Previously dynamic (re-)configuration of nfd-worker was implemented by
(re-)reading the configuration file on every labeling pass. This was
simple and effective, even if a bit wasteful. However, it didn't provide
asynchronous configuration updates that will be required for e.g.
controlling the "sleep-interval" parameter dynamically which will be
implemented by later patches.
A new special value 'all' is a shortcut for enabling all feature
sources. It should be the only name specified -- if any other names are
specified 'all' does not take effect, but, we only enable the listed
feature sources. E.g.
--sources=all enables all sources, but
--sources=all,cpu only enables the cpu source
Also, print a warning if unknown sources are specified.
Dumb re-read/re-parse of the configuration file on every round of
discoery. Probably not the most elegant solution to watch for config
file changes, but, it works and doesn't cost much overhead.
Extend the FeatureSource interface with new methods for configuration
handling. This enables easier on-the fly reconfiguration of the
feature sources. Further, it simplifies adding config support to feature
sources in the future. Stub methods are added to sources that do not
currently have any configurability.
The patch fixes some (corner) cases with the overrides (--options)
handling, too:
- Overrides were not applied if config file was missing or its parsing
failed
- Overrides for a certain source did not have effect if an empty config
for the source was specified in the config file. This was caused by
the first pass of parsing (config file) setting a nil pointer to the
source-specific config, effectively detaching it from the main config.
The second pass would then create a new instance of the source
specific config, but, this was not visible in the feature source, of
course.
Make the list of enabled sources and the label whitelist regexp members
of the nfdWorker instance. Get rid of the not-that-well-defined
configureParameters() function.
Unify handling of --label-whitelist in nfd-worker and nfd-master. That is,
in nfd-worker, apply the regexp filter on non-namespaced part of the
label name.
Brief history:
1. Originally the whitelist regexp was applied on the full namespaced
label name (that would be e.g.
'feature.node.kubernetes.io/cpu-cpuid.AVX' in the current nfd version)
2. Commit 81752b2d changed the behavior so that the regexp was applied
on the non-namespaced part (that would be `cpu-cpuid.AVX`)
3. Commit 40918827 added support for custom label namespaces. With this
change, the label whitelist handling diverged between nfd-worker and
nfd-master. In nfd-master the whitelist regexp is always applied on
the non-namespaced label name. However, in nfd-worker the whitelist
handling is two-fold (and inconsistent): for labels in the standard
nfd namespace regexp is applied on the non-namespaced part (e.g.
`cpu-cpuid.AVX`, but, for labels in custom namespaces the regexp is
applied on the full name (e.g. `example.com/my-feature`).
This patch changes nfd-worker to behave similarly to nfd-master. The
namespace part is now always omitted, which should be easier for the
users to comprehend.
Also, fixes a bug in the label name prefixing so that the name of the
feature source is not prefixed into labels with custom label namespace
(effectively mangling the intended namespace). For example, previously a
'example.com/feature' label from the 'custom' feature source would be
prefixed with the source name, mangling it to
'custom-example.com/feature'.