Add a configuration option for controlling the enabled "raw" feature
sources. This is useful e.g. in testing and development, plus it also
allows fully shutting down discovery of features that are not needed in
a deployment. Supplements core.labelSources which controls the
enablement of label sources.
Provide backwards compatibility via a deprecated 'core.sources' config
file option. This will override 'core.labelSources'. A warning is
printed in the log if this option is detected.
The goal is to make the name more descriptive. Also keeping in mind a
possible future addition a 'featureSources' option (or similar) for
controlling the feature discovery.
Use the single-dash (i.e. '-option' instead of '--option') format
consistently accross log messages and documentation. This is the format
that was mostly used, already, and shown by command line help of the
binaries, for example.
Support templating of var names in a similar manner as labels. Add
support for a new 'varsTemplate' field to the feature rule spec which is
treated similarly to the 'labelsTemplate' field. The value of the field
is processed through the golang "text/template" template engine and the
expanded value must contain variables in <key>=<value> format, separated
by newlines i.e.:
- name: <rule-name>
varsTemplate: |
<label-1>=<value-1>
<label-2>=<value-2>
...
Similar rules as for 'labelsTemplate' apply, i.e.
1. In case of matchAny is specified, the template is executed separately
against each individual matchFeatures matcher.
2. 'vars' field has priority over 'varsTemplate'
Support backreferencing of output values from previous rules. Enables
complex rule setups where custom features are further combined together
to form even more sophisticated higher level labels. The labels created
by preceding rules are available as a special 'rule.matched' feature
(for matchFeatures to use).
If referencing rules accross multiple configs/CRDs care must be taken
with the ordering. Processing order of rules in nfd-worker:
1. Static rules
2. Files from /etc/kubernetes/node-feature-discovery/custom.d/
in alphabetical order. Subdirectories are processed by reading their
files in alphabetical order.
3. Custom rules from main nfd-worker.conf
In nfd-master, NodeFeatureRule objects are processed in alphabetical
order (based on their metadata.name).
This patch also adds new 'vars' fields to the rule spec. Like 'labels',
it is a map of key-value pairs but no labels are generated from these.
The values specified in 'vars' are only added for backreferencing into
the 'rules.matched' feature. This may by desired in schemes where the
output of certain rules is only used as intermediate variables for other
rules and no labels out of these are wanted.
An example setup:
- name: "kernel feature"
labels:
kernel-feature:
matchFeatures:
- feature: kernel.version
matchExpressions:
major: {op: Gt, value: ["4"]}
- name: "intermediate var feature"
vars:
nolabel-feature: "true"
matchFeatures:
- feature: cpu.cpuid
matchExpressions:
AVX512F: {op: Exists}
- feature: pci.device
matchExpressions:
vendor: {op: In, value: ["8086"]}
device: {op: In, value: ["1234", "1235"]}
- name: top-level-feature
matchFeatures:
- feature: rule.matched
matchExpressions:
kernel-feature: "true"
nolabel-feature: "true"
Require that the expanded LabelsTemplate has values. That is, the
(expanded) template must consist of key=value pairs separated by
newlines. No default value will be assigned and we now return an error
if a (non-empty) line not conforming with the key=value format is
encountered.
Commit c8d73666d described that the value defaults to "true" if not
specified. That was not the case and we defaulted to an empty string,
instead.
An example:
- name: "my rule"
labelsTemplate: |
my.label.1=foo
my.label.2=
Would create these labels:
"my.label.1": "foo"
"my.label.2": ""
Further, the following:
- name: "my failing rule"
labelsTemplate: |
my.label.3
will cause an error in the rule processing.
Support templating of label names in feature rules. It is available both
in NodeFeatureRule CRs and in custom rule configuration of nfd-worker.
This patch adds a new 'labelsTemplate' field to the rule spec, making it
possible to dynamically generate multiple labels per rule based on the
matched features. The feature relies on the golang "text/template"
package. When expanded, the template must contain labels in a raw
<key>[=<value>] format (where 'value' defaults to "true"), separated by
newlines i.e.:
- name: <rule-name>
labelsTemplate: |
<label-1>[=<value-1>]
<label-2>[=<value-2>]
...
All the matched features of 'matchFeatures' directives are available for
templating engine in a nested data structure that can be described in
yaml as:
.
<domain-1>:
<key-feature-1>:
- Name: <matched-key>
- ...
<value-feature-1:
- Name: <matched-key>
Value: <matched-value>
- ...
<instance-feature-1>:
- <attribute-1-name>: <attribute-1-value>
<attribute-2-name>: <attribute-2-value>
...
- ...
<domain-2>:
...
That is, the per-feature data available for matching depends on the type
of feature that was matched:
- "key features": only 'Name' is available
- "value features": 'Name' and 'Value' can be used
- "instance features": all attributes of the matched instance are
available
NOTE: In case of matchAny is specified, the template is executed
separately against each individual matchFeatures matcher and the
eventual set of labels is a superset of all these expansions. Consider
the following:
- name: <name>
labelsTemplate: <template>
matchAny:
- matchFeatures: <matcher#1>
- matchFeatures: <matcher#2>
matchFeatures: <matcher#3>
In the example above (assuming the overall result is a match) the
template would be executed on matcher#1 and/or matcher#2 (depending on
whether both or only one of them match), and finally on matcher#3, and
all the labels from these separate expansions would be created (i.e. the
end result would be a union of all the individual expansions).
NOTE 2: The 'labels' field has priority over 'labelsTemplate', i.e.
labels specified in the 'labels' field will override any labels
originating from the 'labelsTemplate' field.
A special case of an empty match expression set matches everything (i.e.
matches/returns all existing keys/values). This makes it simpler to
write templates that run over all values. Also, makes it possible to
later implement support for templates that run over all _keys_ of a
feature.
Some example configurations:
- name: "my-pci-template-features"
labelsTemplate: |
{{ range .pci.device }}intel-{{ .class }}-{{ .device }}=present
{{ end }}
matchFeatures:
- feature: pci.device
matchExpressions:
class: {op: InRegexp, value: ["^06"]}
vendor: ["8086"]
- name: "my-system-template-features"
labelsTemplate: |
{{ range .system.osrelease }}system-{{ .Name }}={{ .Value }}
{{ end }}
matchFeatures:
- feature: system.osRelease
matchExpressions:
ID: {op: Exists}
VERSION_ID.major: {op: Exists}
Imaginative template pipelines are possible, of course, but care must be
taken in order to produce understandable and maintainable rule sets.
Implement a private helper type (nameTemplateHelper) for handling
(executing and caching) of templated names. DeepCopy methods are
manually implemented as controller-gen is not able to help with that.
Enable Custom Resource based label creation in nfd-master. This extends
the previously implemented controller stub for watching NodeFeatureRule
objects. NFD-master watches NodeFeatureRule objects in the cluster and
processes the rules on every incoming labeling request from workers.
The functionality relies on the "raw features" (identical to how
nfd-worker handles custom rules) submitted by nfd-worker, making it
independent of the label source configuration of the worker. This means
that the labeling functions as expected even if all sources in the
worker are disabled.
NOTE: nfd-master is stateless and re-labeling only happens on the
reception of SetLabelsRequest from the worker – i.e. on intervals
specified by the core.sleepInterval configuration option (or
-sleep-interval cmdline flag) of each nfd-worker instance. This means
that modification/creation of NodeFeatureRule objects does not
automatically update the node labels. Instead, the changes only come
visible when workers send their labeling requests.
Add a new command line flag for disabling/enabling the controller for
NodeFeatureRule objects. In practice, disabling the controller disables
all labels generated from rules in NodeFeatureRule objects.
Implement a simple controller stub that operates on NodeFeatureRule
objects. The controller does not yet have any functionality other than
logging changes in the (NodeFeatureRule) objecs it is watching.
Also update the documentation on the -no-publish flag to match the new
functionality.
Add auto-generated code for interfacing our CRD API. On top of this, a
CR controller can be implemented. This patch uses k8s/code-generator
for code generation. Run "make generate" in order to (re-)generate
everything. Path to the code-generator repository may need to be
specified:
K8S_CODE_GENERATOR=path/to/code-generator make apigen
Code-generator version 0.20.7 was used to create this patch. Install
k8s code-generator tools and clone the repo with:
git clone https://github.com/kubernetes/code-generator -b v0.20.7 <path/to/code-generator>
go install k8s.io/code-generator/cmd/...(at)v0.20.7
Move the rule processing of matchFeatures and matchAny from
source/custom package over to pkg/apis/nfd, aiming for better integrity
and re-usability of the code. Does not change the CRD API as such, just
adds more supportive functions.
Having a dedicated type makes it possible to specify deepcopy functions
for it. We need to do this manually as deepcopy-gen doesn't know how to
create copies of regexps.
Add a cluster-scoped Custom Resource Definition for specifying labeling
rules. Nodes (node features, node objects) are cluster-level objects and
thus the natural and encouraged setup is to only have one NFD deployment
per cluster - the set of underlying features of the node stays the same
independent of how many parallel NFD deployments you have. Our extension
points (hooks, feature files and now CRs) can be be used by multiple
actors (depending on us) simultaneously. Having the CRD cluster-scoped
hopefully drives deployments in this direction. It also should make
deployment of vendor-specific labeling rules easy as there is no need to
worry about the namespace.
This patch virtually replicates the source.custom.FeatureSpec in a CRD
API (located in the pkg/apis/nfd/v1alpha1 package) with the notable
exception that "MatchOn" legacy rules are not supported. Legacy rules
are left out in order to keep the CRD simple and clean.
The duplicate functionality in source/custom will be dropped by upcoming
patches.
This patch utilizes controller-gen (from sigs.k8s.io/controller-tools)
for generating the CRD and deepcopy methods. Code can be (re-)generated
with "make generate". Install controller-gen with:
go install sigs.k8s.io/controller-tools/cmd/controller-gen@v0.7.0
Update kustomize and helm deployments to deploy the CRD.
Create a new package pkg/apis/nfd/v1alpha1 and migrate the custom rule
expressions over there. This is the first step in creating a new CRD API
for custom rules.
Enable transmitting the discovered "raw" features over the gRPC API.
Extend pkg/api/feature with protobuf and gRPC code. In this, utilize
go-to-protobuf from k8s code-generator for auto-generating the gRPC
interface from golang code. The tool can be Installed with:
go install k8s.io/code-generator/cmd/go-to-protobuf@v0.20.7
The auto-generated code is (re-)generated/updated with "make apigen".
The NodeResourceTopology API has been made cluster
scoped as in the current context a CR corresponds to
a Node and since Node is a cluster scoped resource it
makes sense to make NRT cluster scoped as well.
Ref: https://github.com/k8stopologyawareschedwg/noderesourcetopology-api/pull/18
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
The Kuberenetes pod resource API now exposing the memory and hugepages information
for guaranteed pods. We can use this information to update NodeResourceTopology
resource with memory and hugepages data.
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
The methods are used during calculation of reserved memory for system workloads.
The calcualation is `resourceCapacity - resourceAllocatable`.
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
Use 'go generate' for auto-generating code. Drop the old 'mock' and
'apigen' makefile targets. Those are replaced with a single
make generate
which (re-)generates everything.
There have been recent changes made to the noderesourcetopology API
storing the proto file generated using go-to-protobuf tool and
this code inports the proto generated in the API in the topology-updater.proto
The PRs corresponding to the changes are as follows:
https://github.com/k8stopologyawareschedwg/noderesourcetopology-api/pull/9https://github.com/k8stopologyawareschedwg/noderesourcetopology-api/pull/13
Commands used to generate topology-updater.pb.go file:
go install github.com/golang/protobuf/protoc-gen-go@v1.4.3
go mod vendor
protoc --go_opt=paths=source_relative --go_out=plugins=grpc:. pkg/topologyupdater/topology-updater.proto -I. -Ivendor
As part of implmentation of this patch, reserved (non-allocatable) CPUs
are evaluated by performing a difference between all the CPUs on a system
(determined by using ghw) and allocatable CPUs (determined by querying
GetAllocatableResources podResource API endpoint).
When aggregator creates the NUMA zones, it will skip the zone creation if
there are no allocatable resources. In this update we creates those missing
zone with zero allocatable/available resources so we won't have holes in the
array of reported zones.
Co-Authored-by: Talor Itzhak <titzhak@redhat.com>
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
For accounting we should consider all guaranteed pods with
integral CPU requests and all the pods with device requests
This patch ensures that pods are only considered
for accounting disregarding non-guranteed pods without any
device request.
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
- Files obtained after running make mock
- Run `go get github.com/vektra/mockery` and make sure that
mockery is in your $PATH
- run `make mock`
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
- This patch allows to expose Resource Hardware Topology information
through CRDs in Node Feature Discovery.
- In order to do this we introduce another software component called
nfd-topology-updater in addition to the already existing software
components nfd-master and nfd-worker.
- nfd-master was enhanced to communicate with nfd-topology-updater
over gRPC followed by creation of CRs corresponding to the nodes
in the cluster exposing resource hardware topology information
of that node.
- Pin kubernetes dependency to one that include pod resource implementation
- This code is responsible for obtaining hardware information from the system
as well as pod resource information from the Pod Resource API in order to
determine the allocatable resource information for each NUMA zone. This
information along with Costs for NUMA zones (obtained by reading NUMA distances)
is gathered by nfd-topology-updater running on all the nodes
of the cluster and propagate NUMA zone costs to master in order to populate
that information in the CRs corresponding to the nodes.
- We use GHW facilities for obtaining system information like CPUs, topology,
NUMA distances etc.
- This also includes updates made to Makefile and Dockerfile and Manifests for
deploying nfd-topology-updater.
- This patch includes unit tests
- As part of the Topology Aware Scheduling work, this patch captures
the configured Topology manager scope in addition to the Topology manager policy.
Based on the value of both attribues a single string will be populated to the CRD.
The string value will be on of the following {SingleNUMANodeContainerLevel,
SingleNUMANodePodLevel, BestEffort, Restricted, None}
Co-Authored-by: Artyom Lukianov <alukiano@redhat.com>
Co-Authored-by: Francesco Romani <fromani@redhat.com>
Co-Authored-by: Talor Itzhak <titzhak@redhat.com>
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
Setup the topologyupdater API for gRPC communication of
nfd-topology-updater with master
We generate pb.go file to reflect latest dependency changes
using github.com/golang/protobuf/protoc-gen-go and generate
grpc files via:
`protoc pkg/topologyupdater/topology-updater.proto --go_out=plugins=grpc:.`
Please refer to: https://github.com/k8stopologyawareschedwg/noderesourcetopology-api/blob/master/pkg/apis/topology/v1alpha1/types.go
Co-Authored-by: Artyom Lukianov <alukiano@redhat.com>
Co-Authored-by: Francesco Romani <fromani@redhat.com>
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
Specify a new interface for managing "raw" feature data. This is the
first step to separate raw feature data from node labels. None of the
feature sources implement this interface, yet.
This patch unifies the data format of "raw" features by dividing them
into three different basic types.
- keys, a set of names without any associated values, e.g. CPUID flags
or loaded kernel modules
- values, a map of key-value pairs, for features with a single value,
e.g. kernel config flags or os version
- instances, a list of instances each of which has multiple attributes
(key-value pairs of their own), e.g. PCI or USB devices
The new feature data types are defined in a new "pkg/api/feature"
package, catering decoupling and re-usability of code e.g. within future
extentions of the NFD gRPC API.
Rename the Discover() method of LabelSource interface to GetLabels().