1
0
Fork 0
mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2024-12-14 11:57:51 +00:00
Commit graph

51 commits

Author SHA1 Message Date
Markus Lehtonen
9624d182ab deployment/kustomize: drop nfd-master service
Not needed anymore as we're not relying on gRPC anymore.
2023-12-08 14:53:23 +02:00
Carlos Eduardo Arango Gutierrez
c0063be4f4
Discover node features as annotations
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: bebc <mchf1990212@gmail.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
2023-10-25 19:58:58 +02:00
Markus Lehtonen
98c3b0750d nfd-gc: add metrics
Implements three metrics for nfd-gc:

- nfd_gc_build_info: version information of nfd-gc.
- nfd_gc_objects_deleted_total: total number of NodeFeature and
  NodeResourceTopology objects deleted by nfd-gc.
- nfd_gc_object_delete_failures_total: number of errors encountered when
  deleting NodeFeature and NodeResourceTopology objects.
2023-10-09 13:39:28 +00:00
Muyassarov, Feruzjon
06036a62ce Replace gRPC health probe utility with k8s built-in health probe
Kubernetes 1.23 has introduced native health probes for gRPC which
can replace grpc_health_probe utility. This commit removes baking
in grpc_health_probe binary into the image and updates related
health checks to use k8s native gRPC.

Signed-off-by: Muyassarov, Feruzjon <feruzjon.muyassarov@intel.com>
2023-09-20 12:25:36 +03:00
Markus Lehtonen
6cf29bd8ef deployment/kustomize: support nfd-gc
Rename the old "topology-gc" to just "gc". Simplify the setup a bit by
including the RBAC rules in the "gc" base.

Note: we don't enable nfd-gc in the default overlay, yet, as the
NodeFeature API isn't enabled (gc is not needed).
2023-08-23 10:56:12 +03:00
Markus Lehtonen
06b333db1e nfd-topology-updater: add metrics support
For now, add only one metric, a counter for the errors occurring while
scanning pod resources on the node.
2023-08-04 16:48:37 +03:00
Markus Lehtonen
7e375ad1f0 generate: bump tools to their latest versions
Bump tools versions and re-auto-generate files.
2023-07-27 14:29:48 +03:00
Carlos Eduardo Arango Gutierrez
e3aedd33e2
Enable metrics via prometheus operator
Expose metrics via prometheus.monitoring.coreos.com/v1

The exposed metrics are

| Metric        | Type | Meaning |
| --------------- | ---------------- | ---------------- |
|  `nfd_master_build_info`           | Gauge | Version from which nfd-master was built. |
|  `nfd_worker_build_info`           | Gauge | Version from which nfd-worker was built. |
|  `nfd_updated_nodes`           |  Counter | Time taken to label a node |
|  `nfd_crd_processing_time`          |  Gauge | Time taken to process a NodeFeatureRule CRD |
| `nfd_feature_discovery_duration_seconds` |  HistogramVec | Time taken to discover features on a node |

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
2023-07-21 10:59:52 +02:00
Markus Lehtonen
457fc8483b deployment/kustomize: use a named port for nfd gRPC service 2023-06-06 21:00:42 +03:00
PiotrProkop
272fd4784f Add new flag enable-leader-election for nfd-master.
It allows NFD-master to be run in active-passive way when running
multiple instances of NFD-master to prevent multiple components
from updating same custom resources.

Signed-off-by: PiotrProkop <pprokop@nvidia.com>
2023-05-15 13:30:07 +02:00
Markus Lehtonen
a5ec646c48 generate: update controller-gen to v0.11.3
Update controller-gen tool from sigs.k8s.io/controller-tools to the
latest release.

Also, bump goimports from golang.org/x/tools to the latest version.
2023-04-19 12:48:12 +03:00
Fabiano Fidêncio
250aea4741
Create extended resources with NodeFeatureRule
Add support for management of Extended Resources via the
NodeFeatureRule CRD API.

There are usage scenarios where users want to advertise features
as extended resources instead of labels (or annotations).

This patch enables the discovery of extended resources, via annotation
and patch of node.status.capacity and node.status.allocatable. By using
the NodeFeatureRule API.

Co-authored-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
Co-authored-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-04-07 16:14:56 +02:00
AhmedGrati
3fff409f6d Add master config file
Similar to the nfd-worker, in this PR we want to support the
dynamic run-time configurability through a config file for the nfd-master.

We'll use a json or yaml configuration file along with the fsnotify in
order to watch for changes in the config file. As a result, we're
allowing dynamic control of logging params, allowed namespaces,
extended resources, label whitelisting, and denied namespaces.

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-04-03 09:52:09 +01:00
Jose Luis Ojosnegros Manchón
d1d1eda0d2 nrt-api: Update to v0.1.0 to use v1alpha2 2023-02-09 12:03:18 +01:00
AhmedGrati
743c877ad8 deployment: disable service links in NFD master pod
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-01-27 16:55:18 +01:00
PiotrProkop
59afae50ba Add NodeResourceTopology garbage collector
NodeResourceTopology(aka NRT) custom resource is used to enable NUMA aware Scheduling in Kubernetes.
As of now node-feature-discovery daemons are used to advertise those
resources but there is no service responsible for removing obsolete
objects(without corresponding Kubernetes node).

This patch adds new daemon called nfd-topology-gc which removes old
NRTs.

Signed-off-by: PiotrProkop <pprokop@nvidia.com>
2023-01-11 10:15:21 +01:00
Markus Lehtonen
dfda9bccad apis/nfd: update auto-generated code 2022-12-22 17:58:20 +02:00
Markus Lehtonen
6ddd87e465 nfd-master: support NodeFeature objects
Add initial support for handling NodeFeature objects. With this patch
nfd-master watches NodeFeature objects in all namespaces and reacts to
changes in any of these. The node which a certain NodeFeature object
affects is determined by the "nfd.node.kubernetes.io/node-name"
annotation of the object. When a NodeFeature object targeting certain
node is changed, nfd-master needs to process all other objects targeting
the same node, too, because there may be dependencies between them.

Add a new command line flag for selecting between gRPC and NodeFeature
CRD API as the source of feature requests. Enabling NodeFeature API
disables the gRPC interface.

 -enable-nodefeature-api   enable NodeFeature CRD API for incoming
                           feature requests, will disable the gRPC
                           interface (defaults to false)

It is not possible to serve gRPC and watch NodeFeature objects at the
same time. This is deliberate to avoid labeling races e.g. by nfd-worker
sending gRPC requests but NodeFeature objects in the cluster
"overriding" those changes (labels from the gRPC requests will get
overridden when NodeFeature objects are processed).
2022-12-14 07:31:28 +02:00
Markus Lehtonen
237494463b nfd-worker: support creating NodeFeatures object
Support the new NodeFeatures object of the NFD CRD api. Add two new
command line options to nfd-worker:

 -kubeconfig               specifies the kubeconfig to use for
                           connecting k8s api (defaults to empty which
                           implies in-cluster config)
 -enable-nodefeature-api   enable the NodeFeature CRD API for
                           communicating node features to nfd-master,
                           will also automatically disable gRPC
                           (defgault to false)

No config file option for selecting the API is available as there should
be no need for dynamically selecting between gRPC and CRD. The
nfd-master configuration must be changed in tandem and it is safer (and
avoid awkward configuration races) to configure the whole NFD deployment
at once.

Default behavior of nfd-worker is not changed i.e. NodeFeatures object
creation is not enabled by default (but must be enabled with the command
line flag).

The patch also updates the kustomize and Helm deployment, adding RBAC
rules for nfd-worker and updating the example worker configuration.
2022-12-14 07:31:28 +02:00
Markus Lehtonen
d1c91e129a apis/nfd: update auto-generated code 2022-12-14 07:31:28 +02:00
Markus Lehtonen
59ebff46c9 apis/nfd: add CRD for communicating node features
Add a new NodeFeature CRD to the nfd Kubernetes API to communicate node
features over K8s api objects instead of gRPC. The new resource is
namespaced which will help the management of multiple NodeFeature
objects per node. This aims at enabling 3rd party detectors for custom
features.

In addition to communicating raw features the NodeFeature object also
has a field for directly requesting labels that should be applied on the
node object.

Rename the crd deployment file to nfd-api-crds.yaml so that it matches
the new content of the file. Also, rename the Helm subdir for CRDs to
match the expected chart directory structure.
2022-12-14 07:31:28 +02:00
Markus Lehtonen
f13ed2d91c nfd-topology-updater: update NodeResourceTopology objects directly
Drop the gRPC communication to nfd-master and connect to the Kubernetes
API server directly when updating NodeResourceTopology objects.
Topology-updater already has connection to the API server for listing
Pods so this is not that dramatic change. It also simplifies the code
a lot as there is no need for the NFD gRPC client and no need for
managing TLS certs/keys.

This change aligns nfd-topology-updater with the future direction of
nfd-worker where the gRPC API is being dropped and replaced by a
CRD-based API.

This patch also update deployment files and documentation to reflect
this change.
2022-12-08 11:03:22 +02:00
Feruzjon Muyassarov
984a3de198 Document tainting feature
Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>
2022-12-02 17:29:10 +02:00
Feruzjon Muyassarov
532e1193ce Add taints field to NodeFeatureRule CR spec
Extend NodeFeatureRule Spec with taints field to allow users to
specify the list of the taints they want to be set on the node if
rule matches.

Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>
2022-12-02 17:25:00 +02:00
Markus Lehtonen
37d51c96f1 deployment: drop stale nfd-api-crds.yaml
Remove a stale unused file that was accidentally committed from an
experimental work.
2022-11-29 13:46:30 +02:00
Garrybest
3ec1b94020 get kubelet config from configz
Signed-off-by: Garrybest <garrybest@foxmail.com>
2022-11-08 23:52:35 +08:00
Markus Lehtonen
9ea787bc99 apis/nfd: update auto-generated code
Re-generate after the latest API change. Involves renaming the crd spec
files.
2022-10-18 18:41:53 +03:00
Feruzjon Muyassarov
60f270d40d Set shortName for NodeFeatureRule CRD
This patch adds a kubebuilder marker to add a short name nfr for
NodeFeatureRule CRD.

Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>
2022-09-28 12:18:49 +03:00
Markus Lehtonen
98228d2069 Update auto-generated artefacts
Latest gofmt changes and update to go v1.19 induce some changes in the
generated files.
2022-09-08 12:45:20 +03:00
Markus Lehtonen
38e763e36c Refresh auto-generated files 2022-08-10 14:24:33 +03:00
Markus Lehtonen
b648d005e1 pkg/apis/nfd: support templating of "vars"
Support templating of var names in a similar manner as labels. Add
support for a new 'varsTemplate' field to the feature rule spec which is
treated similarly to the 'labelsTemplate' field. The value of the field
is processed through the golang "text/template" template engine and the
expanded value must contain variables in <key>=<value> format, separated
by newlines i.e.:

  - name: <rule-name>
    varsTemplate: |
      <label-1>=<value-1>
      <label-2>=<value-2>
      ...

Similar rules as for 'labelsTemplate' apply, i.e.

1. In case of matchAny is specified, the template is executed separately
   against each individual matchFeatures matcher.
2. 'vars' field has priority over 'varsTemplate'
2021-11-25 12:50:47 +02:00
Markus Lehtonen
f75303ce43 pkg/apis/nfd: add variables to rule spec and support backreferences
Support backreferencing of output values from previous rules. Enables
complex rule setups where custom features are further combined together
to form even more sophisticated higher level labels. The labels created
by preceding rules are available as a special 'rule.matched' feature
(for matchFeatures to use).

If referencing rules accross multiple configs/CRDs care must be taken
with the ordering. Processing order of rules in nfd-worker:

1. Static rules
2. Files from /etc/kubernetes/node-feature-discovery/custom.d/
   in alphabetical order. Subdirectories are processed by reading their
   files in alphabetical order.
3. Custom rules from main nfd-worker.conf

In nfd-master, NodeFeatureRule objects are processed in alphabetical
order (based on their metadata.name).

This patch also adds new 'vars' fields to the rule spec. Like 'labels',
it is a map of key-value pairs but no labels are generated from these.
The values specified in 'vars' are only added for backreferencing into
the 'rules.matched' feature. This may by desired in schemes where the
output of certain rules is only used as intermediate variables for other
rules and no labels out of these are wanted.

An example setup:

  - name: "kernel feature"
    labels:
      kernel-feature:
    matchFeatures:
      - feature: kernel.version
        matchExpressions:
          major: {op: Gt, value: ["4"]}

  - name: "intermediate var feature"
    vars:
      nolabel-feature: "true"
    matchFeatures:
      - feature: cpu.cpuid
        matchExpressions:
          AVX512F: {op: Exists}
      - feature: pci.device
        matchExpressions:
          vendor: {op: In, value: ["8086"]}
          device: {op: In, value: ["1234", "1235"]}

  - name: top-level-feature
    matchFeatures:
      - feature: rule.matched
        matchExpressions:
          kernel-feature: "true"
          nolabel-feature: "true"
2021-11-25 12:50:47 +02:00
Kubernetes Prow Robot
da484b7bd3
Merge pull request #550 from marquiz/devel/custom-templating
Templating of custom label names
2021-11-23 12:02:51 -08:00
Markus Lehtonen
c8d73666d6 pkg/apis/nfd: support label name templating
Support templating of label names in feature rules. It is available both
in NodeFeatureRule CRs and in custom rule configuration of nfd-worker.

This patch adds a new 'labelsTemplate' field to the rule spec, making it
possible to dynamically generate multiple labels per rule based on the
matched features. The feature relies on the golang "text/template"
package.  When expanded, the template must contain labels in a raw
<key>[=<value>] format (where 'value' defaults to "true"), separated by
newlines i.e.:

  - name: <rule-name>
    labelsTemplate: |
      <label-1>[=<value-1>]
      <label-2>[=<value-2>]
      ...

All the matched features of 'matchFeatures' directives are available for
templating engine in a nested data structure that can be described in
yaml as:

.
  <domain-1>:
      <key-feature-1>:
        - Name: <matched-key>
        - ...

      <value-feature-1:
        - Name: <matched-key>
          Value: <matched-value>
        - ...

      <instance-feature-1>:
        - <attribute-1-name>: <attribute-1-value>
          <attribute-2-name>: <attribute-2-value>
          ...
        - ...

  <domain-2>:
     ...

That is, the per-feature data available for matching depends on the type
of feature that was matched:

- "key features": only 'Name' is available
- "value features": 'Name' and 'Value' can be used
- "instance features": all attributes of the matched instance are
   available

NOTE: In case of matchAny is specified, the template is executed
separately against each individual matchFeatures matcher and the
eventual set of labels is a superset of all these expansions.  Consider
the following:

  - name: <name>
    labelsTemplate: <template>
    matchAny:
      - matchFeatures: <matcher#1>
      - matchFeatures: <matcher#2>
    matchFeatures: <matcher#3>

In the example above (assuming the overall result is a match) the
template would be executed on matcher#1 and/or matcher#2 (depending on
whether both or only one of them match), and finally on matcher#3, and
all the labels from these separate expansions would be created (i.e. the
end result would be a union of all the individual expansions).

NOTE 2: The 'labels' field has priority over 'labelsTemplate', i.e.
labels specified in the 'labels' field will override any labels
originating from the 'labelsTemplate' field.

A special case of an empty match expression set matches everything (i.e.
matches/returns all existing keys/values). This makes it simpler to
write templates that run over all values. Also, makes it possible to
later implement support for templates that run over all _keys_ of a
feature.

Some example configurations:

  - name: "my-pci-template-features"
    labelsTemplate: |
      {{ range .pci.device }}intel-{{ .class }}-{{ .device }}=present
      {{ end }}
    matchFeatures:
      - feature: pci.device
        matchExpressions:
          class: {op: InRegexp, value: ["^06"]}
          vendor: ["8086"]

  - name: "my-system-template-features"
    labelsTemplate: |
      {{ range .system.osrelease }}system-{{ .Name }}={{ .Value }}
      {{ end }}
    matchFeatures:
      - feature: system.osRelease
        matchExpressions:
          ID: {op: Exists}
          VERSION_ID.major: {op: Exists}

Imaginative template pipelines are possible, of course, but care must be
taken in order to produce understandable and maintainable rule sets.
2021-11-23 21:03:22 +02:00
Markus Lehtonen
c3da439d21 source/memory: implement FeatureSource
Separate feature discovery and creation of feature labels.

Generalize the discovery of nvdimm devices so that they can be matched
in custom label rules in a similar fashion as pci and usb devices.
Available attributes for matching nvdimm devices are limited to:

- devtype
- mode

For numa we now detect the number of numa nodes which can be matched
agains in custom label rules.

Labels created by the memory feature source are unchanged. The new
features being detected are available in custom rules only.

Example custom rule:

  - name: "my memory rule"
    labels:
      my-memory-feature: "true"
    matchFeatures:
      - feature: memory.numa
        matchExpressions:
          "node_count": {op: Gt, value: ["3"]}
      - feature: memory.nv
        matchExpressions:
          "devtype" {op: In, value: ["nd_dax"]}

Also, add minimalist unit test.
2021-11-23 15:08:15 +02:00
Markus Lehtonen
9a02b544a2 source/network: implement FeatureSource
Separate feature discovery and creation of feature labels. Generalize
the feature discovery so that network devices can be matched in custom
label rules in a similar fashion as pci and usb devices. Available
attributes for matching are:

- operstate
- speed
- sriov_numvfs
- sriov_totalvfs

Labels created by the network feature source are unchanged. The new
features being detected are available in custom rules only.

Example custom rule:

  - name: "my network rule"
    labels:
      my-network-feature: "true"
    matchFeatures:
      - feature: network.device
        matchExpressions:
          "operstate": { op: In, value: ["up"] }
          "sriov_numvfs": { op: Gt, value: ["9"] }

Also, add minimalist unit test.
2021-11-23 10:05:38 +02:00
Kubernetes Prow Robot
99d3251c42
Merge pull request #649 from marquiz/devel/storage-feature-source
source/storage: implement FeatureSource
2021-11-22 11:31:32 -08:00
Markus Lehtonen
e6e32a88c3 nfd-master: implement controller for NodeFeatureRule CRs
Implement a simple controller stub that operates on NodeFeatureRule
objects. The controller does not yet have any functionality other than
logging changes in the (NodeFeatureRule) objecs it is watching.

Also update the documentation on the -no-publish flag to match the new
functionality.
2021-11-22 16:57:42 +02:00
Kubernetes Prow Robot
882320f523
Merge pull request #608 from marquiz/devel/deployment-base
deployment: clean up base/topologyupdater-daemonset
2021-11-18 09:13:02 -08:00
Markus Lehtonen
999628418b source/storage: implement FeatureSource
Separate feature discovery and creation of feature labels. Generalize
the feature discovery so that block devices can be matched in custom
label rules in a similar fashion as pci and usb devices. This extends
the discovery to other block queue attributes than 'rotational': now we
also detect 'dax', 'nr_zones' and 'zoned'.

Labels created by the storage feature source are unchanged. The new
features being detected are available in custom rules only.

Example custom rules:

  - name: "my block rule 1"
    labels:
      my-block-feature-1: "true"
    matchFeatures:
      - feature: storage.block
          "rotational": {op: In, value: ["0"]}

  - name: "my block rule 2"
    labels:
      my-block-feature-2: "true"
    matchFeatures:
      - feature: storage.block
          "zoned": {op: In, value: [“host-aware”, “host-managed”]}

Also, add minimalist unit test.
2021-11-18 14:58:33 +02:00
Markus Lehtonen
b96b86bc6c pkg/apis/nfd: drop excess field from the CRD
Drop stale leftover "LabelsTemplate" field from the rule spec.
2021-11-17 16:40:28 +02:00
Markus Lehtonen
c3e2315834 pkg/apis/nfd: specify CRD for custom labeling rules
Add a cluster-scoped Custom Resource Definition for specifying labeling
rules. Nodes (node features, node objects) are cluster-level objects and
thus the natural and encouraged setup is to only have one NFD deployment
per cluster - the set of underlying features of the node stays the same
independent of how many parallel NFD deployments you have. Our extension
points (hooks, feature files and now CRs) can be be used by multiple
actors (depending on us) simultaneously. Having the CRD cluster-scoped
hopefully drives deployments in this direction. It also should make
deployment of vendor-specific labeling rules easy as there is no need to
worry about the namespace.

This patch virtually replicates the source.custom.FeatureSpec in a CRD
API (located in the pkg/apis/nfd/v1alpha1 package) with the notable
exception that "MatchOn" legacy rules are not supported. Legacy rules
are left out in order to keep the CRD simple and clean.

The duplicate functionality in source/custom will be dropped by upcoming
patches.

This patch utilizes controller-gen (from sigs.k8s.io/controller-tools)
for generating the CRD and deepcopy methods. Code can be (re-)generated
with "make generate". Install controller-gen with:

  go install sigs.k8s.io/controller-tools/cmd/controller-gen@v0.7.0

Update kustomize and helm deployments to deploy the CRD.
2021-11-17 13:40:23 +02:00
Swati Sehgal
b444ef95a8 NFD-Topology-Updater: Bump NRT API to version v0.0.12
The NodeResourceTopology API has been made cluster
scoped as in the current context a CR corresponds to
a Node and since Node is a cluster scoped resource it
makes sense to make NRT cluster scoped as well.

Ref: https://github.com/k8stopologyawareschedwg/noderesourcetopology-api/pull/18
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2021-11-16 13:28:23 +00:00
Markus Lehtonen
e342076a5e deployment: clean up base/topologyupdater-daemonset
The base should really have the very bare minimum. Remove all redundant
(at default-value) args and move the others to the specific
topologyupdater kustomize component. This also makes these settings
re-usable in user-specific overlays (that are not based on
topologyupdater-daemonset).
2021-10-06 21:42:31 +03:00
Wei Zhang
4b1e9d7211
deployment: drop the topology updater job 2021-10-06 10:28:37 +08:00
Swati Sehgal
a2c066dc0d topologyupdater: manifests: topologyupdater deployment files
- create an overlay for deployment of all components
- create an overlay for just topologyupdater deployment (to be deployed in
  conjunction with the default overlay)
- create a separate overlay for deployment of master and topologyupdater-job

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2021-09-21 10:48:10 +01:00
Markus Lehtonen
a3b2d97513 kustomize: fix broken master-worker-combined base
Got broken unnoticed with the addition of liveness and readiness probes.
2021-08-19 23:22:28 +03:00
Carlos Eduardo Arango Gutierrez
dece85b394
Add livenessProbe via grpc to nfd-master
Signed-off-by: Carlos Eduardo Arango Gutierrez <carangog@redhat.com>
2021-08-18 10:23:10 -05:00
Markus Lehtonen
1f8a6d7819 kustomize: add standard-combined overlay
Replicates nfd-daemonset-combined.yaml.template.

In addition to the overlay we need to add a separate set of patches
under components/common in order to handle the double-container pod.
2021-08-18 15:10:25 +03:00
Markus Lehtonen
787ebfe441 kustomize: add Job example deployment
Add a new base kustomization for worker Job and an overlay stitching up
the complete deployment. Replaces nfd-worker-job.yaml.template.
2021-08-18 15:10:25 +03:00