1
0
Fork 0
mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2024-12-15 17:50:49 +00:00
Commit graph

242 commits

Author SHA1 Message Date
Jose Luis Ojosnegros Manchón
b340d112a8 topology-updater:compute pod set fingerprint
Add an option to compute the fingerprint of the current pod set on each
node.

Report this new fingerprint using an attribute in NRT object.
2023-02-22 10:22:50 +01:00
Jose Luis Ojosnegros Manchón
1a687cb286 topology-updater: Refactor Scan to expand response
We are gonna add new data to Scan response so better introduce a new
ScanResponse struct as Scan return value to make it easier.
2023-02-22 09:56:28 +01:00
Kubernetes Prow Robot
a92614c292
Merge pull request #1051 from AhmedGrati/feat-add-deny-label-ns-with-wildcard
feat: add deny-label-ns flag which supports wildcard
2023-02-15 03:42:25 -08:00
Kubernetes Prow Robot
38cc370e69
Merge pull request #1054 from PiotrProkop/use-new-nrt-api
Advertise TopologyManger policy and scope as Attributes in NRT api v1alpha2
2023-02-15 01:12:25 -08:00
AhmedGrati
b499799364 feat: add deny-label-ns flag which supports wildcard
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-02-15 09:47:00 +01:00
PiotrProkop
f76fc5bf6b Read Kubelet configuration the same way as Kubelet to apply default values
Signed-off-by: PiotrProkop <pprokop@nvidia.com>
2023-02-15 09:27:25 +01:00
Ville Pihlava
b1c6b229fe Add discovery duration logging. 2023-02-13 12:55:57 +02:00
pprokop
5484babcb1 Advertise TopologyManger policy and scope as Attributes
Signed-off-by: pprokop <pprokop@nvidia.com>
2023-02-10 12:03:11 +01:00
Kubernetes Prow Robot
ac271b3c29
Merge pull request #1050 from VillePihlava/interval-fix
Change nfd-worker to use Ticker instead of After.
2023-02-09 07:54:22 -08:00
Ville Pihlava
2101cb20e4 Change nfd-worker to use Ticker instead of After. 2023-02-09 17:14:39 +02:00
Jose Luis Ojosnegros Manchón
2967f3307a nrt-api: move from v1alpha1 to v1alpha2 2023-02-09 12:29:54 +01:00
Carlos Eduardo Arango Gutierrez
9b3171bce2
nfd-master: always start gRPC server
Don't register gRPC LabelServer when using the NodeFeature option, only
turn the gRPC server on for Health and Readiness probes.
2023-01-16 19:33:15 +01:00
Kubernetes Prow Robot
ea921a8b14
Merge pull request #1024 from PiotrProkop/nrt-garbage-collector
Add NRT garbage collector
2023-01-11 01:59:44 -08:00
PiotrProkop
59afae50ba Add NodeResourceTopology garbage collector
NodeResourceTopology(aka NRT) custom resource is used to enable NUMA aware Scheduling in Kubernetes.
As of now node-feature-discovery daemons are used to advertise those
resources but there is no service responsible for removing obsolete
objects(without corresponding Kubernetes node).

This patch adds new daemon called nfd-topology-gc which removes old
NRTs.

Signed-off-by: PiotrProkop <pprokop@nvidia.com>
2023-01-11 10:15:21 +01:00
PiotrProkop
1bae2867e2 Release v0.0.13 of NodeResourceTopology API added missing TopologyManagerPolicy.
Expose new policies:
* RestrictedContainerLevel
* RestrictedPodLevel
* BestEffortContainerLevel
* BestEffortPodLevel

Signed-off-by: PiotrProkop <pprokop@nvidia.com>
2023-01-09 16:02:12 +01:00
Kubernetes Prow Robot
8eb6640754
Merge pull request #1020 from marquiz/devel/worker-refactor
worker: move code
2022-12-27 00:45:34 -08:00
Kubernetes Prow Robot
e97b2c1579
Merge pull request #1017 from marquiz/devel/nfd-api-optional-fields
apis/nfd: make all fields in NodeFeatureSpec optional
2022-12-27 00:45:28 -08:00
Markus Lehtonen
1026d91d12 worker: move code
Simplify code bu dropping the unnecessary base client package.
2022-12-23 11:38:21 +02:00
Markus Lehtonen
0283f68702 topology-updater: move code
Move and rename the Go package. It has nothing to do with NFD gRPC
client anymore so move it out of the nfd-client package.
2022-12-23 11:37:46 +02:00
Markus Lehtonen
aa97105854 Add common utility function for getting node name 2022-12-23 09:50:15 +02:00
Markus Lehtonen
dfda9bccad apis/nfd: update auto-generated code 2022-12-22 17:58:20 +02:00
Markus Lehtonen
a4fc15a424 apis/nfd: make all fields in NodeFeatureSpec optional
Don't require features to be specified. The creator possibly only wants
to create labels or only some types of features. No need to specify
empty structs for the unused fields.
2022-12-22 17:53:42 +02:00
Markus Lehtonen
f5ae3fe2c7 Simplify usage of ObjectMeta fields
No need to explicitly spell out ObjectMeta as it's embedded in the
object types.
2022-12-19 17:40:10 +02:00
Kubernetes Prow Robot
28a5daa338
Merge pull request #999 from marquiz/fixes/nodefeature-missing
nfd-master: update node if no NodeFeature objects are present
2022-12-19 00:39:44 -08:00
Markus Lehtonen
4c955ad72c nfd-master: update node if no NodeFeature objects are present
Correctly handle the case where no NodeFeature objects exist for certain
node (and NodeFeature API has been enabled with
-enable-nodefeature-api). In this case all the labels should be removed.
2022-12-19 10:22:04 +02:00
Markus Lehtonen
b9c09e6674 nfd-master: update all nodes at startup when NodeFeature API enabled
We want to always update all nodes at startup. Without this patch we
don't get any update event from the controller if no NodeFeature or
NodeFeatureRule objects exist in the cluster. Thus all nodes would stay
untouched whereas we really want to remove all labels from all nodes in
this case.
2022-12-14 21:49:50 +02:00
Kubernetes Prow Robot
d1b314842c
Merge pull request #989 from marquiz/devel/nodefeature-multi-object
nfd-master: handle multiple NodeFeature objects
2022-12-14 07:51:34 -08:00
Markus Lehtonen
740e3af681 nfd-master: implement ratelimiter for nfd api updates
Implement a naive ratelimiter for node update events originating from
the nfd API. We might get a ton of events in short interval. The
simplest example is startup when we get a separate Add event for every
NodeFeature and NodeFeatureRule object. Without rate limiting we
run "update all nodes" separately for each NodeFeatureRule object, plus,
we would run "update node X" separately for each NodeFeature object
targeting node X. This is a huge amount of wasted work because in
principle just running "update all nodes" once should be enough.
2022-12-14 15:45:43 +02:00
Markus Lehtonen
79ed747be8 nfd-master: handle multiple NodeFeature objects
Implement handling of multiple NodeFeature objects by merging all
objects (targeting a certain node) into one before processing the data.
This patch implements MergeInto() methods for all required data types.

With support for multiple NodeFeature objects per node, The "nfd api
workflow" can be easily demonstrated and tested from the command line.
Creating the folloiwing object (assuming node-n exists in the cluster):

    apiVersion: nfd.k8s-sigs.io/v1alpha1
    kind: NodeFeature
    metadata:
      labels:
        nfd.node.kubernetes.io/node-name: node-n
      name: my-features-for-node-n
    spec:
      # Features for NodeFeatureRule matching
      features:
        flags:
          vendor.domain-a:
            elements:
              feature-x: {}
        attributes:
          vendor.domain-b:
            elements:
              feature-y: "foo"
              feature-z: "123"
        instances:
          vendor.domain-c:
            elements:
            - attributes:
                name: "elem-1"
                vendor: "acme"
            - attributes:
                name: "elem-2"
                vendor: "acme"
      # Labels to be created
      labels:
        vendor-feature.enabled: "true"
        vendor-setting.value: "100"

will create two feature labes:

    feature.node.kubernetes.io/vendor-feature.enabled: "true"
    feature.node.kubernetes.io/vendor-setting.value: "100"

In addition it will advertise hidden/raw features that can be used for
custom rules in NodeFeatureRule objects. Now, creating a NodeFeatureRule
object:

    apiVersion: nfd.k8s-sigs.io/v1alpha1
    kind: NodeFeatureRule
    metadata:
      name: my-rule
    spec:
      rules:
        - name: "my feature rule"
          labels:
            "my-feature": "true"
          matchFeatures:
            - feature: vendor.domain-a
              matchExpressions:
                feature-x: {op: Exists}
            - feature: vendor.domain-c
              matchExpressions:
                vendor: {op: In, value: ["acme"]}

will match the features in the NodeFeature object above and cause one
more label to be created:

    feature.node.kubernetes.io/my-feature: "true"
2022-12-14 15:44:52 +02:00
Markus Lehtonen
9f0806593d nfd-master: rename -featurerules-controller flag to -crd-controller
Deprecate the '-featurerules-controller' command line flag as the name
does not describe the functionality anymore: in practice it controls the
CRD controller handling both NodeFeature and NodeFeatureRule objects.
The patch introduces a duplicate, more generally named, flag
'-crd-controller'. A warning is printed in the log if
'-featurerules-controller' flag is encountered.
2022-12-14 10:23:45 +02:00
Markus Lehtonen
6ddd87e465 nfd-master: support NodeFeature objects
Add initial support for handling NodeFeature objects. With this patch
nfd-master watches NodeFeature objects in all namespaces and reacts to
changes in any of these. The node which a certain NodeFeature object
affects is determined by the "nfd.node.kubernetes.io/node-name"
annotation of the object. When a NodeFeature object targeting certain
node is changed, nfd-master needs to process all other objects targeting
the same node, too, because there may be dependencies between them.

Add a new command line flag for selecting between gRPC and NodeFeature
CRD API as the source of feature requests. Enabling NodeFeature API
disables the gRPC interface.

 -enable-nodefeature-api   enable NodeFeature CRD API for incoming
                           feature requests, will disable the gRPC
                           interface (defaults to false)

It is not possible to serve gRPC and watch NodeFeature objects at the
same time. This is deliberate to avoid labeling races e.g. by nfd-worker
sending gRPC requests but NodeFeature objects in the cluster
"overriding" those changes (labels from the gRPC requests will get
overridden when NodeFeature objects are processed).
2022-12-14 07:31:28 +02:00
Markus Lehtonen
237494463b nfd-worker: support creating NodeFeatures object
Support the new NodeFeatures object of the NFD CRD api. Add two new
command line options to nfd-worker:

 -kubeconfig               specifies the kubeconfig to use for
                           connecting k8s api (defaults to empty which
                           implies in-cluster config)
 -enable-nodefeature-api   enable the NodeFeature CRD API for
                           communicating node features to nfd-master,
                           will also automatically disable gRPC
                           (defgault to false)

No config file option for selecting the API is available as there should
be no need for dynamically selecting between gRPC and CRD. The
nfd-master configuration must be changed in tandem and it is safer (and
avoid awkward configuration races) to configure the whole NFD deployment
at once.

Default behavior of nfd-worker is not changed i.e. NodeFeatures object
creation is not enabled by default (but must be enabled with the command
line flag).

The patch also updates the kustomize and Helm deployment, adding RBAC
rules for nfd-worker and updating the example worker configuration.
2022-12-14 07:31:28 +02:00
Markus Lehtonen
d1c91e129a apis/nfd: update auto-generated code 2022-12-14 07:31:28 +02:00
Markus Lehtonen
59ebff46c9 apis/nfd: add CRD for communicating node features
Add a new NodeFeature CRD to the nfd Kubernetes API to communicate node
features over K8s api objects instead of gRPC. The new resource is
namespaced which will help the management of multiple NodeFeature
objects per node. This aims at enabling 3rd party detectors for custom
features.

In addition to communicating raw features the NodeFeature object also
has a field for directly requesting labels that should be applied on the
node object.

Rename the crd deployment file to nfd-api-crds.yaml so that it matches
the new content of the file. Also, rename the Helm subdir for CRDs to
match the expected chart directory structure.
2022-12-14 07:31:28 +02:00
Markus Lehtonen
079655b42c nfd-master: add error checking for CRD controller creation 2022-12-14 00:27:27 +02:00
Feruzjon Muyassarov
b296bdf0b3 update test functions according to upstream deprecated/removed methods
Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>
2022-12-13 12:12:50 +02:00
Kubernetes Prow Robot
733fb5deaa
Merge pull request #984 from marquiz/devel/worker-namespace
nfd-worker: detect the namespace it is running in
2022-12-09 07:10:11 -08:00
Markus Lehtonen
f13ed2d91c nfd-topology-updater: update NodeResourceTopology objects directly
Drop the gRPC communication to nfd-master and connect to the Kubernetes
API server directly when updating NodeResourceTopology objects.
Topology-updater already has connection to the API server for listing
Pods so this is not that dramatic change. It also simplifies the code
a lot as there is no need for the NFD gRPC client and no need for
managing TLS certs/keys.

This change aligns nfd-topology-updater with the future direction of
nfd-worker where the gRPC API is being dropped and replaced by a
CRD-based API.

This patch also update deployment files and documentation to reflect
this change.
2022-12-08 11:03:22 +02:00
Markus Lehtonen
87b92f88ca nfd-worker: detect the namespace it is running in
Implement detection of kubernetes namespace by reading file
/var/run/secrets/kubernetes.io/serviceaccount/namespace

Aa a fallback (if the file is not accessible) we take namespace from
KUBERNETES_NAMESPACE environment variable. This is useful for e.g.
testing and development where you might run nfd-worker directly from the
command line on a host system.
2022-12-08 10:34:52 +02:00
Feruzjon Muyassarov
2bdf427b89 nfd-master logic update for setting node taints
This commits extends NFD master code to support adding node taints
from NodeFeatureRule CR. We also introduce a new annotation for
taints which helps to identify if the taint set on node is owned
by NFD or not. When user deletes the taint entry from
NodeFeatureRule CR, NFD will remove the taint from the node. But
to avoid accidental deletion of taints not owned by the NFD, it
needs to know the owner. Keeping track of NFD set taints in the
annotation can be used during the filtering of the owner. Also
enable-taints flag is added to allow users opt in/out for node
tainting feature. The flag takes precedence over taints defined
in NodeFeatureRule CR. In other words, if enbale-taints is set to
false(disabled) and user still defines taints on the CR, NFD will
ignore those taints and skip them from setting on the node.

Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>
2022-12-02 17:25:00 +02:00
Feruzjon Muyassarov
532e1193ce Add taints field to NodeFeatureRule CR spec
Extend NodeFeatureRule Spec with taints field to allow users to
specify the list of the taints they want to be set on the node if
rule matches.

Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>
2022-12-02 17:25:00 +02:00
Markus Lehtonen
eb8e29c80a nfd-worker: drop deprecated command line flags
Drop the following flags that were deprecated already in v0.8.0:

-sleep-interval  (replaced by core.sleepInterval config file option)
-label-whitelist (replaced by core.labelWhiteList config file option)
-sources         (replaced by -label-sources flag)
2022-11-23 22:33:51 +02:00
Talor Itzhak
5b0788ced4 topology-updater: introduce exclude-list
The exclude-list allows to filter specific resource accounting
from NRT's objects per node basis.

The CRs created by the topology-updater are used by the scheduler-plugin
as a source of truth for making scheduling decisions.
As such, this feature allows to hide specific information
from the scheduler, which in turn
will affect the scheduling decision.
A common use case is when user would like to perform scheduling
decisions which are based on a specific resource.
In that case, we can exclude all the other resources
which we don't want the scheduler to exemine.

The exclude-list is provided to the topology-updater via a ConfigMap.
Resource type's names specified in the list should match the names
as shown here: https://pkg.go.dev/k8s.io/api/core/v1#ResourceName

This is a resurrection of an old work started here:
https://github.com/kubernetes-sigs/node-feature-discovery/pull/545

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
2022-11-21 14:08:25 +02:00
Garrybest
3ec1b94020 get kubelet config from configz
Signed-off-by: Garrybest <garrybest@foxmail.com>
2022-11-08 23:52:35 +08:00
Feruzjon Muyassarov
7ea0e0b0a7 Add argument to updateNodeFeatures method to pass client from caller
This commit adds an argument to updateNodeFeatures method for receiving
client argument, which currently gets initialized within the method
itself. This is a minor improvement for https://github.com/kubernetes-sigs/node-feature-discovery/pull/910.

Ref:https://github.com/kubernetes-sigs/node-feature-discovery/pull/910#discussion_r1012703631

Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>
2022-11-06 22:37:11 +02:00
Markus Lehtonen
7c24b50f74 apis/nfd: fix NodeFeatureRule templating
Fix handling of templates that got broken in
b907d07d7e when "flattening" the internal
data structure of features. That happened because the golang
text/template format uses dots to reference fields of a struct /
elements of a map (i.e. 'foo.bar' means that 'bar' must be a sub-element
of foo). Thus, using dots in our feature names (e.g. 'cpu.cpuid') means
that that hierarchy must be reflected in the data structure that is fed
to the templating engine. Thus, for templates we're now stuck stuck with
two level hierarchy. It doesn't really matter for now as all our
features follow that naming patter. We might be able to overcome this
limitation e.g.  by using reflect but that's left as a future exercise.
2022-10-25 23:37:27 +03:00
Kubernetes Prow Robot
a65ee959b9
Merge pull request #925 from marquiz/devel/feature-api-flatten
apis/nfd: flatten the structure of features data type
2022-10-24 01:14:26 -07:00
Francesco Romani
700d9e215c topology-updater: continue looping on scan error
Scanning podresources can temporarily fail; the previous code was
mistakenly not rearming the loop condition when this occurred,
effectively stopping the monitoring.

Rather, we should always pool and bail out on unrecoverable
error or when asked to stop.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2022-10-20 10:08:13 +02:00
Markus Lehtonen
9ea787bc99 apis/nfd: update auto-generated code
Re-generate after the latest API change. Involves renaming the crd spec
files.
2022-10-18 18:41:53 +03:00
Markus Lehtonen
b907d07d7e apis/nfd: flatten the structure of features data type
Flatten the data structure that stores features, dropping the "domain"
level from the data model. That extra level of hierarchy brought little
benefit but just caused some extra complexity, instead. The new
structure nicely matches what we have in the NodeFeatureRule object (the
matchFeatures field of uses the same flat structure with the "feature"
field having a value <domain>.<feature>, e.g. "kernel.version").

This is pre-work for introducing a new "node feature" CRD that contains
the raw feature data. It makes the life of both users and developers
easier when both CRDs, plus our internal code, handle feature data in a
similar flat structure.
2022-10-18 18:37:28 +03:00