node-feature-discovery

mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2024-12-15 17:50:49 +00:00

Author	SHA1	Message	Date
Jose Luis Ojosnegros Manchón	b340d112a8	topology-updater:compute pod set fingerprint Add an option to compute the fingerprint of the current pod set on each node. Report this new fingerprint using an attribute in NRT object.	2023-02-22 10:22:50 +01:00
Jose Luis Ojosnegros Manchón	1a687cb286	topology-updater: Refactor Scan to expand response We are gonna add new data to Scan response so better introduce a new ScanResponse struct as Scan return value to make it easier.	2023-02-22 09:56:28 +01:00
Kubernetes Prow Robot	a92614c292	Merge pull request #1051 from AhmedGrati/feat-add-deny-label-ns-with-wildcard feat: add deny-label-ns flag which supports wildcard	2023-02-15 03:42:25 -08:00
Kubernetes Prow Robot	38cc370e69	Merge pull request #1054 from PiotrProkop/use-new-nrt-api Advertise TopologyManger policy and scope as Attributes in NRT api v1alpha2	2023-02-15 01:12:25 -08:00
AhmedGrati	b499799364	feat: add deny-label-ns flag which supports wildcard Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-02-15 09:47:00 +01:00
PiotrProkop	f76fc5bf6b	Read Kubelet configuration the same way as Kubelet to apply default values Signed-off-by: PiotrProkop <pprokop@nvidia.com>	2023-02-15 09:27:25 +01:00
Ville Pihlava	b1c6b229fe	Add discovery duration logging.	2023-02-13 12:55:57 +02:00
pprokop	5484babcb1	Advertise TopologyManger policy and scope as Attributes Signed-off-by: pprokop <pprokop@nvidia.com>	2023-02-10 12:03:11 +01:00
Kubernetes Prow Robot	ac271b3c29	Merge pull request #1050 from VillePihlava/interval-fix Change nfd-worker to use Ticker instead of After.	2023-02-09 07:54:22 -08:00
Ville Pihlava	2101cb20e4	Change nfd-worker to use Ticker instead of After.	2023-02-09 17:14:39 +02:00
Jose Luis Ojosnegros Manchón	2967f3307a	nrt-api: move from v1alpha1 to v1alpha2	2023-02-09 12:29:54 +01:00
Carlos Eduardo Arango Gutierrez	9b3171bce2	nfd-master: always start gRPC server Don't register gRPC LabelServer when using the NodeFeature option, only turn the gRPC server on for Health and Readiness probes.	2023-01-16 19:33:15 +01:00
Kubernetes Prow Robot	ea921a8b14	Merge pull request #1024 from PiotrProkop/nrt-garbage-collector Add NRT garbage collector	2023-01-11 01:59:44 -08:00
PiotrProkop	59afae50ba	Add NodeResourceTopology garbage collector NodeResourceTopology(aka NRT) custom resource is used to enable NUMA aware Scheduling in Kubernetes. As of now node-feature-discovery daemons are used to advertise those resources but there is no service responsible for removing obsolete objects(without corresponding Kubernetes node). This patch adds new daemon called nfd-topology-gc which removes old NRTs. Signed-off-by: PiotrProkop <pprokop@nvidia.com>	2023-01-11 10:15:21 +01:00
PiotrProkop	1bae2867e2	Release `v0.0.13` of NodeResourceTopology API added missing TopologyManagerPolicy. Expose new policies: * RestrictedContainerLevel * RestrictedPodLevel * BestEffortContainerLevel * BestEffortPodLevel Signed-off-by: PiotrProkop <pprokop@nvidia.com>	2023-01-09 16:02:12 +01:00
Kubernetes Prow Robot	8eb6640754	Merge pull request #1020 from marquiz/devel/worker-refactor worker: move code	2022-12-27 00:45:34 -08:00
Kubernetes Prow Robot	e97b2c1579	Merge pull request #1017 from marquiz/devel/nfd-api-optional-fields apis/nfd: make all fields in NodeFeatureSpec optional	2022-12-27 00:45:28 -08:00
Markus Lehtonen	1026d91d12	worker: move code Simplify code bu dropping the unnecessary base client package.	2022-12-23 11:38:21 +02:00
Markus Lehtonen	0283f68702	topology-updater: move code Move and rename the Go package. It has nothing to do with NFD gRPC client anymore so move it out of the nfd-client package.	2022-12-23 11:37:46 +02:00
Markus Lehtonen	aa97105854	Add common utility function for getting node name	2022-12-23 09:50:15 +02:00
Markus Lehtonen	dfda9bccad	apis/nfd: update auto-generated code	2022-12-22 17:58:20 +02:00
Markus Lehtonen	a4fc15a424	apis/nfd: make all fields in NodeFeatureSpec optional Don't require features to be specified. The creator possibly only wants to create labels or only some types of features. No need to specify empty structs for the unused fields.	2022-12-22 17:53:42 +02:00
Markus Lehtonen	f5ae3fe2c7	Simplify usage of ObjectMeta fields No need to explicitly spell out ObjectMeta as it's embedded in the object types.	2022-12-19 17:40:10 +02:00
Kubernetes Prow Robot	28a5daa338	Merge pull request #999 from marquiz/fixes/nodefeature-missing nfd-master: update node if no NodeFeature objects are present	2022-12-19 00:39:44 -08:00
Markus Lehtonen	4c955ad72c	nfd-master: update node if no NodeFeature objects are present Correctly handle the case where no NodeFeature objects exist for certain node (and NodeFeature API has been enabled with -enable-nodefeature-api). In this case all the labels should be removed.	2022-12-19 10:22:04 +02:00
Markus Lehtonen	b9c09e6674	nfd-master: update all nodes at startup when NodeFeature API enabled We want to always update all nodes at startup. Without this patch we don't get any update event from the controller if no NodeFeature or NodeFeatureRule objects exist in the cluster. Thus all nodes would stay untouched whereas we really want to remove all labels from all nodes in this case.	2022-12-14 21:49:50 +02:00
Kubernetes Prow Robot	d1b314842c	Merge pull request #989 from marquiz/devel/nodefeature-multi-object nfd-master: handle multiple NodeFeature objects	2022-12-14 07:51:34 -08:00
Markus Lehtonen	740e3af681	nfd-master: implement ratelimiter for nfd api updates Implement a naive ratelimiter for node update events originating from the nfd API. We might get a ton of events in short interval. The simplest example is startup when we get a separate Add event for every NodeFeature and NodeFeatureRule object. Without rate limiting we run "update all nodes" separately for each NodeFeatureRule object, plus, we would run "update node X" separately for each NodeFeature object targeting node X. This is a huge amount of wasted work because in principle just running "update all nodes" once should be enough.	2022-12-14 15:45:43 +02:00
Markus Lehtonen	79ed747be8	nfd-master: handle multiple NodeFeature objects Implement handling of multiple NodeFeature objects by merging all objects (targeting a certain node) into one before processing the data. This patch implements MergeInto() methods for all required data types. With support for multiple NodeFeature objects per node, The "nfd api workflow" can be easily demonstrated and tested from the command line. Creating the folloiwing object (assuming node-n exists in the cluster): apiVersion: nfd.k8s-sigs.io/v1alpha1 kind: NodeFeature metadata: labels: nfd.node.kubernetes.io/node-name: node-n name: my-features-for-node-n spec: # Features for NodeFeatureRule matching features: flags: vendor.domain-a: elements: feature-x: {} attributes: vendor.domain-b: elements: feature-y: "foo" feature-z: "123" instances: vendor.domain-c: elements: - attributes: name: "elem-1" vendor: "acme" - attributes: name: "elem-2" vendor: "acme" # Labels to be created labels: vendor-feature.enabled: "true" vendor-setting.value: "100" will create two feature labes: feature.node.kubernetes.io/vendor-feature.enabled: "true" feature.node.kubernetes.io/vendor-setting.value: "100" In addition it will advertise hidden/raw features that can be used for custom rules in NodeFeatureRule objects. Now, creating a NodeFeatureRule object: apiVersion: nfd.k8s-sigs.io/v1alpha1 kind: NodeFeatureRule metadata: name: my-rule spec: rules: - name: "my feature rule" labels: "my-feature": "true" matchFeatures: - feature: vendor.domain-a matchExpressions: feature-x: {op: Exists} - feature: vendor.domain-c matchExpressions: vendor: {op: In, value: ["acme"]} will match the features in the NodeFeature object above and cause one more label to be created: feature.node.kubernetes.io/my-feature: "true"	2022-12-14 15:44:52 +02:00
Markus Lehtonen	9f0806593d	nfd-master: rename -featurerules-controller flag to -crd-controller Deprecate the '-featurerules-controller' command line flag as the name does not describe the functionality anymore: in practice it controls the CRD controller handling both NodeFeature and NodeFeatureRule objects. The patch introduces a duplicate, more generally named, flag '-crd-controller'. A warning is printed in the log if '-featurerules-controller' flag is encountered.	2022-12-14 10:23:45 +02:00
Markus Lehtonen	6ddd87e465	nfd-master: support NodeFeature objects Add initial support for handling NodeFeature objects. With this patch nfd-master watches NodeFeature objects in all namespaces and reacts to changes in any of these. The node which a certain NodeFeature object affects is determined by the "nfd.node.kubernetes.io/node-name" annotation of the object. When a NodeFeature object targeting certain node is changed, nfd-master needs to process all other objects targeting the same node, too, because there may be dependencies between them. Add a new command line flag for selecting between gRPC and NodeFeature CRD API as the source of feature requests. Enabling NodeFeature API disables the gRPC interface. -enable-nodefeature-api enable NodeFeature CRD API for incoming feature requests, will disable the gRPC interface (defaults to false) It is not possible to serve gRPC and watch NodeFeature objects at the same time. This is deliberate to avoid labeling races e.g. by nfd-worker sending gRPC requests but NodeFeature objects in the cluster "overriding" those changes (labels from the gRPC requests will get overridden when NodeFeature objects are processed).	2022-12-14 07:31:28 +02:00
Markus Lehtonen	237494463b	nfd-worker: support creating NodeFeatures object Support the new NodeFeatures object of the NFD CRD api. Add two new command line options to nfd-worker: -kubeconfig specifies the kubeconfig to use for connecting k8s api (defaults to empty which implies in-cluster config) -enable-nodefeature-api enable the NodeFeature CRD API for communicating node features to nfd-master, will also automatically disable gRPC (defgault to false) No config file option for selecting the API is available as there should be no need for dynamically selecting between gRPC and CRD. The nfd-master configuration must be changed in tandem and it is safer (and avoid awkward configuration races) to configure the whole NFD deployment at once. Default behavior of nfd-worker is not changed i.e. NodeFeatures object creation is not enabled by default (but must be enabled with the command line flag). The patch also updates the kustomize and Helm deployment, adding RBAC rules for nfd-worker and updating the example worker configuration.	2022-12-14 07:31:28 +02:00
Markus Lehtonen	d1c91e129a	apis/nfd: update auto-generated code	2022-12-14 07:31:28 +02:00
Markus Lehtonen	59ebff46c9	apis/nfd: add CRD for communicating node features Add a new NodeFeature CRD to the nfd Kubernetes API to communicate node features over K8s api objects instead of gRPC. The new resource is namespaced which will help the management of multiple NodeFeature objects per node. This aims at enabling 3rd party detectors for custom features. In addition to communicating raw features the NodeFeature object also has a field for directly requesting labels that should be applied on the node object. Rename the crd deployment file to nfd-api-crds.yaml so that it matches the new content of the file. Also, rename the Helm subdir for CRDs to match the expected chart directory structure.	2022-12-14 07:31:28 +02:00
Markus Lehtonen	079655b42c	nfd-master: add error checking for CRD controller creation	2022-12-14 00:27:27 +02:00
Feruzjon Muyassarov	b296bdf0b3	update test functions according to upstream deprecated/removed methods Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>	2022-12-13 12:12:50 +02:00
Kubernetes Prow Robot	733fb5deaa	Merge pull request #984 from marquiz/devel/worker-namespace nfd-worker: detect the namespace it is running in	2022-12-09 07:10:11 -08:00
Markus Lehtonen	f13ed2d91c	nfd-topology-updater: update NodeResourceTopology objects directly Drop the gRPC communication to nfd-master and connect to the Kubernetes API server directly when updating NodeResourceTopology objects. Topology-updater already has connection to the API server for listing Pods so this is not that dramatic change. It also simplifies the code a lot as there is no need for the NFD gRPC client and no need for managing TLS certs/keys. This change aligns nfd-topology-updater with the future direction of nfd-worker where the gRPC API is being dropped and replaced by a CRD-based API. This patch also update deployment files and documentation to reflect this change.	2022-12-08 11:03:22 +02:00
Markus Lehtonen	87b92f88ca	nfd-worker: detect the namespace it is running in Implement detection of kubernetes namespace by reading file /var/run/secrets/kubernetes.io/serviceaccount/namespace Aa a fallback (if the file is not accessible) we take namespace from KUBERNETES_NAMESPACE environment variable. This is useful for e.g. testing and development where you might run nfd-worker directly from the command line on a host system.	2022-12-08 10:34:52 +02:00
Feruzjon Muyassarov	2bdf427b89	nfd-master logic update for setting node taints This commits extends NFD master code to support adding node taints from NodeFeatureRule CR. We also introduce a new annotation for taints which helps to identify if the taint set on node is owned by NFD or not. When user deletes the taint entry from NodeFeatureRule CR, NFD will remove the taint from the node. But to avoid accidental deletion of taints not owned by the NFD, it needs to know the owner. Keeping track of NFD set taints in the annotation can be used during the filtering of the owner. Also enable-taints flag is added to allow users opt in/out for node tainting feature. The flag takes precedence over taints defined in NodeFeatureRule CR. In other words, if enbale-taints is set to false(disabled) and user still defines taints on the CR, NFD will ignore those taints and skip them from setting on the node. Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>	2022-12-02 17:25:00 +02:00
Feruzjon Muyassarov	532e1193ce	Add taints field to NodeFeatureRule CR spec Extend NodeFeatureRule Spec with taints field to allow users to specify the list of the taints they want to be set on the node if rule matches. Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>	2022-12-02 17:25:00 +02:00
Markus Lehtonen	eb8e29c80a	nfd-worker: drop deprecated command line flags Drop the following flags that were deprecated already in v0.8.0: -sleep-interval (replaced by core.sleepInterval config file option) -label-whitelist (replaced by core.labelWhiteList config file option) -sources (replaced by -label-sources flag)	2022-11-23 22:33:51 +02:00
Talor Itzhak	5b0788ced4	topology-updater: introduce exclude-list The exclude-list allows to filter specific resource accounting from NRT's objects per node basis. The CRs created by the topology-updater are used by the scheduler-plugin as a source of truth for making scheduling decisions. As such, this feature allows to hide specific information from the scheduler, which in turn will affect the scheduling decision. A common use case is when user would like to perform scheduling decisions which are based on a specific resource. In that case, we can exclude all the other resources which we don't want the scheduler to exemine. The exclude-list is provided to the topology-updater via a ConfigMap. Resource type's names specified in the list should match the names as shown here: https://pkg.go.dev/k8s.io/api/core/v1#ResourceName This is a resurrection of an old work started here: https://github.com/kubernetes-sigs/node-feature-discovery/pull/545 Signed-off-by: Talor Itzhak <titzhak@redhat.com>	2022-11-21 14:08:25 +02:00
Garrybest	3ec1b94020	get kubelet config from configz Signed-off-by: Garrybest <garrybest@foxmail.com>	2022-11-08 23:52:35 +08:00
Feruzjon Muyassarov	7ea0e0b0a7	Add argument to updateNodeFeatures method to pass client from caller This commit adds an argument to updateNodeFeatures method for receiving client argument, which currently gets initialized within the method itself. This is a minor improvement for https://github.com/kubernetes-sigs/node-feature-discovery/pull/910. Ref:https://github.com/kubernetes-sigs/node-feature-discovery/pull/910#discussion_r1012703631 Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>	2022-11-06 22:37:11 +02:00
Markus Lehtonen	7c24b50f74	apis/nfd: fix NodeFeatureRule templating Fix handling of templates that got broken in `b907d07d7e` when "flattening" the internal data structure of features. That happened because the golang text/template format uses dots to reference fields of a struct / elements of a map (i.e. 'foo.bar' means that 'bar' must be a sub-element of foo). Thus, using dots in our feature names (e.g. 'cpu.cpuid') means that that hierarchy must be reflected in the data structure that is fed to the templating engine. Thus, for templates we're now stuck stuck with two level hierarchy. It doesn't really matter for now as all our features follow that naming patter. We might be able to overcome this limitation e.g. by using reflect but that's left as a future exercise.	2022-10-25 23:37:27 +03:00
Kubernetes Prow Robot	a65ee959b9	Merge pull request #925 from marquiz/devel/feature-api-flatten apis/nfd: flatten the structure of features data type	2022-10-24 01:14:26 -07:00
Francesco Romani	700d9e215c	topology-updater: continue looping on scan error Scanning podresources can temporarily fail; the previous code was mistakenly not rearming the loop condition when this occurred, effectively stopping the monitoring. Rather, we should always pool and bail out on unrecoverable error or when asked to stop. Signed-off-by: Francesco Romani <fromani@redhat.com>	2022-10-20 10:08:13 +02:00
Markus Lehtonen	9ea787bc99	apis/nfd: update auto-generated code Re-generate after the latest API change. Involves renaming the crd spec files.	2022-10-18 18:41:53 +03:00
Markus Lehtonen	b907d07d7e	apis/nfd: flatten the structure of features data type Flatten the data structure that stores features, dropping the "domain" level from the data model. That extra level of hierarchy brought little benefit but just caused some extra complexity, instead. The new structure nicely matches what we have in the NodeFeatureRule object (the matchFeatures field of uses the same flat structure with the "feature" field having a value <domain>.<feature>, e.g. "kernel.version"). This is pre-work for introducing a new "node feature" CRD that contains the raw feature data. It makes the life of both users and developers easier when both CRDs, plus our internal code, handle feature data in a similar flat structure.	2022-10-18 18:37:28 +03:00

1 2 3 4 5

242 commits