node-feature-discovery

mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2024-12-14 11:57:51 +00:00

Author	SHA1	Message	Date
Markus Lehtonen	9624d182ab	deployment/kustomize: drop nfd-master service Not needed anymore as we're not relying on gRPC anymore.	2023-12-08 14:53:23 +02:00
Carlos Eduardo Arango Gutierrez	c0063be4f4	Discover node features as annotations Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> Co-authored-by: bebc <mchf1990212@gmail.com> Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>	2023-10-25 19:58:58 +02:00
Markus Lehtonen	98c3b0750d	nfd-gc: add metrics Implements three metrics for nfd-gc: - nfd_gc_build_info: version information of nfd-gc. - nfd_gc_objects_deleted_total: total number of NodeFeature and NodeResourceTopology objects deleted by nfd-gc. - nfd_gc_object_delete_failures_total: number of errors encountered when deleting NodeFeature and NodeResourceTopology objects.	2023-10-09 13:39:28 +00:00
Muyassarov, Feruzjon	06036a62ce	Replace gRPC health probe utility with k8s built-in health probe Kubernetes 1.23 has introduced native health probes for gRPC which can replace grpc_health_probe utility. This commit removes baking in grpc_health_probe binary into the image and updates related health checks to use k8s native gRPC. Signed-off-by: Muyassarov, Feruzjon <feruzjon.muyassarov@intel.com>	2023-09-20 12:25:36 +03:00
Markus Lehtonen	6cf29bd8ef	deployment/kustomize: support nfd-gc Rename the old "topology-gc" to just "gc". Simplify the setup a bit by including the RBAC rules in the "gc" base. Note: we don't enable nfd-gc in the default overlay, yet, as the NodeFeature API isn't enabled (gc is not needed).	2023-08-23 10:56:12 +03:00
Markus Lehtonen	06b333db1e	nfd-topology-updater: add metrics support For now, add only one metric, a counter for the errors occurring while scanning pod resources on the node.	2023-08-04 16:48:37 +03:00
Markus Lehtonen	7e375ad1f0	generate: bump tools to their latest versions Bump tools versions and re-auto-generate files.	2023-07-27 14:29:48 +03:00
Carlos Eduardo Arango Gutierrez	e3aedd33e2	Enable metrics via prometheus operator Expose metrics via prometheus.monitoring.coreos.com/v1 The exposed metrics are \| Metric \| Type \| Meaning \| \| --------------- \| ---------------- \| ---------------- \| \| `nfd_master_build_info` \| Gauge \| Version from which nfd-master was built. \| \| `nfd_worker_build_info` \| Gauge \| Version from which nfd-worker was built. \| \| `nfd_updated_nodes` \| Counter \| Time taken to label a node \| \| `nfd_crd_processing_time` \| Gauge \| Time taken to process a NodeFeatureRule CRD \| \| `nfd_feature_discovery_duration_seconds` \| HistogramVec \| Time taken to discover features on a node \| Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>	2023-07-21 10:59:52 +02:00
Markus Lehtonen	457fc8483b	deployment/kustomize: use a named port for nfd gRPC service	2023-06-06 21:00:42 +03:00
PiotrProkop	272fd4784f	Add new flag enable-leader-election for nfd-master. It allows NFD-master to be run in active-passive way when running multiple instances of NFD-master to prevent multiple components from updating same custom resources. Signed-off-by: PiotrProkop <pprokop@nvidia.com>	2023-05-15 13:30:07 +02:00
Markus Lehtonen	a5ec646c48	generate: update controller-gen to v0.11.3 Update controller-gen tool from sigs.k8s.io/controller-tools to the latest release. Also, bump goimports from golang.org/x/tools to the latest version.	2023-04-19 12:48:12 +03:00
Fabiano Fidêncio	250aea4741	Create extended resources with NodeFeatureRule Add support for management of Extended Resources via the NodeFeatureRule CRD API. There are usage scenarios where users want to advertise features as extended resources instead of labels (or annotations). This patch enables the discovery of extended resources, via annotation and patch of node.status.capacity and node.status.allocatable. By using the NodeFeatureRule API. Co-authored-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com> Co-authored-by: Fabiano Fidêncio <fabiano.fidencio@intel.com> Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com> Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2023-04-07 16:14:56 +02:00
AhmedGrati	3fff409f6d	Add master config file Similar to the nfd-worker, in this PR we want to support the dynamic run-time configurability through a config file for the nfd-master. We'll use a json or yaml configuration file along with the fsnotify in order to watch for changes in the config file. As a result, we're allowing dynamic control of logging params, allowed namespaces, extended resources, label whitelisting, and denied namespaces. Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-04-03 09:52:09 +01:00
Jose Luis Ojosnegros Manchón	d1d1eda0d2	nrt-api: Update to v0.1.0 to use v1alpha2	2023-02-09 12:03:18 +01:00
AhmedGrati	743c877ad8	deployment: disable service links in NFD master pod Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-01-27 16:55:18 +01:00
PiotrProkop	59afae50ba	Add NodeResourceTopology garbage collector NodeResourceTopology(aka NRT) custom resource is used to enable NUMA aware Scheduling in Kubernetes. As of now node-feature-discovery daemons are used to advertise those resources but there is no service responsible for removing obsolete objects(without corresponding Kubernetes node). This patch adds new daemon called nfd-topology-gc which removes old NRTs. Signed-off-by: PiotrProkop <pprokop@nvidia.com>	2023-01-11 10:15:21 +01:00
Markus Lehtonen	dfda9bccad	apis/nfd: update auto-generated code	2022-12-22 17:58:20 +02:00
Markus Lehtonen	6ddd87e465	nfd-master: support NodeFeature objects Add initial support for handling NodeFeature objects. With this patch nfd-master watches NodeFeature objects in all namespaces and reacts to changes in any of these. The node which a certain NodeFeature object affects is determined by the "nfd.node.kubernetes.io/node-name" annotation of the object. When a NodeFeature object targeting certain node is changed, nfd-master needs to process all other objects targeting the same node, too, because there may be dependencies between them. Add a new command line flag for selecting between gRPC and NodeFeature CRD API as the source of feature requests. Enabling NodeFeature API disables the gRPC interface. -enable-nodefeature-api enable NodeFeature CRD API for incoming feature requests, will disable the gRPC interface (defaults to false) It is not possible to serve gRPC and watch NodeFeature objects at the same time. This is deliberate to avoid labeling races e.g. by nfd-worker sending gRPC requests but NodeFeature objects in the cluster "overriding" those changes (labels from the gRPC requests will get overridden when NodeFeature objects are processed).	2022-12-14 07:31:28 +02:00
Markus Lehtonen	237494463b	nfd-worker: support creating NodeFeatures object Support the new NodeFeatures object of the NFD CRD api. Add two new command line options to nfd-worker: -kubeconfig specifies the kubeconfig to use for connecting k8s api (defaults to empty which implies in-cluster config) -enable-nodefeature-api enable the NodeFeature CRD API for communicating node features to nfd-master, will also automatically disable gRPC (defgault to false) No config file option for selecting the API is available as there should be no need for dynamically selecting between gRPC and CRD. The nfd-master configuration must be changed in tandem and it is safer (and avoid awkward configuration races) to configure the whole NFD deployment at once. Default behavior of nfd-worker is not changed i.e. NodeFeatures object creation is not enabled by default (but must be enabled with the command line flag). The patch also updates the kustomize and Helm deployment, adding RBAC rules for nfd-worker and updating the example worker configuration.	2022-12-14 07:31:28 +02:00
Markus Lehtonen	d1c91e129a	apis/nfd: update auto-generated code	2022-12-14 07:31:28 +02:00
Markus Lehtonen	59ebff46c9	apis/nfd: add CRD for communicating node features Add a new NodeFeature CRD to the nfd Kubernetes API to communicate node features over K8s api objects instead of gRPC. The new resource is namespaced which will help the management of multiple NodeFeature objects per node. This aims at enabling 3rd party detectors for custom features. In addition to communicating raw features the NodeFeature object also has a field for directly requesting labels that should be applied on the node object. Rename the crd deployment file to nfd-api-crds.yaml so that it matches the new content of the file. Also, rename the Helm subdir for CRDs to match the expected chart directory structure.	2022-12-14 07:31:28 +02:00
Markus Lehtonen	f13ed2d91c	nfd-topology-updater: update NodeResourceTopology objects directly Drop the gRPC communication to nfd-master and connect to the Kubernetes API server directly when updating NodeResourceTopology objects. Topology-updater already has connection to the API server for listing Pods so this is not that dramatic change. It also simplifies the code a lot as there is no need for the NFD gRPC client and no need for managing TLS certs/keys. This change aligns nfd-topology-updater with the future direction of nfd-worker where the gRPC API is being dropped and replaced by a CRD-based API. This patch also update deployment files and documentation to reflect this change.	2022-12-08 11:03:22 +02:00
Feruzjon Muyassarov	984a3de198	Document tainting feature Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>	2022-12-02 17:29:10 +02:00
Feruzjon Muyassarov	532e1193ce	Add taints field to NodeFeatureRule CR spec Extend NodeFeatureRule Spec with taints field to allow users to specify the list of the taints they want to be set on the node if rule matches. Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>	2022-12-02 17:25:00 +02:00
Markus Lehtonen	37d51c96f1	deployment: drop stale nfd-api-crds.yaml Remove a stale unused file that was accidentally committed from an experimental work.	2022-11-29 13:46:30 +02:00
Garrybest	3ec1b94020	get kubelet config from configz Signed-off-by: Garrybest <garrybest@foxmail.com>	2022-11-08 23:52:35 +08:00
Markus Lehtonen	9ea787bc99	apis/nfd: update auto-generated code Re-generate after the latest API change. Involves renaming the crd spec files.	2022-10-18 18:41:53 +03:00
Feruzjon Muyassarov	60f270d40d	Set shortName for NodeFeatureRule CRD This patch adds a kubebuilder marker to add a short name nfr for NodeFeatureRule CRD. Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>	2022-09-28 12:18:49 +03:00
Markus Lehtonen	98228d2069	Update auto-generated artefacts Latest gofmt changes and update to go v1.19 induce some changes in the generated files.	2022-09-08 12:45:20 +03:00
Markus Lehtonen	38e763e36c	Refresh auto-generated files	2022-08-10 14:24:33 +03:00
Markus Lehtonen	b648d005e1	pkg/apis/nfd: support templating of "vars" Support templating of var names in a similar manner as labels. Add support for a new 'varsTemplate' field to the feature rule spec which is treated similarly to the 'labelsTemplate' field. The value of the field is processed through the golang "text/template" template engine and the expanded value must contain variables in <key>=<value> format, separated by newlines i.e.: - name: <rule-name> varsTemplate: \| <label-1>=<value-1> <label-2>=<value-2> ... Similar rules as for 'labelsTemplate' apply, i.e. 1. In case of matchAny is specified, the template is executed separately against each individual matchFeatures matcher. 2. 'vars' field has priority over 'varsTemplate'	2021-11-25 12:50:47 +02:00
Markus Lehtonen	f75303ce43	pkg/apis/nfd: add variables to rule spec and support backreferences Support backreferencing of output values from previous rules. Enables complex rule setups where custom features are further combined together to form even more sophisticated higher level labels. The labels created by preceding rules are available as a special 'rule.matched' feature (for matchFeatures to use). If referencing rules accross multiple configs/CRDs care must be taken with the ordering. Processing order of rules in nfd-worker: 1. Static rules 2. Files from /etc/kubernetes/node-feature-discovery/custom.d/ in alphabetical order. Subdirectories are processed by reading their files in alphabetical order. 3. Custom rules from main nfd-worker.conf In nfd-master, NodeFeatureRule objects are processed in alphabetical order (based on their metadata.name). This patch also adds new 'vars' fields to the rule spec. Like 'labels', it is a map of key-value pairs but no labels are generated from these. The values specified in 'vars' are only added for backreferencing into the 'rules.matched' feature. This may by desired in schemes where the output of certain rules is only used as intermediate variables for other rules and no labels out of these are wanted. An example setup: - name: "kernel feature" labels: kernel-feature: matchFeatures: - feature: kernel.version matchExpressions: major: {op: Gt, value: ["4"]} - name: "intermediate var feature" vars: nolabel-feature: "true" matchFeatures: - feature: cpu.cpuid matchExpressions: AVX512F: {op: Exists} - feature: pci.device matchExpressions: vendor: {op: In, value: ["8086"]} device: {op: In, value: ["1234", "1235"]} - name: top-level-feature matchFeatures: - feature: rule.matched matchExpressions: kernel-feature: "true" nolabel-feature: "true"	2021-11-25 12:50:47 +02:00
Kubernetes Prow Robot	da484b7bd3	Merge pull request #550 from marquiz/devel/custom-templating Templating of custom label names	2021-11-23 12:02:51 -08:00
Markus Lehtonen	c8d73666d6	pkg/apis/nfd: support label name templating Support templating of label names in feature rules. It is available both in NodeFeatureRule CRs and in custom rule configuration of nfd-worker. This patch adds a new 'labelsTemplate' field to the rule spec, making it possible to dynamically generate multiple labels per rule based on the matched features. The feature relies on the golang "text/template" package. When expanded, the template must contain labels in a raw <key>[=<value>] format (where 'value' defaults to "true"), separated by newlines i.e.: - name: <rule-name> labelsTemplate: \| <label-1>[=<value-1>] <label-2>[=<value-2>] ... All the matched features of 'matchFeatures' directives are available for templating engine in a nested data structure that can be described in yaml as: . <domain-1>: <key-feature-1>: - Name: <matched-key> - ... <value-feature-1: - Name: <matched-key> Value: <matched-value> - ... <instance-feature-1>: - <attribute-1-name>: <attribute-1-value> <attribute-2-name>: <attribute-2-value> ... - ... <domain-2>: ... That is, the per-feature data available for matching depends on the type of feature that was matched: - "key features": only 'Name' is available - "value features": 'Name' and 'Value' can be used - "instance features": all attributes of the matched instance are available NOTE: In case of matchAny is specified, the template is executed separately against each individual matchFeatures matcher and the eventual set of labels is a superset of all these expansions. Consider the following: - name: <name> labelsTemplate: <template> matchAny: - matchFeatures: <matcher#1> - matchFeatures: <matcher#2> matchFeatures: <matcher#3> In the example above (assuming the overall result is a match) the template would be executed on matcher#1 and/or matcher#2 (depending on whether both or only one of them match), and finally on matcher#3, and all the labels from these separate expansions would be created (i.e. the end result would be a union of all the individual expansions). NOTE 2: The 'labels' field has priority over 'labelsTemplate', i.e. labels specified in the 'labels' field will override any labels originating from the 'labelsTemplate' field. A special case of an empty match expression set matches everything (i.e. matches/returns all existing keys/values). This makes it simpler to write templates that run over all values. Also, makes it possible to later implement support for templates that run over all _keys_ of a feature. Some example configurations: - name: "my-pci-template-features" labelsTemplate: \| {{ range .pci.device }}intel-{{ .class }}-{{ .device }}=present {{ end }} matchFeatures: - feature: pci.device matchExpressions: class: {op: InRegexp, value: ["^06"]} vendor: ["8086"] - name: "my-system-template-features" labelsTemplate: \| {{ range .system.osrelease }}system-{{ .Name }}={{ .Value }} {{ end }} matchFeatures: - feature: system.osRelease matchExpressions: ID: {op: Exists} VERSION_ID.major: {op: Exists} Imaginative template pipelines are possible, of course, but care must be taken in order to produce understandable and maintainable rule sets.	2021-11-23 21:03:22 +02:00
Markus Lehtonen	c3da439d21	source/memory: implement FeatureSource Separate feature discovery and creation of feature labels. Generalize the discovery of nvdimm devices so that they can be matched in custom label rules in a similar fashion as pci and usb devices. Available attributes for matching nvdimm devices are limited to: - devtype - mode For numa we now detect the number of numa nodes which can be matched agains in custom label rules. Labels created by the memory feature source are unchanged. The new features being detected are available in custom rules only. Example custom rule: - name: "my memory rule" labels: my-memory-feature: "true" matchFeatures: - feature: memory.numa matchExpressions: "node_count": {op: Gt, value: ["3"]} - feature: memory.nv matchExpressions: "devtype" {op: In, value: ["nd_dax"]} Also, add minimalist unit test.	2021-11-23 15:08:15 +02:00
Markus Lehtonen	9a02b544a2	source/network: implement FeatureSource Separate feature discovery and creation of feature labels. Generalize the feature discovery so that network devices can be matched in custom label rules in a similar fashion as pci and usb devices. Available attributes for matching are: - operstate - speed - sriov_numvfs - sriov_totalvfs Labels created by the network feature source are unchanged. The new features being detected are available in custom rules only. Example custom rule: - name: "my network rule" labels: my-network-feature: "true" matchFeatures: - feature: network.device matchExpressions: "operstate": { op: In, value: ["up"] } "sriov_numvfs": { op: Gt, value: ["9"] } Also, add minimalist unit test.	2021-11-23 10:05:38 +02:00
Kubernetes Prow Robot	99d3251c42	Merge pull request #649 from marquiz/devel/storage-feature-source source/storage: implement FeatureSource	2021-11-22 11:31:32 -08:00
Markus Lehtonen	e6e32a88c3	nfd-master: implement controller for NodeFeatureRule CRs Implement a simple controller stub that operates on NodeFeatureRule objects. The controller does not yet have any functionality other than logging changes in the (NodeFeatureRule) objecs it is watching. Also update the documentation on the -no-publish flag to match the new functionality.	2021-11-22 16:57:42 +02:00
Kubernetes Prow Robot	882320f523	Merge pull request #608 from marquiz/devel/deployment-base deployment: clean up base/topologyupdater-daemonset	2021-11-18 09:13:02 -08:00
Markus Lehtonen	999628418b	source/storage: implement FeatureSource Separate feature discovery and creation of feature labels. Generalize the feature discovery so that block devices can be matched in custom label rules in a similar fashion as pci and usb devices. This extends the discovery to other block queue attributes than 'rotational': now we also detect 'dax', 'nr_zones' and 'zoned'. Labels created by the storage feature source are unchanged. The new features being detected are available in custom rules only. Example custom rules: - name: "my block rule 1" labels: my-block-feature-1: "true" matchFeatures: - feature: storage.block "rotational": {op: In, value: ["0"]} - name: "my block rule 2" labels: my-block-feature-2: "true" matchFeatures: - feature: storage.block "zoned": {op: In, value: [“host-aware”, “host-managed”]} Also, add minimalist unit test.	2021-11-18 14:58:33 +02:00
Markus Lehtonen	b96b86bc6c	pkg/apis/nfd: drop excess field from the CRD Drop stale leftover "LabelsTemplate" field from the rule spec.	2021-11-17 16:40:28 +02:00
Markus Lehtonen	c3e2315834	pkg/apis/nfd: specify CRD for custom labeling rules Add a cluster-scoped Custom Resource Definition for specifying labeling rules. Nodes (node features, node objects) are cluster-level objects and thus the natural and encouraged setup is to only have one NFD deployment per cluster - the set of underlying features of the node stays the same independent of how many parallel NFD deployments you have. Our extension points (hooks, feature files and now CRs) can be be used by multiple actors (depending on us) simultaneously. Having the CRD cluster-scoped hopefully drives deployments in this direction. It also should make deployment of vendor-specific labeling rules easy as there is no need to worry about the namespace. This patch virtually replicates the source.custom.FeatureSpec in a CRD API (located in the pkg/apis/nfd/v1alpha1 package) with the notable exception that "MatchOn" legacy rules are not supported. Legacy rules are left out in order to keep the CRD simple and clean. The duplicate functionality in source/custom will be dropped by upcoming patches. This patch utilizes controller-gen (from sigs.k8s.io/controller-tools) for generating the CRD and deepcopy methods. Code can be (re-)generated with "make generate". Install controller-gen with: go install sigs.k8s.io/controller-tools/cmd/controller-gen@v0.7.0 Update kustomize and helm deployments to deploy the CRD.	2021-11-17 13:40:23 +02:00
Swati Sehgal	b444ef95a8	NFD-Topology-Updater: Bump NRT API to version v0.0.12 The NodeResourceTopology API has been made cluster scoped as in the current context a CR corresponds to a Node and since Node is a cluster scoped resource it makes sense to make NRT cluster scoped as well. Ref: https://github.com/k8stopologyawareschedwg/noderesourcetopology-api/pull/18 Signed-off-by: Swati Sehgal <swsehgal@redhat.com>	2021-11-16 13:28:23 +00:00
Markus Lehtonen	e342076a5e	deployment: clean up base/topologyupdater-daemonset The base should really have the very bare minimum. Remove all redundant (at default-value) args and move the others to the specific topologyupdater kustomize component. This also makes these settings re-usable in user-specific overlays (that are not based on topologyupdater-daemonset).	2021-10-06 21:42:31 +03:00
Wei Zhang	4b1e9d7211	deployment: drop the topology updater job	2021-10-06 10:28:37 +08:00
Swati Sehgal	a2c066dc0d	topologyupdater: manifests: topologyupdater deployment files - create an overlay for deployment of all components - create an overlay for just topologyupdater deployment (to be deployed in conjunction with the default overlay) - create a separate overlay for deployment of master and topologyupdater-job Signed-off-by: Swati Sehgal <swsehgal@redhat.com>	2021-09-21 10:48:10 +01:00
Markus Lehtonen	a3b2d97513	kustomize: fix broken master-worker-combined base Got broken unnoticed with the addition of liveness and readiness probes.	2021-08-19 23:22:28 +03:00
Carlos Eduardo Arango Gutierrez	dece85b394	Add livenessProbe via grpc to nfd-master Signed-off-by: Carlos Eduardo Arango Gutierrez <carangog@redhat.com>	2021-08-18 10:23:10 -05:00
Markus Lehtonen	1f8a6d7819	kustomize: add standard-combined overlay Replicates nfd-daemonset-combined.yaml.template. In addition to the overlay we need to add a separate set of patches under components/common in order to handle the double-container pod.	2021-08-18 15:10:25 +03:00
Markus Lehtonen	787ebfe441	kustomize: add Job example deployment Add a new base kustomization for worker Job and an overlay stitching up the complete deployment. Replaces nfd-worker-job.yaml.template.	2021-08-18 15:10:25 +03:00

1 2

51 commits