node-feature-discovery

mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2024-12-15 17:50:49 +00:00

Author	SHA1	Message	Date
Carlos Eduardo Arango Gutierrez	75f0a14f2a	helm: add priorityClassName option Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2024-02-15 16:29:33 +01:00
Kubernetes Prow Robot	5c99ae8343	Merge pull request #1560 from leemingeer/master nfd-topology-updater add pods fingerprint by default	2024-01-29 00:34:44 -08:00
leemingeer	b6d8ce7a5a	nfd-topology-updater add pods fingerprint by default	2024-01-26 17:55:34 +08:00
Markus Lehtonen	8adb4b38da	deployment/helm: don't deploy topology-updater conf unnecessarily Only deploy the topology-updater config if topology-updater itself (the daemon) is deployed.	2024-01-25 16:15:58 +02:00
Kubernetes Prow Robot	4501bedd61	Merge pull request #1535 from marquiz/devel/grpc-probe nfd-master: run a separate gRPC health server	2024-01-05 15:24:28 +01:00
Markus Lehtonen	a053efda64	nfd-master: run a separate gRPC health server This patch separates the gRPC health server from the deprecated gRPC server (disabled by default, replaced by the NodeFeature CRD API) used for node labeling requests. The new health server runs on hardcoded TCP port number 8082. The main motivation for this change is to make the Kubernetes' built-in gRPC liveness probes to function if TLS is enabled (as they don't support TLS). The health server itself is a naive implementation (as it was before), basically only checking that nfd-master has started and hasn't crashed. The patch adds a TODO note to improve the functionality.	2024-01-04 13:58:26 +02:00
Markus Lehtonen	09b5af74de	deployment/kustomize: drop the sample cert-manager overlay Drop the deprecated and broken sample overlay. This was an example for enabling TLS with cert-manager. However, the overlay has been broken (and useless) since NodeFeature API was enabled by default - and gRPC disabled - in v0.14.	2024-01-03 21:13:15 +02:00
Markus Lehtonen	889fffd7d4	helm: add post-delete hook that cleans up the node This patch adds a post-delete hook to the Helm chart that runs "nfd-master --prune" in the cluster. This cleans up the node of labels, annotations, taints and extended resources that were created by NFD.	2023-12-29 15:36:41 +02:00
Markus Lehtonen	9846dede43	deployment/kustomize: enable nfd-gc in the default overlay	2023-12-21 21:30:14 +02:00
Markus Lehtonen	84fa1ed6e1	Document the NodeFeatureRule samples and move them under deployment dir	2023-12-15 13:43:26 +02:00
Markus Lehtonen	fe412a54b9	apis/nfd: add matchName field in feature matcher terms Extend the format of feature matcher terms (the elements of the arrayspecified under under matchFeatures field) with new matchName field. The value of this field is an expression that is evaluated against the names of feature elements instead of their values (values are matched with the matchExpressions field, instead). The matchName field is useful e.g. in template rules for creating per-feature-element labels based on feature names (instead of values) and in non-template rules for checking if (at least) one of certain feature element names are present. If both matchExpressions and matchName for certain feature matcher term is specified, they both must match in order to get an overall match. Also, in this case the list of matched features (used in templating) is the union of the results from matchExpressions and matchName. An example of creating an "avx512" label if any AVX512* CPUID feature is present: - name: "avx wildcard rule" labels: avx512: "true" matchFeatures: - feature: cpu.cpuid matchName: {op: InRegexp, value: ["^AVX512"]} An example of a template rule creating a dynamic set of labels based on the existence of certain kconfig options. - name: "kconfig template rule" labelsTemplate: \| {{ range .kernel.config }}kconfig-{{ .Name }}={{ .Value }} {{ end }} matchFeatures: - feature: kernel.config matchName: {op: In, value: ["SWAP", "X86", "ARM"]} NOTE: this patch changes the corner case of nil/null match expressions with instance features (i.e. "matchExpressions: null"). Previously, we returned all instances for templating but now a nil match expression is not evaluated and no instances for templating are returned.	2023-12-15 11:32:23 +02:00
Markus Lehtonen	0bc1b6c28f	apis/nfd: drop creation helper functions Drop the creation helper functions as one step in an effort to tidy up the api package. These functions were not much used outside unit tests anyway, the static rules of the nfd-worker custom feature source being the only exception (and if those happened to be invalid we'd catch that e.g. in the e2e-tests).	2023-12-14 15:54:51 +02:00
Kubernetes Prow Robot	04c4725dd1	Merge pull request #1491 from marquiz/devel/nf-owner-ref nfd-worker: set owner reference in NodeFeature objects	2023-12-08 14:51:47 +01:00
Kubernetes Prow Robot	aa26bbf964	Merge pull request #1494 from marquiz/devel/master-service deployment/kustomize: drop nfd-master service	2023-12-08 14:31:07 +01:00
Markus Lehtonen	34574f4211	nfd-worker: set owner reference in NodeFeature objects This patch creates a owner-dependent relationship between the nfd-worker pod and the NodeFeature object that it creates. With this change the orphaned NodeFeature object(s) gets automatically garbage-collected when the nfd-worker pod goes away, without the need for manual clean-up actions.	2023-12-08 14:57:31 +02:00
Markus Lehtonen	9624d182ab	deployment/kustomize: drop nfd-master service Not needed anymore as we're not relying on gRPC anymore.	2023-12-08 14:53:23 +02:00
Markus Lehtonen	53f5967555	deployment/kustomize: drop default-combined overlay The "combined" overlay, deploying nfd-master and nfd-worker in the same pod (with a daemonset) doesn't make sense anymore as we have enabled NodeFeature API. There is no direct communication between nfd-master and nfd-worker anymore, Moreover, the combined deployment can be seen as broken as there is one NodeFeature controller (i.e. nfd-master) on each node, causing them to race against each other, all processing all NodeFeature objects.	2023-12-08 14:42:31 +02:00
Markus Lehtonen	1d012a28cd	Option to stop implicitly adding default prefix to names Add new autoDefaultNs (default is "true") config option to nfd-master. Setting the config option to false stops NFD from automatically adding the "feature.node.kubernetes.io/" prefix to labels, annotations and extended resources. Taints are not affected as for them no prefix is automatically added. The user-visible part of enabling the option change is that NodeFeatureRules, local feature files, hooks and configuration of the "custom" may need to be altereda (if the auto-prefixing is relied on). For now, the config option defaults to "true", meaning no change in default behavior. However, the intent is to change the default to "false" in a future release, deprecating the option and eventually removing it (forcing it to "false"). The goal of stopping doing "auto-prefixing" is to simplify the operation (of nfd and users). Make the naming more straightforward and easier to understand and debug (kind of WYSIWYG), eliminating peculiar corner cases: 1. Make validation simpler and unambiguous 2. Remove "overloading" of names, i.e. the mapping two values to the same actual name. E.g. previously something like labels: feature.node.kubernetes.io/foo: bar foo: baz Could actually result in node label: feature.node.kubernetes.io/foo: baz 3. Make the processing/usagee of the "rule.matched" and "local.labels" feature in NodeFeatureRules unambiguous and more understadable. E.g. previously you could have node label "feature.node.kubernetes.io/local-foo: bar" but in the NodeFeatureRule you'd need to use the unprefixed name "local-foo" or the fully prefixed name, depending on what was specified in the feature file (or hook) on the node(s). NOTE: setting autoDefaultNs to false is a breaking change for users who rely on automatic prefixing with the default feature.node.kubernetes.io/ namespace. NodeFeatureRules, feature files, hooks and custom rules (configuration of the "custom" source of nfd-worker) will need to be altered. Unprefixed labels, annoations and extended resources will be denied by nfd-master.	2023-11-24 12:48:20 +02:00
Carlos Eduardo Arango Gutierrez	c0063be4f4	Discover node features as annotations Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> Co-authored-by: bebc <mchf1990212@gmail.com> Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>	2023-10-25 19:58:58 +02:00
AhmedGrati	d27eb0ac6d	feat: add parameters in helm to disable/enable nfd-master and nfd-worker Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-10-11 10:50:32 +01:00
Markus Lehtonen	98c3b0750d	nfd-gc: add metrics Implements three metrics for nfd-gc: - nfd_gc_build_info: version information of nfd-gc. - nfd_gc_objects_deleted_total: total number of NodeFeature and NodeResourceTopology objects deleted by nfd-gc. - nfd_gc_object_delete_failures_total: number of errors encountered when deleting NodeFeature and NodeResourceTopology objects.	2023-10-09 13:39:28 +00:00
Shiva Krishna, Merla	6237b821f6	Fix serviceaccount handling for nfd-gc to be consistent with others Signed-off-by: Shiva Krishna, Merla <smerla@nvidia.com>	2023-10-04 15:17:32 -07:00
Carlos Eduardo Arango Gutierrez	dd8d7f6725	Helm - service to be only deployed when needed (#1389 ) * Helm - service to be only deployed when needed Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> * Update deployment/helm/node-feature-discovery/templates/service.yaml Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com> --------- Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>	2023-10-04 16:00:43 +02:00
Carlos Eduardo Arango Gutierrez	3543aa22ce	Helm - Move remaining gPRC related flags to conditional Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2023-10-03 07:32:52 +02:00
Muyassarov, Feruzjon	06036a62ce	Replace gRPC health probe utility with k8s built-in health probe Kubernetes 1.23 has introduced native health probes for gRPC which can replace grpc_health_probe utility. This commit removes baking in grpc_health_probe binary into the image and updates related health checks to use k8s native gRPC. Signed-off-by: Muyassarov, Feruzjon <feruzjon.muyassarov@intel.com>	2023-09-20 12:25:36 +03:00
Kubernetes Prow Robot	8cdedf92fd	Merge pull request #1365 from marquiz/devel/helm-fix-nf-api deployment/helm: fix handling of enableNodeFeatureApi parameter	2023-09-19 05:33:08 -07:00
Markus Lehtonen	8b207cae1f	deployment/helm: fix handling of enableNodeFeatureApi parameter	2023-09-19 14:18:03 +03:00
Markus Lehtonen	759143ea3c	deployment/helm: fix namespace of nfd-worker role and rolebinding Put nfd-worker role and rolebinding in the correct namespace if namespaceOverride parameter is used.	2023-09-19 13:53:19 +03:00
Kubernetes Prow Robot	2e6a202218	Merge pull request #1331 from andrewjamesbrown/ajb/chart_annotations Helm: conditionally add annotations if defined	2023-09-07 01:20:59 -07:00
AhmedGrati	a5624cc8ca	chore: update config file in helm deployment Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-09-06 16:05:02 +01:00
Andrew Brown	b2f292bc7a	Add annotations to all deployments+daemonsets	2023-09-06 09:39:01 -04:00
Andrew Brown	1bb5b87d4a	Conditionally add annotations if defined	2023-09-05 19:37:56 -04:00
Carlos Eduardo Arango Gutierrez	04e954a7c3	Enable NodeFeature API by default Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>	2023-09-05 20:21:31 +02:00
Markus Lehtonen	ceb672bde0	deployment/helm: support nfd-gc Rename files and parameters. Drop the container security context parameters from the Helm chart. There should be no reason to run the nfd-gc with other than the minimal privileges. Also updates the documentation.	2023-08-23 10:56:12 +03:00
Markus Lehtonen	6cf29bd8ef	deployment/kustomize: support nfd-gc Rename the old "topology-gc" to just "gc". Simplify the setup a bit by including the RBAC rules in the "gc" base. Note: we don't enable nfd-gc in the default overlay, yet, as the NodeFeature API isn't enabled (gc is not needed).	2023-08-23 10:56:12 +03:00
Markus Lehtonen	01c08d67b6	Rename nfd-topology-gc to nfd-gc This is preparation for making it a generic garbage collector for all nfd-managed api objects.	2023-08-21 21:46:11 +03:00
Markus Lehtonen	06b333db1e	nfd-topology-updater: add metrics support For now, add only one metric, a counter for the errors occurring while scanning pod resources on the node.	2023-08-04 16:48:37 +03:00
Markus Lehtonen	7e375ad1f0	generate: bump tools to their latest versions Bump tools versions and re-auto-generate files.	2023-07-27 14:29:48 +03:00
Pat Riehecky	0523257d1a	Add optional labels to the podmonitor Signed-off-by: Pat Riehecky <riehecky@fnal.gov>	2023-07-21 10:03:50 -05:00
Carlos Eduardo Arango Gutierrez	e3aedd33e2	Enable metrics via prometheus operator Expose metrics via prometheus.monitoring.coreos.com/v1 The exposed metrics are \| Metric \| Type \| Meaning \| \| --------------- \| ---------------- \| ---------------- \| \| `nfd_master_build_info` \| Gauge \| Version from which nfd-master was built. \| \| `nfd_worker_build_info` \| Gauge \| Version from which nfd-worker was built. \| \| `nfd_updated_nodes` \| Counter \| Time taken to label a node \| \| `nfd_crd_processing_time` \| Gauge \| Time taken to process a NodeFeatureRule CRD \| \| `nfd_feature_discovery_duration_seconds` \| HistogramVec \| Time taken to discover features on a node \| Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>	2023-07-21 10:59:52 +02:00
adrianc	904f3739a3	fix typo in helm chart else statement of crd-controller should also refer to crd-controller flag. Signed-off-by: adrianc <adrianc@nvidia.com>	2023-07-02 18:01:31 +03:00
Kubernetes Prow Robot	407a610e0c	Merge pull request #1182 from fmuyassarov/disable-hooks-by-default hooks: disable hooks by default from v0.14	2023-06-22 04:43:40 -07:00
Dipankar Das	ebac4a25e7	Removal of the bases field as it is deprecated by kustomize Signed-off-by: Dipankar Das <dipankardas0115@gmail.com>	2023-06-09 12:49:24 +05:30
Muyassarov, Feruzjon	19527be924	hooks: disable hooks by default We have deprecated hooks in v0.12.0 but kept it enabled by default. Starting from v0.14 we are starting to disable it by default and plan to fully remove it in the near future. Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>	2023-06-07 13:04:23 +03:00
Markus Lehtonen	457fc8483b	deployment/kustomize: use a named port for nfd gRPC service	2023-06-06 21:00:42 +03:00
Hairong Chen	e8a00ba7da	cpu: Discover TDX guests based on cpuid information NFD already has the capability to discover whether baremetal / host machines support Intel TDX. Now, the next step is to add support for discovering whether a node is TDX protected (as in, a virtual machine started using Intel TDX). In order to do so, we've decided to go for a new `cpu-security.tdx` property, called `protected` (`cpu-security.tdx.protected`). Signed-off-by: Hairong Chen <hairong.chen@intel.com> Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2023-06-05 11:06:28 +02:00
AhmedGrati	b3cfe17392	feat: parallelize nodes update This PR aims to optimize the process of updating nodes with corresponding features. In fact, previously, we were updating nodes sequentially even though they are independent from each other. Therefore, we integrated new components: LabelersNodePool which is responsible for spininng a goroutine whenever there's a request for updating nodes, and a Workqueue which is responsible for holding nodes names that should be updated. Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-06-02 11:41:50 +01:00
Kubernetes Prow Robot	70d5ef477f	Merge pull request #1219 from PiotrProkop/leader-elect Add leader election for nfd-master	2023-05-22 00:36:21 -07:00
PiotrProkop	272fd4784f	Add new flag enable-leader-election for nfd-master. It allows NFD-master to be run in active-passive way when running multiple instances of NFD-master to prevent multiple components from updating same custom resources. Signed-off-by: PiotrProkop <pprokop@nvidia.com>	2023-05-15 13:30:07 +02:00
Markus Lehtonen	1200fd05c5	topology-updater: use node IP in the default configz URI Use a separate NODE_ADDRESS environment variable in the default value of -kubelet-config-uri (instead of NODE_NAME that was previously used). Also change the kustomize and Helm deployments to set this variable to node IP address. This should make the default deployment more robust, making it work in scenarios where node name does not resolve to the node ip, e.g. nodename != hostname.	2023-05-05 13:29:51 +03:00
Kubernetes Prow Robot	cd45baef8d	Merge pull request #1211 from marquiz/devel/helm deployment/helm: improve handling of topologyUpdater.kubeletStateFiles	2023-05-05 00:17:13 -07:00
Kubernetes Prow Robot	68370f861c	Merge pull request #1213 from marquiz/devel/helm-3 deployment/helm: user dedicated serviceaccount for topology-updater	2023-05-05 00:09:20 -07:00
Markus Lehtonen	526aab87cf	deployment/helm: user dedicated serviceaccount for topology-updater Change the configuration so that, by default, we use a dedicated serviceaccount for topology-updater (similar to topology-gc, nfd-master and nfd-worker). Fix the templates so that the serviceaccount and clusterrolebinding are only created when topology-updater is enabled (clusterrole was already handled this way). This patch also correctly documents the default value of rbac.create parameter of topology-updater and topology-gc.	2023-05-05 08:30:21 +03:00
Markus Lehtonen	9c2f268fd2	deployment/helm: improve handling of topologyUpdater.kubeletStateFiles Make it possible to disable kubelet state tracking with --set topologyUpdater.kubeletStateFiles="" as the documentation suggests. Also, fix the documentation regarding the default value of topologyUpdater.kubeletStateFiles parameter.	2023-05-04 15:01:19 +03:00
Markus Lehtonen	5891df6917	deployment/helm: avoid overlapping mount paths on topology-updater Mount kubelet podresources socket on an independent path, not under with the kubelet state directory. Otherwise container creation may fail on mount creation if topologyUpdater.kubeletPodResourcesSockPath and/or topologyUpdater.kubeletConfigPath Helm parameters are specified in a certain way.	2023-05-04 14:17:08 +03:00
Kubernetes Prow Robot	11db6bd37d	Merge pull request #1208 from marquiz/devel/kubelet-mounts deployment/kustomize: drop pod-resources mount for topology-updater	2023-05-04 02:02:42 -07:00
Markus Lehtonen	efabbe04ae	deployment/helm: fix default for kubeletStateDir parameter This parameter is a path in the host system, not a mount path inside the container.	2023-05-04 11:48:18 +03:00
Markus Lehtonen	c8a722b7c3	deployment/kustomize: drop pod-resources mount for topology-updater This mount is redundant as it's already included in the kubelet state files (/var/lib/kubelet) mount.	2023-05-04 11:06:55 +03:00
Markus Lehtonen	b016def8a3	helm: fix mount for nfd-master config Volume/mount setup for the ConfigMap was erroneously inside conditionals so it was not mounted unless TLS was enabled.	2023-05-02 10:06:21 +03:00
Kubernetes Prow Robot	2356223ffc	Merge pull request #1139 from AhmedGrati/feat-configure-master-resync feat: add master resync period configurability	2023-04-24 03:49:02 -07:00
AhmedGrati	7917434d38	feat: add master resync period configurability This PR adds a config option for setting the NFD API controller resync period. The resync period is only activated when the NodeFeature API has been enabled (with -enable-nodefeature-api). Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-04-24 11:52:38 +02:00
Markus Lehtonen	f4de7ed8ee	deployment/kustomize: add master config to prune overlay Otherwise pods error out with failed mount of nfd-master-conf ConfigMap.	2023-04-20 20:38:36 +03:00
Markus Lehtonen	a5ec646c48	generate: update controller-gen to v0.11.3 Update controller-gen tool from sigs.k8s.io/controller-tools to the latest release. Also, bump goimports from golang.org/x/tools to the latest version.	2023-04-19 12:48:12 +03:00
Kubernetes Prow Robot	8d71ed6755	Merge pull request #1086 from AhmedGrati/feat-support-builtin-kernel-mods feat: support builtin kernel mods	2023-04-13 10:30:40 -07:00
AhmedGrati	109caa1f28	feat: support builtin kernel mods This PR adds the combination of dynamic and builtin kernel modules into one feature called `kernel.enabledmodule`. It's a superset of the `kernel.loadedmodule` feature. Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-04-13 10:19:24 +01:00
Fabiano Fidêncio	250aea4741	Create extended resources with NodeFeatureRule Add support for management of Extended Resources via the NodeFeatureRule CRD API. There are usage scenarios where users want to advertise features as extended resources instead of labels (or annotations). This patch enables the discovery of extended resources, via annotation and patch of node.status.capacity and node.status.allocatable. By using the NodeFeatureRule API. Co-authored-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com> Co-authored-by: Fabiano Fidêncio <fabiano.fidencio@intel.com> Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com> Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2023-04-07 16:14:56 +02:00
Kubernetes Prow Robot	193c552b33	Merge pull request #1084 from AhmedGrati/feat-add-master-config-file feat: add master config file	2023-04-04 10:41:40 -07:00
AhmedGrati	3fff409f6d	Add master config file Similar to the nfd-worker, in this PR we want to support the dynamic run-time configurability through a config file for the nfd-master. We'll use a json or yaml configuration file along with the fsnotify in order to watch for changes in the config file. As a result, we're allowing dynamic control of logging params, allowed namespaces, extended resources, label whitelisting, and denied namespaces. Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-04-03 09:52:09 +01:00
AhmedGrati	02b3b7c7e0	feat: add enableTaints to helm chart Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-03-21 10:49:24 +01:00
Kubernetes Prow Robot	13f92faa77	Merge pull request #1031 from k8stopologyawareschedwg/reactive_updates topology-updater: reactive updates	2023-03-17 10:13:17 -07:00
Talor Itzhak	91daff3b59	deployment/helm: update helm charts Adding kubelet state directory mount Signed-off-by: Talor Itzhak <titzhak@redhat.com>	2023-03-16 11:51:45 +02:00
Carlos Eduardo Arango Gutierrez	355807f98c	kustomize: trim prune overlay Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2023-03-15 20:36:45 +01:00
Talor Itzhak	8afd819132	deployment/topology-updater: add mount for kubelet state dir This mount is needed for watching the state files Signed-off-by: Talor Itzhak <titzhak@redhat.com>	2023-03-12 12:43:13 +02:00
Kubernetes Prow Robot	37504109d6	Merge pull request #1080 from marquiz/devel/deploy-topology-updater deployment: fixes for mounting kubelet config	2023-03-09 09:50:02 -08:00
Markus Lehtonen	ed8a87b131	helm: fix handling of topologyUpdater.kubeletConfigPath By default we use the configz API endpoint so no mounts are needed.	2023-03-09 17:49:31 +02:00
Markus Lehtonen	33a1e3d114	kustomize: drop mount for kubelet config in topology-updater We use the configz endpoint nowadays.	2023-03-09 17:48:56 +02:00
Markus Lehtonen	40644aab60	helm: create topology-updater RBAC rules by default Create RBAC rules if topology-updater is enabled. Previously installing with topologyUpdater.enable=true (without topologyUpdater.rbac.create=true) resulted in a crashloogbackoff as RBAC was missing.	2023-03-09 16:16:09 +02:00
Markus Lehtonen	40d7139257	helm: fix topology-updater rbac clusterrole Access to nodes/proxy resource was accidentally given to nfd-master (which really doesn't need it), not topology-updater.	2023-03-09 16:15:03 +02:00
Jose Luis Ojosnegros Manchón	b340d112a8	topology-updater:compute pod set fingerprint Add an option to compute the fingerprint of the current pod set on each node. Report this new fingerprint using an attribute in NRT object.	2023-02-22 10:22:50 +01:00
Kubernetes Prow Robot	a92614c292	Merge pull request #1051 from AhmedGrati/feat-add-deny-label-ns-with-wildcard feat: add deny-label-ns flag which supports wildcard	2023-02-15 03:42:25 -08:00
AhmedGrati	b499799364	feat: add deny-label-ns flag which supports wildcard Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-02-15 09:47:00 +01:00
Jose Luis Ojosnegros Manchón	d1d1eda0d2	nrt-api: Update to v0.1.0 to use v1alpha2	2023-02-09 12:03:18 +01:00
Kubernetes Prow Robot	94ab0ddd3d	Merge pull request #1045 from AhmedGrati/feat-disable-service-links-nfd-master deployment: disable service links in NFD master pod	2023-02-06 08:55:01 -08:00
AhmedGrati	07d5ffe4b8	helm: make master port configurable Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-02-01 10:03:06 +01:00
AhmedGrati	743c877ad8	deployment: disable service links in NFD master pod Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-01-27 16:55:18 +01:00
Carlos Eduardo Arango Gutierrez	1c095f5e8e	docs: Fix link for Helm docs	2023-01-17 15:23:30 +01:00
PiotrProkop	59afae50ba	Add NodeResourceTopology garbage collector NodeResourceTopology(aka NRT) custom resource is used to enable NUMA aware Scheduling in Kubernetes. As of now node-feature-discovery daemons are used to advertise those resources but there is no service responsible for removing obsolete objects(without corresponding Kubernetes node). This patch adds new daemon called nfd-topology-gc which removes old NRTs. Signed-off-by: PiotrProkop <pprokop@nvidia.com>	2023-01-11 10:15:21 +01:00
Markus Lehtonen	dfda9bccad	apis/nfd: update auto-generated code	2022-12-22 17:58:20 +02:00
Markus Lehtonen	59a2757115	Use single-dash format for nfd cmdline flags Use the "single-dash" version of nfd command line flags in deployment files and e2e-tests. No impact in functionality, just aligns with documentation and other parts of the codebase.	2022-12-21 15:00:49 +02:00
Markus Lehtonen	9f0806593d	nfd-master: rename -featurerules-controller flag to -crd-controller Deprecate the '-featurerules-controller' command line flag as the name does not describe the functionality anymore: in practice it controls the CRD controller handling both NodeFeature and NodeFeatureRule objects. The patch introduces a duplicate, more generally named, flag '-crd-controller'. A warning is printed in the log if '-featurerules-controller' flag is encountered.	2022-12-14 10:23:45 +02:00
Markus Lehtonen	6ddd87e465	nfd-master: support NodeFeature objects Add initial support for handling NodeFeature objects. With this patch nfd-master watches NodeFeature objects in all namespaces and reacts to changes in any of these. The node which a certain NodeFeature object affects is determined by the "nfd.node.kubernetes.io/node-name" annotation of the object. When a NodeFeature object targeting certain node is changed, nfd-master needs to process all other objects targeting the same node, too, because there may be dependencies between them. Add a new command line flag for selecting between gRPC and NodeFeature CRD API as the source of feature requests. Enabling NodeFeature API disables the gRPC interface. -enable-nodefeature-api enable NodeFeature CRD API for incoming feature requests, will disable the gRPC interface (defaults to false) It is not possible to serve gRPC and watch NodeFeature objects at the same time. This is deliberate to avoid labeling races e.g. by nfd-worker sending gRPC requests but NodeFeature objects in the cluster "overriding" those changes (labels from the gRPC requests will get overridden when NodeFeature objects are processed).	2022-12-14 07:31:28 +02:00
Markus Lehtonen	237494463b	nfd-worker: support creating NodeFeatures object Support the new NodeFeatures object of the NFD CRD api. Add two new command line options to nfd-worker: -kubeconfig specifies the kubeconfig to use for connecting k8s api (defaults to empty which implies in-cluster config) -enable-nodefeature-api enable the NodeFeature CRD API for communicating node features to nfd-master, will also automatically disable gRPC (defgault to false) No config file option for selecting the API is available as there should be no need for dynamically selecting between gRPC and CRD. The nfd-master configuration must be changed in tandem and it is safer (and avoid awkward configuration races) to configure the whole NFD deployment at once. Default behavior of nfd-worker is not changed i.e. NodeFeatures object creation is not enabled by default (but must be enabled with the command line flag). The patch also updates the kustomize and Helm deployment, adding RBAC rules for nfd-worker and updating the example worker configuration.	2022-12-14 07:31:28 +02:00
Markus Lehtonen	d1c91e129a	apis/nfd: update auto-generated code	2022-12-14 07:31:28 +02:00
Markus Lehtonen	59ebff46c9	apis/nfd: add CRD for communicating node features Add a new NodeFeature CRD to the nfd Kubernetes API to communicate node features over K8s api objects instead of gRPC. The new resource is namespaced which will help the management of multiple NodeFeature objects per node. This aims at enabling 3rd party detectors for custom features. In addition to communicating raw features the NodeFeature object also has a field for directly requesting labels that should be applied on the node object. Rename the crd deployment file to nfd-api-crds.yaml so that it matches the new content of the file. Also, rename the Helm subdir for CRDs to match the expected chart directory structure.	2022-12-14 07:31:28 +02:00
Kubernetes Prow Robot	776a8c335c	Merge pull request #980 from marquiz/devel/topology-updater nfd-topology-updater: update NodeResourceTopology objects directly	2022-12-08 01:44:22 -08:00
Markus Lehtonen	f13ed2d91c	nfd-topology-updater: update NodeResourceTopology objects directly Drop the gRPC communication to nfd-master and connect to the Kubernetes API server directly when updating NodeResourceTopology objects. Topology-updater already has connection to the API server for listing Pods so this is not that dramatic change. It also simplifies the code a lot as there is no need for the NFD gRPC client and no need for managing TLS certs/keys. This change aligns nfd-topology-updater with the future direction of nfd-worker where the gRPC API is being dropped and replaced by a CRD-based API. This patch also update deployment files and documentation to reflect this change.	2022-12-08 11:03:22 +02:00
Kubernetes Prow Robot	f0ca0ffb5d	Merge pull request #979 from marquiz/fixes/helm-topology-updater helm: fix mount name of topology-updater config	2022-12-07 05:28:40 -08:00
Kubernetes Prow Robot	66a4ce9488	Merge pull request #981 from tariq1890/svc-selector nfd-master svc should select only nfd-master pods	2022-12-07 04:10:37 -08:00
Kubernetes Prow Robot	9f68f6c93a	Merge pull request #910 from fmuyassarov/taint/feruz Allow optionally setting node taints defined on the NodeFeatureRule CR	2022-12-06 07:28:37 -08:00
Tariq Ibrahim	153815fa56	nfd-master svc should select only nfd-master pods	2022-12-05 17:45:26 -08:00

1 2 3 4 5

236 commits