node-feature-discovery

mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2024-12-14 11:57:51 +00:00

Author	SHA1	Message	Date
Markus Lehtonen	b3d6282d2c	api/nfd: document all undocumented fields in the types	2024-05-23 23:49:49 +03:00
Carlos Eduardo Arango Gutierrez	47c054e1db	Add NodeFeatureGroup CRD The NodeFeatureGroup is an NFD-specific custom resource that is designed for grouping nodes based on their features. NFD-Master watches for NodeFeatureGroup objects in the cluster and updates the status of the NodeFeatureGroup object with the list of nodes that match the feature group rules. The NodeFeatureGroup rules follow the same syntax as the NodeFeatureRule rules. Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2024-05-23 16:34:08 +02:00
Markus Lehtonen	560bd11d85	Re-add -enable-nodefeature-api cmdline flag Bring back the -enable-nodefeature-api command line flag and the corresponding enableNodeFeatureApi helm config value that were removed without deprecation when the NodeFeatureAPI feature gate was introduced. The thinking behind this change is to not break existing users (without warning) unless totally unavoidable. Now the -enable-nodefeature-api flag is marked as deprecated and slated for removal in NFD v0.17. The NodeFeatureAPI feature gate and the -enable-nodefeature-api flag work together so that the NodeFeature API is disabled (gRPC is enabled, instead) if either of them is set to false. This patch selectively reverts parts of `06c4733bc5`.	2024-05-16 10:53:49 +03:00
Kubernetes Prow Robot	391865bbb2	Merge pull request #1651 from cmontemuino/doc-resource-limits docs: document trade-offs in memory configuration	2024-04-25 06:41:29 -07:00
Kubernetes Prow Robot	af8a41cc02	Merge pull request #1639 from TessaIO/chore-add-prometheus-pod-monitor-interval chore/deploy: make interval property in PodMonitor configurable	2024-04-05 03:03:26 -07:00
Carlos M	cc53b604c5	chore: include suggestions from code review Co-authored-by: Carlos Eduardo Arango Gutierrez <arangogutierrez@gmail.com>	2024-04-05 10:01:08 +02:00
Oleg Zhurakivskyy	f2e9557a2d	nfd-topology-updater: Add liveness probe Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>	2024-04-03 13:15:54 +03:00
cmontemuino	54b01a2576	docs: document trade-offs in memory configuration Problem: memory requests and limits has been set for `master` process in PR #1631. It does not follow best practices for setting those values, but the intention was provide default values for a wide variety of clusters, including small ones. Solution: provide solid documentation about the problems that might happen in production environments when `resource.memory.requests << resource.memory.limits`. Add a link to relevant external sources, which includes the advise from Tim Hockin: > Always set memory limit == request Signed-off-by: cmontemuino <1761056+cmontemuino@users.noreply.github.com>	2024-04-02 19:01:50 +02:00
Kubernetes Prow Robot	7938e81c33	Merge pull request #1631 from TessaIO/chore-add-resources-limits-and-requests chore/deployment: add resources requests and limits for helm and Kustomize	2024-04-02 02:03:59 -07:00
TessaIO	74153e11b5	chore/deploy: make interval property in PodMonitor configurable Signed-off-by: TessaIO <ahmedgrati1999@gmail.com>	2024-03-26 08:36:52 +01:00
TessaIO	d02414cf61	chore/deployment: add resources requests and limits for helm and Kustomize Signed-off-by: TessaIO <ahmedgrati1999@gmail.com>	2024-03-22 14:27:44 +01:00
Markus Lehtonen	9b3d273a18	helm: fix invalid name of host-swaps volume	2024-03-20 21:15:02 +02:00
Kubernetes Prow Robot	0ad5e50f24	Merge pull request #1609 from ozhuraki/worker-health nfd-worker: Add liveness probe	2024-03-19 06:57:23 -07:00
Oleg Zhurakivskyy	8b63d17af7	nfd-worker: Add liveness probe Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>	2024-03-19 15:34:53 +02:00
Kubernetes Prow Robot	7df0f17f68	Merge pull request #1602 from ozhuraki/nrt-owner-ref Add owner reference to NRT object	2024-03-19 01:12:59 -07:00
Kubernetes Prow Robot	797fada92e	Merge pull request #1585 from kannon92/add-swap-support add swap support in nfd	2024-03-18 04:19:48 -07:00
Carlos Eduardo Arango Gutierrez	06c4733bc5	Add FeatureGate framework to handle new features Code inspired on https://github.com/kubernetes/component-base/tree/master/featuregate Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2024-03-15 19:11:32 +01:00
Oleg Zhurakivskyy	c662265a47	topology-updater: Add owner reference to NRT object Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>	2024-03-15 16:36:27 +02:00
Markus Lehtonen	a562a6188a	Update auto-generated code	2024-03-11 12:18:32 +02:00
Allen Mun	8bd52594ab	add ability to use a custom issuer	2024-02-27 12:14:43 -05:00
Kevin Hannon	187f65f94e	Add swap support in nfd	2024-02-19 10:20:56 -05:00
Carlos Eduardo Arango Gutierrez	75f0a14f2a	helm: add priorityClassName option Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2024-02-15 16:29:33 +01:00
Kubernetes Prow Robot	5c99ae8343	Merge pull request #1560 from leemingeer/master nfd-topology-updater add pods fingerprint by default	2024-01-29 00:34:44 -08:00
leemingeer	b6d8ce7a5a	nfd-topology-updater add pods fingerprint by default	2024-01-26 17:55:34 +08:00
Markus Lehtonen	8adb4b38da	deployment/helm: don't deploy topology-updater conf unnecessarily Only deploy the topology-updater config if topology-updater itself (the daemon) is deployed.	2024-01-25 16:15:58 +02:00
Kubernetes Prow Robot	4501bedd61	Merge pull request #1535 from marquiz/devel/grpc-probe nfd-master: run a separate gRPC health server	2024-01-05 15:24:28 +01:00
Markus Lehtonen	a053efda64	nfd-master: run a separate gRPC health server This patch separates the gRPC health server from the deprecated gRPC server (disabled by default, replaced by the NodeFeature CRD API) used for node labeling requests. The new health server runs on hardcoded TCP port number 8082. The main motivation for this change is to make the Kubernetes' built-in gRPC liveness probes to function if TLS is enabled (as they don't support TLS). The health server itself is a naive implementation (as it was before), basically only checking that nfd-master has started and hasn't crashed. The patch adds a TODO note to improve the functionality.	2024-01-04 13:58:26 +02:00
Markus Lehtonen	09b5af74de	deployment/kustomize: drop the sample cert-manager overlay Drop the deprecated and broken sample overlay. This was an example for enabling TLS with cert-manager. However, the overlay has been broken (and useless) since NodeFeature API was enabled by default - and gRPC disabled - in v0.14.	2024-01-03 21:13:15 +02:00
Markus Lehtonen	889fffd7d4	helm: add post-delete hook that cleans up the node This patch adds a post-delete hook to the Helm chart that runs "nfd-master --prune" in the cluster. This cleans up the node of labels, annotations, taints and extended resources that were created by NFD.	2023-12-29 15:36:41 +02:00
Markus Lehtonen	9846dede43	deployment/kustomize: enable nfd-gc in the default overlay	2023-12-21 21:30:14 +02:00
Markus Lehtonen	84fa1ed6e1	Document the NodeFeatureRule samples and move them under deployment dir	2023-12-15 13:43:26 +02:00
Markus Lehtonen	fe412a54b9	apis/nfd: add matchName field in feature matcher terms Extend the format of feature matcher terms (the elements of the arrayspecified under under matchFeatures field) with new matchName field. The value of this field is an expression that is evaluated against the names of feature elements instead of their values (values are matched with the matchExpressions field, instead). The matchName field is useful e.g. in template rules for creating per-feature-element labels based on feature names (instead of values) and in non-template rules for checking if (at least) one of certain feature element names are present. If both matchExpressions and matchName for certain feature matcher term is specified, they both must match in order to get an overall match. Also, in this case the list of matched features (used in templating) is the union of the results from matchExpressions and matchName. An example of creating an "avx512" label if any AVX512* CPUID feature is present: - name: "avx wildcard rule" labels: avx512: "true" matchFeatures: - feature: cpu.cpuid matchName: {op: InRegexp, value: ["^AVX512"]} An example of a template rule creating a dynamic set of labels based on the existence of certain kconfig options. - name: "kconfig template rule" labelsTemplate: \| {{ range .kernel.config }}kconfig-{{ .Name }}={{ .Value }} {{ end }} matchFeatures: - feature: kernel.config matchName: {op: In, value: ["SWAP", "X86", "ARM"]} NOTE: this patch changes the corner case of nil/null match expressions with instance features (i.e. "matchExpressions: null"). Previously, we returned all instances for templating but now a nil match expression is not evaluated and no instances for templating are returned.	2023-12-15 11:32:23 +02:00
Markus Lehtonen	0bc1b6c28f	apis/nfd: drop creation helper functions Drop the creation helper functions as one step in an effort to tidy up the api package. These functions were not much used outside unit tests anyway, the static rules of the nfd-worker custom feature source being the only exception (and if those happened to be invalid we'd catch that e.g. in the e2e-tests).	2023-12-14 15:54:51 +02:00
Kubernetes Prow Robot	04c4725dd1	Merge pull request #1491 from marquiz/devel/nf-owner-ref nfd-worker: set owner reference in NodeFeature objects	2023-12-08 14:51:47 +01:00
Kubernetes Prow Robot	aa26bbf964	Merge pull request #1494 from marquiz/devel/master-service deployment/kustomize: drop nfd-master service	2023-12-08 14:31:07 +01:00
Markus Lehtonen	34574f4211	nfd-worker: set owner reference in NodeFeature objects This patch creates a owner-dependent relationship between the nfd-worker pod and the NodeFeature object that it creates. With this change the orphaned NodeFeature object(s) gets automatically garbage-collected when the nfd-worker pod goes away, without the need for manual clean-up actions.	2023-12-08 14:57:31 +02:00
Markus Lehtonen	9624d182ab	deployment/kustomize: drop nfd-master service Not needed anymore as we're not relying on gRPC anymore.	2023-12-08 14:53:23 +02:00
Markus Lehtonen	53f5967555	deployment/kustomize: drop default-combined overlay The "combined" overlay, deploying nfd-master and nfd-worker in the same pod (with a daemonset) doesn't make sense anymore as we have enabled NodeFeature API. There is no direct communication between nfd-master and nfd-worker anymore, Moreover, the combined deployment can be seen as broken as there is one NodeFeature controller (i.e. nfd-master) on each node, causing them to race against each other, all processing all NodeFeature objects.	2023-12-08 14:42:31 +02:00
Markus Lehtonen	1d012a28cd	Option to stop implicitly adding default prefix to names Add new autoDefaultNs (default is "true") config option to nfd-master. Setting the config option to false stops NFD from automatically adding the "feature.node.kubernetes.io/" prefix to labels, annotations and extended resources. Taints are not affected as for them no prefix is automatically added. The user-visible part of enabling the option change is that NodeFeatureRules, local feature files, hooks and configuration of the "custom" may need to be altereda (if the auto-prefixing is relied on). For now, the config option defaults to "true", meaning no change in default behavior. However, the intent is to change the default to "false" in a future release, deprecating the option and eventually removing it (forcing it to "false"). The goal of stopping doing "auto-prefixing" is to simplify the operation (of nfd and users). Make the naming more straightforward and easier to understand and debug (kind of WYSIWYG), eliminating peculiar corner cases: 1. Make validation simpler and unambiguous 2. Remove "overloading" of names, i.e. the mapping two values to the same actual name. E.g. previously something like labels: feature.node.kubernetes.io/foo: bar foo: baz Could actually result in node label: feature.node.kubernetes.io/foo: baz 3. Make the processing/usagee of the "rule.matched" and "local.labels" feature in NodeFeatureRules unambiguous and more understadable. E.g. previously you could have node label "feature.node.kubernetes.io/local-foo: bar" but in the NodeFeatureRule you'd need to use the unprefixed name "local-foo" or the fully prefixed name, depending on what was specified in the feature file (or hook) on the node(s). NOTE: setting autoDefaultNs to false is a breaking change for users who rely on automatic prefixing with the default feature.node.kubernetes.io/ namespace. NodeFeatureRules, feature files, hooks and custom rules (configuration of the "custom" source of nfd-worker) will need to be altered. Unprefixed labels, annoations and extended resources will be denied by nfd-master.	2023-11-24 12:48:20 +02:00
Carlos Eduardo Arango Gutierrez	c0063be4f4	Discover node features as annotations Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> Co-authored-by: bebc <mchf1990212@gmail.com> Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>	2023-10-25 19:58:58 +02:00
AhmedGrati	d27eb0ac6d	feat: add parameters in helm to disable/enable nfd-master and nfd-worker Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-10-11 10:50:32 +01:00
Markus Lehtonen	98c3b0750d	nfd-gc: add metrics Implements three metrics for nfd-gc: - nfd_gc_build_info: version information of nfd-gc. - nfd_gc_objects_deleted_total: total number of NodeFeature and NodeResourceTopology objects deleted by nfd-gc. - nfd_gc_object_delete_failures_total: number of errors encountered when deleting NodeFeature and NodeResourceTopology objects.	2023-10-09 13:39:28 +00:00
Shiva Krishna, Merla	6237b821f6	Fix serviceaccount handling for nfd-gc to be consistent with others Signed-off-by: Shiva Krishna, Merla <smerla@nvidia.com>	2023-10-04 15:17:32 -07:00
Carlos Eduardo Arango Gutierrez	dd8d7f6725	Helm - service to be only deployed when needed (#1389 ) * Helm - service to be only deployed when needed Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> * Update deployment/helm/node-feature-discovery/templates/service.yaml Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com> --------- Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>	2023-10-04 16:00:43 +02:00
Carlos Eduardo Arango Gutierrez	3543aa22ce	Helm - Move remaining gPRC related flags to conditional Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2023-10-03 07:32:52 +02:00
Muyassarov, Feruzjon	06036a62ce	Replace gRPC health probe utility with k8s built-in health probe Kubernetes 1.23 has introduced native health probes for gRPC which can replace grpc_health_probe utility. This commit removes baking in grpc_health_probe binary into the image and updates related health checks to use k8s native gRPC. Signed-off-by: Muyassarov, Feruzjon <feruzjon.muyassarov@intel.com>	2023-09-20 12:25:36 +03:00
Kubernetes Prow Robot	8cdedf92fd	Merge pull request #1365 from marquiz/devel/helm-fix-nf-api deployment/helm: fix handling of enableNodeFeatureApi parameter	2023-09-19 05:33:08 -07:00
Markus Lehtonen	8b207cae1f	deployment/helm: fix handling of enableNodeFeatureApi parameter	2023-09-19 14:18:03 +03:00
Markus Lehtonen	759143ea3c	deployment/helm: fix namespace of nfd-worker role and rolebinding Put nfd-worker role and rolebinding in the correct namespace if namespaceOverride parameter is used.	2023-09-19 13:53:19 +03:00
Kubernetes Prow Robot	2e6a202218	Merge pull request #1331 from andrewjamesbrown/ajb/chart_annotations Helm: conditionally add annotations if defined	2023-09-07 01:20:59 -07:00

1 2 3 4 5

207 commits