node-feature-discovery

mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2024-12-14 11:57:51 +00:00

Author	SHA1	Message	Date
Markus Lehtonen	fb6484fb8d	deployment: add startupProbe for nfd-master This patch mitigates inadvertent termination of nfd-master pods by the liveness probe on big clusters. With a recent change nfd-master started to wait (block) for informer caches to sync before starting the main loop. Consequently, this change also made the gRPC health enpoint to not respond until the caches have been synced. In big clusters the syncing the NodeFeature object cache takes a long time as the objects are big and there's (at least) one per each node in the cluster. Thus, in big clusters, the liveness probe kicks in and kills the nfd-master pod before it's ready.	2024-12-12 20:00:49 +02:00
Markus Lehtonen	45f49d574a	nfd-master: drop resourceLabels Drop the resourceLabels config file option and the corresponding -resource-labels command line flag. They were deprecated in NFD v0.13 so it's time to let them go. NodeFeatureRule(s) should be used to manage ERs, instead.	2024-11-07 15:16:52 +02:00
Carlos Eduardo Arango Gutierrez	62f4eddce6	Drop support for hooks Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2024-11-04 14:50:07 +01:00
Markus Lehtonen	403ad6cd7c	Update auto-generated code Run make generate after updating generator tools.	2024-10-30 12:25:16 +02:00
Carlos Eduardo Arango Gutierrez	0bd82cf82a	Drop NFD gRPC API Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2024-10-29 15:15:18 +01:00
Kubernetes Prow Robot	fd2893e2a5	Merge pull request #1592 from AhmedThresh/feat-configure-cr-restrictions feat/nfd-master: configure CR restrictions	2024-10-24 12:20:54 +01:00
Tobias Giese	52c2fc6498	Add separate helm values for the liveness and readiness probes Signed-off-by: Tobias Giese <tgiese@nvidia.com>	2024-10-18 12:54:42 +02:00
Tobias Giese	2af06bc722	Template exposed health port in helm chart Signed-off-by: Tobias Giese <tgiese@nvidia.com>	2024-10-14 09:52:55 +02:00
Tobias Giese	53ddf081da	Add parameter to configure health endpoint port Signed-off-by: Tobias Giese <tgiese@nvidia.com>	2024-09-24 15:15:50 +02:00
Tobias Giese	af0592b87c	Add helm values to configure hostNetwork and additional env vars We have to run our NFD workers in the host network. Also we need additional env variables such as KUBERNETES_SERVICE_HOST and _PORT. To achieve this we can simply add generic helm values. The default behavior is not changed. Signed-off-by: Tobias Giese <tgiese@nvidia.com>	2024-09-18 17:58:59 +02:00
Markus Lehtonen	e14596716a	helm: rename args to extraArgs in values.yaml Fixes an omission in `843fc9307d`.	2024-09-18 18:11:38 +03:00
Markus Lehtonen	843fc9307d	helm: rename args chart value to extraArgs The "args" value is not yet part of any release so this is not a breaking change.	2024-09-18 17:47:36 +03:00
AhmedGrati	28b40c90b8	deploy: add CR restrictions to the helm config Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com> Signed-off-by: AhmedThresh <ahmed.grati@insat.ucar.tn> Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com> Signed-off-by: AhmedThresh <ahmed.grati@insat.ucar.tn> Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com> Signed-off-by: AhmedThresh <ahmed.grati@insat.ucar.tn>	2024-09-16 16:02:42 +02:00
Markus Lehtonen	02b6b7395c	Drop dynamic run-time reconfiguration Simplify the code and reduce possible error scenarios by dropping fsnotify-based reconfiguration from nfd-master and nfd-worker. Also eliminates repeated re-configuration in scenarios where kubelet continuosly touches the (every minute) mounted file (configmap) on the filesystem. Also modifies the Helm and kustomize deployments so that nfd-master, nfd-worker and nfd-topology-updater pods are restarted on configmap updates. In kustomize, the slght downside of this is the name of the config map(s) depends on the content, so every time a user customizes the config data, the old unused configmap will be left and must be garbage-collected manually.	2024-08-21 12:46:36 +03:00
Omer Aplatony	b2222e2c8c	helm: add configurable liveness&readiness probes for master topology-updater and worker Signed-off-by: Omer Aplatony <omerap12@gmail.com>	2024-07-22 21:54:25 +03:00
Rouke Broersma	1230d607ac	Helm: Add revision history limit for worker daemonset (#1797 ) * Helm: Add revision history limit for worker daemonset Signed-off-by: Rouke Broersma <mobrockers@gmail.com> * Helm: Add revision history limit for topology updater daemonset Signed-off-by: Rouke Broersma <mobrockers@gmail.com> * chore: tidy table columns --------- Signed-off-by: Rouke Broersma <mobrockers@gmail.com>	2024-07-18 05:31:49 -07:00
Markus Lehtonen	fe6a1ac3d9	helm: drop trailing whitespace from values.yaml	2024-07-16 09:41:26 +03:00
Kubernetes Prow Robot	25ffe9c178	Merge pull request #1782 from omerap12/issue_1759 Helm: Add revision history limit for master replica	2024-07-15 01:09:09 -07:00
Omer Aplatony	920306cba8	Add revision history limit for master replica and for garbage collector Signed-off-by: Omer Aplatony <omerap12@gmail.com>	2024-07-12 18:20:38 +03:00
Markus Lehtonen	a269bf4d25	Drop the -enable-nodefeature-api flag Was marked to be removed in v0.17.	2024-07-10 15:20:07 +03:00
Kubernetes Prow Robot	393af96a88	Merge pull request #1755 from ArangoGutierrez/1752 Use worker DS OwnerReference for NF's	2024-07-09 06:33:07 -07:00
Kubernetes Prow Robot	d2456e181a	Merge pull request #1726 from marquiz/devel/helm-cmdline-args deployment/helm: enable specifying additional cmdline args	2024-07-09 02:09:52 -07:00
Carlos Eduardo Arango Gutierrez	5d3ee1c51f	Use worker DS OwnerReference for NF's Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2024-07-04 13:53:24 +02:00
Tariq Ibrahim	8e1907f53f	ensure post-delete-job's service account matches ref in job spec Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>	2024-06-17 14:09:26 -07:00
budimanjojo	3d62382cd1	helm: remove defaults CPU limits Signed-off-by: budimanjojo <budimanjojo@gmail.com>	2024-05-30 11:55:34 +07:00
Markus Lehtonen	a088de7333	deployment/helm: enable specifying additional cmdline args	2024-05-28 20:09:08 +03:00
Kubernetes Prow Robot	4136a69545	Merge pull request #1715 from marquiz/devel/avx10-deprecate source/cpu: disable AVX10 label	2024-05-24 04:53:59 -07:00
Markus Lehtonen	ece6076dd4	source/cpu: disable AVX10 label Disable AVX10 as unnecessary as AVX10_LEVEL is better suited for checking AVX10 compatibility. There is not yet any hardware with the feature so disabling it shouldn't cause problems for users.	2024-05-24 13:50:46 +03:00
Markus Lehtonen	fa2f008d18	cpu: advertise AVX10 version Add new cpuid label "feature.node.kubernetes.io/cpu-cpuid.AVX10_VERSION" that advertises the supported version of AVX10 vector ISA. Correspondingly, the patch adds AVX10_VERSION to the "cpu.cpuid" feature for NodeFeatureRules to consume. This makes cpu.cpuid on amd64 architecture a "multi-type" feature in that it contains "flags" and potentially also "attributes" (the only cpuid attribute so far is the AVX10_VERSION).	2024-05-24 13:48:20 +03:00
Markus Lehtonen	b3d6282d2c	api/nfd: document all undocumented fields in the types	2024-05-23 23:49:49 +03:00
Carlos Eduardo Arango Gutierrez	47c054e1db	Add NodeFeatureGroup CRD The NodeFeatureGroup is an NFD-specific custom resource that is designed for grouping nodes based on their features. NFD-Master watches for NodeFeatureGroup objects in the cluster and updates the status of the NodeFeatureGroup object with the list of nodes that match the feature group rules. The NodeFeatureGroup rules follow the same syntax as the NodeFeatureRule rules. Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2024-05-23 16:34:08 +02:00
Markus Lehtonen	560bd11d85	Re-add -enable-nodefeature-api cmdline flag Bring back the -enable-nodefeature-api command line flag and the corresponding enableNodeFeatureApi helm config value that were removed without deprecation when the NodeFeatureAPI feature gate was introduced. The thinking behind this change is to not break existing users (without warning) unless totally unavoidable. Now the -enable-nodefeature-api flag is marked as deprecated and slated for removal in NFD v0.17. The NodeFeatureAPI feature gate and the -enable-nodefeature-api flag work together so that the NodeFeature API is disabled (gRPC is enabled, instead) if either of them is set to false. This patch selectively reverts parts of `06c4733bc5`.	2024-05-16 10:53:49 +03:00
Kubernetes Prow Robot	391865bbb2	Merge pull request #1651 from cmontemuino/doc-resource-limits docs: document trade-offs in memory configuration	2024-04-25 06:41:29 -07:00
Kubernetes Prow Robot	af8a41cc02	Merge pull request #1639 from TessaIO/chore-add-prometheus-pod-monitor-interval chore/deploy: make interval property in PodMonitor configurable	2024-04-05 03:03:26 -07:00
Carlos M	cc53b604c5	chore: include suggestions from code review Co-authored-by: Carlos Eduardo Arango Gutierrez <arangogutierrez@gmail.com>	2024-04-05 10:01:08 +02:00
Oleg Zhurakivskyy	f2e9557a2d	nfd-topology-updater: Add liveness probe Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>	2024-04-03 13:15:54 +03:00
cmontemuino	54b01a2576	docs: document trade-offs in memory configuration Problem: memory requests and limits has been set for `master` process in PR #1631. It does not follow best practices for setting those values, but the intention was provide default values for a wide variety of clusters, including small ones. Solution: provide solid documentation about the problems that might happen in production environments when `resource.memory.requests << resource.memory.limits`. Add a link to relevant external sources, which includes the advise from Tim Hockin: > Always set memory limit == request Signed-off-by: cmontemuino <1761056+cmontemuino@users.noreply.github.com>	2024-04-02 19:01:50 +02:00
Kubernetes Prow Robot	7938e81c33	Merge pull request #1631 from TessaIO/chore-add-resources-limits-and-requests chore/deployment: add resources requests and limits for helm and Kustomize	2024-04-02 02:03:59 -07:00
TessaIO	74153e11b5	chore/deploy: make interval property in PodMonitor configurable Signed-off-by: TessaIO <ahmedgrati1999@gmail.com>	2024-03-26 08:36:52 +01:00
TessaIO	d02414cf61	chore/deployment: add resources requests and limits for helm and Kustomize Signed-off-by: TessaIO <ahmedgrati1999@gmail.com>	2024-03-22 14:27:44 +01:00
Markus Lehtonen	9b3d273a18	helm: fix invalid name of host-swaps volume	2024-03-20 21:15:02 +02:00
Kubernetes Prow Robot	0ad5e50f24	Merge pull request #1609 from ozhuraki/worker-health nfd-worker: Add liveness probe	2024-03-19 06:57:23 -07:00
Oleg Zhurakivskyy	8b63d17af7	nfd-worker: Add liveness probe Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>	2024-03-19 15:34:53 +02:00
Kubernetes Prow Robot	7df0f17f68	Merge pull request #1602 from ozhuraki/nrt-owner-ref Add owner reference to NRT object	2024-03-19 01:12:59 -07:00
Kubernetes Prow Robot	797fada92e	Merge pull request #1585 from kannon92/add-swap-support add swap support in nfd	2024-03-18 04:19:48 -07:00
Carlos Eduardo Arango Gutierrez	06c4733bc5	Add FeatureGate framework to handle new features Code inspired on https://github.com/kubernetes/component-base/tree/master/featuregate Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2024-03-15 19:11:32 +01:00
Oleg Zhurakivskyy	c662265a47	topology-updater: Add owner reference to NRT object Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>	2024-03-15 16:36:27 +02:00
Markus Lehtonen	a562a6188a	Update auto-generated code	2024-03-11 12:18:32 +02:00
Allen Mun	8bd52594ab	add ability to use a custom issuer	2024-02-27 12:14:43 -05:00
Kevin Hannon	187f65f94e	Add swap support in nfd	2024-02-19 10:20:56 -05:00

1 2 3 4 5

236 commits