1
0
Fork 0
mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2025-03-17 22:08:33 +00:00
Commit graph

253 commits

Author SHA1 Message Date
Kubernetes Prow Robot
a6d9f76eca
Merge pull request #2072 from fmuyassarov/grpc-removal
Drop gRPC startup probe
2025-03-04 06:13:43 -08:00
Feruzjon Muyassarov
e9ab6a0424
Drop gRPC startup probe
Remove gRPC startup probe from master following gRPC logging removal.

Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@est.tech>
2025-03-04 14:19:35 +02:00
Carlos Eduardo Arango Gutierrez
80056a8c73
Remove deprecated toleration/affinity value
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2025-03-04 11:19:10 +01:00
Kubernetes Prow Robot
797c66ffaa
Merge pull request #1947 from marquiz/devel/health-topology-updater
nfd-topology-updater: serve metrics and healthz on the same port
2025-02-17 02:02:22 -08:00
Markus Lehtonen
0b0aed2318 nfd-master: serve metrics and healthz endpoint on the same port
Changes the gRPC health endpoint to plain http. At the same time starts
serving both the metrics and healthz endpoints on a single port.
Replaces the -metrics and -grpc-health command line flags with a single
-port flag.

Changes the Helm and kustomize deployments correspondingly.
2025-02-04 10:30:58 +02:00
Markus Lehtonen
0b9a8cf120 nfd-topology-updater: serve metrics and healthz on the same port
Changes the gRPC health endpoint to plain http. At the same time starts
serving both the metrics and healthz endpoints on a single port.
Replaces the -metrics and -grpc-health command line flags with a single
-port flag.

Changes the Helm and kustomize deployments correspondingly.
2025-02-04 10:30:26 +02:00
Markus Lehtonen
850c544522 nfd-worker: add healthz endpoint 2025-01-31 07:52:13 +02:00
Markus Lehtonen
4959a13a07 nfd-worker: replace --metrics with --port
Use a single port for serving http. In addition to metrics we will have
the healthz endpoint.
2025-01-31 07:52:12 +02:00
Markus Lehtonen
25914ec06e nfd-worker: drop the gRPC health port
To be replaced with plain http.
2025-01-31 07:50:00 +02:00
Markus Lehtonen
23ebd39ef1 helm: fix usage of worker.extraArgs 2025-01-29 13:20:35 +02:00
killianmuldoon
0ff1987d2a
Make DNS Policy configurable in helm chart (#2025)
* Make DNS Policy configurable in helm chart

Signed-off-by: Killian Muldoon <kmuldoon@nvidia.com>

* Add documentation for helm dnsPolicy values

Signed-off-by: Killian Muldoon <kmuldoon@nvidia.com>

* Reformat tables in helm docs

Signed-off-by: Killian Muldoon <kmuldoon@nvidia.com>

---------

Signed-off-by: Killian Muldoon <kmuldoon@nvidia.com>
2025-01-23 01:42:58 -08:00
adrianc
3f012c2d5a
Add support running with OwnerReferencesPermissionEnforcement
when OwnerReferencesPermissionEnforcement validating webhook is
enabled additional permissions are required to set/update owner ref
field. NFD worker sets/updates NodeFeature owner ref field to
the worker pod and owning daemonset.

owner reference can only be updated if the worker has delete permissions
for NodeFeatures.

if owner reference has blockOwnerDeletion (as the case for the daemonset
owner reference) then it requires update permissions to the finalizers
of the owner, to avoid this, we set blockOwnerDeleteion to false for all
owners referenced from NFD worker pod when setting/updating NodeFeature
owner ref.

Signed-off-by: adrianc <adrianc@nvidia.com>
2025-01-08 13:44:30 +02:00
Markus Lehtonen
bcb493ec96 Update autogenerated code
This encompasses a lot of changes because of the recent bump to
Kubernetes v1.32 (the code-generator version was bumped, too).
2024-12-18 12:30:46 +02:00
Kubernetes Prow Robot
3e87c97ac2
Merge pull request #1976 from marquiz/devel/grpc-api-cleanup
Cleanup for NodeFeature API being GA
2024-12-13 15:14:26 +01:00
Markus Lehtonen
fc103a6028 Cleanup for NodeFeature API being GA
Drop references to the gRPC API and don't suggest that NodeFeatureAPI
could be disabled.

Also update the developer guide for instructions running nfd components
outside the cluster.
2024-12-13 15:40:46 +02:00
Kubernetes Prow Robot
caaac59eba
Merge pull request #1860 from ozhuraki/no-owner-refs
nfd-worker: Add an option to disable setting the owner references
2024-12-13 13:12:26 +01:00
Markus Lehtonen
fb6484fb8d deployment: add startupProbe for nfd-master
This patch mitigates inadvertent termination of nfd-master pods by the
liveness probe on big clusters.

With a recent change nfd-master started to wait (block) for informer
caches to sync before starting the main loop. Consequently, this change
also made the gRPC health enpoint to not respond until the caches have
been synced. In big clusters the syncing the NodeFeature object cache
takes a long time as the objects are big and there's (at least) one per
each node in the cluster. Thus, in big clusters, the liveness probe
kicks in and kills the nfd-master pod before it's ready.
2024-12-12 20:00:49 +02:00
Oleg Zhurakivskyy
20ef877ab1 nfd-worker: Add an option to disable setting the owner references
In some cases it's desirable to control automatic garbage collection
of NodeFeature object.

Add an option to disable setting the owner references to Pod
for NodeFeature object.

Closes: 1817

Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2024-11-28 16:50:10 +02:00
Markus Lehtonen
45f49d574a nfd-master: drop resourceLabels
Drop the resourceLabels config file option and the corresponding
-resource-labels command line flag. They were deprecated in NFD v0.13 so
it's time to let them go. NodeFeatureRule(s) should be used to manage
ERs, instead.
2024-11-07 15:16:52 +02:00
Carlos Eduardo Arango Gutierrez
62f4eddce6
Drop support for hooks
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-11-04 14:50:07 +01:00
Markus Lehtonen
403ad6cd7c Update auto-generated code
Run make generate after updating generator tools.
2024-10-30 12:25:16 +02:00
Carlos Eduardo Arango Gutierrez
0bd82cf82a
Drop NFD gRPC API
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-10-29 15:15:18 +01:00
Kubernetes Prow Robot
fd2893e2a5
Merge pull request #1592 from AhmedThresh/feat-configure-cr-restrictions
feat/nfd-master: configure CR restrictions
2024-10-24 12:20:54 +01:00
Tobias Giese
52c2fc6498
Add separate helm values for the liveness and readiness probes
Signed-off-by: Tobias Giese <tgiese@nvidia.com>
2024-10-18 12:54:42 +02:00
Tobias Giese
2af06bc722
Template exposed health port in helm chart
Signed-off-by: Tobias Giese <tgiese@nvidia.com>
2024-10-14 09:52:55 +02:00
Tobias Giese
53ddf081da
Add parameter to configure health endpoint port
Signed-off-by: Tobias Giese <tgiese@nvidia.com>
2024-09-24 15:15:50 +02:00
Tobias Giese
af0592b87c
Add helm values to configure hostNetwork and additional env vars
We have to run our NFD workers in the host network.
Also we need additional env variables such as KUBERNETES_SERVICE_HOST and _PORT.
To achieve this we can simply add generic helm values. The default behavior is not changed.

Signed-off-by: Tobias Giese <tgiese@nvidia.com>
2024-09-18 17:58:59 +02:00
Markus Lehtonen
e14596716a helm: rename args to extraArgs in values.yaml
Fixes an omission in 843fc9307d.
2024-09-18 18:11:38 +03:00
Markus Lehtonen
843fc9307d helm: rename args chart value to extraArgs
The "args" value is not yet part of any release so this is not a
breaking change.
2024-09-18 17:47:36 +03:00
AhmedGrati
28b40c90b8 deploy: add CR restrictions to the helm config
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
Signed-off-by: AhmedThresh <ahmed.grati@insat.ucar.tn>
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
Signed-off-by: AhmedThresh <ahmed.grati@insat.ucar.tn>
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
Signed-off-by: AhmedThresh <ahmed.grati@insat.ucar.tn>
2024-09-16 16:02:42 +02:00
Markus Lehtonen
02b6b7395c Drop dynamic run-time reconfiguration
Simplify the code and reduce possible error scenarios by dropping
fsnotify-based reconfiguration from nfd-master and nfd-worker. Also
eliminates repeated re-configuration in scenarios where kubelet
continuosly touches the (every minute) mounted file (configmap) on the
filesystem.

Also modifies the Helm and kustomize deployments so that nfd-master,
nfd-worker and nfd-topology-updater pods are restarted on configmap
updates. In kustomize, the slght downside of this is the name of the
config map(s) depends on the content, so every time a user customizes
the config data, the old unused configmap will be left and must be
garbage-collected manually.
2024-08-21 12:46:36 +03:00
Omer Aplatony
b2222e2c8c helm: add configurable liveness&readiness probes for master topology-updater and worker
Signed-off-by: Omer Aplatony <omerap12@gmail.com>
2024-07-22 21:54:25 +03:00
Rouke Broersma
1230d607ac
Helm: Add revision history limit for worker daemonset (#1797)
* Helm: Add revision history limit for worker daemonset

Signed-off-by: Rouke Broersma <mobrockers@gmail.com>

* Helm: Add revision history limit for topology updater daemonset

Signed-off-by: Rouke Broersma <mobrockers@gmail.com>

* chore: tidy table columns

---------

Signed-off-by: Rouke Broersma <mobrockers@gmail.com>
2024-07-18 05:31:49 -07:00
Markus Lehtonen
fe6a1ac3d9 helm: drop trailing whitespace from values.yaml 2024-07-16 09:41:26 +03:00
Kubernetes Prow Robot
25ffe9c178
Merge pull request #1782 from omerap12/issue_1759
Helm: Add revision history limit for master replica
2024-07-15 01:09:09 -07:00
Omer Aplatony
920306cba8 Add revision history limit for master replica and for garbage collector
Signed-off-by: Omer Aplatony <omerap12@gmail.com>
2024-07-12 18:20:38 +03:00
Markus Lehtonen
a269bf4d25 Drop the -enable-nodefeature-api flag
Was marked to be removed in v0.17.
2024-07-10 15:20:07 +03:00
Kubernetes Prow Robot
393af96a88
Merge pull request #1755 from ArangoGutierrez/1752
Use worker DS OwnerReference for NF's
2024-07-09 06:33:07 -07:00
Kubernetes Prow Robot
d2456e181a
Merge pull request #1726 from marquiz/devel/helm-cmdline-args
deployment/helm: enable specifying additional cmdline args
2024-07-09 02:09:52 -07:00
Carlos Eduardo Arango Gutierrez
5d3ee1c51f
Use worker DS OwnerReference for NF's
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-07-04 13:53:24 +02:00
Tariq Ibrahim
8e1907f53f
ensure post-delete-job's service account matches ref in job spec
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
2024-06-17 14:09:26 -07:00
budimanjojo
3d62382cd1
helm: remove defaults CPU limits
Signed-off-by: budimanjojo <budimanjojo@gmail.com>
2024-05-30 11:55:34 +07:00
Markus Lehtonen
a088de7333 deployment/helm: enable specifying additional cmdline args 2024-05-28 20:09:08 +03:00
Kubernetes Prow Robot
4136a69545
Merge pull request #1715 from marquiz/devel/avx10-deprecate
source/cpu: disable AVX10 label
2024-05-24 04:53:59 -07:00
Markus Lehtonen
ece6076dd4 source/cpu: disable AVX10 label
Disable AVX10 as unnecessary as AVX10_LEVEL is better suited for
checking AVX10 compatibility. There is not yet any hardware with the
feature so disabling it shouldn't cause problems for users.
2024-05-24 13:50:46 +03:00
Markus Lehtonen
fa2f008d18 cpu: advertise AVX10 version
Add new cpuid label "feature.node.kubernetes.io/cpu-cpuid.AVX10_VERSION"
that advertises the supported version of AVX10 vector ISA.
Correspondingly, the patch adds AVX10_VERSION to the "cpu.cpuid" feature
for NodeFeatureRules to consume.

This makes cpu.cpuid on amd64 architecture a "multi-type" feature in
that it contains "flags" and potentially also "attributes" (the only
cpuid attribute so far is the AVX10_VERSION).
2024-05-24 13:48:20 +03:00
Markus Lehtonen
b3d6282d2c api/nfd: document all undocumented fields in the types 2024-05-23 23:49:49 +03:00
Carlos Eduardo Arango Gutierrez
47c054e1db
Add NodeFeatureGroup CRD
The NodeFeatureGroup is an NFD-specific custom resource that is designed for
grouping nodes based on their features. NFD-Master watches for NodeFeatureGroup
objects in the cluster and updates the status of the NodeFeatureGroup object
with the list of nodes that match the feature group rules. The NodeFeatureGroup
rules follow the same syntax as the NodeFeatureRule rules.

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-05-23 16:34:08 +02:00
Markus Lehtonen
560bd11d85 Re-add -enable-nodefeature-api cmdline flag
Bring back the -enable-nodefeature-api command line flag and the
corresponding enableNodeFeatureApi helm config value that were
removed without deprecation when the NodeFeatureAPI feature gate was
introduced. The thinking behind this change is to not break existing
users (without warning) unless totally unavoidable. Now the
-enable-nodefeature-api flag is marked as deprecated and slated for
removal in NFD v0.17.

The NodeFeatureAPI feature gate and the -enable-nodefeature-api flag
work together so that the NodeFeature API is disabled (gRPC is enabled,
instead) if either of them is set to false.

This patch selectively reverts parts of
06c4733bc5.
2024-05-16 10:53:49 +03:00
Kubernetes Prow Robot
391865bbb2
Merge pull request #1651 from cmontemuino/doc-resource-limits
docs: document trade-offs in memory configuration
2024-04-25 06:41:29 -07:00