1
0
Fork 0
mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2024-12-14 11:57:51 +00:00
Commit graph

11 commits

Author SHA1 Message Date
Markus Lehtonen
fb6484fb8d deployment: add startupProbe for nfd-master
This patch mitigates inadvertent termination of nfd-master pods by the
liveness probe on big clusters.

With a recent change nfd-master started to wait (block) for informer
caches to sync before starting the main loop. Consequently, this change
also made the gRPC health enpoint to not respond until the caches have
been synced. In big clusters the syncing the NodeFeature object cache
takes a long time as the objects are big and there's (at least) one per
each node in the cluster. Thus, in big clusters, the liveness probe
kicks in and kills the nfd-master pod before it's ready.
2024-12-12 20:00:49 +02:00
TessaIO
d02414cf61 chore/deployment: add resources requests and limits for helm and Kustomize
Signed-off-by: TessaIO <ahmedgrati1999@gmail.com>
2024-03-22 14:27:44 +01:00
Markus Lehtonen
a053efda64 nfd-master: run a separate gRPC health server
This patch separates the gRPC health server from the deprecated gRPC
server (disabled by default, replaced by the NodeFeature CRD API) used
for node labeling requests. The new health server runs on hardcoded TCP
port number 8082.

The main motivation for this change is to make the Kubernetes' built-in
gRPC liveness probes to function if TLS is enabled (as they don't
support TLS).

The health server itself is a naive implementation (as it was before),
basically only checking that nfd-master has started and hasn't crashed.
The patch adds a TODO note to improve the functionality.
2024-01-04 13:58:26 +02:00
Markus Lehtonen
9624d182ab deployment/kustomize: drop nfd-master service
Not needed anymore as we're not relying on gRPC anymore.
2023-12-08 14:53:23 +02:00
Muyassarov, Feruzjon
06036a62ce Replace gRPC health probe utility with k8s built-in health probe
Kubernetes 1.23 has introduced native health probes for gRPC which
can replace grpc_health_probe utility. This commit removes baking
in grpc_health_probe binary into the image and updates related
health checks to use k8s native gRPC.

Signed-off-by: Muyassarov, Feruzjon <feruzjon.muyassarov@intel.com>
2023-09-20 12:25:36 +03:00
Carlos Eduardo Arango Gutierrez
e3aedd33e2
Enable metrics via prometheus operator
Expose metrics via prometheus.monitoring.coreos.com/v1

The exposed metrics are

| Metric        | Type | Meaning |
| --------------- | ---------------- | ---------------- |
|  `nfd_master_build_info`           | Gauge | Version from which nfd-master was built. |
|  `nfd_worker_build_info`           | Gauge | Version from which nfd-worker was built. |
|  `nfd_updated_nodes`           |  Counter | Time taken to label a node |
|  `nfd_crd_processing_time`          |  Gauge | Time taken to process a NodeFeatureRule CRD |
| `nfd_feature_discovery_duration_seconds` |  HistogramVec | Time taken to discover features on a node |

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
2023-07-21 10:59:52 +02:00
Markus Lehtonen
457fc8483b deployment/kustomize: use a named port for nfd gRPC service 2023-06-06 21:00:42 +03:00
AhmedGrati
3fff409f6d Add master config file
Similar to the nfd-worker, in this PR we want to support the
dynamic run-time configurability through a config file for the nfd-master.

We'll use a json or yaml configuration file along with the fsnotify in
order to watch for changes in the config file. As a result, we're
allowing dynamic control of logging params, allowed namespaces,
extended resources, label whitelisting, and denied namespaces.

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-04-03 09:52:09 +01:00
AhmedGrati
743c877ad8 deployment: disable service links in NFD master pod
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-01-27 16:55:18 +01:00
Carlos Eduardo Arango Gutierrez
dece85b394
Add livenessProbe via grpc to nfd-master
Signed-off-by: Carlos Eduardo Arango Gutierrez <carangog@redhat.com>
2021-08-18 10:23:10 -05:00
Markus Lehtonen
8117c099a3 deployment: add kustomize base
Implement functionality virtually replicating deployment templates for
nfd-master and nfd-worker daemonset (nfd-master.yaml.template and
nfd-worker-daemonset.yaml.template) by adding a kustomize overlay named
"default".

We split the resources into multiple bases (rbac, master and
worker-daemonset) so that relevant parts are re-usable in
other deployment scenarios added later (e.g. "one-shot job", and
"combined daemonset").

This patch adds one component (components/common) doing the required
kustomization for the example deployment.
2021-08-18 14:05:57 +03:00