1
0
Fork 0
mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2024-12-14 11:57:51 +00:00
node-feature-discovery/docs/deployment/helm.md
Carlos Eduardo Arango Gutierrez e3aedd33e2
Enable metrics via prometheus operator
Expose metrics via prometheus.monitoring.coreos.com/v1

The exposed metrics are

| Metric        | Type | Meaning |
| --------------- | ---------------- | ---------------- |
|  `nfd_master_build_info`           | Gauge | Version from which nfd-master was built. |
|  `nfd_worker_build_info`           | Gauge | Version from which nfd-worker was built. |
|  `nfd_updated_nodes`           |  Counter | Time taken to label a node |
|  `nfd_crd_processing_time`          |  Gauge | Time taken to process a NodeFeatureRule CRD |
| `nfd_feature_discovery_duration_seconds` |  HistogramVec | Time taken to discover features on a node |

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
2023-07-21 10:59:52 +02:00

23 KiB

title layout sort
Helm default 3

Deployment with Helm

{: .no_toc}

Table of contents

{: .no_toc .text-delta}

  1. TOC {:toc}

Node Feature Discovery Helm chart allow to easily deploy and manage NFD.

NOTE: NFD is not ideal for other Helm charts to depend on as that may result in multiple parallel NFD deployments in the same cluster which is not fully supported by the NFD Helm chart.

Prerequisites

Helm package manager should be installed.

Deployment

To install the latest stable version:

export NFD_NS=node-feature-discovery
helm repo add nfd https://kubernetes-sigs.github.io/node-feature-discovery/charts
helm repo update
helm install nfd/node-feature-discovery --namespace $NFD_NS --create-namespace --generate-name

To install the latest development version you need to clone the NFD Git repository and install from there.

git clone https://github.com/kubernetes-sigs/node-feature-discovery/
cd node-feature-discovery/deployment/helm
export NFD_NS=node-feature-discovery
helm install node-feature-discovery ./node-feature-discovery/ --namespace $NFD_NS --create-namespace

See the configuration section below for instructions how to alter the deployment parameters.

In order to deploy the minimal image you need to override the image tag:

helm install node-feature-discovery ./node-feature-discovery/ --set image.tag={{ site.release }}-minimal --namespace $NFD_NS --create-namespace

Configuration

You can override values from values.yaml and provide a file with custom values:

export NFD_NS=node-feature-discovery
helm install nfd/node-feature-discovery -f <path/to/custom/values.yaml> --namespace $NFD_NS --create-namespace

To specify each parameter separately you can provide them to helm install command:

export NFD_NS=node-feature-discovery
helm install nfd/node-feature-discovery --set nameOverride=NFDinstance --set master.replicaCount=2 --namespace $NFD_NS --create-namespace

Uninstalling the chart

To uninstall the node-feature-discovery deployment:

export NFD_NS=node-feature-discovery
helm uninstall node-feature-discovery --namespace $NFD_NS

The command removes all the Kubernetes components associated with the chart and deletes the release.

Chart parameters

In order to tailor the deployment of the Node Feature Discovery to your cluster needs We have introduced the following Chart parameters.

General parameters

Name Type Default description
image.repository string `{{ site.container_image split: ":"
image.tag string {{ site.release }} NFD image tag
image.pullPolicy string Always Image pull policy
imagePullSecrets list [] ImagePullSecrets is an optional list of references to secrets in the same namespace to use for pulling any of the images used by this PodSpec. If specified, these secrets will be passed to individual puller implementations for them to use. For example, in the case of docker, only DockerConfig type secrets are honored. More info
nameOverride string Override the name of the chart
fullnameOverride string Override a default fully qualified app name
tls.enable bool false Specifies whether to use TLS for communications between components
tls.certManager bool false If enabled, requires cert-manager to be installed and will automatically create the required TLS certificates
enableNodeFeatureApi bool false Enable the NodeFeature CRD API for communicating node features. This will automatically disable the gRPC communication.
prometheus.enable bool false Specifies whether to expose metrics using prometheus operator

Metrics are configured to be exposed using prometheus operator API's by default. If you want to expose metrics using the prometheus operator API's you need to install the prometheus operator in your cluster.

Master pod parameters

Name Type Default description
master.* dict NFD master deployment configuration
master.port integer Specifies the TCP port that nfd-master listens for incoming requests.
master.metricsPort integer 8081 Port on which to expose metrics from components to prometheus operator
master.instance string Instance name. Used to separate annotation namespaces for multiple parallel deployments
master.resyncPeriod string NFD API controller resync period.
master.extraLabelNs array [] List of allowed extra label namespaces
master.resourceLabels array [] List of labels to be registered as extended resources
master.enableTaints bool false Specifies whether to enable or disable node tainting
master.crdController bool null Specifies whether the NFD CRD API controller is enabled. If not set, controller will be enabled if master.instance is empty.
master.featureRulesController bool null DEPRECATED: use master.crdController instead
master.replicaCount integer 1 Number of desired pods. This is a pointer to distinguish between explicit zero and not specified
master.podSecurityContext dict {} PodSecurityContext holds pod-level security attributes and common container settings
master.securityContext dict {} Container security settings
master.serviceAccount.create bool true Specifies whether a service account should be created
master.serviceAccount.annotations dict {} Annotations to add to the service account
master.serviceAccount.name string The name of the service account to use. If not set and create is true, a name is generated using the fullname template
master.rbac.create bool true Specifies whether to create RBAC configuration for nfd-master
master.service.type string ClusterIP NFD master service type
master.service.port integer 8080 NFD master service port
master.resources dict {} NFD master pod resources management
master.nodeSelector dict {} NFD master pod node selector
master.tolerations dict Scheduling to master node is disabled NFD master pod tolerations
master.annotations dict {} NFD master pod annotations
master.affinity dict NFD master pod required node affinity
master.deploymentAnnotations dict {} NFD master deployment annotations
master.nfdApiParallelism integer 10 Specifies the maximum number of concurrent node updates.
master.config dict NFD master configuration

Worker pod parameters

Name Type Default description
worker.* dict NFD worker daemonset configuration
worker.metricsPort* integer 8081 Port on which to expose metrics from components to prometheus operator
worker.config dict NFD worker configuration
worker.podSecurityContext dict {} PodSecurityContext holds pod-level security attributes and common container settings
worker.securityContext dict {} Container security settings
worker.serviceAccount.create bool true Specifies whether a service account for nfd-worker should be created
worker.serviceAccount.annotations dict {} Annotations to add to the service account for nfd-worker
worker.serviceAccount.name string The name of the service account to use for nfd-worker. If not set and create is true, a name is generated using the fullname template (suffixed with -worker)
worker.rbac.create bool true Specifies whether to create RBAC configuration for nfd-worker
worker.mountUsrSrc bool false Specifies whether to allow users to mount the hostpath /user/src. Does not work on systems without /usr/src AND a read-only /usr
worker.resources dict {} NFD worker pod resources management
worker.nodeSelector dict {} NFD worker pod node selector
worker.tolerations dict {} NFD worker pod node tolerations
worker.priorityClassName string NFD worker pod priority class
worker.annotations dict {} NFD worker pod annotations
worker.daemonsetAnnotations dict {} NFD worker daemonset annotations

Topology updater parameters

Name Type Default description
topologyUpdater.* dict NFD Topology Updater configuration
topologyUpdater.enable bool false Specifies whether the NFD Topology Updater should be created
topologyUpdater.createCRDs bool false Specifies whether the NFD Topology Updater CRDs should be created
topologyUpdater.serviceAccount.create bool true Specifies whether the service account for topology updater should be created
topologyUpdater.serviceAccount.annotations dict {} Annotations to add to the service account for topology updater
topologyUpdater.serviceAccount.name string The name of the service account for topology updater to use. If not set and create is true, a name is generated using the fullname template and -topology-updater suffix
topologyUpdater.rbac.create bool true Specifies whether to create RBAC configuration for topology updater
topologyUpdater.kubeletConfigPath string "" Specifies the kubelet config host path
topologyUpdater.kubeletPodResourcesSockPath string "" Specifies the kubelet sock path to read pod resources
topologyUpdater.updateInterval string 60s Time to sleep between CR updates. Non-positive value implies no CR update.
topologyUpdater.watchNamespace string * Namespace to watch pods, * for all namespaces
topologyUpdater.podSecurityContext dict {} PodSecurityContext holds pod-level security attributes and common container settings
topologyUpdater.securityContext dict {} Container security settings
topologyUpdater.resources dict {} Topology updater pod resources management
topologyUpdater.nodeSelector dict {} Topology updater pod node selector
topologyUpdater.tolerations dict {} Topology updater pod node tolerations
topologyUpdater.annotations dict {} Topology updater pod annotations
topologyUpdater.affinity dict {} Topology updater pod affinity
topologyUpdater.config dict configuration
topologyUpdater.podSetFingerprint bool false Enables compute and report of pod fingerprint in NRT objects.
topologyUpdater.kubeletStateDir string /var/lib/kubelet Specifies kubelet state directory path for watching state and checkpoint files. Empty value disables kubelet state tracking.

Topology garbage collector parameters

Name Type Default description
topologyGC.* dict NFD Topology Garbage Collector configuration
topologyGC.enable bool true Specifies whether the NFD Topology Garbage Collector should be created
topologyGC.serviceAccount.create bool true Specifies whether the service account for topology garbage collector should be created
topologyGC.serviceAccount.annotations dict {} Annotations to add to the service account for topology garbage collector
topologyGC.serviceAccount.name string The name of the service account for topology garbage collector to use. If not set and create is true, a name is generated using the fullname template and -topology-gc suffix
topologyGC.rbac.create bool true Specifies whether to create RBAC configuration for topology garbage collector
topologyGC.interval string 1h Time between periodic garbage collector runs
topologyGC.podSecurityContext dict {} PodSecurityContext holds pod-level security attributes and common container settings
topologyGC.securityContext dict {} Container security settings
topologyGC.resources dict {} Topology garbage collector pod resources management
topologyGC.nodeSelector dict {} Topology garbage collector pod node selector
topologyGC.tolerations dict {} Topology garbage collector pod node tolerations
topologyGC.annotations dict {} Topology garbage collector pod annotations
topologyGC.affinity dict {} Topology garbage collector pod affinity