1
0
Fork 0
mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2024-12-14 11:57:51 +00:00

Merge pull request #526 from k8stopologyawareschedwg/topology-updater-documentation

Documentation capturing enablement of NFD-Topology-Updater in NFD
This commit is contained in:
Kubernetes Prow Robot 2021-10-29 04:54:50 -07:00 committed by GitHub
commit 347b16daea
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
5 changed files with 526 additions and 9 deletions

View file

@ -184,6 +184,8 @@ Usage of nfd-master:
Comma separated list of labels to be exposed as extended resources.
-verify-node-name
Verify worker node name against the worker's TLS certificate. Only takes effect when TLS authentication has been enabled.
-nrt-namespace
Namespace in which Node Resource Topology CR are created. Ensure that the namespace specified already exists
-version
Print version and exit.
```
@ -242,6 +244,95 @@ stand-alone directly with `docker run`. See the
[default deployment](https://github.com/kubernetes-sigs/node-feature-discovery/blob/{{site.release}}/deployment/components/common/worker-mounts.yaml)
for up-to-date information about the required volume mounts.
### NFD-Topology-Updater
In order to run nfd-topology-updater as a "stand-alone" container against your
standalone nfd-master you need to run them in the same network namespace:
```bash
$ docker run --rm --network=container:nfd-test ${NFD_CONTAINER_IMAGE} nfd-topology-updater
2019/02/01 14:48:56 Node Feature Discovery Topology Updater <NFD_VERSION>
...
```
If you just want to try out feature discovery without connecting to nfd-master,
pass the `-no-publish` flag to nfd-topology-updater.
Command line flags of nfd-topology-updater:
```bash
$ docker run --rm ${NFD_CONTAINER_IMAGE} nfd-topology-updater -help
docker run --rm quay.io/swsehgal/node-feature-discovery:v0.10.0-devel-64-g93a0a9f-dirty nfd-topology-updater -help
Usage of nfd-topology-updater:
-add_dir_header
If true, adds the file directory to the header of the log messages
-alsologtostderr
log to standard error as well as files
-ca-file string
Root certificate for verifying connections
-cert-file string
Certificate used for authenticating connections
-key-file string
Private key matching -cert-file
-kubeconfig string
Kube config file.
-kubelet-config-file string
Kubelet config file path. (default "/host-var/lib/kubelet/config.yaml")
-log_backtrace_at value
when logging hits line file:N, emit a stack trace
-log_dir string
If non-empty, write log files in this directory
-log_file string
If non-empty, use this log file
-log_file_max_size uint
Defines the maximum size a log file can grow to. Unit is megabytes. If the value is 0, the maximum file size is unlimited. (default 1800)
-logtostderr
log to standard error instead of files (default true)
-no-publish
Do not publish discovered features to the cluster-local Kubernetes API server.
-one_output
If true, only write logs to their native severity level (vs also writing to each lower severity level)
-oneshot
Update once and exit
-podresources-socket string
Pod Resource Socket path to use. (default "/host-var/lib/kubelet/pod-resources/kubelet.sock")
-server string
NFD server address to connecto to. (default "localhost:8080")
-server-name-override string
Hostname expected from server certificate, useful in testing
-skip_headers
If true, avoid header prefixes in the log messages
-skip_log_headers
If true, avoid headers when opening log files
-sleep-interval duration
Time to sleep between CR updates. Non-positive value implies no CR updatation (i.e. infinite sleep). [Default: 60s] (default 1m0s)
-stderrthreshold value
logs at or above this threshold go to stderr (default 2)
-v value
number for the log level verbosity
-version
Print version and exit.
-vmodule value
comma-separated list of pattern=N settings for file-filtered logging
-watch-namespace string
Namespace to watch pods (for testing/debugging purpose). Use * for all namespaces. (default "*")
```
NOTE:
NFD topology updater needs certain directories and/or files from the
host mounted inside the NFD container. Thus, you need to provide Docker with the
correct `--volume` options in order for them to work correctly when run
stand-alone directly with `docker run`. See the
[template spec](https://github.com/kubernetes-sigs/node-feature-discovery/blob/{{site.release}}/deployment/components/topology-updater/topologyupdater-mounts.yaml)
for up-to-date information about the required volume mounts.
[PodResource API][podresource-api] is a prerequisite for nfd-topology-updater.
Preceding Kubernetes v1.23, the `kubelet` must be started with the following flag:
`--feature-gates=KubeletPodResourcesGetAllocatable=true`.
Starting Kubernetes v1.23, the `GetAllocatableResources` is enabled by default
through `KubeletPodResourcesGetAllocatable` [feature gate][feature-gate].
## Documentation
All documentation resides under the
@ -271,4 +362,6 @@ make site-build
This will generate html documentation under `docs/_site/`.
<!-- Links -->
[e2e-config-sample]: https://github.com/kubernetes-sigs/node-feature-discovery/blob/{{site.release}}/test/e2e/e2e-test-config.example.yaml
[e2e-config-sample]: https://github.com/kubernetes-sigs/node-feature-discovery/blob/{{site.release}}/test/e2e/e2e-test-config.exapmle.yaml
[podresource-api]: https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/#monitoring-device-plugin-resources
[feature-gate]: https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates

View file

@ -0,0 +1,197 @@
---
title: "Topology Updater Cmdline Reference"
layout: default
sort: 5
---
# NFD-Topology-Updater Commandline Flags
{: .no_toc }
## Table of Contents
{: .no_toc .text-delta }
1. TOC
{:toc}
---
To quickly view available command line flags execute `nfd-topology-updater -help`.
In a docker container:
```bash
docker run gcr.io/k8s-staging-nfd/node-feature-discovery:master nfd-topology-updater -help
```
### -h, -help
Print usage and exit.
### -version
Print version and exit.
### -server
The `-server` flag specifies the address of the nfd-master endpoint where to
connect to.
Default: localhost:8080
Example:
```bash
nfd-topology-updater -server=nfd-master.nfd.svc.cluster.local:443
```
### -ca-file
The `-ca-file` is one of the three flags (together with `-cert-file` and
`-key-file`) controlling the mutual TLS authentication on the topology-updater side.
This flag specifies the TLS root certificate that is used for verifying the
authenticity of nfd-master.
Default: *empty*
Note: Must be specified together with `-cert-file` and `-key-file`
Example:
```bash
nfd-topology-updater -ca-file=/opt/nfd/ca.crt -cert-file=/opt/nfd/updater.crt -key-file=/opt/nfd/updater.key
```
### -cert-file
The `-cert-file` is one of the three flags (together with `-ca-file` and
`-key-file`) controlling mutual TLS authentication on the topology-updater
side. This flag specifies the TLS certificate presented for authenticating
outgoing requests.
Default: *empty*
Note: Must be specified together with `-ca-file` and `-key-file`
Example:
```bash
nfd-topology-updater -cert-file=/opt/nfd/updater.crt -key-file=/opt/nfd/updater.key -ca-file=/opt/nfd/ca.crt
```
### -key-file
The `-key-file` is one of the three flags (together with `-ca-file` and
`-cert-file`) controlling the mutual TLS authentication on topology-updater
side. This flag specifies the private key corresponding the given certificate file
(`-cert-file`) that is used for authenticating outgoing requests.
Default: *empty*
Note: Must be specified together with `-cert-file` and `-ca-file`
Example:
```bash
nfd-topology-updater -key-file=/opt/nfd/updater.key -cert-file=/opt/nfd/updater.crt -ca-file=/opt/nfd/ca.crt
```
### -server-name-override
The `-server-name-override` flag specifies the common name (CN) which to
expect from the nfd-master TLS certificate. This flag is mostly intended for
development and debugging purposes.
Default: *empty*
Example:
```bash
nfd-topology-updater -server-name-override=localhost
```
### -no-publish
The `-no-publish` flag disables all communication with the nfd-master, making
it a "dry-run" flag for nfd-topology-updater. NFD-Topology-Updater runs
resource hardware topology detection normally, but no CR requests are sent to
nfd-master.
Default: *false*
Example:
```bash
nfd-topology-updater -no-publish
```
### -oneshot
The `-oneshot` flag causes nfd-topology-updater to exit after one pass of
resource hardware topology detection.
Default: *false*
Example:
```bash
nfd-topology-updater -oneshot -no-publish
```
### -sleep-interval
The `-sleep-interval` specifies the interval between resource hardware
topology re-examination (and CR updates). A non-positive value implies
infinite sleep interval, i.e. no re-detection is done.
Default: 60s
Example:
```bash
nfd-topology-updater -sleep-interval=1h
```
### -watch-namespace
The `-watch-namespace` specifies the namespace to ensure that resource
hardware topology examination only happens for the pods running in the
specified namespace. Pods that are not running in the specified namespace
are not considered during resource accounting. This is particularly useful
for testing/debugging purpose. A "*" value would mean that all the pods would
be considered during the accounting process.
Default: "*"
Example:
```bash
nfd-topology-updater -watch-namespace=rte
```
### -kubelet-config-file
The `-kubelet-config-file` specifies the path to the Kubelet's configuration
file.
Default: /host-var/lib/kubelet/config.yaml
Example:
```bash
nfd-topology-updater -kubelet-config-file=/var/lib/kubelet/config.yaml
```
### -podresources-socket
The `-podresources-socket` specifies the path to the Unix socket where kubelet
exports a gRPC service to enable discovery of in-use CPUs and devices, and to
provide metadata for them.
Default: /host-var/liblib/kubelet/pod-resources/kubelet.sock
Example:
```bash
nfd-topology-updater -podresources-socket=/var/lib/kubelet/pod-resources/kubelet.sock
```

View file

@ -96,7 +96,11 @@ kubectl apply -k https://github.com/kubernetes-sigs/node-feature-discovery/deplo
```
This will required RBAC rules and deploy nfd-master (as a deployment) and
nfd-worker (as a daemonset) in the `node-feature-discovery` namespace.
nfd-worker (as daemonset) in the `node-feature-discovery` namespace.
**NOTE:** nfd-topology-updater is not deployed as part of the `default` overlay.
Please refer to the [Master Worker Topologyupdater](#master-worker-topologyupdater)
and [Topologyupdater](#topology-updater) below.
Alternatively you can clone the repository and customize the deployment by
creating your own overlays. For example, to deploy the [minimal](#minimal)
@ -115,6 +119,10 @@ scenarios under
see [Master-worker pod](#master-worker-pod) below
- [`default-job`](https://github.com/kubernetes-sigs/node-feature-discovery/blob/{{site.release}}/deployment/overlays/default-job):
see [Worker one-shot](#worker-one-shot) below
- [`master-worker-topologyupdater`](https://github.com/kubernetes-sigs/node-feature-discovery/blob/{{site.release}}/deployment/overlays/master-worker-topologyupdater):
see [Master Worker Topologyupdater](#master-worker-topologyupdater) below
- [`topologyupdater`](https://github.com/kubernetes-sigs/node-feature-discovery/blob/{{site.release}}/deployment/overlays/topologyupdater):
see [Topology Updater](#topology-updater) below
- [`prune`](https://github.com/kubernetes-sigs/node-feature-discovery/blob/{{site.release}}/deployment/overlays/prune):
clean up the cluster after uninstallation, see
[Removing feature labels](#removing-feature-labels)
@ -138,10 +146,14 @@ kubectl apply -k https://github.com/kubernetes-sigs/node-feature-discovery/deplo
```
This creates a DaemonSet runs both nfd-worker and nfd-master in the same Pod.
This creates a DaemonSet that runs nfd-worker and nfd-master in the same Pod.
In this case no nfd-master is run on the master node(s), but, the worker nodes
are able to label themselves which may be desirable e.g. in single-node setups.
**NOTE:** nfd-topology-updater is not deployed by the default-combined overlay.
To enable nfd-topology-updater in this scenario,the users must customize the
deployment themselves.
#### Worker one-shot
Feature discovery can alternatively be configured as a one-shot job.
@ -154,11 +166,44 @@ kubectl kustomize https://github.com/kubernetes-sigs/node-feature-discovery/depl
kubectl apply -f -
```
The example above launces as many jobs as there are non-master nodes. Note that
The example above launches as many jobs as there are non-master nodes. Note that
this approach does not guarantee running once on every node. For example,
tainted, non-ready nodes or some other reasons in Job scheduling may cause some
node(s) will run extra job instance(s) to satisfy the request.
#### Master Worker Topologyupdater
NFD Master, NFD worker and NFD Topologyupdater can be configured to be deployed
as separate pods. The `master-worker-topologyupdater` overlay may be used to
achieve this:
```bash
kubectl apply -k https://github.com/kubernetes-sigs/node-feature-discovery/deployment/overlays/master-worker-topologyupdater?ref={{ site.release }}
```
#### Topologyupdater
In order to deploy just NFD master and NFD Topologyupdater (without nfd-worker)
use the `topologyupdater` overlay:
```bash
kubectl apply -k https://github.com/kubernetes-sigs/node-feature-discovery/deployment/overlays/topologyupdater?ref={{ site.release }}
```
NFD Topologyupdater can be configured along with the `default` overlay
(which deploys NFD worker and NFD master) where all the software components
are deployed as separate pods. The `topologyupdater` overlay may be used
along with `default` overlay to achieve this:
```bash
kubectl apply -k https://github.com/kubernetes-sigs/node-feature-discovery/deployment/overlays/default?ref={{ site.release }}
kubectl apply -k https://github.com/kubernetes-sigs/node-feature-discovery/deployment/overlays/topologyupdater?ref={{ site.release }}
```
### Deployment with Helm
Node Feature Discovery Helm chart allow to easily deploy and manage NFD.
@ -350,6 +395,21 @@ The worker configuration file is watched and re-read on every change which
provides a simple mechanism of dynamic run-time reconfiguration. See
[worker configuration](#worker-configuration) for more details.
### NFD-Topology-Updater
NFD-Topology-Updater is preferably run as a Kubernetes DaemonSet. This assures
re-examination (and CR updates) on regular intervals capturing changes in
the allocated resources and hence the allocatable resources on a per zone
basis. It makes sure that more CR instances are created as new nodes get
added to the cluster. Topology-Updater connects to the nfd-master service
to create CR instances corresponding to nodes.
When run as a daemonset, nodes are re-examined for the allocated resources
(to determine the information of the allocatable resources on a per zone basis
where a zone can be a NUMA node) at an interval specified using the
`-sleep-interval` option. The default sleep interval is set to 60s which is the
the value when no -sleep-interval is specified.
### Communication security with TLS
NFD supports mutual TLS authentication between the nfd-master and nfd-worker

View file

@ -19,10 +19,11 @@ This software enables node feature discovery for Kubernetes. It detects
hardware features available on each node in a Kubernetes cluster, and
advertises those features using node labels.
NFD consists of two software components:
NFD consists of three software components:
1. nfd-master
1. nfd-worker
1. nfd-topology-updater
## NFD-Master
@ -36,7 +37,17 @@ NFD-Worker is a daemon responsible for feature detection. It then communicates
the information to nfd-master which does the actual node labeling. One
instance of nfd-worker is supposed to be running on each node of the cluster,
## Feature discovery
## NFD-Topology-Updater
NFD-Topology-Updater is a daemon responsible for examining allocated
resourceson a worker node to account for resources available to be allocated
to new pod on a per-zone basis (where a zone can be a NUMA node). It then
communicates the information to nfd-master which does the
[NodeResourceTopology CR](#noderesourcetopology-cr) creation corresponding
to all the nodes in the cluster. One instance of nfd-topology-updater is
supposed to be running on each node of the cluster.
## Feature Discovery
Feature discovery is divided into domain-specific feature sources:
@ -93,4 +104,49 @@ command line flag affects the annotation names
Unapplicable annotations are not created, i.e. for example master.version is
only created on nodes running nfd-master.
## NodeResourceTopology CR
When run with NFD-Topology-Updater, NFD creates CR intances corresponding to
node resource hardware topology such as:
```yaml
apiVersion: topology.node.k8s.io/v1alpha1
kind: NodeResourceTopology
metadata:
name: node1
topologyPolicies: ["SingleNUMANodeContainerLevel"]
zones:
- name: node-0
type: Node
resources:
- name: cpu
capacity: 20
allocatable: 16
available: 10
- name: vendor/nic1
capacity: 3
allocatable: 3
available: 3
- name: node-1
type: Node
resources:
- name: cpu
capacity: 30
allocatable: 30
available: 15
- name: vendor/nic2
capacity: 6
allocatable: 6
available: 6
- name: node-2
type: Node
resources:
- name: cpu
capacity: 30
allocatable: 30
available: 15
- name: vendor/nic1
capacity: 3
allocatable: 3
available: 3
```

View file

@ -19,14 +19,16 @@ kubectl apply -k https://github.com/kubernetes-sigs/node-feature-discovery/deplo
## Verify
Wait until NFD master and worker are running.
Wait until NFD master and NFD worker are running.
```bash
$ kubectl -n node-feature-discovery get ds,deploy
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/nfd-worker 3 3 3 3 3 <none> 5s
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/nfd-worker 2 2 2 2 2 <none> 10s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/nfd-master 1/1 1 1 17s
```
Check that NFD feature labels have been created
@ -71,3 +73,112 @@ $ kubectl get po feature-dependent-pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
feature-dependent-pod 1/1 Running 0 23s 10.36.0.4 node-2 <none> <none>
```
## Additional Optional Installation Steps
In order to deploy nfd-master and nfd-topology-updater daemons
use `topologyupdater` overlay.
Deploy with kustomize -- creates a new namespace, service and required RBAC
rules and nfd-master and nfd-topology-updater daemons.
```bash
kubectl apply -k https://github.com/kubernetes-sigs/node-feature-discovery/deployment/overlays/topologyupdater?ref={{ site.release }}
```
**NOTE:**
[PodResource API][podresource-api] is a prerequisite for nfd-topology-updater.
Preceding Kubernetes v1.23, the `kubelet` must be started with the following flag:
`--feature-gates=KubeletPodResourcesGetAllocatable=true`
Starting Kubernetes v1.23, the `GetAllocatableResources` is enabled by default
through `KubeletPodResourcesGetAllocatable` [feature gate][feature-gate].
## Verify
Wait until NFD master and NFD topologyupdater are running.
```bash
$ kubectl -n node-feature-discovery get ds,deploy
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/nfd-topology-updater 2 2 2 2 2 <none> 5s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/nfd-master 1/1 1 1 17s
```
Check that the NodeResourceTopology CR instances are created
```bash
$ kubectl get noderesourcetopologies.topology.node.k8s.io
NAME AGE
kind-control-plane 23s
kind-worker 23s
```
## Show the CR instances
```bash
$ kubectl describe noderesourcetopologies.topology.node.k8s.io kind-control-plane
Name: kind-control-plane
Namespace: default
Labels: <none>
Annotations: <none>
API Version: topology.node.k8s.io/v1alpha1
Kind: NodeResourceTopology
...
Topology Policies:
SingleNUMANodeContainerLevel
Zones:
Name: node-0
Costs:
node-0: 10
node-1: 20
Resources:
Name: Cpu
Allocatable: 3
Capacity: 3
Available: 3
Name: vendor/nic1
Allocatable: 2
Capacity: 2
Available: 2
Name: vendor/nic2
Allocatable: 2
Capacity: 2
Available: 2
Type: Node
Name: node-1
Costs:
node-0: 20
node-1: 10
Resources:
Name: Cpu
Allocatable: 4
Capacity: 4
Available: 4
Name: vendor/nic1
Allocatable: 2
Capacity: 2
Available: 2
Name: vendor/nic2
Allocatable: 2
Capacity: 2
Available: 2
Type: Node
Events: <none>
```
The CR instances created can be used to gain insight into the allocatable
resources along with the granularity of those resources at a per-zone level
(represented by node-0 and node-1 in the above example) or can be used by an
external entity (e.g. topology-aware scheduler plugin) to take an action based
on the gathered information.
<!-- Links -->
[podresource-api]: https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/#monitoring-device-plugin-resources
[feature-gate]: https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates