1
0
Fork 0
mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2024-12-14 11:57:51 +00:00
Node feature discovery for Kubernetes
Find a file
Markus Lehtonen a2068f7ce3 nfd-master: tweak list options for NodeFeature informer
Fix cache syncing problems on big clusters with thousands of NodeFeature
objects.

On the initial list (sync) the client-go cache reflector sets the
ResourceVersion to "0" (instead of leaving it empty). This causes
problems in the api server with (apiserver) logs like:

E writers.go:122] apiserver was unable to write a JSON response: http:
                  Handler timeout
E status.go:71] apiserver received an error that is not an
                metav1.Status: &errors.errorString{s:"http: Handler timeout"}:
                http: Handler timeout

On the nfd-master side we see corresponding log snippets like:

W reflector.go:547] failed to list *v1alpha1.NodeFeature: stream error
                    when reading response body, may be caused by closed
                    connection. Please retry. Original error: stream
                    error: stream ID 1521; INTERNAL_ERROR; received from
                    peer
I trace.go:236] "Reflector ListAndWatch" name:*** (***) (total time:
                61126ms): ---"Objects listed" error:stream error when
                reading response body, may be caused by closed
                connection. Please retry. Original error: stream
                error: stream ID 1521; INTERNAL_ERROR; received from
                peer 61126ms (***)

Decreasing the page size (opts.Limits) does not have any effect on the
timeouts. However, setting ResourceVersion to an empty value seems to
get the paging on its tracks, eliminating the timeouts.

TODO: investigate in Kubernetes upstream the root cause of the timeouts
with ResourceVersion="0".
2024-07-25 16:29:05 +03:00
.github Revert "build(deps): bump actions/checkout from 1 to 4" 2024-05-27 20:59:56 +03:00
api build(deps): bump k8s.io/kubernetes in the k8sio group 2024-07-22 09:41:19 +03:00
cmd Drop the -enable-nodefeature-api flag 2024-07-10 15:20:07 +03:00
demo demo: make demo runnable again 2020-09-10 17:09:53 +03:00
deployment helm: add configurable liveness&readiness probes for master topology-updater and worker 2024-07-22 21:54:25 +03:00
docs helm: add configurable liveness&readiness probes for master topology-updater and worker 2024-07-22 21:54:25 +03:00
enhancements/1186-spiffe-integration docs: add kep of spiffe integration 2024-01-18 15:09:10 +01:00
examples Add NodeFeatureGroup CRD 2024-05-23 16:34:08 +02:00
hack Add NodeFeatureGroup CRD 2024-05-23 16:34:08 +02:00
pkg nfd-master: tweak list options for NodeFeature informer 2024-07-25 16:29:05 +03:00
scripts scripts/test-infra: bump helm to v3.15.3 2024-07-18 08:51:38 +03:00
source fix: take into consideration possibility of having empty line in swap file 2024-07-11 22:02:39 +02:00
test feature-gates: mark NodeFeatureAPI as GA 2024-07-16 13:53:31 +03:00
.dockerignore dockerignore: cleanup 2023-12-08 14:48:02 +02:00
.gitignore gitignore: ignore codecov coverage report 2023-03-13 12:08:32 +02:00
cloudbuild.yaml cloudbuild: increase the image build timeout 2024-07-09 12:35:33 +03:00
code-of-conduct.md Update code-of-conduct.md 2017-12-20 14:12:51 -05:00
codecov.yml codecov: drop required minimum coverage ratio of at patch level 2023-04-28 17:00:14 +03:00
CONTRIBUTING.md Template project files 2016-07-22 22:13:48 -07:00
Dockerfile Dockerfile: cache go modules on build 2024-07-18 15:58:16 +03:00
Dockerfile_generator Dockerfile: cache go modules on build 2024-07-18 15:58:16 +03:00
go.mod build(deps): bump k8s.io/kubernetes in the k8sio group 2024-07-22 09:41:19 +03:00
go.sum build(deps): bump k8s.io/kubernetes in the k8sio group 2024-07-22 09:41:19 +03:00
LICENSE Template project files 2016-07-22 22:13:48 -07:00
Makefile build: specify buildx builder name everywhere 2024-04-26 17:02:02 +03:00
netlify.toml Add netlify configuration file 2022-09-16 00:47:49 +03:00
OWNERS replace AhmedGrati account with TessaIO as reviewer 2024-03-16 21:37:05 +01:00
README.md README: update to v0.16.3 2024-07-16 15:21:10 +03:00
SECURITY_CONTACTS Update SECURITY_CONTACTS 2020-11-19 15:10:27 -05:00
Tiltfile Update base image to Debian bullseye 2022-10-14 10:04:04 +03:00

Node Feature Discovery

Go Report Card Prow Build Prow E2E-Test

Welcome to Node Feature Discovery a Kubernetes add-on for detecting hardware features and system configuration!

See our Documentation for detailed instructions and reference

Quick-start the short-short version

$ kubectl apply -k https://github.com/kubernetes-sigs/node-feature-discovery/deployment/overlays/default?ref=v0.16.3
  namespace/node-feature-discovery created
  customresourcedefinition.apiextensions.k8s.io/nodefeaturerules.nfd.k8s-sigs.io created
  customresourcedefinition.apiextensions.k8s.io/nodefeatures.nfd.k8s-sigs.io created
  serviceaccount/nfd-gc created
  serviceaccount/nfd-master created
  serviceaccount/nfd-worker created
  role.rbac.authorization.k8s.io/nfd-worker created
  clusterrole.rbac.authorization.k8s.io/nfd-gc created
  clusterrole.rbac.authorization.k8s.io/nfd-master created
  rolebinding.rbac.authorization.k8s.io/nfd-worker created
  clusterrolebinding.rbac.authorization.k8s.io/nfd-gc created
  clusterrolebinding.rbac.authorization.k8s.io/nfd-master created
  configmap/nfd-master-conf created
  configmap/nfd-worker-conf created
  deployment.apps/nfd-gc created
  deployment.apps/nfd-master created
  daemonset.apps/nfd-worker created

$ kubectl -n node-feature-discovery get all
  NAME                              READY   STATUS    RESTARTS   AGE
  pod/nfd-gc-565fc85d9b-94jpj       1/1     Running   0          18s
  pod/nfd-master-6796d89d7b-qccrq   1/1     Running   0          18s
  pod/nfd-worker-nwdp6              1/1     Running   0          18s
...

$ kubectl get no -o json | jq '.items[].metadata.labels'
  {
    "kubernetes.io/arch": "amd64",
    "kubernetes.io/os": "linux",
    "feature.node.kubernetes.io/cpu-cpuid.ADX": "true",
    "feature.node.kubernetes.io/cpu-cpuid.AESNI": "true",
...