mirror of
https://github.com/kubernetes-sigs/node-feature-discovery.git
synced 2024-12-14 11:57:51 +00:00
Node feature discovery for Kubernetes
a2068f7ce3
Fix cache syncing problems on big clusters with thousands of NodeFeature objects. On the initial list (sync) the client-go cache reflector sets the ResourceVersion to "0" (instead of leaving it empty). This causes problems in the api server with (apiserver) logs like: E writers.go:122] apiserver was unable to write a JSON response: http: Handler timeout E status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http: Handler timeout"}: http: Handler timeout On the nfd-master side we see corresponding log snippets like: W reflector.go:547] failed to list *v1alpha1.NodeFeature: stream error when reading response body, may be caused by closed connection. Please retry. Original error: stream error: stream ID 1521; INTERNAL_ERROR; received from peer I trace.go:236] "Reflector ListAndWatch" name:*** (***) (total time: 61126ms): ---"Objects listed" error:stream error when reading response body, may be caused by closed connection. Please retry. Original error: stream error: stream ID 1521; INTERNAL_ERROR; received from peer 61126ms (***) Decreasing the page size (opts.Limits) does not have any effect on the timeouts. However, setting ResourceVersion to an empty value seems to get the paging on its tracks, eliminating the timeouts. TODO: investigate in Kubernetes upstream the root cause of the timeouts with ResourceVersion="0". |
||
---|---|---|
.github | ||
api | ||
cmd | ||
demo | ||
deployment | ||
docs | ||
enhancements/1186-spiffe-integration | ||
examples | ||
hack | ||
pkg | ||
scripts | ||
source | ||
test | ||
.dockerignore | ||
.gitignore | ||
cloudbuild.yaml | ||
code-of-conduct.md | ||
codecov.yml | ||
CONTRIBUTING.md | ||
Dockerfile | ||
Dockerfile_generator | ||
go.mod | ||
go.sum | ||
LICENSE | ||
Makefile | ||
netlify.toml | ||
OWNERS | ||
README.md | ||
SECURITY_CONTACTS | ||
Tiltfile |
Node Feature Discovery
Welcome to Node Feature Discovery – a Kubernetes add-on for detecting hardware features and system configuration!
See our Documentation for detailed instructions and reference
Quick-start – the short-short version
$ kubectl apply -k https://github.com/kubernetes-sigs/node-feature-discovery/deployment/overlays/default?ref=v0.16.3
namespace/node-feature-discovery created
customresourcedefinition.apiextensions.k8s.io/nodefeaturerules.nfd.k8s-sigs.io created
customresourcedefinition.apiextensions.k8s.io/nodefeatures.nfd.k8s-sigs.io created
serviceaccount/nfd-gc created
serviceaccount/nfd-master created
serviceaccount/nfd-worker created
role.rbac.authorization.k8s.io/nfd-worker created
clusterrole.rbac.authorization.k8s.io/nfd-gc created
clusterrole.rbac.authorization.k8s.io/nfd-master created
rolebinding.rbac.authorization.k8s.io/nfd-worker created
clusterrolebinding.rbac.authorization.k8s.io/nfd-gc created
clusterrolebinding.rbac.authorization.k8s.io/nfd-master created
configmap/nfd-master-conf created
configmap/nfd-worker-conf created
deployment.apps/nfd-gc created
deployment.apps/nfd-master created
daemonset.apps/nfd-worker created
$ kubectl -n node-feature-discovery get all
NAME READY STATUS RESTARTS AGE
pod/nfd-gc-565fc85d9b-94jpj 1/1 Running 0 18s
pod/nfd-master-6796d89d7b-qccrq 1/1 Running 0 18s
pod/nfd-worker-nwdp6 1/1 Running 0 18s
...
$ kubectl get no -o json | jq '.items[].metadata.labels'
{
"kubernetes.io/arch": "amd64",
"kubernetes.io/os": "linux",
"feature.node.kubernetes.io/cpu-cpuid.ADX": "true",
"feature.node.kubernetes.io/cpu-cpuid.AESNI": "true",
...