Sharing the same client between updater threads virtually serializes
access, in practice making the effective parallelism close to 1.
With this patch, in my bench cluster of 300 nodes, the time taken by
updating all nodes drops from ~2 minutes to ~12 seconds (with the
default parallelism of 10 node updater threads). This demonstrates the
10-fold increased parallelism from ~1 to 10.
There might be other solutions that could be explored, e.g. caching
nodes with an indexer/lister but otoh nfd doesn't necessarily need/want
to watch every little change in each node. We only need to get the node
when something in our own CRDs change (we don't react to any changes in
the node object itself). Using multiple clients was the most obvious
choice to solve the problem for now.
This allows the nfd to run on arm devices like the Raspberry Pi > 2
with armv7 core or armv8 in 32Bit mode.
The build tests will only be run for linux/amd64 and linux/arm64.
Releases will be made for all Platforms.
Don't require that the annotation value must conform to the (strict)
requirements of label values. In the Kubernetes API annotation values do
not have other restrictions than that the total size (keys and values)
of _all_ annotations combined of an object must not exceed 256kB.
This patch sets a maximum size limit of 1kB for the value of a single
feature annotation created by NFD. This limit is rather arbitrary but
should be enough for the NFD usage scenarios (until proven wrong).
Prevents potential race between node-updater pool and the api-controller
when re-configuring nfd-master. Reconfiguration causes a new
api-controller instance to be created so nfd api lister might change in
the midst of processing a node update (if the pool was running). No
actual issues related to this have been identified but races (like this)
should still be avoided.
Don't try to be too smart when kubeconfig is needed. In practice, the
nfd-master really doesn't work anymore (with the NodeFeature API
enabled) without a kubeconfig set. This patch fixes crashes happening
when NoPublish is enabled, e.g. in listing all nodes in the nfd api
handler and in getting single node objects in the node updater pool.
This patch changes the kubeconfig parsing to happen at the creation of
the nfd-master instance. We don't need to do that at reconfigure time as
none of the dynamic config options affect it. Unit tests are adjusted,
accordingly.
This started as a small effort to simplify the usage of "ready" channel
in nfd-master. It extended into a wider simplification/unification of
the channel usage.
Change the handling of LabelWhiteList config option to use a pointer to
detect when the option is unset. This doesn't fix any detected crash but
is merely general improvement and stabilization, serving easier testing.
Also, use the regexp type from the core libs for the config struct -
dropping the unmasrhalling code for our custom regexp type - as the core
regexp now implements unmarshaller itself.
Prevent excess queries of node objects from the Kubernetes apiserver.
This significantly speeds up node updates (and reduces the load on the
apiserver) as the client-side throttling (which is good) does not bite
us that hard.
Problem: memory requests and limits has been set for `master` process in
PR #1631. It does not follow best practices for setting those values,
but the intention was provide default values for a wide variety of
clusters, including small ones.
Solution: provide solid documentation about the problems that might
happen in production environments when
`resource.memory.requests << resource.memory.limits`. Add a link to
relevant external sources, which includes the advise from Tim Hockin:
> Always set memory limit == request
Signed-off-by: cmontemuino <1761056+cmontemuino@users.noreply.github.com>
Prevents races when (re-)starting the queue. There are no reports on
issues related to this (and I haven't come up with any actual failure
path in the current code) but better to be safe and follow the best
practices.
Prevents (rare) races on nfd-master reconfigurartion. Previously the
scheme was registered at nfd API controller creation/startup time. This
caused a race with some lister/informer goroutines of the previous
(stoppped) controller still running and accessing (reading) the sceme
while we were updating (writing) it.
APIVersion and Kind are empty in the returned namespace object
and need to be set explicitly.
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>