node-feature-discovery

mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2024-12-14 11:57:51 +00:00

Author	SHA1	Message	Date
Kubernetes Prow Robot	6b80f654d4	Merge pull request #1600 from ArangoGutierrez/e2e-not-k8s Move NFD api to a separate go mod	2024-04-09 02:06:06 -07:00
Markus Lehtonen	ba4cebb29e	nfd-master: stop node-updater pool before reconfiguring api-controller Prevents potential race between node-updater pool and the api-controller when re-configuring nfd-master. Reconfiguration causes a new api-controller instance to be created so nfd api lister might change in the midst of processing a node update (if the pool was running). No actual issues related to this have been identified but races (like this) should still be avoided.	2024-04-09 10:45:07 +03:00
Kubernetes Prow Robot	31a56acdd4	Merge pull request #1655 from marquiz/devel/master-no-publish nfd-master: parse kubeconfig even with NoPublish set	2024-04-08 05:14:21 -07:00
Markus Lehtonen	8709cccf71	nfd-master: parse kubeconfig even with NoPublish set Don't try to be too smart when kubeconfig is needed. In practice, the nfd-master really doesn't work anymore (with the NodeFeature API enabled) without a kubeconfig set. This patch fixes crashes happening when NoPublish is enabled, e.g. in listing all nodes in the nfd api handler and in getting single node objects in the node updater pool. This patch changes the kubeconfig parsing to happen at the creation of the nfd-master instance. We don't need to do that at reconfigure time as none of the dynamic config options affect it. Unit tests are adjusted, accordingly.	2024-04-08 14:25:27 +03:00
Kubernetes Prow Robot	8f5830f3c6	Merge pull request #1658 from marquiz/devel/master-opts nfd-master: implement opts for modifying NfdMaster instance	2024-04-08 03:59:52 -07:00
Markus Lehtonen	fcb8d3cda4	nfd-master: implement opts for modifying NfdMaster instance This provides a more controlled way for setting up the NfdMaster instance for testing.	2024-04-05 20:21:19 +03:00
Kubernetes Prow Robot	199d665046	Merge pull request #1656 from marquiz/devel/channel-simplify Tidy up usage of channels for signaling	2024-04-05 07:51:34 -07:00
Carlos Eduardo Arango Gutierrez	3434557d7c	Move NFD api to a separate go mod Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2024-04-05 16:35:47 +02:00
Kubernetes Prow Robot	86c88f18f0	Merge pull request #1650 from marquiz/devel/readme Update readme to v0.15.4 release	2024-04-05 06:01:46 -07:00
Kubernetes Prow Robot	cb24f7c234	Merge pull request #1657 from marquiz/devel/master-label-whitelist nfd-master: prevent crash on empty config struct	2024-04-05 05:36:52 -07:00
Markus Lehtonen	26a80cf142	Tidy up usage of channels for signaling This started as a small effort to simplify the usage of "ready" channel in nfd-master. It extended into a wider simplification/unification of the channel usage.	2024-04-05 14:39:58 +03:00
Markus Lehtonen	b27676451a	nfd-master: prevent crash on empty config struct Change the handling of LabelWhiteList config option to use a pointer to detect when the option is unset. This doesn't fix any detected crash but is merely general improvement and stabilization, serving easier testing. Also, use the regexp type from the core libs for the config struct - dropping the unmasrhalling code for our custom regexp type - as the core regexp now implements unmarshaller itself.	2024-04-05 14:19:44 +03:00
Kubernetes Prow Robot	ad96c301a4	Merge pull request #1642 from marquiz/devel/master-updater-pool-lock nfd-master: protect node updater pool queueing with a lock	2024-04-05 03:31:10 -07:00
Kubernetes Prow Robot	af8a41cc02	Merge pull request #1639 from TessaIO/chore-add-prometheus-pod-monitor-interval chore/deploy: make interval property in PodMonitor configurable	2024-04-05 03:03:26 -07:00
Carlos M	cc53b604c5	chore: include suggestions from code review Co-authored-by: Carlos Eduardo Arango Gutierrez <arangogutierrez@gmail.com>	2024-04-05 10:01:08 +02:00
Kubernetes Prow Robot	275e625c2a	Merge pull request #1652 from marquiz/devel/reuse-node nfd-master: get node object only once when updating node	2024-04-04 05:02:45 -07:00
Markus Lehtonen	44a5a5b4a8	nfd-master: get node object only once when updating node Prevent excess queries of node objects from the Kubernetes apiserver. This significantly speeds up node updates (and reduces the load on the apiserver) as the client-side throttling (which is good) does not bite us that hard.	2024-04-04 14:44:52 +03:00
Kubernetes Prow Robot	fcf819ad9f	Merge pull request #1643 from ozhuraki/topology-health nfd-topology-updater: Add liveness probe	2024-04-03 07:34:08 -07:00
Oleg Zhurakivskyy	f2e9557a2d	nfd-topology-updater: Add liveness probe Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>	2024-04-03 13:15:54 +03:00
cmontemuino	54b01a2576	docs: document trade-offs in memory configuration Problem: memory requests and limits has been set for `master` process in PR #1631. It does not follow best practices for setting those values, but the intention was provide default values for a wide variety of clusters, including small ones. Solution: provide solid documentation about the problems that might happen in production environments when `resource.memory.requests << resource.memory.limits`. Add a link to relevant external sources, which includes the advise from Tim Hockin: > Always set memory limit == request Signed-off-by: cmontemuino <1761056+cmontemuino@users.noreply.github.com>	2024-04-02 19:01:50 +02:00
Kubernetes Prow Robot	7938e81c33	Merge pull request #1631 from TessaIO/chore-add-resources-limits-and-requests chore/deployment: add resources requests and limits for helm and Kustomize	2024-04-02 02:03:59 -07:00
Markus Lehtonen	b02aa3eda8	Update readme to v0.15.4 release	2024-03-28 11:35:54 +02:00
Kubernetes Prow Robot	1696c6589e	Merge pull request #1641 from marquiz/devel/fix-master-crash nfd-master: do nfd API scheme registration in an init function	2024-03-27 11:54:14 -07:00
Markus Lehtonen	bce446c5b6	nfd-master: protect node updater pool queueing with a lock Prevents races when (re-)starting the queue. There are no reports on issues related to this (and I haven't come up with any actual failure path in the current code) but better to be safe and follow the best practices.	2024-03-27 16:53:34 +02:00
Markus Lehtonen	c4e010eafd	nfd-master: do nfd API scheme registration in an init function Prevents (rare) races on nfd-master reconfigurartion. Previously the scheme was registered at nfd API controller creation/startup time. This caused a race with some lister/informer goroutines of the previous (stoppped) controller still running and accessing (reading) the sceme while we were updating (writing) it.	2024-03-27 15:26:16 +02:00
TessaIO	74153e11b5	chore/deploy: make interval property in PodMonitor configurable Signed-off-by: TessaIO <ahmedgrati1999@gmail.com>	2024-03-26 08:36:52 +01:00
TessaIO	d02414cf61	chore/deployment: add resources requests and limits for helm and Kustomize Signed-off-by: TessaIO <ahmedgrati1999@gmail.com>	2024-03-22 14:27:44 +01:00
Kubernetes Prow Robot	137f18b5b3	Merge pull request #1635 from marquiz/devel/helm-fix helm: fix invalid name of host-swaps volume	2024-03-20 23:26:51 -07:00
Kubernetes Prow Robot	2c4a3e5718	Merge pull request #1634 from ozhuraki/nrt-owner-reference-fix topology-updater: Set APIVersion, Kind in the OwnerReference explicitly	2024-03-20 12:45:41 -07:00
Markus Lehtonen	9b3d273a18	helm: fix invalid name of host-swaps volume	2024-03-20 21:15:02 +02:00
Oleg Zhurakivskyy	7bd27c757a	topology-updater: Set APIVersion, Kind in the OwnerReference explicitly APIVersion and Kind are empty in the returned namespace object and need to be set explicitly. Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>	2024-03-20 20:09:06 +02:00
Kubernetes Prow Robot	0ad5e50f24	Merge pull request #1609 from ozhuraki/worker-health nfd-worker: Add liveness probe	2024-03-19 06:57:23 -07:00
Oleg Zhurakivskyy	8b63d17af7	nfd-worker: Add liveness probe Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>	2024-03-19 15:34:53 +02:00
Kubernetes Prow Robot	c4ff25de52	Merge pull request #1596 from marquiz/devel/master-infinite-retry nfd-master: retry node updates indefinitely	2024-03-19 04:00:50 -07:00
Kubernetes Prow Robot	7df0f17f68	Merge pull request #1602 from ozhuraki/nrt-owner-ref Add owner reference to NRT object	2024-03-19 01:12:59 -07:00
Kubernetes Prow Robot	869bb2044d	Merge pull request #1632 from marquiz/devel/fix-nodefeatureapi-feature-gate Remove references to -enable-nodefeature-api flag	2024-03-18 09:27:23 -07:00
Markus Lehtonen	e7f87de6df	nfd-master: retry node updates indefinitely Treat node updates like a reconciliation loop. Keep trying on node update as long as it fails. Node update permafailing likely indicates a bug in the nfd code (there should be no reason for it to fail forever) and it's better to clearly see it in the logs/metrics rather than giving up after a few retries.	2024-03-18 18:14:24 +02:00
Markus Lehtonen	6f891ce1d2	Remove references to -enable-nodefeature-api flag Fix documentation, code and e2e-tests.	2024-03-18 16:06:25 +02:00
Kubernetes Prow Robot	4790962123	Merge pull request #1595 from marquiz/devel/master-check-node-existence nfd-master: check if node exists before trying update	2024-03-18 04:19:57 -07:00
Kubernetes Prow Robot	797fada92e	Merge pull request #1585 from kannon92/add-swap-support add swap support in nfd	2024-03-18 04:19:48 -07:00
Kubernetes Prow Robot	35cc81969f	Merge pull request #1630 from TessaIO/replace-AhmedGrati-with-TessaIO replace AhmedGrati account with TessaIO as reviewer	2024-03-18 01:53:06 -07:00
TessaIO	7d1d3387be	replace AhmedGrati account with TessaIO as reviewer Signed-off-by: TessaIO <ahmedgrati1999@gmail.com>	2024-03-16 21:37:05 +01:00
Kubernetes Prow Robot	013254404e	Merge pull request #1623 from ArangoGutierrez/featuregate Add FeatureGate framework to handle new features	2024-03-15 11:34:17 -07:00
Carlos Eduardo Arango Gutierrez	06c4733bc5	Add FeatureGate framework to handle new features Code inspired on https://github.com/kubernetes/component-base/tree/master/featuregate Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2024-03-15 19:11:32 +01:00
Oleg Zhurakivskyy	c662265a47	topology-updater: Add owner reference to NRT object Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>	2024-03-15 16:36:27 +02:00
Kubernetes Prow Robot	fbc9a78568	Merge pull request #1628 from marquiz/devel/readme Update readme to v0.15.3 release	2024-03-15 05:58:39 -07:00
Markus Lehtonen	a0d47294f4	Update readme to v0.15.3 release	2024-03-15 11:11:52 +02:00
Kubernetes Prow Robot	52d4337004	Merge pull request #1615 from marquiz/devel/master-mem-leak nfd-master: fix memory leak in nfd api-controller	2024-03-14 08:21:33 -07:00
Kubernetes Prow Robot	e260c025b8	Merge pull request #1620 from ArangoGutierrez/tuleak Use close to signal stop channedl in worker and topology-updater	2024-03-14 07:49:36 -07:00
Carlos Eduardo Arango Gutierrez	69dbfdfbc0	Use close to signal stop channedl in worker and topology-updater Fix stop channel management on Worker and T-updater in case of multiple callers Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2024-03-14 15:28:39 +01:00

1 2 3 4 5 ...

2360 commits