Markus Lehtonen
fcb8d3cda4
nfd-master: implement opts for modifying NfdMaster instance
...
This provides a more controlled way for setting up the NfdMaster
instance for testing.
2024-04-05 20:21:19 +03:00
Kubernetes Prow Robot
199d665046
Merge pull request #1656 from marquiz/devel/channel-simplify
...
Tidy up usage of channels for signaling
2024-04-05 07:51:34 -07:00
Carlos Eduardo Arango Gutierrez
3434557d7c
Move NFD api to a separate go mod
...
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-04-05 16:35:47 +02:00
Kubernetes Prow Robot
86c88f18f0
Merge pull request #1650 from marquiz/devel/readme
...
Update readme to v0.15.4 release
2024-04-05 06:01:46 -07:00
Kubernetes Prow Robot
cb24f7c234
Merge pull request #1657 from marquiz/devel/master-label-whitelist
...
nfd-master: prevent crash on empty config struct
2024-04-05 05:36:52 -07:00
Markus Lehtonen
26a80cf142
Tidy up usage of channels for signaling
...
This started as a small effort to simplify the usage of "ready" channel
in nfd-master. It extended into a wider simplification/unification of
the channel usage.
2024-04-05 14:39:58 +03:00
Markus Lehtonen
b27676451a
nfd-master: prevent crash on empty config struct
...
Change the handling of LabelWhiteList config option to use a pointer to
detect when the option is unset. This doesn't fix any detected crash but
is merely general improvement and stabilization, serving easier testing.
Also, use the regexp type from the core libs for the config struct -
dropping the unmasrhalling code for our custom regexp type - as the core
regexp now implements unmarshaller itself.
2024-04-05 14:19:44 +03:00
Kubernetes Prow Robot
ad96c301a4
Merge pull request #1642 from marquiz/devel/master-updater-pool-lock
...
nfd-master: protect node updater pool queueing with a lock
2024-04-05 03:31:10 -07:00
Kubernetes Prow Robot
af8a41cc02
Merge pull request #1639 from TessaIO/chore-add-prometheus-pod-monitor-interval
...
chore/deploy: make interval property in PodMonitor configurable
2024-04-05 03:03:26 -07:00
Carlos M
cc53b604c5
chore: include suggestions from code review
...
Co-authored-by: Carlos Eduardo Arango Gutierrez <arangogutierrez@gmail.com>
2024-04-05 10:01:08 +02:00
Kubernetes Prow Robot
275e625c2a
Merge pull request #1652 from marquiz/devel/reuse-node
...
nfd-master: get node object only once when updating node
2024-04-04 05:02:45 -07:00
Markus Lehtonen
44a5a5b4a8
nfd-master: get node object only once when updating node
...
Prevent excess queries of node objects from the Kubernetes apiserver.
This significantly speeds up node updates (and reduces the load on the
apiserver) as the client-side throttling (which is good) does not bite
us that hard.
2024-04-04 14:44:52 +03:00
Kubernetes Prow Robot
fcf819ad9f
Merge pull request #1643 from ozhuraki/topology-health
...
nfd-topology-updater: Add liveness probe
2024-04-03 07:34:08 -07:00
Oleg Zhurakivskyy
f2e9557a2d
nfd-topology-updater: Add liveness probe
...
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2024-04-03 13:15:54 +03:00
cmontemuino
54b01a2576
docs: document trade-offs in memory configuration
...
Problem: memory requests and limits has been set for `master` process in
PR #1631 . It does not follow best practices for setting those values,
but the intention was provide default values for a wide variety of
clusters, including small ones.
Solution: provide solid documentation about the problems that might
happen in production environments when
`resource.memory.requests << resource.memory.limits`. Add a link to
relevant external sources, which includes the advise from Tim Hockin:
> Always set memory limit == request
Signed-off-by: cmontemuino <1761056+cmontemuino@users.noreply.github.com>
2024-04-02 19:01:50 +02:00
Kubernetes Prow Robot
7938e81c33
Merge pull request #1631 from TessaIO/chore-add-resources-limits-and-requests
...
chore/deployment: add resources requests and limits for helm and Kustomize
2024-04-02 02:03:59 -07:00
Markus Lehtonen
b02aa3eda8
Update readme to v0.15.4 release
2024-03-28 11:35:54 +02:00
Kubernetes Prow Robot
1696c6589e
Merge pull request #1641 from marquiz/devel/fix-master-crash
...
nfd-master: do nfd API scheme registration in an init function
2024-03-27 11:54:14 -07:00
Markus Lehtonen
bce446c5b6
nfd-master: protect node updater pool queueing with a lock
...
Prevents races when (re-)starting the queue. There are no reports on
issues related to this (and I haven't come up with any actual failure
path in the current code) but better to be safe and follow the best
practices.
2024-03-27 16:53:34 +02:00
Markus Lehtonen
c4e010eafd
nfd-master: do nfd API scheme registration in an init function
...
Prevents (rare) races on nfd-master reconfigurartion. Previously the
scheme was registered at nfd API controller creation/startup time. This
caused a race with some lister/informer goroutines of the previous
(stoppped) controller still running and accessing (reading) the sceme
while we were updating (writing) it.
2024-03-27 15:26:16 +02:00
TessaIO
74153e11b5
chore/deploy: make interval property in PodMonitor configurable
...
Signed-off-by: TessaIO <ahmedgrati1999@gmail.com>
2024-03-26 08:36:52 +01:00
TessaIO
d02414cf61
chore/deployment: add resources requests and limits for helm and Kustomize
...
Signed-off-by: TessaIO <ahmedgrati1999@gmail.com>
2024-03-22 14:27:44 +01:00
Kubernetes Prow Robot
137f18b5b3
Merge pull request #1635 from marquiz/devel/helm-fix
...
helm: fix invalid name of host-swaps volume
2024-03-20 23:26:51 -07:00
Kubernetes Prow Robot
2c4a3e5718
Merge pull request #1634 from ozhuraki/nrt-owner-reference-fix
...
topology-updater: Set APIVersion, Kind in the OwnerReference explicitly
2024-03-20 12:45:41 -07:00
Markus Lehtonen
9b3d273a18
helm: fix invalid name of host-swaps volume
2024-03-20 21:15:02 +02:00
Oleg Zhurakivskyy
7bd27c757a
topology-updater: Set APIVersion, Kind in the OwnerReference explicitly
...
APIVersion and Kind are empty in the returned namespace object
and need to be set explicitly.
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2024-03-20 20:09:06 +02:00
Kubernetes Prow Robot
0ad5e50f24
Merge pull request #1609 from ozhuraki/worker-health
...
nfd-worker: Add liveness probe
2024-03-19 06:57:23 -07:00
Oleg Zhurakivskyy
8b63d17af7
nfd-worker: Add liveness probe
...
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2024-03-19 15:34:53 +02:00
Kubernetes Prow Robot
c4ff25de52
Merge pull request #1596 from marquiz/devel/master-infinite-retry
...
nfd-master: retry node updates indefinitely
2024-03-19 04:00:50 -07:00
Kubernetes Prow Robot
7df0f17f68
Merge pull request #1602 from ozhuraki/nrt-owner-ref
...
Add owner reference to NRT object
2024-03-19 01:12:59 -07:00
Kubernetes Prow Robot
869bb2044d
Merge pull request #1632 from marquiz/devel/fix-nodefeatureapi-feature-gate
...
Remove references to -enable-nodefeature-api flag
2024-03-18 09:27:23 -07:00
Markus Lehtonen
e7f87de6df
nfd-master: retry node updates indefinitely
...
Treat node updates like a reconciliation loop. Keep trying on node
update as long as it fails. Node update permafailing likely indicates a
bug in the nfd code (there should be no reason for it to fail forever)
and it's better to clearly see it in the logs/metrics rather than giving
up after a few retries.
2024-03-18 18:14:24 +02:00
Markus Lehtonen
6f891ce1d2
Remove references to -enable-nodefeature-api flag
...
Fix documentation, code and e2e-tests.
2024-03-18 16:06:25 +02:00
Kubernetes Prow Robot
4790962123
Merge pull request #1595 from marquiz/devel/master-check-node-existence
...
nfd-master: check if node exists before trying update
2024-03-18 04:19:57 -07:00
Kubernetes Prow Robot
797fada92e
Merge pull request #1585 from kannon92/add-swap-support
...
add swap support in nfd
2024-03-18 04:19:48 -07:00
Kubernetes Prow Robot
35cc81969f
Merge pull request #1630 from TessaIO/replace-AhmedGrati-with-TessaIO
...
replace AhmedGrati account with TessaIO as reviewer
2024-03-18 01:53:06 -07:00
TessaIO
7d1d3387be
replace AhmedGrati account with TessaIO as reviewer
...
Signed-off-by: TessaIO <ahmedgrati1999@gmail.com>
2024-03-16 21:37:05 +01:00
Kubernetes Prow Robot
013254404e
Merge pull request #1623 from ArangoGutierrez/featuregate
...
Add FeatureGate framework to handle new features
2024-03-15 11:34:17 -07:00
Carlos Eduardo Arango Gutierrez
06c4733bc5
Add FeatureGate framework to handle new features
...
Code inspired on https://github.com/kubernetes/component-base/tree/master/featuregate
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-03-15 19:11:32 +01:00
Oleg Zhurakivskyy
c662265a47
topology-updater: Add owner reference to NRT object
...
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2024-03-15 16:36:27 +02:00
Kubernetes Prow Robot
fbc9a78568
Merge pull request #1628 from marquiz/devel/readme
...
Update readme to v0.15.3 release
2024-03-15 05:58:39 -07:00
Markus Lehtonen
a0d47294f4
Update readme to v0.15.3 release
2024-03-15 11:11:52 +02:00
Kubernetes Prow Robot
52d4337004
Merge pull request #1615 from marquiz/devel/master-mem-leak
...
nfd-master: fix memory leak in nfd api-controller
2024-03-14 08:21:33 -07:00
Kubernetes Prow Robot
e260c025b8
Merge pull request #1620 from ArangoGutierrez/tuleak
...
Use close to signal stop channedl in worker and topology-updater
2024-03-14 07:49:36 -07:00
Carlos Eduardo Arango Gutierrez
69dbfdfbc0
Use close to signal stop channedl in worker and topology-updater
...
Fix stop channel management on Worker and T-updater in case of multiple callers
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-03-14 15:28:39 +01:00
Kubernetes Prow Robot
e2e8878735
Merge pull request #1613 from kubernetes-sigs/dependabot/go_modules/google.golang.org/protobuf-1.33.0
...
build(deps): bump google.golang.org/protobuf from 1.32.0 to 1.33.0
2024-03-14 07:03:56 -07:00
Markus Lehtonen
70fd3757c4
nfd-master: fix memory leak in nfd api-controller
...
Fixes a memory leak that happened when stopping (and then re-starting)
the nfd api controller. The stop channel was not used properly which
caused the underlying informer to keep on running.
2024-03-14 15:39:10 +02:00
Markus Lehtonen
559d362ac3
go.mod: bump github.com/golang/protobuf to v1.5.4
2024-03-14 14:58:09 +02:00
dependabot[bot]
6a1910ecb2
build(deps): bump google.golang.org/protobuf from 1.32.0 to 1.33.0
...
Bumps google.golang.org/protobuf from 1.32.0 to 1.33.0.
---
updated-dependencies:
- dependency-name: google.golang.org/protobuf
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
2024-03-13 23:30:48 +00:00
Kubernetes Prow Robot
1ff7a9457b
Merge pull request #1612 from marquiz/devel/deperecate-crd-controller-flag
...
nfd-master: mark the -crd-controller flag as deprecated
2024-03-13 07:57:51 -07:00