1
0
Fork 0
mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2024-12-14 11:57:51 +00:00
Commit graph

227 commits

Author SHA1 Message Date
Carlos Eduardo Arango Gutierrez
7e7ab403cf
Bump release to v0.16.6
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-10-29 15:08:02 +01:00
Tobias Giese
a8f2ab1607 Template exposed health port in helm chart
Signed-off-by: Tobias Giese <tgiese@nvidia.com>
2024-10-14 08:35:01 +00:00
Tobias Giese
930a53e106 Add helm values to configure hostNetwork and additional env vars
We have to run our NFD workers in the host network.
Also we need additional env variables such as KUBERNETES_SERVICE_HOST and _PORT.
To achieve this we can simply add generic helm values. The default behavior is not changed.

Signed-off-by: Tobias Giese <tgiese@nvidia.com>
(cherry picked from commit af0592b87c)
2024-10-10 15:30:27 +03:00
Markus Lehtonen
175b16bbd1 Release v0.16.5 2024-10-09 15:11:54 +03:00
Tobias Giese
61760b2ab5 Add parameter to configure health endpoint port
Signed-off-by: Tobias Giese <tgiese@nvidia.com>
(cherry picked from commit 53ddf081da)
2024-10-09 12:07:33 +03:00
Markus Lehtonen
7111356d27 Release v0.16.4 2024-08-09 09:25:29 +03:00
Omer Aplatony
21b9b7a94d helm: add configurable liveness&readiness probes for master topology-updater and worker
Signed-off-by: Omer Aplatony <omerap12@gmail.com>
(cherry picked from commit b2222e2c8c)
2024-07-23 16:30:23 +03:00
Rouke Broersma
7f532809ac Helm: Add revision history limit for worker daemonset (#1797)
* Helm: Add revision history limit for worker daemonset

Signed-off-by: Rouke Broersma <mobrockers@gmail.com>

* Helm: Add revision history limit for topology updater daemonset

Signed-off-by: Rouke Broersma <mobrockers@gmail.com>

* chore: tidy table columns

---------

Signed-off-by: Rouke Broersma <mobrockers@gmail.com>
(cherry picked from commit 1230d607ac)
2024-07-18 16:53:26 +03:00
Kubernetes Prow Robot
6030c974d3
Merge pull request #1787 from marquiz/release-0.16
[release-0.16] Release v0.16.3
2024-07-16 01:40:54 -07:00
Omer Aplatony
59f5c64d43 Add revision history limit for master replica and for garbage collector
Signed-off-by: Omer Aplatony <omerap12@gmail.com>
(cherry picked from commit 920306cba8)
2024-07-16 08:42:34 +03:00
Markus Lehtonen
e8f96a0315 Release v0.16.3 2024-07-12 08:20:24 +03:00
Carlos Eduardo Arango Gutierrez
fbc8b368c3
Release v0.16.2
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-07-10 11:47:06 +02:00
Carlos Eduardo Arango Gutierrez
283caf2d64 Use worker DS OwnerReference for NF's
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-07-09 13:34:24 +00:00
Carlos Eduardo Arango Gutierrez
27f4940473
Release v0.16.1
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-07-02 16:15:48 +02:00
Tariq Ibrahim
d93806e43f ensure post-delete-job's service account matches ref in job spec
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
2024-06-27 19:12:23 +00:00
budimanjojo
c296eda0e4 helm: remove defaults CPU limits
Signed-off-by: budimanjojo <budimanjojo@gmail.com>
2024-06-07 08:13:47 +00:00
Markus Lehtonen
b98668ce22 Prepare v0.16
Generated with:

    ./hack/prepare-release.sh -g 1.22.3 v0.16.0
2024-05-27 15:16:59 +03:00
Kubernetes Prow Robot
4136a69545
Merge pull request #1715 from marquiz/devel/avx10-deprecate
source/cpu: disable AVX10 label
2024-05-24 04:53:59 -07:00
Markus Lehtonen
ece6076dd4 source/cpu: disable AVX10 label
Disable AVX10 as unnecessary as AVX10_LEVEL is better suited for
checking AVX10 compatibility. There is not yet any hardware with the
feature so disabling it shouldn't cause problems for users.
2024-05-24 13:50:46 +03:00
Markus Lehtonen
fa2f008d18 cpu: advertise AVX10 version
Add new cpuid label "feature.node.kubernetes.io/cpu-cpuid.AVX10_VERSION"
that advertises the supported version of AVX10 vector ISA.
Correspondingly, the patch adds AVX10_VERSION to the "cpu.cpuid" feature
for NodeFeatureRules to consume.

This makes cpu.cpuid on amd64 architecture a "multi-type" feature in
that it contains "flags" and potentially also "attributes" (the only
cpuid attribute so far is the AVX10_VERSION).
2024-05-24 13:48:20 +03:00
Markus Lehtonen
b3d6282d2c api/nfd: document all undocumented fields in the types 2024-05-23 23:49:49 +03:00
Carlos Eduardo Arango Gutierrez
47c054e1db
Add NodeFeatureGroup CRD
The NodeFeatureGroup is an NFD-specific custom resource that is designed for
grouping nodes based on their features. NFD-Master watches for NodeFeatureGroup
objects in the cluster and updates the status of the NodeFeatureGroup object
with the list of nodes that match the feature group rules. The NodeFeatureGroup
rules follow the same syntax as the NodeFeatureRule rules.

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-05-23 16:34:08 +02:00
Markus Lehtonen
560bd11d85 Re-add -enable-nodefeature-api cmdline flag
Bring back the -enable-nodefeature-api command line flag and the
corresponding enableNodeFeatureApi helm config value that were
removed without deprecation when the NodeFeatureAPI feature gate was
introduced. The thinking behind this change is to not break existing
users (without warning) unless totally unavoidable. Now the
-enable-nodefeature-api flag is marked as deprecated and slated for
removal in NFD v0.17.

The NodeFeatureAPI feature gate and the -enable-nodefeature-api flag
work together so that the NodeFeature API is disabled (gRPC is enabled,
instead) if either of them is set to false.

This patch selectively reverts parts of
06c4733bc5.
2024-05-16 10:53:49 +03:00
Kubernetes Prow Robot
391865bbb2
Merge pull request #1651 from cmontemuino/doc-resource-limits
docs: document trade-offs in memory configuration
2024-04-25 06:41:29 -07:00
Kubernetes Prow Robot
af8a41cc02
Merge pull request #1639 from TessaIO/chore-add-prometheus-pod-monitor-interval
chore/deploy: make interval property in PodMonitor configurable
2024-04-05 03:03:26 -07:00
Carlos M
cc53b604c5
chore: include suggestions from code review
Co-authored-by: Carlos Eduardo Arango Gutierrez <arangogutierrez@gmail.com>
2024-04-05 10:01:08 +02:00
Oleg Zhurakivskyy
f2e9557a2d nfd-topology-updater: Add liveness probe
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2024-04-03 13:15:54 +03:00
cmontemuino
54b01a2576
docs: document trade-offs in memory configuration
Problem: memory requests and limits has been set for `master` process in
PR #1631. It does not follow best practices for setting those values,
but the intention was provide default values for a wide variety of
clusters, including small ones.

Solution: provide solid documentation about the problems that might
happen in production environments when
`resource.memory.requests << resource.memory.limits`. Add a link to
relevant external sources, which includes the advise from Tim Hockin:
> Always set memory limit == request

Signed-off-by: cmontemuino <1761056+cmontemuino@users.noreply.github.com>
2024-04-02 19:01:50 +02:00
Kubernetes Prow Robot
7938e81c33
Merge pull request #1631 from TessaIO/chore-add-resources-limits-and-requests
chore/deployment: add resources requests and limits for helm and Kustomize
2024-04-02 02:03:59 -07:00
TessaIO
74153e11b5 chore/deploy: make interval property in PodMonitor configurable
Signed-off-by: TessaIO <ahmedgrati1999@gmail.com>
2024-03-26 08:36:52 +01:00
TessaIO
d02414cf61 chore/deployment: add resources requests and limits for helm and Kustomize
Signed-off-by: TessaIO <ahmedgrati1999@gmail.com>
2024-03-22 14:27:44 +01:00
Markus Lehtonen
9b3d273a18 helm: fix invalid name of host-swaps volume 2024-03-20 21:15:02 +02:00
Kubernetes Prow Robot
0ad5e50f24
Merge pull request #1609 from ozhuraki/worker-health
nfd-worker: Add liveness probe
2024-03-19 06:57:23 -07:00
Oleg Zhurakivskyy
8b63d17af7 nfd-worker: Add liveness probe
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2024-03-19 15:34:53 +02:00
Kubernetes Prow Robot
7df0f17f68
Merge pull request #1602 from ozhuraki/nrt-owner-ref
Add owner reference to NRT object
2024-03-19 01:12:59 -07:00
Kubernetes Prow Robot
797fada92e
Merge pull request #1585 from kannon92/add-swap-support
add swap support in nfd
2024-03-18 04:19:48 -07:00
Carlos Eduardo Arango Gutierrez
06c4733bc5
Add FeatureGate framework to handle new features
Code inspired on https://github.com/kubernetes/component-base/tree/master/featuregate

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-03-15 19:11:32 +01:00
Oleg Zhurakivskyy
c662265a47 topology-updater: Add owner reference to NRT object
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
2024-03-15 16:36:27 +02:00
Markus Lehtonen
a562a6188a Update auto-generated code 2024-03-11 12:18:32 +02:00
Allen Mun
8bd52594ab add ability to use a custom issuer 2024-02-27 12:14:43 -05:00
Kevin Hannon
187f65f94e Add swap support in nfd 2024-02-19 10:20:56 -05:00
Carlos Eduardo Arango Gutierrez
75f0a14f2a
helm: add priorityClassName option
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-02-15 16:29:33 +01:00
Kubernetes Prow Robot
5c99ae8343
Merge pull request #1560 from leemingeer/master
nfd-topology-updater add pods fingerprint by default
2024-01-29 00:34:44 -08:00
leemingeer
b6d8ce7a5a nfd-topology-updater add pods fingerprint by default 2024-01-26 17:55:34 +08:00
Markus Lehtonen
8adb4b38da deployment/helm: don't deploy topology-updater conf unnecessarily
Only deploy the topology-updater config if topology-updater itself (the
daemon) is deployed.
2024-01-25 16:15:58 +02:00
Kubernetes Prow Robot
4501bedd61
Merge pull request #1535 from marquiz/devel/grpc-probe
nfd-master: run a separate gRPC health server
2024-01-05 15:24:28 +01:00
Markus Lehtonen
a053efda64 nfd-master: run a separate gRPC health server
This patch separates the gRPC health server from the deprecated gRPC
server (disabled by default, replaced by the NodeFeature CRD API) used
for node labeling requests. The new health server runs on hardcoded TCP
port number 8082.

The main motivation for this change is to make the Kubernetes' built-in
gRPC liveness probes to function if TLS is enabled (as they don't
support TLS).

The health server itself is a naive implementation (as it was before),
basically only checking that nfd-master has started and hasn't crashed.
The patch adds a TODO note to improve the functionality.
2024-01-04 13:58:26 +02:00
Markus Lehtonen
09b5af74de deployment/kustomize: drop the sample cert-manager overlay
Drop the deprecated and broken sample overlay. This was an example for
enabling TLS with cert-manager. However, the overlay has been broken
(and useless) since NodeFeature API was enabled by default - and gRPC
disabled - in v0.14.
2024-01-03 21:13:15 +02:00
Markus Lehtonen
889fffd7d4 helm: add post-delete hook that cleans up the node
This patch adds a post-delete hook to the Helm chart that runs
"nfd-master --prune" in the cluster. This cleans up the node of labels,
annotations, taints and extended resources that were created by NFD.
2023-12-29 15:36:41 +02:00
Markus Lehtonen
9846dede43 deployment/kustomize: enable nfd-gc in the default overlay 2023-12-21 21:30:14 +02:00