1
0
Fork 0
mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2025-03-06 08:47:04 +00:00
Commit graph

27 commits

Author SHA1 Message Date
Markus Lehtonen
06b333db1e nfd-topology-updater: add metrics support
For now, add only one metric, a counter for the errors occurring while
scanning pod resources on the node.
2023-08-04 16:48:37 +03:00
Markus Lehtonen
0a8b514d67 docs: unify formatting of NOTEs 2023-08-03 15:36:56 +03:00
Pat Riehecky
0523257d1a Add optional labels to the podmonitor
Signed-off-by: Pat Riehecky <riehecky@fnal.gov>
2023-07-21 10:03:50 -05:00
Carlos Eduardo Arango Gutierrez
e3aedd33e2
Enable metrics via prometheus operator
Expose metrics via prometheus.monitoring.coreos.com/v1

The exposed metrics are

| Metric        | Type | Meaning |
| --------------- | ---------------- | ---------------- |
|  `nfd_master_build_info`           | Gauge | Version from which nfd-master was built. |
|  `nfd_worker_build_info`           | Gauge | Version from which nfd-worker was built. |
|  `nfd_updated_nodes`           |  Counter | Time taken to label a node |
|  `nfd_crd_processing_time`          |  Gauge | Time taken to process a NodeFeatureRule CRD |
| `nfd_feature_discovery_duration_seconds` |  HistogramVec | Time taken to discover features on a node |

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
2023-07-21 10:59:52 +02:00
AhmedGrati
b3cfe17392 feat: parallelize nodes update
This PR aims to optimize the process of updating nodes with
corresponding features. In fact, previously, we were updating nodes
sequentially even though they are independent from each other.
Therefore, we integrated new components: LabelersNodePool which is
responsible for spininng a goroutine whenever there's a request for
updating nodes, and a Workqueue which is responsible for holding nodes names
that should be updated.

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-06-02 11:41:50 +01:00
Kubernetes Prow Robot
cd45baef8d
Merge pull request #1211 from marquiz/devel/helm
deployment/helm: improve handling of topologyUpdater.kubeletStateFiles
2023-05-05 00:17:13 -07:00
Markus Lehtonen
526aab87cf deployment/helm: user dedicated serviceaccount for topology-updater
Change the configuration so that, by default, we use a dedicated
serviceaccount for topology-updater (similar to topology-gc, nfd-master
and nfd-worker).

Fix the templates so that the serviceaccount and clusterrolebinding are
only created when topology-updater is enabled (clusterrole was already
handled this way).

This patch also correctly documents the default value of rbac.create
parameter of topology-updater and topology-gc.
2023-05-05 08:30:21 +03:00
Markus Lehtonen
9c2f268fd2 deployment/helm: improve handling of topologyUpdater.kubeletStateFiles
Make it possible to disable kubelet state tracking with
--set topologyUpdater.kubeletStateFiles="" as the documentation
suggests.

Also, fix the documentation regarding the default value of
topologyUpdater.kubeletStateFiles parameter.
2023-05-04 15:01:19 +03:00
AhmedGrati
7917434d38 feat: add master resync period configurability
This PR adds a config option for setting the NFD API controller resync period.
The resync period is only activated when the NodeFeature API has been
enabled (with -enable-nodefeature-api).

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-04-24 11:52:38 +02:00
Kubernetes Prow Robot
193c552b33
Merge pull request #1084 from AhmedGrati/feat-add-master-config-file
feat: add master config file
2023-04-04 10:41:40 -07:00
AhmedGrati
3fff409f6d Add master config file
Similar to the nfd-worker, in this PR we want to support the
dynamic run-time configurability through a config file for the nfd-master.

We'll use a json or yaml configuration file along with the fsnotify in
order to watch for changes in the config file. As a result, we're
allowing dynamic control of logging params, allowed namespaces,
extended resources, label whitelisting, and denied namespaces.

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-04-03 09:52:09 +01:00
AhmedGrati
02b3b7c7e0 feat: add enableTaints to helm chart
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-03-21 10:49:24 +01:00
Talor Itzhak
727de56191 documentaion: document the reactive updates feature
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
2023-03-16 11:53:12 +02:00
Jose Luis Ojosnegros Manchón
b340d112a8 topology-updater:compute pod set fingerprint
Add an option to compute the fingerprint of the current pod set on each
node.

Report this new fingerprint using an attribute in NRT object.
2023-02-22 10:22:50 +01:00
AhmedGrati
07d5ffe4b8 helm: make master port configurable
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-02-01 10:03:06 +01:00
Hiren Panchasara
bfbc47f55e docs: fix internal cross-page references by injecting .md 2023-01-16 20:53:36 -08:00
PiotrProkop
3143faf0ab Add documentation for topology garbage collector
Signed-off-by: PiotrProkop <pprokop@nvidia.com>
2023-01-11 10:15:38 +01:00
Markus Lehtonen
27c47bd088 docs: better document differences between deployment methods 2022-12-20 16:29:48 +02:00
Markus Lehtonen
3209c14bea docs: document NodeFeature API
Document the usage of the NodeFeature CRD API. Also re-organize the
documentation a bit, moving the description of NodeFeatureRule
controller from customization guide to nfd-master usage page.
2022-12-14 22:33:12 +02:00
Markus Lehtonen
9f0806593d nfd-master: rename -featurerules-controller flag to -crd-controller
Deprecate the '-featurerules-controller' command line flag as the name
does not describe the functionality anymore: in practice it controls the
CRD controller handling both NodeFeature and NodeFeatureRule objects.
The patch introduces a duplicate, more generally named, flag
'-crd-controller'. A warning is printed in the log if
'-featurerules-controller' flag is encountered.
2022-12-14 10:23:45 +02:00
Markus Lehtonen
237494463b nfd-worker: support creating NodeFeatures object
Support the new NodeFeatures object of the NFD CRD api. Add two new
command line options to nfd-worker:

 -kubeconfig               specifies the kubeconfig to use for
                           connecting k8s api (defaults to empty which
                           implies in-cluster config)
 -enable-nodefeature-api   enable the NodeFeature CRD API for
                           communicating node features to nfd-master,
                           will also automatically disable gRPC
                           (defgault to false)

No config file option for selecting the API is available as there should
be no need for dynamically selecting between gRPC and CRD. The
nfd-master configuration must be changed in tandem and it is safer (and
avoid awkward configuration races) to configure the whole NFD deployment
at once.

Default behavior of nfd-worker is not changed i.e. NodeFeatures object
creation is not enabled by default (but must be enabled with the command
line flag).

The patch also updates the kustomize and Helm deployment, adding RBAC
rules for nfd-worker and updating the example worker configuration.
2022-12-14 07:31:28 +02:00
Markus Lehtonen
881ee13654 docs: remove non-existent nodeFeatureRule.createCRD parameter
This value was recently dropped.
2022-12-07 16:25:43 +02:00
Kubernetes Prow Robot
efc833d1c7
Merge pull request #970 from marquiz/docs/worker-helm-sa-params
docs: document helm chart params related to worker serviceaccount
2022-11-28 08:36:08 -08:00
Markus Lehtonen
d0a4cf7564 docs: document helm chart params related to worker serviceaccount 2022-11-28 18:07:17 +02:00
Markus Lehtonen
c1fa8b2f28 docs: revise topology-updater helm chart rbac parameters 2022-11-28 17:49:19 +02:00
Talor Itzhak
d495376f06 docs: topology-updater: update docs for exclude-list feature
Update the docs with explanations and examples
about the exclude-list feature.

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
2022-11-21 21:31:51 +02:00
Markus Lehtonen
6171c745a4 docs: restructure docs
Introduce two main sections "Deployment" and "Usage" and move "Developer
guide" to the top level, too. In particular, split the huge
deployment-and-usage file into multiple parts under the new main
sections. Move customization guide from "Advanced" to "Usage".
This patch also renames "Advanced" to "Reference" as only that is left
there is reference documentation.
2022-11-03 10:26:56 +02:00