node-feature-discovery

mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2024-12-14 11:57:51 +00:00

Author	SHA1	Message	Date
Markus Lehtonen	2e8da8849a	topology-gc: simplify listing of node objects Hopefully makes the code slightly more readable.	2023-08-21 09:13:41 +03:00
Markus Lehtonen	0b5e51bd35	topology-gc: refactor unit tests Remove a lot of boilerplate code by defining reusable functions. Also, test the Run() method instead of the functions callees of Run() as it is the top level functionality that was tested in practice (we don't have separate unit tests for the callee functions).	2023-08-21 09:10:24 +03:00
Kubernetes Prow Robot	4674bce27d	Merge pull request #1310 from marquiz/devel/refactor-gc-4 topology-gc: rename runGC to garbageCollect()	2023-08-18 11:26:34 -07:00
Kubernetes Prow Robot	f4cf4877f2	Merge pull request #1309 from marquiz/devel/refactor-gc-3 topology-gc: rename run()	2023-08-18 11:26:28 -07:00
Markus Lehtonen	ec51b29b3c	topology-gc: rename runGC to garbageCollect() One less function named run.	2023-08-18 17:57:05 +03:00
Markus Lehtonen	98b0b36b87	topology-gc: rename run() Too many run methods here.	2023-08-18 17:52:11 +03:00
Markus Lehtonen	108d603bdc	topology-gc: fix Stop The stop channel has multiple readers to we need to close it so that all of the readers get notified.	2023-08-18 17:46:54 +03:00
Kubernetes Prow Robot	9d61b19454	Merge pull request #1287 from freelizhun/fix-empty-hugepages fix empty hugepages in some numa nodes caused no such file or directory errors	2023-08-08 02:50:16 -07:00
lizhun	a4ad3d4411	fix empty hugepages in some numa nodes caused no such file or directory error Signed-off-by: lizhun <lizhun@kylinos.cn>	2023-08-08 15:14:44 +08:00
Markus Lehtonen	5ad2294c14	metrics: add nfd_node_update_requests_total counter Add a counter for total number of node update/sync requests. In practice, this counts the number of gRPC requests received if the gRPC API is in use. If the NodeFeature API is enabled, this counts the requests initiated by the NFD API controller, i.e. updates triggered by changes in NodeFeature or NodeFeatureRule objects plus updates initiated by the controller resync period.	2023-08-07 09:37:29 +03:00
Markus Lehtonen	4b24cc1afa	metrics: counters for rejected labels, extended resources and taints Add counters for labels, extended resources and taints rejected/filtered out by nfd-master.	2023-08-07 09:37:29 +03:00
Markus Lehtonen	a8a29e6df2	metrics: add nfd_nodefeaturerule_processing_errors_total counter Add a counter for errors encountered when processing NodeFeatureRules. Another simple counter without any additional prometheus labels - nfd-master logs can provide further details.	2023-08-07 09:37:29 +03:00
Markus Lehtonen	b90f2c318e	metrics: add nfd_node_update_failures_total counter Add a new counter for tracking node update failures from nfd-master. This tracks both normal feature updates and the --prune sub-command. This is a simple counter without any additional labels - nfd-master logs can be used for further diagnostics.	2023-08-07 09:37:27 +03:00
Markus Lehtonen	06b333db1e	nfd-topology-updater: add metrics support For now, add only one metric, a counter for the errors occurring while scanning pod resources on the node.	2023-08-04 16:48:37 +03:00
Markus Lehtonen	039378c725	nfd-master: use term node update instead of labeling Rename symbols and reword log messages to correlate with the functionality (we may do other updates than just modify labels nowadays).	2023-08-01 16:42:34 +03:00
Markus Lehtonen	d8f167d8a9	nfd-master: remove one stale empty line	2023-08-01 16:38:32 +03:00
Kubernetes Prow Robot	c1cb63243b	Merge pull request #1288 from marquiz/devel/metrics Improve metrics	2023-07-31 10:38:39 -07:00
Markus Lehtonen	5091fef84b	metrics: improve feature discovery duration metric Rename the "NodeName" prometheus label to "node", aligning with common prometheus/kubernetes conventions. Also reconfigure the prometheus histogram buckets (now 10ms to 1s) to better match the expected sample range.	2023-07-31 19:45:22 +03:00
Markus Lehtonen	47f621d970	metrics: improve the node updates gauge Rename the metric, better describe what we're measuring and better comply with prometheus naming conventions. Also change it to represent actual updates of the node object on the Kubernetes apiserver.	2023-07-31 19:45:22 +03:00
Markus Lehtonen	945e7fcb3f	metrics: improve nfr processing time metric Change the metric from a simple gauge (that basically was a single value for the whole cluster) into a HistogramVec, aligning with the feature discovery duration metric in nfd-worker. This improved metric now has prometheus labels for the NFR name and node name, i.e. it is tracking per-NFR metric for each node being processed. Also, change the naming to better comply with prometheus suggested conventions.	2023-07-31 19:45:22 +03:00
Kubernetes Prow Robot	01ca8cb91d	Merge pull request #1284 from marquiz/devel/generator-deps generate: bump tools to their latest versions	2023-07-31 06:32:39 -07:00
Kubernetes Prow Robot	e0f10a81de	Merge pull request #1256 from PiotrProkop/fix-topo-updater-policy-and-scope-advertisment Fix Topology Manager policy and scope not being updated after NRT creation	2023-07-28 00:25:54 -07:00
Markus Lehtonen	7e375ad1f0	generate: bump tools to their latest versions Bump tools versions and re-auto-generate files.	2023-07-27 14:29:48 +03:00
Kubernetes Prow Robot	77d869c4f7	Merge pull request #1242 from ArangoGutierrez/metrics Enable metrics via prometheus operator	2023-07-21 02:26:08 -07:00
Carlos Eduardo Arango Gutierrez	e3aedd33e2	Enable metrics via prometheus operator Expose metrics via prometheus.monitoring.coreos.com/v1 The exposed metrics are \| Metric \| Type \| Meaning \| \| --------------- \| ---------------- \| ---------------- \| \| `nfd_master_build_info` \| Gauge \| Version from which nfd-master was built. \| \| `nfd_worker_build_info` \| Gauge \| Version from which nfd-worker was built. \| \| `nfd_updated_nodes` \| Counter \| Time taken to label a node \| \| `nfd_crd_processing_time` \| Gauge \| Time taken to process a NodeFeatureRule CRD \| \| `nfd_feature_discovery_duration_seconds` \| HistogramVec \| Time taken to discover features on a node \| Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>	2023-07-21 10:59:52 +02:00
pprokop	6d98b6150b	Fix Topology Manager policy and scope not being updated properly NFD is only detecting policy and scope of Topology Manager when NRT object doesn't exist. This means that topologyManagerScope and topologyManagerPolicy attributes won't be updated even if kubelet config was changed to use other TopologyManager policy and scope. Signed-off-by: pprokop <pprokop@nvidia.com>	2023-07-20 16:31:12 +02:00
AhmedGrati	8e55d78d85	test: add node updater pool unit tests Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-07-19 12:03:35 +01:00
Markus Lehtonen	dac45be28c	nfd-master: check for nil references in nfdAPIUpdateAllNodes Just a safeguard.	2023-07-17 17:49:44 +03:00
hang.jiang	698031fc2d	Stop ticker in time to avoid memory leak Because it will cause memory leak if we do not stop ticker when the function has completed. Signed-off-by: hang.jiang <hang.jiang@daocloud.io>	2023-07-05 18:35:01 +08:00
guoguangwu	b946bcc0f5	nfd-master-internal_test.go rm pkg imported twice Signed-off-by: guoguangwu <guoguangwu@magic-shield.com>	2023-06-21 16:53:55 +08:00
Kubernetes Prow Robot	306969a945	Merge pull request #1133 from AhmedGrati/feat-parallelize-nodes-update feat: parallelize nodes update	2023-06-02 05:28:57 -07:00
AhmedGrati	b3cfe17392	feat: parallelize nodes update This PR aims to optimize the process of updating nodes with corresponding features. In fact, previously, we were updating nodes sequentially even though they are independent from each other. Therefore, we integrated new components: LabelersNodePool which is responsible for spininng a goroutine whenever there's a request for updating nodes, and a Workqueue which is responsible for holding nodes names that should be updated. Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-06-02 11:41:50 +01:00
AhmedGrati	08b9c3486e	feat: support dynamic values for labels in the NodeFeatureRule This PR aims to support the dynamic values for labels in the NodeFeatureRule CRD, it would offer more flexible labeling for users. To achieve this, we check whether label value starts with "@", and if it's the case, we will get the value of the feature value, and update the value of the label with the feature value. Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-05-31 23:30:26 +01:00
Markus Lehtonen	bf670de68d	pkg/utils: migrate KlogDump to structured logging Drop the KlogDump helper in favor of klog.InfoS. However, that patch introduces a new DelayedDumper() helper to avoid processing (marshalling) of object unless really evaluated by the logging function.	2023-05-31 14:43:08 +03:00
Markus Lehtonen	4947ebf336	pkg/util: migrate to structured logging We gRPC logging interface is not compatible with structured logging so grpcLogger is left intact.	2023-05-31 14:43:08 +03:00
Markus Lehtonen	64d5af016e	apis/nfd: migrate to structured logging	2023-05-31 14:43:08 +03:00
Markus Lehtonen	6e3b181ab4	topology-updater: migrate to structured logging	2023-05-31 14:43:08 +03:00
Markus Lehtonen	7be08f9e7f	nfd-worker: migrate to structured logging	2023-05-31 14:43:08 +03:00
Markus Lehtonen	8113d651c2	nfd-master: migrate to structured logging	2023-05-31 14:43:05 +03:00
Markus Lehtonen	2a3c7e4c93	nfd-master: add validation of label names and values Validate labels before trying to update the node. Makes us fail early nad prevent useless retries in case invalid labels are tried.	2023-05-29 16:54:14 +03:00
Markus Lehtonen	1809c24314	nfd-master: use close for stop channel Simpler and more reliable (in case of multiple consumers) to just close the channel.	2023-05-24 16:51:48 +03:00
PiotrProkop	272fd4784f	Add new flag enable-leader-election for nfd-master. It allows NFD-master to be run in active-passive way when running multiple instances of NFD-master to prevent multiple components from updating same custom resources. Signed-off-by: PiotrProkop <pprokop@nvidia.com>	2023-05-15 13:30:07 +02:00
Kubernetes Prow Robot	85073525c3	Merge pull request #1185 from AhmedGrati/fix-resync-period-functionality nfd-master: fix resync period config option	2023-05-02 11:14:16 -07:00
AhmedGrati	87c2d7e184	nfd-master: fix resync period config option This PR fixes the resync-period configuration option of the nfd-master. In fact, previously, changes were not reflected in the nfd-master at runtime. e2e tests are also implemented to make sure that the fix is already working as expected. Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-05-02 13:17:01 +02:00
Markus Lehtonen	fb20388028	nfd-master: refactor filtering of taints	2023-04-28 18:13:54 +03:00
Markus Lehtonen	43ced0c1a1	nfd-master: refactor filtering of feature labels More consistent error messages. Also preparation for dynamic labels values (that '@' notation currently supported for extended resources).	2023-04-28 18:13:54 +03:00
Markus Lehtonen	6ca687fbef	nfd-master: refactor filtering of extended resources Simplify code a bit and get more consistent error messages (in addition to fixing some of those).	2023-04-28 18:13:54 +03:00
Markus Lehtonen	131325fb2c	nfd-master: refactor api-controller object handling Split out resolving of node name (of the node to be updated) into a separate function. Makes it possible to add unit tests. Also. do unconditional type casting in the handler functions – that shouldn't fail unless there is a really serious internal inconsistency in the codebase so it should be ok to panic.	2023-04-28 17:33:33 +03:00
Kubernetes Prow Robot	d84248bc7d	Merge pull request #1190 from marquiz/devel/api-unit-tests apis/nfd: add unit tests for Feature type	2023-04-26 23:32:15 -07:00
Markus Lehtonen	77011a775f	nfd-master: log node name when processing NodeFeatureRules	2023-04-26 07:22:30 +03:00
Markus Lehtonen	dda7b195ee	apis/nfd: add unit tests for Feature type	2023-04-25 19:40:35 +03:00
Kubernetes Prow Robot	54bd4c5d74	Merge pull request #1167 from PiotrProkop/fix-reactive-updates nfd-topology-updater: fix wrong kubelet_internal_checkpoint path and compare basename to full path	2023-04-24 04:41:01 -07:00
pprokop	5a9a12151c	nfd-topology-updater: fix kubelet state file notifier - kubelet_internal_checkpoint file is in /var/lib/kubelet/device-plugins not /var/lib/kubelet fsWatcher doesn't watch dirs recursively - e.Name returned from fsWatcher events is a full path not a basename Signed-off-by: pprokop <pprokop@nvidia.com>	2023-04-24 13:21:56 +02:00
Kubernetes Prow Robot	2356223ffc	Merge pull request #1139 from AhmedGrati/feat-configure-master-resync feat: add master resync period configurability	2023-04-24 03:49:02 -07:00
AhmedGrati	7917434d38	feat: add master resync period configurability This PR adds a config option for setting the NFD API controller resync period. The resync period is only activated when the NodeFeature API has been enabled (with -enable-nodefeature-api). Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-04-24 11:52:38 +02:00
Kubernetes Prow Robot	64fe26ed37	Merge pull request #1169 from ArangoGutierrez/i1168 nfd-master: reject malformed extended resource dynamic capacity assignment	2023-04-24 00:17:15 -07:00
Carlos Eduardo Arango Gutierrez	f5df7b658c	nfd-master: reject malformed extended resource dynamic capacity assignment Reject malformed extended resource dynamic capacity assignment capacity should be in the form of domain.feature.element, add logic at func filterExtendedResources to check if true or ignore ExtendedResource, logging as an error. Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2023-04-22 08:43:50 +02:00
Kubernetes Prow Robot	d5bccda7c5	Merge pull request #1171 from ArangoGutierrez/foundon_typo pkg/nfd-master/nfd-master.go: Fix typo	2023-04-21 12:21:11 -07:00
Kubernetes Prow Robot	c2c1e18908	Merge pull request #1173 from marquiz/devel/fix-master nfd-master: fix a crash when processing NodeFeatureRules	2023-04-21 09:49:11 -07:00
Markus Lehtonen	9523f1e411	nfd-master: fix a crash when processing NodeFeatureRules Fix a a bug where nfd-master with NodeFeature API enabled would crash when NodeFeatureRule objects were processed in the case where no NodeFeature objects existed. This was caused by trying to insert values into a non-initialized NodeFeatureSpec in the code. This patch adds two safety measures to prevent that from happening in the future. First, add a constructor function for the NodeFeatureSpec type, and second, check for uninitialized object in the function inserting new functions. TODO: add unit tests for the API helper functions.	2023-04-21 19:24:08 +03:00
Carlos Eduardo Arango Gutierrez	ae22031547	pkg/nfd-master/nfd-master.go: Fix typo Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2023-04-21 16:17:11 +02:00
Markus Lehtonen	37306662fe	nfd-master: don't create emtpy annotations Make the nfd.node.kubernetes.io/feature-labels and nfd.node.kubernetes.io/extended-resources annotations behave similary to the taints annotation: only create the annotations if some labels or extended resources are created.	2023-04-21 16:14:17 +03:00
Markus Lehtonen	f0f6bbcf36	nfd-master: configure before prune Otherwise prune will crash because of uninitialized configuration.	2023-04-20 20:38:11 +03:00
Markus Lehtonen	32db081f3a	nfd-master: support noPublish with -prune Better this way than to crash which is what currently happens with this combination.	2023-04-19 15:58:06 +03:00
Markus Lehtonen	18f7bfa8e8	generate: update mockery to v2.25.1 Bump the vektra/mockery tool to the latest release.	2023-04-19 13:33:42 +03:00
Markus Lehtonen	117baac1a6	generate: update protoc to v22.3	2023-04-19 10:44:55 +03:00
Markus Lehtonen	ca7ed04a34	generate: update auto-generated code Re-run "make generate".	2023-04-19 09:49:17 +03:00
Markus Lehtonen	e2d5ba1a2b	pkg/podres: update mocked PodResourcesListerClient Update mocked implementation of k8s.io/kubelet/pkg/apis/podresources/v1.PodResourcesListerClient. The mocked implementation is moved to a separate "mocks" subpackage as it's for an external interface. This patch also adds code for auto-generation for the mocked interface.	2023-04-18 20:51:51 +03:00
Kubernetes Prow Robot	8d71ed6755	Merge pull request #1086 from AhmedGrati/feat-support-builtin-kernel-mods feat: support builtin kernel mods	2023-04-13 10:30:40 -07:00
Markus Lehtonen	6b2d10753f	nfd-master: re-try on node update failures Change the NFD API handler to re-try on node update failures. Will work around transient failures, making sure that failed nodes (i.e. nodes that we failed to update) don't need to wait for the 1 hour resync period before being tried again.	2023-04-13 16:30:31 +03:00
AhmedGrati	109caa1f28	feat: support builtin kernel mods This PR adds the combination of dynamic and builtin kernel modules into one feature called `kernel.enabledmodule`. It's a superset of the `kernel.loadedmodule` feature. Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-04-13 10:19:24 +01:00
Markus Lehtonen	70ac19ea66	nfd-master: increase controller resync period to 1 hour Increase the NFD API controller resync period from 5 minutes to 1 hour. The resync causes nfd-master to replay all NodeFeature and NodeFeatureRule objects, being effectively a "big hammer reset all" button. This should only be needed as an "insurance" to fix labels et al in case they have been manually tampered (outside NFD) and against certain bugs in nfd itself. NFD is not supposed to manage anything fast-changing so 1 hour should be enough. This change only affects behavior when the NodeFeature API has been enabled (with -enable-nodefeature-api).	2023-04-12 16:38:47 +03:00
Kubernetes Prow Robot	ad07829d0a	Merge pull request #1099 from ArangoGutierrez/extended_resources_v2 Create extended resources with NodeFeatureRule	2023-04-07 08:09:15 -07:00
Fabiano Fidêncio	250aea4741	Create extended resources with NodeFeatureRule Add support for management of Extended Resources via the NodeFeatureRule CRD API. There are usage scenarios where users want to advertise features as extended resources instead of labels (or annotations). This patch enables the discovery of extended resources, via annotation and patch of node.status.capacity and node.status.allocatable. By using the NodeFeatureRule API. Co-authored-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com> Co-authored-by: Fabiano Fidêncio <fabiano.fidencio@intel.com> Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com> Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2023-04-07 16:14:56 +02:00
Markus Lehtonen	f64c23968a	nfd-master: fix node update Update node status before node metadata. This fixes a problem where we lose track of NFD-managed extended resources in case patching node status fails. Previously we removed all labels and annotations (including the one listing our ERs) and only after that updated node status. If node status update failed we had lost the annotation but extended resources were still there, leaving them orphaned.	2023-04-06 22:04:35 +03:00
Markus Lehtonen	cc6c20ff5f	nfd-master: disallow unprefixed and kubernetes taints Disallow taints having a key with "kubernetes.io/" or "*.kubernetes.io/" prefix. This is a precaution to protect the user from messing up with the "official" well-known taints from Kubernetes itself. The only exception is that the "nfd.node.kubernetes.io/" prefix is allowed. However, there is one allowed NFD-specific namespace (and its sub-namespaces) i.e. "feature.node.kubernetes.io" under the kubernetes.io domain that can be used for NFD-managed taints. Also disallow unprefixed taint keys. We don't add a default prefix to unprefixed taints (like we do for labels) from NodeFeatureRules. This is to prevent unpleasant surprises to users that need to manage matching tolerations for their workloads.	2023-04-06 16:12:37 +03:00
Kubernetes Prow Robot	193c552b33	Merge pull request #1084 from AhmedGrati/feat-add-master-config-file feat: add master config file	2023-04-04 10:41:40 -07:00
AhmedGrati	3fff409f6d	Add master config file Similar to the nfd-worker, in this PR we want to support the dynamic run-time configurability through a config file for the nfd-master. We'll use a json or yaml configuration file along with the fsnotify in order to watch for changes in the config file. As a result, we're allowing dynamic control of logging params, allowed namespaces, extended resources, label whitelisting, and denied namespaces. Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-04-03 09:52:09 +01:00
AhmedGrati	d0a6289c0f	chore: add debug dump of nfd worker configuration Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-03-18 00:49:07 +01:00
Kubernetes Prow Robot	13f92faa77	Merge pull request #1031 from k8stopologyawareschedwg/reactive_updates topology-updater: reactive updates	2023-03-17 10:13:17 -07:00
Talor Itzhak	5c6be580f4	reactive updates: add an option to disable the feature Access to the kubelet state directory may raise concerns in some setups, added an option to disable it. The feature is enabled by default. Signed-off-by: Talor Itzhak <titzhak@redhat.com>	2023-03-16 11:53:16 +02:00
Kubernetes Prow Robot	a06e44ef0b	Merge pull request #1083 from fmuyassarov/mockery codegen: fix code-generation	2023-03-15 06:46:16 -07:00
Markus Lehtonen	4a8fc811be	pkg/utils: add UnmarshalJSON method to StringSetVal Make it possible to specify values in yaml as an array like conf: - foo - bar Instead of unwieldy map like conf: foo: bar:	2023-03-14 10:53:24 +02:00
Talor Itzhak	8924213d14	topology-updater: make it possible to disable sleep-interval Especially convenient for testing porpuses and completely harmless Signed-off-by: Talor Itzhak <titzhak@redhat.com>	2023-03-12 12:43:17 +02:00
Talor Itzhak	1c12876815	topology-updater: log event type that triggered update Specify the event type as part of the log message. In order to reduce the log volume, make it V4 Signed-off-by: Talor Itzhak <titzhak@redhat.com>	2023-03-12 12:37:24 +02:00
Talor Itzhak	7b248ecae2	topology-updater: update CRs when notified When a message received via the channel, the main loop updates the `NodeResourceTopology` objects. The notifier will send a message via the channel if: 1. It reached the sleep timeout. 2. It detected a change in Kubelet state files Signed-off-by: Talor Itzhak <titzhak@redhat.com>	2023-03-12 12:37:24 +02:00
Talor Itzhak	175e0c81aa	topology-updater: add kubelet-state-dir flag On different Kubernetes flavors like OpenShift for exmaple, the Kubelet state directory path is different. make it configurable for maximum flexability. Signed-off-by: Talor Itzhak <titzhak@redhat.com>	2023-03-12 12:37:24 +02:00
Talor Itzhak	0f65b87329	kubeletnotifier: introduce kubeletnotifier package Enabling reactive update for nfd-topology-updater by detecting changes in Kubelet state/checkpoint files, and signaling to the main loop to update the NodeResourceTopology objects. This has high value when scaling is an issue. Having multiple pods deployed in between single update instance might reflect incorrect resource accounting in the NRT CRs. Example: Time Interval = 5s t0 - New update sent to NRT CRs t1 - Schedule guaranteed podA t2 - Schedule guaranteed podB time elapsed between t0-t2 < 5 seconds, IOW the update on t0 is the recent update. In t2 the resource accounting reflected by NRT is not aligned with the actual accounting because NRT CRs doesn't reflect the change happened in t1. With this reactive update feature we expect an update to be trigger between t1 and t2 so the NRT objects will reflect more accurate picture. There still might be a scenario when the updates aren't fast enough, but this is an additional future planned optimization. The notifier has two event types: 1. Time based - keeping the old behavior, trigger an update per interval. 2. FS event - trigger an update when Kubelet state/checkpoint files modified. Signed-off-by: Talor Itzhak <titzhak@redhat.com>	2023-03-12 12:37:24 +02:00
Muyassarov, Feruzjon	e3a856b405	update re-generated code with make-generate results Update generated code based on the updated from re-running make generate. Signed-off-by: Muyassarov, Feruzjon <feruzjon.muyassarov@intel.com>	2023-03-11 22:15:11 +02:00
Jose Luis Ojosnegros Manchón	b340d112a8	topology-updater:compute pod set fingerprint Add an option to compute the fingerprint of the current pod set on each node. Report this new fingerprint using an attribute in NRT object.	2023-02-22 10:22:50 +01:00
Jose Luis Ojosnegros Manchón	1a687cb286	topology-updater: Refactor Scan to expand response We are gonna add new data to Scan response so better introduce a new ScanResponse struct as Scan return value to make it easier.	2023-02-22 09:56:28 +01:00
Kubernetes Prow Robot	a92614c292	Merge pull request #1051 from AhmedGrati/feat-add-deny-label-ns-with-wildcard feat: add deny-label-ns flag which supports wildcard	2023-02-15 03:42:25 -08:00
Kubernetes Prow Robot	38cc370e69	Merge pull request #1054 from PiotrProkop/use-new-nrt-api Advertise TopologyManger policy and scope as Attributes in NRT api v1alpha2	2023-02-15 01:12:25 -08:00
AhmedGrati	b499799364	feat: add deny-label-ns flag which supports wildcard Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-02-15 09:47:00 +01:00
PiotrProkop	f76fc5bf6b	Read Kubelet configuration the same way as Kubelet to apply default values Signed-off-by: PiotrProkop <pprokop@nvidia.com>	2023-02-15 09:27:25 +01:00
Ville Pihlava	b1c6b229fe	Add discovery duration logging.	2023-02-13 12:55:57 +02:00
pprokop	5484babcb1	Advertise TopologyManger policy and scope as Attributes Signed-off-by: pprokop <pprokop@nvidia.com>	2023-02-10 12:03:11 +01:00
Kubernetes Prow Robot	ac271b3c29	Merge pull request #1050 from VillePihlava/interval-fix Change nfd-worker to use Ticker instead of After.	2023-02-09 07:54:22 -08:00
Ville Pihlava	2101cb20e4	Change nfd-worker to use Ticker instead of After.	2023-02-09 17:14:39 +02:00
Jose Luis Ojosnegros Manchón	2967f3307a	nrt-api: move from v1alpha1 to v1alpha2	2023-02-09 12:29:54 +01:00
Carlos Eduardo Arango Gutierrez	9b3171bce2	nfd-master: always start gRPC server Don't register gRPC LabelServer when using the NodeFeature option, only turn the gRPC server on for Health and Readiness probes.	2023-01-16 19:33:15 +01:00
Kubernetes Prow Robot	ea921a8b14	Merge pull request #1024 from PiotrProkop/nrt-garbage-collector Add NRT garbage collector	2023-01-11 01:59:44 -08:00
PiotrProkop	59afae50ba	Add NodeResourceTopology garbage collector NodeResourceTopology(aka NRT) custom resource is used to enable NUMA aware Scheduling in Kubernetes. As of now node-feature-discovery daemons are used to advertise those resources but there is no service responsible for removing obsolete objects(without corresponding Kubernetes node). This patch adds new daemon called nfd-topology-gc which removes old NRTs. Signed-off-by: PiotrProkop <pprokop@nvidia.com>	2023-01-11 10:15:21 +01:00
PiotrProkop	1bae2867e2	Release `v0.0.13` of NodeResourceTopology API added missing TopologyManagerPolicy. Expose new policies: * RestrictedContainerLevel * RestrictedPodLevel * BestEffortContainerLevel * BestEffortPodLevel Signed-off-by: PiotrProkop <pprokop@nvidia.com>	2023-01-09 16:02:12 +01:00
Kubernetes Prow Robot	8eb6640754	Merge pull request #1020 from marquiz/devel/worker-refactor worker: move code	2022-12-27 00:45:34 -08:00
Kubernetes Prow Robot	e97b2c1579	Merge pull request #1017 from marquiz/devel/nfd-api-optional-fields apis/nfd: make all fields in NodeFeatureSpec optional	2022-12-27 00:45:28 -08:00
Markus Lehtonen	1026d91d12	worker: move code Simplify code bu dropping the unnecessary base client package.	2022-12-23 11:38:21 +02:00
Markus Lehtonen	0283f68702	topology-updater: move code Move and rename the Go package. It has nothing to do with NFD gRPC client anymore so move it out of the nfd-client package.	2022-12-23 11:37:46 +02:00
Markus Lehtonen	aa97105854	Add common utility function for getting node name	2022-12-23 09:50:15 +02:00
Markus Lehtonen	dfda9bccad	apis/nfd: update auto-generated code	2022-12-22 17:58:20 +02:00
Markus Lehtonen	a4fc15a424	apis/nfd: make all fields in NodeFeatureSpec optional Don't require features to be specified. The creator possibly only wants to create labels or only some types of features. No need to specify empty structs for the unused fields.	2022-12-22 17:53:42 +02:00
Markus Lehtonen	f5ae3fe2c7	Simplify usage of ObjectMeta fields No need to explicitly spell out ObjectMeta as it's embedded in the object types.	2022-12-19 17:40:10 +02:00
Kubernetes Prow Robot	28a5daa338	Merge pull request #999 from marquiz/fixes/nodefeature-missing nfd-master: update node if no NodeFeature objects are present	2022-12-19 00:39:44 -08:00
Markus Lehtonen	4c955ad72c	nfd-master: update node if no NodeFeature objects are present Correctly handle the case where no NodeFeature objects exist for certain node (and NodeFeature API has been enabled with -enable-nodefeature-api). In this case all the labels should be removed.	2022-12-19 10:22:04 +02:00
Markus Lehtonen	b9c09e6674	nfd-master: update all nodes at startup when NodeFeature API enabled We want to always update all nodes at startup. Without this patch we don't get any update event from the controller if no NodeFeature or NodeFeatureRule objects exist in the cluster. Thus all nodes would stay untouched whereas we really want to remove all labels from all nodes in this case.	2022-12-14 21:49:50 +02:00
Kubernetes Prow Robot	d1b314842c	Merge pull request #989 from marquiz/devel/nodefeature-multi-object nfd-master: handle multiple NodeFeature objects	2022-12-14 07:51:34 -08:00
Markus Lehtonen	740e3af681	nfd-master: implement ratelimiter for nfd api updates Implement a naive ratelimiter for node update events originating from the nfd API. We might get a ton of events in short interval. The simplest example is startup when we get a separate Add event for every NodeFeature and NodeFeatureRule object. Without rate limiting we run "update all nodes" separately for each NodeFeatureRule object, plus, we would run "update node X" separately for each NodeFeature object targeting node X. This is a huge amount of wasted work because in principle just running "update all nodes" once should be enough.	2022-12-14 15:45:43 +02:00
Markus Lehtonen	79ed747be8	nfd-master: handle multiple NodeFeature objects Implement handling of multiple NodeFeature objects by merging all objects (targeting a certain node) into one before processing the data. This patch implements MergeInto() methods for all required data types. With support for multiple NodeFeature objects per node, The "nfd api workflow" can be easily demonstrated and tested from the command line. Creating the folloiwing object (assuming node-n exists in the cluster): apiVersion: nfd.k8s-sigs.io/v1alpha1 kind: NodeFeature metadata: labels: nfd.node.kubernetes.io/node-name: node-n name: my-features-for-node-n spec: # Features for NodeFeatureRule matching features: flags: vendor.domain-a: elements: feature-x: {} attributes: vendor.domain-b: elements: feature-y: "foo" feature-z: "123" instances: vendor.domain-c: elements: - attributes: name: "elem-1" vendor: "acme" - attributes: name: "elem-2" vendor: "acme" # Labels to be created labels: vendor-feature.enabled: "true" vendor-setting.value: "100" will create two feature labes: feature.node.kubernetes.io/vendor-feature.enabled: "true" feature.node.kubernetes.io/vendor-setting.value: "100" In addition it will advertise hidden/raw features that can be used for custom rules in NodeFeatureRule objects. Now, creating a NodeFeatureRule object: apiVersion: nfd.k8s-sigs.io/v1alpha1 kind: NodeFeatureRule metadata: name: my-rule spec: rules: - name: "my feature rule" labels: "my-feature": "true" matchFeatures: - feature: vendor.domain-a matchExpressions: feature-x: {op: Exists} - feature: vendor.domain-c matchExpressions: vendor: {op: In, value: ["acme"]} will match the features in the NodeFeature object above and cause one more label to be created: feature.node.kubernetes.io/my-feature: "true"	2022-12-14 15:44:52 +02:00
Markus Lehtonen	9f0806593d	nfd-master: rename -featurerules-controller flag to -crd-controller Deprecate the '-featurerules-controller' command line flag as the name does not describe the functionality anymore: in practice it controls the CRD controller handling both NodeFeature and NodeFeatureRule objects. The patch introduces a duplicate, more generally named, flag '-crd-controller'. A warning is printed in the log if '-featurerules-controller' flag is encountered.	2022-12-14 10:23:45 +02:00
Markus Lehtonen	6ddd87e465	nfd-master: support NodeFeature objects Add initial support for handling NodeFeature objects. With this patch nfd-master watches NodeFeature objects in all namespaces and reacts to changes in any of these. The node which a certain NodeFeature object affects is determined by the "nfd.node.kubernetes.io/node-name" annotation of the object. When a NodeFeature object targeting certain node is changed, nfd-master needs to process all other objects targeting the same node, too, because there may be dependencies between them. Add a new command line flag for selecting between gRPC and NodeFeature CRD API as the source of feature requests. Enabling NodeFeature API disables the gRPC interface. -enable-nodefeature-api enable NodeFeature CRD API for incoming feature requests, will disable the gRPC interface (defaults to false) It is not possible to serve gRPC and watch NodeFeature objects at the same time. This is deliberate to avoid labeling races e.g. by nfd-worker sending gRPC requests but NodeFeature objects in the cluster "overriding" those changes (labels from the gRPC requests will get overridden when NodeFeature objects are processed).	2022-12-14 07:31:28 +02:00
Markus Lehtonen	237494463b	nfd-worker: support creating NodeFeatures object Support the new NodeFeatures object of the NFD CRD api. Add two new command line options to nfd-worker: -kubeconfig specifies the kubeconfig to use for connecting k8s api (defaults to empty which implies in-cluster config) -enable-nodefeature-api enable the NodeFeature CRD API for communicating node features to nfd-master, will also automatically disable gRPC (defgault to false) No config file option for selecting the API is available as there should be no need for dynamically selecting between gRPC and CRD. The nfd-master configuration must be changed in tandem and it is safer (and avoid awkward configuration races) to configure the whole NFD deployment at once. Default behavior of nfd-worker is not changed i.e. NodeFeatures object creation is not enabled by default (but must be enabled with the command line flag). The patch also updates the kustomize and Helm deployment, adding RBAC rules for nfd-worker and updating the example worker configuration.	2022-12-14 07:31:28 +02:00
Markus Lehtonen	d1c91e129a	apis/nfd: update auto-generated code	2022-12-14 07:31:28 +02:00
Markus Lehtonen	59ebff46c9	apis/nfd: add CRD for communicating node features Add a new NodeFeature CRD to the nfd Kubernetes API to communicate node features over K8s api objects instead of gRPC. The new resource is namespaced which will help the management of multiple NodeFeature objects per node. This aims at enabling 3rd party detectors for custom features. In addition to communicating raw features the NodeFeature object also has a field for directly requesting labels that should be applied on the node object. Rename the crd deployment file to nfd-api-crds.yaml so that it matches the new content of the file. Also, rename the Helm subdir for CRDs to match the expected chart directory structure.	2022-12-14 07:31:28 +02:00
Markus Lehtonen	079655b42c	nfd-master: add error checking for CRD controller creation	2022-12-14 00:27:27 +02:00
Feruzjon Muyassarov	b296bdf0b3	update test functions according to upstream deprecated/removed methods Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>	2022-12-13 12:12:50 +02:00
Kubernetes Prow Robot	733fb5deaa	Merge pull request #984 from marquiz/devel/worker-namespace nfd-worker: detect the namespace it is running in	2022-12-09 07:10:11 -08:00
Markus Lehtonen	f13ed2d91c	nfd-topology-updater: update NodeResourceTopology objects directly Drop the gRPC communication to nfd-master and connect to the Kubernetes API server directly when updating NodeResourceTopology objects. Topology-updater already has connection to the API server for listing Pods so this is not that dramatic change. It also simplifies the code a lot as there is no need for the NFD gRPC client and no need for managing TLS certs/keys. This change aligns nfd-topology-updater with the future direction of nfd-worker where the gRPC API is being dropped and replaced by a CRD-based API. This patch also update deployment files and documentation to reflect this change.	2022-12-08 11:03:22 +02:00
Markus Lehtonen	87b92f88ca	nfd-worker: detect the namespace it is running in Implement detection of kubernetes namespace by reading file /var/run/secrets/kubernetes.io/serviceaccount/namespace Aa a fallback (if the file is not accessible) we take namespace from KUBERNETES_NAMESPACE environment variable. This is useful for e.g. testing and development where you might run nfd-worker directly from the command line on a host system.	2022-12-08 10:34:52 +02:00
Feruzjon Muyassarov	2bdf427b89	nfd-master logic update for setting node taints This commits extends NFD master code to support adding node taints from NodeFeatureRule CR. We also introduce a new annotation for taints which helps to identify if the taint set on node is owned by NFD or not. When user deletes the taint entry from NodeFeatureRule CR, NFD will remove the taint from the node. But to avoid accidental deletion of taints not owned by the NFD, it needs to know the owner. Keeping track of NFD set taints in the annotation can be used during the filtering of the owner. Also enable-taints flag is added to allow users opt in/out for node tainting feature. The flag takes precedence over taints defined in NodeFeatureRule CR. In other words, if enbale-taints is set to false(disabled) and user still defines taints on the CR, NFD will ignore those taints and skip them from setting on the node. Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>	2022-12-02 17:25:00 +02:00
Feruzjon Muyassarov	532e1193ce	Add taints field to NodeFeatureRule CR spec Extend NodeFeatureRule Spec with taints field to allow users to specify the list of the taints they want to be set on the node if rule matches. Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>	2022-12-02 17:25:00 +02:00
Markus Lehtonen	eb8e29c80a	nfd-worker: drop deprecated command line flags Drop the following flags that were deprecated already in v0.8.0: -sleep-interval (replaced by core.sleepInterval config file option) -label-whitelist (replaced by core.labelWhiteList config file option) -sources (replaced by -label-sources flag)	2022-11-23 22:33:51 +02:00
Talor Itzhak	5b0788ced4	topology-updater: introduce exclude-list The exclude-list allows to filter specific resource accounting from NRT's objects per node basis. The CRs created by the topology-updater are used by the scheduler-plugin as a source of truth for making scheduling decisions. As such, this feature allows to hide specific information from the scheduler, which in turn will affect the scheduling decision. A common use case is when user would like to perform scheduling decisions which are based on a specific resource. In that case, we can exclude all the other resources which we don't want the scheduler to exemine. The exclude-list is provided to the topology-updater via a ConfigMap. Resource type's names specified in the list should match the names as shown here: https://pkg.go.dev/k8s.io/api/core/v1#ResourceName This is a resurrection of an old work started here: https://github.com/kubernetes-sigs/node-feature-discovery/pull/545 Signed-off-by: Talor Itzhak <titzhak@redhat.com>	2022-11-21 14:08:25 +02:00
Garrybest	3ec1b94020	get kubelet config from configz Signed-off-by: Garrybest <garrybest@foxmail.com>	2022-11-08 23:52:35 +08:00
Feruzjon Muyassarov	7ea0e0b0a7	Add argument to updateNodeFeatures method to pass client from caller This commit adds an argument to updateNodeFeatures method for receiving client argument, which currently gets initialized within the method itself. This is a minor improvement for https://github.com/kubernetes-sigs/node-feature-discovery/pull/910. Ref:https://github.com/kubernetes-sigs/node-feature-discovery/pull/910#discussion_r1012703631 Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>	2022-11-06 22:37:11 +02:00
Markus Lehtonen	7c24b50f74	apis/nfd: fix NodeFeatureRule templating Fix handling of templates that got broken in `b907d07d7e` when "flattening" the internal data structure of features. That happened because the golang text/template format uses dots to reference fields of a struct / elements of a map (i.e. 'foo.bar' means that 'bar' must be a sub-element of foo). Thus, using dots in our feature names (e.g. 'cpu.cpuid') means that that hierarchy must be reflected in the data structure that is fed to the templating engine. Thus, for templates we're now stuck stuck with two level hierarchy. It doesn't really matter for now as all our features follow that naming patter. We might be able to overcome this limitation e.g. by using reflect but that's left as a future exercise.	2022-10-25 23:37:27 +03:00
Kubernetes Prow Robot	a65ee959b9	Merge pull request #925 from marquiz/devel/feature-api-flatten apis/nfd: flatten the structure of features data type	2022-10-24 01:14:26 -07:00
Francesco Romani	700d9e215c	topology-updater: continue looping on scan error Scanning podresources can temporarily fail; the previous code was mistakenly not rearming the loop condition when this occurred, effectively stopping the monitoring. Rather, we should always pool and bail out on unrecoverable error or when asked to stop. Signed-off-by: Francesco Romani <fromani@redhat.com>	2022-10-20 10:08:13 +02:00
Markus Lehtonen	9ea787bc99	apis/nfd: update auto-generated code Re-generate after the latest API change. Involves renaming the crd spec files.	2022-10-18 18:41:53 +03:00
Markus Lehtonen	b907d07d7e	apis/nfd: flatten the structure of features data type Flatten the data structure that stores features, dropping the "domain" level from the data model. That extra level of hierarchy brought little benefit but just caused some extra complexity, instead. The new structure nicely matches what we have in the NodeFeatureRule object (the matchFeatures field of uses the same flat structure with the "feature" field having a value <domain>.<feature>, e.g. "kernel.version"). This is pre-work for introducing a new "node feature" CRD that contains the raw feature data. It makes the life of both users and developers easier when both CRDs, plus our internal code, handle feature data in a similar flat structure.	2022-10-18 18:37:28 +03:00
Markus Lehtonen	c3caf687c8	apis/nfd: update autogenerated code Update and migrate auto-generated code after removing pkg/api/feature.	2022-10-15 07:42:20 +03:00
Markus Lehtonen	0e1d4a9046	apis/nfd: migrate pkg/api/feature Move the previously-protobuf-only internal "feature api" over to the public "nfd api" package. This is in preparation for introducing a new CRD API for communicating features. This patch carries no functional changes. Just moving code around.	2022-10-15 07:42:20 +03:00
Feruzjon Muyassarov	71434a1392	Standardize "k8s.io/api/core/v1" package short name Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>	2022-10-15 02:22:41 +03:00
Feruzjon Muyassarov	e79f09deb2	Error strings should not be capitalized Error strings should not be capitalized (ST1005) & remove the redundancy from array, slice or map composite literals. Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>	2022-10-14 15:43:18 +03:00
Kubernetes Prow Robot	b06f2f7c8b	Merge pull request #911 from marquiz/devel/master-grpc-refactor nfd-master: refactor gRPC into a separate method	2022-10-12 03:23:00 -07:00
Markus Lehtonen	06bd6c0609	nfd-worker: refactor gRPC connection logic Make the NoPublish config flag a more direct control point for whether to publishing features. This patch is pre-work for adding support for other clients (upcoming new CRD API) in nfd-worker.	2022-10-11 17:02:33 +03:00
Markus Lehtonen	edcaf9a3bb	nfd-master: refactor gRPC into a separate method Refactor the code so that the initialization and running of the gRPC server is done in a separate function. The goal is to make the code more maintainable in terms of disabling (and eventually removing) the gRPC functionality in the future.	2022-10-07 14:45:44 +03:00
Markus Lehtonen	a00cdc2b61	pkg/utils: move hostpath helpers from source to utils Refactor the code, moving the hostpath helper functionality to new "pkg/utils/hostpath" package. This breaks odd-ish dependency "pkg/utils" -> "source".	2022-10-06 14:28:24 +03:00
Kubernetes Prow Robot	4097198848	Merge pull request #908 from marquiz/devel/type-rename pkg/api/feature: rename types	2022-10-06 01:59:51 -07:00
Markus Lehtonen	7f806e8c45	pkg/api/feature: update auto-generated code Complete the previous renaming.	2022-10-06 11:25:01 +03:00
Markus Lehtonen	abdbd420d1	pkg/api/feature: rename types Sync type names with NFD documentation. Aims at making the codebase easier to follow.	2022-10-06 11:25:01 +03:00
Markus Lehtonen	c1e6b41e56	apis/nfd: move annotation and label consts from nfd-master Move consts related to NFD annotations and labels from nfd-master to the api. Makes them more logically accessible for clients.	2022-10-06 11:23:56 +03:00
Kubernetes Prow Robot	906aad6717	Merge pull request #906 from marquiz/devel/master-controller-rename nfd-master: rename crd controller	2022-10-06 01:19:52 -07:00
Markus Lehtonen	658ffaa6a5	nfd-master: rename crd controller Prepare for adding support for other nfd api objects. Just rename file and some symbols, no functional changes.	2022-10-04 20:23:24 +03:00
Markus Lehtonen	11fd19fb7a	nfd-worker: rename some symbols Some renames in preparation for adding support for NFD CRD API client. I.e. a second client in addition to the existing gRPC client.	2022-10-04 17:18:25 +03:00
Kubernetes Prow Robot	dcc02b9787	Merge pull request #901 from fmuyassarov/add-shortname Set shortName for NodeFeatureRule CRD	2022-09-29 03:50:54 -07:00
Feruzjon Muyassarov	60f270d40d	Set shortName for NodeFeatureRule CRD This patch adds a kubebuilder marker to add a short name nfr for NodeFeatureRule CRD. Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>	2022-09-28 12:18:49 +03:00
Markus Lehtonen	8b652ab8ec	nfd-master: log if node was modified (or not) Be a bit more verbose what is happning.	2022-09-21 14:23:37 +03:00
Markus Lehtonen	389a3d4e2e	nfd-master: drop cleanup of ancient incubator labels Remove the cleanup code that removes ancient NFD labels with the node.alpha.kubernetes-incubator.io/ prefix. This label namespace was deprecated/dropped already in v0.4.0 so it should be safe to drop this code.	2022-09-20 19:56:58 +03:00
Markus Lehtonen	ffa35427cd	nfd-client: don't use deprecated grpc.WithInsecure() Replace deprecated grpc.WithInsecure() with grpc.WithTransportCredentials and insecure.NewCredentials(). Makes golangci-lint pass muster. enter the commit message for your changes. Lines starting	2022-09-09 11:07:22 +03:00
Markus Lehtonen	12e859d50c	Drop deprecated io/ioutil package Makes golanci-lint happy.	2022-09-08 14:26:02 +03:00
Markus Lehtonen	98228d2069	Update auto-generated artefacts Latest gofmt changes and update to go v1.19 induce some changes in the generated files.	2022-09-08 12:45:20 +03:00
Markus Lehtonen	2bbfe3edc8	Run gofmt Golang v1.19 was not happy with our code comments.	2022-09-08 12:43:15 +03:00
Kubernetes Prow Robot	4e6a718dfe	Merge pull request #865 from stek29/fix-864 Fix templates for NodeFeatureRule with MatchAny	2022-08-23 09:55:24 -07:00
Viktor Oreshkin	6fd12a2da7	apis/nfd: fix templates with MatchAny only Signed-off-by: Viktor Oreshkin <imselfish@stek29.rocks>	2022-08-23 18:00:44 +03:00
Markus Lehtonen	2c92e1dcff	logging: do not use %w with klog.Errorf It is not recognized (and does not work like with fmt.Errorf) so use %v instead.	2022-08-22 14:39:52 +03:00
Viktor Oreshkin	4375e08e39	apis/nfd: add more tests for templates test that NodeFeatureRule templates work with empty MatchFeatures, but with MatchAny. this test would fail, higligting an issue which is fixed in next commit. see #864. Signed-off-by: Viktor Oreshkin <imselfish@stek29.rocks>	2022-08-22 02:27:55 +03:00
Markus Lehtonen	889e4c1351	nfd-master: more fixes to log messages Use correct name for the CR (NodeFeatureRule) object. Also, the resource is cluster-scoped so don't print the namespace.	2022-08-17 10:07:26 +03:00
Markus Lehtonen	f5ee836bbf	nfd-master: fix incorrect log messages in crd controller	2022-08-16 16:39:27 +03:00
Markus Lehtonen	38e763e36c	Refresh auto-generated files	2022-08-10 14:24:33 +03:00
Markus Lehtonen	345e9bf72c	apis/nfd: revert the type hack Revert the hack that was a workaround for issues with k8s deepcopy-gen. New deepcopy-gen is able to generate code correctly without issues so this is not needed anymore. Also, removing this hack solves issues with object validation when creating NodeFeatureRules programmatically with nfd go-client. This is needed later with NodeFeatureRules e2e-tests. Logically reverts `f3cc109f99`.	2022-08-10 14:24:33 +03:00
Markus Lehtonen	ac3030ce48	Re-generate files Refresh auto-generated files using the new conainerized approach.	2022-08-10 09:47:23 +03:00
Markus Lehtonen	b7658c25fd	generate: update mockery to latest version In order to be able to run it on Go v1.18.	2022-08-10 09:47:23 +03:00
Markus Lehtonen	136c036d4d	Drop the iommu source It was deprecated in v0.10.0.	2022-06-14 15:00:29 +03:00
Markus Lehtonen	36341bf4c7	apis/nfd: empty match expression set returns no features for templates This patch changes a rare corner case of custom label rules with an empty set of matchexpressions. The patch removes a special case where an empty match expression set matched everything and returned all feature elements for templates to consume. With this patch the match expression set logically evaluates all expressions in the set and returns all matches - if there are no expressions there are no matches and no matched features are returned. However, the overall match result (determining if "non-template" labels will be created) in this special case will be "true" as before as none of the zero match expressions failed. The former behavior was somewhat illogical and counterintuitive: having 1 to N expressions matched and returned 1 to N features (at most), but, having 0 expressions always matched everything and returned all features. This was some leftover proof-of-concept functionality (for some possible future extensions) that should have been removed before merging.	2022-03-24 11:43:42 +02:00
Tuomas Katila	2ceafe83b7	topologyupdater: Prevent crash with incorrect node id It's possible for device plugins to advertise non-existent numa node ids that cause topology updater to crash. Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>	2022-03-15 11:16:02 +02:00
Dipto Chakrabarty	19a57789ad	Additional Lint Fixes in Codebase (#779 ) * fix comments and conditonals to fix lint issues * more linter fixes and spelling fixes * fix linter issues based on feedback	2022-03-02 17:12:46 -08:00
Markus Lehtonen	f9b4ba87a8	tls: require min TLS version 1.3 Deny deprecated TLS versions (1.0 and 1.1). We don't really excpect other clients than NFD itself so we can just request the latest version.	2022-02-25 10:08:37 +02:00
Kubernetes Prow Robot	0c330b1a35	Merge pull request #736 from marquiz/devel/grpc-stop nfd-master: do graceful stop of gRPC server	2022-01-21 03:05:59 -08:00
Markus Lehtonen	e53d053475	nfd-master: do graceful stop of gRPC server	2022-01-21 12:03:07 +02:00
Markus Lehtonen	e95a4dd460	nfd-master: print gRPC server error correctly	2022-01-21 11:56:28 +02:00
Mohammed Naser	cf1bc4a34d	Increase timeout in test setups This patch increases the timeout when setting up the NFD master to 5 seconds instead of 1 second to allow for running tests in slow environments.	2022-01-20 18:59:30 -05:00
Dipto Chakrabarty	755294184c	Fix GoLinter Issues in the files (#711 ) * fix linter issues for few files * fix linter issue of exported const Name should have comment or be unexported * fix name lint issue and resolve lints * add changes to comments	2022-01-18 23:12:06 -08:00
Markus Lehtonen	838a375f85	source/iommu: deprecate and disable by default Deprecate the iommu source and disable it by default.	2021-12-20 10:21:29 +02:00
Markus Lehtonen	a6eddbab4f	source: rename TestSource to SupplementalSource Just widen the scope in terms of naming, to cover deprecated and/or experimental sources too, for example.	2021-12-20 10:05:00 +02:00
Markus Lehtonen	bf01875368	nfd-worker: drop 'custom-' prefix from matchFeatures custom rules Do not prefix label names from the new matchFeatures/matchAny custom rules with "custom-". We want to have the same result (set of labels) from a rule independent of whether it has been specified in worker config or in a NodeFeatureRule CRs. Legacy matchOn rules (not available in NodeFeatureRule CRs) are intact, i.e. still prefixed, in order to retain backwards compatibility.	2021-12-09 21:52:40 +02:00
Markus Lehtonen	82e14300a4	source/fake: implement FeatureSource Makes it possible to create fake features for custom rules, enabling testing.	2021-12-07 10:34:41 +02:00
Markus Lehtonen	58e1461d90	nfd-worker: add -feature-sources command line flag Allows controlling (enable/disable) the "raw" feature detection. Especially useful for development and testing.	2021-12-03 09:42:35 +02:00
Markus Lehtonen	df6909ed5e	nfd-worker: add core.featureSources config option Add a configuration option for controlling the enabled "raw" feature sources. This is useful e.g. in testing and development, plus it also allows fully shutting down discovery of features that are not needed in a deployment. Supplements core.labelSources which controls the enablement of label sources.	2021-12-03 09:42:35 +02:00
Markus Lehtonen	2c3a4d1588	nfd-worker: rename nfdWorker.enabledSources to labelSources Refactoring in head of adding new config option for feature sources.	2021-12-02 21:08:46 +02:00
Markus Lehtonen	8cd58af613	nfd-worker: disable sources more easily Make it easier to disable single sources by prefixing the source name with a dash ('-') in the core.sources config option (or -sources cmdline flag).	2021-12-02 10:36:51 +02:00
Markus Lehtonen	773280de65	nfd-worker: provide deprecated core.sources config option Provide backwards compatibility via a deprecated 'core.sources' config file option. This will override 'core.labelSources'. A warning is printed in the log if this option is detected.	2021-12-01 17:11:49 +02:00
Markus Lehtonen	ad9c7dfa1e	nfd-worker: rename config option 'sources' to 'labelSources' The goal is to make the name more descriptive. Also keeping in mind a possible future addition a 'featureSources' option (or similar) for controlling the feature discovery.	2021-12-01 17:11:49 +02:00
Kubernetes Prow Robot	86bfe74cd7	Merge pull request #671 from marquiz/fixes/single-dash-flags Use single-dash format of cmdline flags	2021-12-01 06:45:15 -08:00
Markus Lehtonen	1765a37c6a	pkg/apis/nfd: drop unnecessary else statements	2021-12-01 10:55:50 +02:00
Markus Lehtonen	3f225be081	pkg/apis/nfd: use consistent receiver name for methods of templateHelper	2021-12-01 10:51:47 +02:00
Markus Lehtonen	d07400206f	pkg/apis/nfd/v1alpha1: document exported symbols Add missing comments and fix some existing ones.	2021-12-01 10:46:56 +02:00
Markus Lehtonen	c4f7ab0abe	pkg/api/feature: document exported functions	2021-12-01 10:30:17 +02:00
Markus Lehtonen	a57a25f63c	Use single-dash format of cmdline flags Use the single-dash (i.e. '-option' instead of '--option') format consistently accross log messages and documentation. This is the format that was mostly used, already, and shown by command line help of the binaries, for example.	2021-11-25 18:03:54 +02:00
Markus Lehtonen	b648d005e1	pkg/apis/nfd: support templating of "vars" Support templating of var names in a similar manner as labels. Add support for a new 'varsTemplate' field to the feature rule spec which is treated similarly to the 'labelsTemplate' field. The value of the field is processed through the golang "text/template" template engine and the expanded value must contain variables in <key>=<value> format, separated by newlines i.e.: - name: <rule-name> varsTemplate: \| <label-1>=<value-1> <label-2>=<value-2> ... Similar rules as for 'labelsTemplate' apply, i.e. 1. In case of matchAny is specified, the template is executed separately against each individual matchFeatures matcher. 2. 'vars' field has priority over 'varsTemplate'	2021-11-25 12:50:47 +02:00
Markus Lehtonen	f75303ce43	pkg/apis/nfd: add variables to rule spec and support backreferences Support backreferencing of output values from previous rules. Enables complex rule setups where custom features are further combined together to form even more sophisticated higher level labels. The labels created by preceding rules are available as a special 'rule.matched' feature (for matchFeatures to use). If referencing rules accross multiple configs/CRDs care must be taken with the ordering. Processing order of rules in nfd-worker: 1. Static rules 2. Files from /etc/kubernetes/node-feature-discovery/custom.d/ in alphabetical order. Subdirectories are processed by reading their files in alphabetical order. 3. Custom rules from main nfd-worker.conf In nfd-master, NodeFeatureRule objects are processed in alphabetical order (based on their metadata.name). This patch also adds new 'vars' fields to the rule spec. Like 'labels', it is a map of key-value pairs but no labels are generated from these. The values specified in 'vars' are only added for backreferencing into the 'rules.matched' feature. This may by desired in schemes where the output of certain rules is only used as intermediate variables for other rules and no labels out of these are wanted. An example setup: - name: "kernel feature" labels: kernel-feature: matchFeatures: - feature: kernel.version matchExpressions: major: {op: Gt, value: ["4"]} - name: "intermediate var feature" vars: nolabel-feature: "true" matchFeatures: - feature: cpu.cpuid matchExpressions: AVX512F: {op: Exists} - feature: pci.device matchExpressions: vendor: {op: In, value: ["8086"]} device: {op: In, value: ["1234", "1235"]} - name: top-level-feature matchFeatures: - feature: rule.matched matchExpressions: kernel-feature: "true" nolabel-feature: "true"	2021-11-25 12:50:47 +02:00

... 2 3 4 5 6 ...

481 commits