node-feature-discovery

mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2024-12-14 11:57:51 +00:00

Author	SHA1	Message	Date
Markus Lehtonen	a1406767a9	docs: align metrics documentation with latest changes on naming Also change table formatting and fix one incorrect description.	2023-08-01 15:53:06 +03:00
Kubernetes Prow Robot	65b7216313	Merge pull request #1283 from marquiz/docs/deprecation-policy docs: deprecation policy for Helm chart params	2023-07-25 10:46:06 -07:00
Kubernetes Prow Robot	463a737b82	Merge pull request #1277 from marquiz/docs/k8s-compat docs: describe supported Kubernetes versions	2023-07-25 08:54:06 -07:00
Markus Lehtonen	b1328b3166	docs: describe supported Kubernetes versions	2023-07-25 17:40:06 +03:00
Markus Lehtonen	b72b537261	docs: deprecation policy for Helm chart params	2023-07-24 14:06:30 +03:00
Pat Riehecky	0523257d1a	Add optional labels to the podmonitor Signed-off-by: Pat Riehecky <riehecky@fnal.gov>	2023-07-21 10:03:50 -05:00
Kubernetes Prow Robot	c9f3550237	Merge pull request #1280 from marquiz/docs/tocs docs: remove useless TOCs	2023-07-21 06:50:15 -07:00
Kubernetes Prow Robot	ebbea564a8	Merge pull request #1278 from marquiz/docs/fixes docs: fix toc of topology-updater and topology-gc reference	2023-07-21 06:50:08 -07:00
Markus Lehtonen	312ef308d1	docs: remove useless TOCs Drop table of contents from short pages where it is only cluttering the page.	2023-07-21 16:35:12 +03:00
Markus Lehtonen	f825812229	docs: document version and deprecation policy	2023-07-21 16:28:38 +03:00
Markus Lehtonen	d4d6963473	docs: fix toc of topology-updater and topology-gc reference Exclude the main title from to (with the empty line the "no_toc" directive took no effect).	2023-07-21 15:41:59 +03:00
Carlos Eduardo Arango Gutierrez	e3aedd33e2	Enable metrics via prometheus operator Expose metrics via prometheus.monitoring.coreos.com/v1 The exposed metrics are \| Metric \| Type \| Meaning \| \| --------------- \| ---------------- \| ---------------- \| \| `nfd_master_build_info` \| Gauge \| Version from which nfd-master was built. \| \| `nfd_worker_build_info` \| Gauge \| Version from which nfd-worker was built. \| \| `nfd_updated_nodes` \| Counter \| Time taken to label a node \| \| `nfd_crd_processing_time` \| Gauge \| Time taken to process a NodeFeatureRule CRD \| \| `nfd_feature_discovery_duration_seconds` \| HistogramVec \| Time taken to discover features on a node \| Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>	2023-07-21 10:59:52 +02:00
Kubernetes Prow Robot	407a610e0c	Merge pull request #1182 from fmuyassarov/disable-hooks-by-default hooks: disable hooks by default from v0.14	2023-06-22 04:43:40 -07:00
Carlos Eduardo Arango Gutierrez	563cc862de	Docs: Fix typo on customization-guide Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2023-06-09 10:23:33 +02:00
Muyassarov, Feruzjon	19527be924	hooks: disable hooks by default We have deprecated hooks in v0.12.0 but kept it enabled by default. Starting from v0.14 we are starting to disable it by default and plan to fully remove it in the near future. Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>	2023-06-07 13:04:23 +03:00
Simon Jürgensmeyer	307a865465	Fix missing apostrophe for jq	2023-06-07 09:53:02 +02:00
Hairong Chen	e8a00ba7da	cpu: Discover TDX guests based on cpuid information NFD already has the capability to discover whether baremetal / host machines support Intel TDX. Now, the next step is to add support for discovering whether a node is TDX protected (as in, a virtual machine started using Intel TDX). In order to do so, we've decided to go for a new `cpu-security.tdx` property, called `protected` (`cpu-security.tdx.protected`). Signed-off-by: Hairong Chen <hairong.chen@intel.com> Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2023-06-05 11:06:28 +02:00
Kubernetes Prow Robot	306969a945	Merge pull request #1133 from AhmedGrati/feat-parallelize-nodes-update feat: parallelize nodes update	2023-06-02 05:28:57 -07:00
AhmedGrati	b3cfe17392	feat: parallelize nodes update This PR aims to optimize the process of updating nodes with corresponding features. In fact, previously, we were updating nodes sequentially even though they are independent from each other. Therefore, we integrated new components: LabelersNodePool which is responsible for spininng a goroutine whenever there's a request for updating nodes, and a Workqueue which is responsible for holding nodes names that should be updated. Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-06-02 11:41:50 +01:00
AhmedGrati	08b9c3486e	feat: support dynamic values for labels in the NodeFeatureRule This PR aims to support the dynamic values for labels in the NodeFeatureRule CRD, it would offer more flexible labeling for users. To achieve this, we check whether label value starts with "@", and if it's the case, we will get the value of the feature value, and update the value of the label with the feature value. Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-05-31 23:30:26 +01:00
Kubernetes Prow Robot	d28a02c5cd	Merge pull request #1222 from vaibhav2107/kustomize-type Fixed typo in Header under deployment/kustomize.md	2023-05-22 00:42:21 -07:00
Kubernetes Prow Robot	70d5ef477f	Merge pull request #1219 from PiotrProkop/leader-elect Add leader election for nfd-master	2023-05-22 00:36:21 -07:00
vaibhav2107	9f7854479f	Fixed type in Header under deployment/kustomize.md	2023-05-18 14:59:54 +05:30
PiotrProkop	272fd4784f	Add new flag enable-leader-election for nfd-master. It allows NFD-master to be run in active-passive way when running multiple instances of NFD-master to prevent multiple components from updating same custom resources. Signed-off-by: PiotrProkop <pprokop@nvidia.com>	2023-05-15 13:30:07 +02:00
Markus Lehtonen	1200fd05c5	topology-updater: use node IP in the default configz URI Use a separate NODE_ADDRESS environment variable in the default value of -kubelet-config-uri (instead of NODE_NAME that was previously used). Also change the kustomize and Helm deployments to set this variable to node IP address. This should make the default deployment more robust, making it work in scenarios where node name does not resolve to the node ip, e.g. nodename != hostname.	2023-05-05 13:29:51 +03:00
Kubernetes Prow Robot	cd45baef8d	Merge pull request #1211 from marquiz/devel/helm deployment/helm: improve handling of topologyUpdater.kubeletStateFiles	2023-05-05 00:17:13 -07:00
Markus Lehtonen	526aab87cf	deployment/helm: user dedicated serviceaccount for topology-updater Change the configuration so that, by default, we use a dedicated serviceaccount for topology-updater (similar to topology-gc, nfd-master and nfd-worker). Fix the templates so that the serviceaccount and clusterrolebinding are only created when topology-updater is enabled (clusterrole was already handled this way). This patch also correctly documents the default value of rbac.create parameter of topology-updater and topology-gc.	2023-05-05 08:30:21 +03:00
Markus Lehtonen	9c2f268fd2	deployment/helm: improve handling of topologyUpdater.kubeletStateFiles Make it possible to disable kubelet state tracking with --set topologyUpdater.kubeletStateFiles="" as the documentation suggests. Also, fix the documentation regarding the default value of topologyUpdater.kubeletStateFiles parameter.	2023-05-04 15:01:19 +03:00
Markus Lehtonen	9685d292a2	docs: add missing .md suffix to internal references Commit `bfbc47f55e` added a lot of those and this patch tries to cover all that we missed there. Having .md suffixes in references to internal files makes it convenient to browse the document locally, just as text files as the references work correctly.	2023-04-25 15:28:07 +03:00
Kubernetes Prow Robot	2356223ffc	Merge pull request #1139 from AhmedGrati/feat-configure-master-resync feat: add master resync period configurability	2023-04-24 03:49:02 -07:00
AhmedGrati	7917434d38	feat: add master resync period configurability This PR adds a config option for setting the NFD API controller resync period. The resync period is only activated when the NodeFeature API has been enabled (with -enable-nodefeature-api). Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-04-24 11:52:38 +02:00
Carlos Eduardo Arango Gutierrez	05ef5d4e9d	cpu: expose the total number of AMD SEV ASID and ES This patch add SEV ASIDs and the related (but distinct) SEV Encrypted State (SEV-ES) IDs as two quantities to be exposed via extended resources. In a kernel built with CONFIG_CGROUP_MISC on a suitably equipped AMD CPU, the root control group will have a misc.capacity file that shows the number of available IDs in each category. The added extended resources are: - sev.asids - sev.encrypted_state_ids Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2023-04-17 19:34:39 +02:00
Mikko Ylinen	de1b69a8bf	cpu: make SGX EPC resource available to NodeFeatureRules Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2023-04-14 15:31:54 +03:00
Markus Lehtonen	3320c74472	source/cpu: don't create cpu-security.tdx.total_keys label Just have that as a feature for NodeFeatureRules to consume.	2023-04-14 13:33:13 +03:00
Kubernetes Prow Robot	84c348b69f	Merge pull request #1126 from marquiz/devel/er-deprecation nfd-master: deprecate the -resource-labels flag	2023-04-13 10:52:39 -07:00
Kubernetes Prow Robot	8d71ed6755	Merge pull request #1086 from AhmedGrati/feat-support-builtin-kernel-mods feat: support builtin kernel mods	2023-04-13 10:30:40 -07:00
AhmedGrati	109caa1f28	feat: support builtin kernel mods This PR adds the combination of dynamic and builtin kernel modules into one feature called `kernel.enabledmodule`. It's a superset of the `kernel.loadedmodule` feature. Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-04-13 10:19:24 +01:00
Markus Lehtonen	8511980bf4	nfd-master: deprecate the -resource-labels flag Mark the -resource-labels flag (and the corresponding resourceLabels config option) as deprecated. We now support managing extended resources via NodeFeatureRule objects. This kludge deserves to go, eventually.	2023-04-13 11:30:58 +03:00
Markus Lehtonen	dcbb3bc450	docs: add missing mentions of extended resources and taints A small update to fix some missing mentions of extended resources and taints as assets managed by NFD.	2023-04-11 20:38:21 +03:00
Kubernetes Prow Robot	ad07829d0a	Merge pull request #1099 from ArangoGutierrez/extended_resources_v2 Create extended resources with NodeFeatureRule	2023-04-07 08:09:15 -07:00
Fabiano Fidêncio	250aea4741	Create extended resources with NodeFeatureRule Add support for management of Extended Resources via the NodeFeatureRule CRD API. There are usage scenarios where users want to advertise features as extended resources instead of labels (or annotations). This patch enables the discovery of extended resources, via annotation and patch of node.status.capacity and node.status.allocatable. By using the NodeFeatureRule API. Co-authored-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com> Co-authored-by: Fabiano Fidêncio <fabiano.fidencio@intel.com> Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com> Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>	2023-04-07 16:14:56 +02:00
Kubernetes Prow Robot	6740224a13	Merge pull request #1100 from PiotrProkop/expose-L3-num-closid Advertise RDT L3 num_closid	2023-04-07 00:49:14 -07:00
Markus Lehtonen	cc6c20ff5f	nfd-master: disallow unprefixed and kubernetes taints Disallow taints having a key with "kubernetes.io/" or "*.kubernetes.io/" prefix. This is a precaution to protect the user from messing up with the "official" well-known taints from Kubernetes itself. The only exception is that the "nfd.node.kubernetes.io/" prefix is allowed. However, there is one allowed NFD-specific namespace (and its sub-namespaces) i.e. "feature.node.kubernetes.io" under the kubernetes.io domain that can be used for NFD-managed taints. Also disallow unprefixed taint keys. We don't add a default prefix to unprefixed taints (like we do for labels) from NodeFeatureRules. This is to prevent unpleasant surprises to users that need to manage matching tolerations for their workloads.	2023-04-06 16:12:37 +03:00
PiotrProkop	0e78eba40e	Advertise RDT L3 num_closid Signed-off-by: PiotrProkop <pprokop@nvidia.com>	2023-04-06 11:22:55 +02:00
Kubernetes Prow Robot	3c0c43b9be	Merge pull request #1114 from marquiz/devel/rdt-deprecate source/cpu: deprecate cpu-rdt.* labels	2023-04-05 06:21:40 -07:00
Kubernetes Prow Robot	193c552b33	Merge pull request #1084 from AhmedGrati/feat-add-master-config-file feat: add master config file	2023-04-04 10:41:40 -07:00
Markus Lehtonen	6cb5e99afa	source/cpu: deprecate cpu-rdt.* labels Document built-in RDT labels to be deprecated and removed in a future release. The plan is that the default built-in RDT labels would not be created anymore, but the RDT features would still be available for NodeFeatureRules to consume. The RDT labels are not very useful (they don't e.g indicate if the features are really enabled in kernel or if the resctrlfs is mounted).	2023-04-04 11:54:57 +03:00
AhmedGrati	3fff409f6d	Add master config file Similar to the nfd-worker, in this PR we want to support the dynamic run-time configurability through a config file for the nfd-master. We'll use a json or yaml configuration file along with the fsnotify in order to watch for changes in the config file. As a result, we're allowing dynamic control of logging params, allowed namespaces, extended resources, label whitelisting, and denied namespaces. Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>	2023-04-03 09:52:09 +01:00
Fabiano Fidêncio	10672e1bba	cpu: Expose the total number of keys for TDX The total amount of keys that can be used on a specific TDX system is exposed via the cgroups misc.capacity. See: ``` $ cat /sys/fs/cgroup/misc.capacity tdx 31 ``` The first step to properly manage the amount of keys present in a node is exposing it via the NFD, and that's exactly what this commit does. An example of how it ends up being exposed via the NFD: ``` $ kubectl get node 984fee00befb.jf.intel.com -o jsonpath='{.metadata.labels}' \| jq \| grep tdx.total_keys "feature.node.kubernetes.io/cpu-security.tdx.total_keys": "31", ``` Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2023-03-31 09:12:26 +02:00
Carlos Eduardo Arango Gutierrez	7171cfd4eb	cpu: expose AMD SEV support Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> Co-authored-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>	2023-03-30 15:19:43 +02:00

1 2 3 4 5 ...

276 commits