node-feature-discovery

mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2024-12-15 17:50:49 +00:00

Author	SHA1	Message	Date
Tobias Giese	53ddf081da	Add parameter to configure health endpoint port Signed-off-by: Tobias Giese <tgiese@nvidia.com>	2024-09-24 15:15:50 +02:00
Markus Lehtonen	dfbd63b728	topology-updater: properly handle IPv6 from NODE_ADDRESS Fix the usage of IPv6 addresses for default kubelet configz endpoint. The default host:port we use for kubelet configz endpoint is ${NODE_ADDRESS}:10250. Previously we errored out if NODE_ADDRESS was an IPv6 address because we used an incorrect notation (without brackets). The (IPv6) needs to be enclosed in brackets if specifying the port.	2024-06-04 14:19:57 +03:00
Oleg Zhurakivskyy	f2e9557a2d	nfd-topology-updater: Add liveness probe Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>	2024-04-03 13:15:54 +03:00
leemingeer	b6d8ce7a5a	nfd-topology-updater add pods fingerprint by default	2024-01-26 17:55:34 +08:00
Markus Lehtonen	d7ec0bf674	topology-updater: document the -no-publish flag correctly	2024-01-22 14:21:02 +02:00
Kubernetes Prow Robot	a658c54de3	Merge pull request #1297 from marquiz/devel/topology-updater-version topology-updater: make -version always runnable	2023-08-28 04:05:43 -07:00
Markus Lehtonen	5ba8d14b86	topology-updater: make -version always runnable Make it possible to run -version in an environment whithout the NODE_ADDRESS environment variable set.	2023-08-07 11:56:58 +03:00
Markus Lehtonen	06b333db1e	nfd-topology-updater: add metrics support For now, add only one metric, a counter for the errors occurring while scanning pod resources on the node.	2023-08-04 16:48:37 +03:00
pprokop	6d98b6150b	Fix Topology Manager policy and scope not being updated properly NFD is only detecting policy and scope of Topology Manager when NRT object doesn't exist. This means that topologyManagerScope and topologyManagerPolicy attributes won't be updated even if kubelet config was changed to use other TopologyManager policy and scope. Signed-off-by: pprokop <pprokop@nvidia.com>	2023-07-20 16:31:12 +02:00
Markus Lehtonen	6e3b181ab4	topology-updater: migrate to structured logging	2023-05-31 14:43:08 +03:00
Markus Lehtonen	1200fd05c5	topology-updater: use node IP in the default configz URI Use a separate NODE_ADDRESS environment variable in the default value of -kubelet-config-uri (instead of NODE_NAME that was previously used). Also change the kustomize and Helm deployments to set this variable to node IP address. This should make the default deployment more robust, making it work in scenarios where node name does not resolve to the node ip, e.g. nodename != hostname.	2023-05-05 13:29:51 +03:00
Talor Itzhak	8924213d14	topology-updater: make it possible to disable sleep-interval Especially convenient for testing porpuses and completely harmless Signed-off-by: Talor Itzhak <titzhak@redhat.com>	2023-03-12 12:43:17 +02:00
Talor Itzhak	7b248ecae2	topology-updater: update CRs when notified When a message received via the channel, the main loop updates the `NodeResourceTopology` objects. The notifier will send a message via the channel if: 1. It reached the sleep timeout. 2. It detected a change in Kubelet state files Signed-off-by: Talor Itzhak <titzhak@redhat.com>	2023-03-12 12:37:24 +02:00
Talor Itzhak	175e0c81aa	topology-updater: add kubelet-state-dir flag On different Kubernetes flavors like OpenShift for exmaple, the Kubelet state directory path is different. make it configurable for maximum flexability. Signed-off-by: Talor Itzhak <titzhak@redhat.com>	2023-03-12 12:37:24 +02:00
Jose Luis Ojosnegros Manchón	b340d112a8	topology-updater:compute pod set fingerprint Add an option to compute the fingerprint of the current pod set on each node. Report this new fingerprint using an attribute in NRT object.	2023-02-22 10:22:50 +01:00
pprokop	5484babcb1	Advertise TopologyManger policy and scope as Attributes Signed-off-by: pprokop <pprokop@nvidia.com>	2023-02-10 12:03:11 +01:00
Markus Lehtonen	0283f68702	topology-updater: move code Move and rename the Go package. It has nothing to do with NFD gRPC client anymore so move it out of the nfd-client package.	2022-12-23 11:37:46 +02:00
Markus Lehtonen	aa97105854	Add common utility function for getting node name	2022-12-23 09:50:15 +02:00
Markus Lehtonen	f13ed2d91c	nfd-topology-updater: update NodeResourceTopology objects directly Drop the gRPC communication to nfd-master and connect to the Kubernetes API server directly when updating NodeResourceTopology objects. Topology-updater already has connection to the API server for listing Pods so this is not that dramatic change. It also simplifies the code a lot as there is no need for the NFD gRPC client and no need for managing TLS certs/keys. This change aligns nfd-topology-updater with the future direction of nfd-worker where the gRPC API is being dropped and replaced by a CRD-based API. This patch also update deployment files and documentation to reflect this change.	2022-12-08 11:03:22 +02:00
Talor Itzhak	5b0788ced4	topology-updater: introduce exclude-list The exclude-list allows to filter specific resource accounting from NRT's objects per node basis. The CRs created by the topology-updater are used by the scheduler-plugin as a source of truth for making scheduling decisions. As such, this feature allows to hide specific information from the scheduler, which in turn will affect the scheduling decision. A common use case is when user would like to perform scheduling decisions which are based on a specific resource. In that case, we can exclude all the other resources which we don't want the scheduler to exemine. The exclude-list is provided to the topology-updater via a ConfigMap. Resource type's names specified in the list should match the names as shown here: https://pkg.go.dev/k8s.io/api/core/v1#ResourceName This is a resurrection of an old work started here: https://github.com/kubernetes-sigs/node-feature-discovery/pull/545 Signed-off-by: Talor Itzhak <titzhak@redhat.com>	2022-11-21 14:08:25 +02:00
Garrybest	3ec1b94020	get kubelet config from configz Signed-off-by: Garrybest <garrybest@foxmail.com>	2022-11-08 23:52:35 +08:00
Markus Lehtonen	a00cdc2b61	pkg/utils: move hostpath helpers from source to utils Refactor the code, moving the hostpath helper functionality to new "pkg/utils/hostpath" package. This breaks odd-ish dependency "pkg/utils" -> "source".	2022-10-06 14:28:24 +03:00
NHM Tanveer Hossain Khan	856dfdd8b4	Remove fatal logging to error based on the feedback	2021-11-19 16:57:21 -05:00
Talor Itzhak	674720e922	topology-updater:fix klog initialization We should use the same flag set for both program and klog arguments. Otherwise we won't be able to provide klog flags properly Signed-off-by: Talor Itzhak <titzhak@redhat.com>	2021-10-11 21:36:54 +03:00
Swati Sehgal	aa7ae9265c	topologyupdater: watch/consider only guaranteed pods for accounting - Files obtained after running make mock - Run `go get github.com/vektra/mockery` and make sure that mockery is in your $PATH - run `make mock` Signed-off-by: Swati Sehgal <swsehgal@redhat.com>	2021-09-21 10:48:10 +01:00
Francesco Romani	b4c92e4eed	topologyupdater: Bootstrap nfd-topology-updater in NFD - This patch allows to expose Resource Hardware Topology information through CRDs in Node Feature Discovery. - In order to do this we introduce another software component called nfd-topology-updater in addition to the already existing software components nfd-master and nfd-worker. - nfd-master was enhanced to communicate with nfd-topology-updater over gRPC followed by creation of CRs corresponding to the nodes in the cluster exposing resource hardware topology information of that node. - Pin kubernetes dependency to one that include pod resource implementation - This code is responsible for obtaining hardware information from the system as well as pod resource information from the Pod Resource API in order to determine the allocatable resource information for each NUMA zone. This information along with Costs for NUMA zones (obtained by reading NUMA distances) is gathered by nfd-topology-updater running on all the nodes of the cluster and propagate NUMA zone costs to master in order to populate that information in the CRs corresponding to the nodes. - We use GHW facilities for obtaining system information like CPUs, topology, NUMA distances etc. - This also includes updates made to Makefile and Dockerfile and Manifests for deploying nfd-topology-updater. - This patch includes unit tests - As part of the Topology Aware Scheduling work, this patch captures the configured Topology manager scope in addition to the Topology manager policy. Based on the value of both attribues a single string will be populated to the CRD. The string value will be on of the following {SingleNUMANodeContainerLevel, SingleNUMANodePodLevel, BestEffort, Restricted, None} Co-Authored-by: Artyom Lukianov <alukiano@redhat.com> Co-Authored-by: Francesco Romani <fromani@redhat.com> Co-Authored-by: Talor Itzhak <titzhak@redhat.com> Signed-off-by: Swati Sehgal <swsehgal@redhat.com>	2021-09-21 10:47:39 +01:00

26 commits