1
0
Fork 0
mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2024-12-14 11:57:51 +00:00
Commit graph

368 commits

Author SHA1 Message Date
Markus Lehtonen
06b333db1e nfd-topology-updater: add metrics support
For now, add only one metric, a counter for the errors occurring while
scanning pod resources on the node.
2023-08-04 16:48:37 +03:00
Markus Lehtonen
039378c725 nfd-master: use term node update instead of labeling
Rename symbols and reword log messages to correlate with the
functionality (we may do other updates than just modify labels
nowadays).
2023-08-01 16:42:34 +03:00
Markus Lehtonen
d8f167d8a9 nfd-master: remove one stale empty line 2023-08-01 16:38:32 +03:00
Kubernetes Prow Robot
c1cb63243b
Merge pull request #1288 from marquiz/devel/metrics
Improve metrics
2023-07-31 10:38:39 -07:00
Markus Lehtonen
5091fef84b metrics: improve feature discovery duration metric
Rename the "NodeName" prometheus label to  "node", aligning  with
common prometheus/kubernetes conventions. Also reconfigure the
prometheus histogram buckets (now 10ms to 1s) to better match the
expected sample range.
2023-07-31 19:45:22 +03:00
Markus Lehtonen
47f621d970 metrics: improve the node updates gauge
Rename the metric, better describe what we're measuring and better
comply with prometheus naming conventions. Also change it to represent
actual updates of the node object on the Kubernetes apiserver.
2023-07-31 19:45:22 +03:00
Markus Lehtonen
945e7fcb3f metrics: improve nfr processing time metric
Change the metric from a simple gauge (that basically was a single value
for the whole cluster) into a HistogramVec, aligning with the feature
discovery duration metric in nfd-worker. This improved metric now has
prometheus labels for the NFR name and node name, i.e. it is tracking
per-NFR metric for each node being processed. Also, change the naming to
better comply with prometheus suggested conventions.
2023-07-31 19:45:22 +03:00
Kubernetes Prow Robot
01ca8cb91d
Merge pull request #1284 from marquiz/devel/generator-deps
generate: bump tools to their latest versions
2023-07-31 06:32:39 -07:00
Kubernetes Prow Robot
e0f10a81de
Merge pull request #1256 from PiotrProkop/fix-topo-updater-policy-and-scope-advertisment
Fix Topology Manager policy and scope not being updated after NRT creation
2023-07-28 00:25:54 -07:00
Markus Lehtonen
7e375ad1f0 generate: bump tools to their latest versions
Bump tools versions and re-auto-generate files.
2023-07-27 14:29:48 +03:00
Kubernetes Prow Robot
77d869c4f7
Merge pull request #1242 from ArangoGutierrez/metrics
Enable metrics via prometheus operator
2023-07-21 02:26:08 -07:00
Carlos Eduardo Arango Gutierrez
e3aedd33e2
Enable metrics via prometheus operator
Expose metrics via prometheus.monitoring.coreos.com/v1

The exposed metrics are

| Metric        | Type | Meaning |
| --------------- | ---------------- | ---------------- |
|  `nfd_master_build_info`           | Gauge | Version from which nfd-master was built. |
|  `nfd_worker_build_info`           | Gauge | Version from which nfd-worker was built. |
|  `nfd_updated_nodes`           |  Counter | Time taken to label a node |
|  `nfd_crd_processing_time`          |  Gauge | Time taken to process a NodeFeatureRule CRD |
| `nfd_feature_discovery_duration_seconds` |  HistogramVec | Time taken to discover features on a node |

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
2023-07-21 10:59:52 +02:00
pprokop
6d98b6150b Fix Topology Manager policy and scope not being updated properly
NFD is only detecting policy and scope of Topology Manager when NRT object doesn't exist.
This means that topologyManagerScope and topologyManagerPolicy attributes won't be updated
even if kubelet config was changed to use other TopologyManager policy and scope.

Signed-off-by: pprokop <pprokop@nvidia.com>
2023-07-20 16:31:12 +02:00
AhmedGrati
8e55d78d85 test: add node updater pool unit tests
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-07-19 12:03:35 +01:00
Markus Lehtonen
dac45be28c nfd-master: check for nil references in nfdAPIUpdateAllNodes
Just a safeguard.
2023-07-17 17:49:44 +03:00
hang.jiang
698031fc2d Stop ticker in time to avoid memory leak
Because it will cause memory leak if we do not stop ticker when the function has completed.

Signed-off-by: hang.jiang <hang.jiang@daocloud.io>
2023-07-05 18:35:01 +08:00
guoguangwu
b946bcc0f5 nfd-master-internal_test.go rm pkg imported twice
Signed-off-by: guoguangwu <guoguangwu@magic-shield.com>
2023-06-21 16:53:55 +08:00
Kubernetes Prow Robot
306969a945
Merge pull request #1133 from AhmedGrati/feat-parallelize-nodes-update
feat: parallelize nodes update
2023-06-02 05:28:57 -07:00
AhmedGrati
b3cfe17392 feat: parallelize nodes update
This PR aims to optimize the process of updating nodes with
corresponding features. In fact, previously, we were updating nodes
sequentially even though they are independent from each other.
Therefore, we integrated new components: LabelersNodePool which is
responsible for spininng a goroutine whenever there's a request for
updating nodes, and a Workqueue which is responsible for holding nodes names
that should be updated.

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-06-02 11:41:50 +01:00
AhmedGrati
08b9c3486e feat: support dynamic values for labels in the NodeFeatureRule
This PR aims to support the dynamic values for labels in the
NodeFeatureRule CRD, it would offer more flexible labeling for users.
To achieve this, we check whether label value starts with "@", and if
it's the case, we will get the value of the feature value, and update
the value of the label with the feature value.

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-05-31 23:30:26 +01:00
Markus Lehtonen
bf670de68d pkg/utils: migrate KlogDump to structured logging
Drop the KlogDump helper in favor of klog.InfoS. However, that patch
introduces a new DelayedDumper() helper to avoid processing
(marshalling) of object unless really evaluated by the logging function.
2023-05-31 14:43:08 +03:00
Markus Lehtonen
4947ebf336 pkg/util: migrate to structured logging
We gRPC logging interface is not compatible with structured logging so
grpcLogger is left intact.
2023-05-31 14:43:08 +03:00
Markus Lehtonen
64d5af016e apis/nfd: migrate to structured logging 2023-05-31 14:43:08 +03:00
Markus Lehtonen
6e3b181ab4 topology-updater: migrate to structured logging 2023-05-31 14:43:08 +03:00
Markus Lehtonen
7be08f9e7f nfd-worker: migrate to structured logging 2023-05-31 14:43:08 +03:00
Markus Lehtonen
8113d651c2 nfd-master: migrate to structured logging 2023-05-31 14:43:05 +03:00
Markus Lehtonen
2a3c7e4c93 nfd-master: add validation of label names and values
Validate labels before trying to update the node. Makes us fail early
nad prevent useless retries in case invalid labels are tried.
2023-05-29 16:54:14 +03:00
Markus Lehtonen
1809c24314 nfd-master: use close for stop channel
Simpler and more reliable (in case of multiple consumers) to just close
the channel.
2023-05-24 16:51:48 +03:00
PiotrProkop
272fd4784f Add new flag enable-leader-election for nfd-master.
It allows NFD-master to be run in active-passive way when running
multiple instances of NFD-master to prevent multiple components
from updating same custom resources.

Signed-off-by: PiotrProkop <pprokop@nvidia.com>
2023-05-15 13:30:07 +02:00
Kubernetes Prow Robot
85073525c3
Merge pull request #1185 from AhmedGrati/fix-resync-period-functionality
nfd-master: fix resync period config option
2023-05-02 11:14:16 -07:00
AhmedGrati
87c2d7e184 nfd-master: fix resync period config option
This PR fixes the resync-period configuration option of the nfd-master.
In fact, previously, changes were not reflected in the nfd-master at
runtime. e2e tests are also implemented to make sure that the fix is
already working as expected.

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-05-02 13:17:01 +02:00
Markus Lehtonen
fb20388028 nfd-master: refactor filtering of taints 2023-04-28 18:13:54 +03:00
Markus Lehtonen
43ced0c1a1 nfd-master: refactor filtering of feature labels
More consistent error messages. Also preparation for dynamic labels
values (that '@' notation currently supported for extended resources).
2023-04-28 18:13:54 +03:00
Markus Lehtonen
6ca687fbef nfd-master: refactor filtering of extended resources
Simplify code a bit and get more consistent error messages (in addition
to fixing some of those).
2023-04-28 18:13:54 +03:00
Markus Lehtonen
131325fb2c nfd-master: refactor api-controller object handling
Split out resolving of node name (of the node to be updated) into a
separate function. Makes it possible to add unit tests. Also. do
unconditional type casting in the handler functions – that shouldn't
fail unless there is a really serious internal inconsistency in the
codebase so it should be ok to panic.
2023-04-28 17:33:33 +03:00
Kubernetes Prow Robot
d84248bc7d
Merge pull request #1190 from marquiz/devel/api-unit-tests
apis/nfd: add unit tests for Feature type
2023-04-26 23:32:15 -07:00
Markus Lehtonen
77011a775f nfd-master: log node name when processing NodeFeatureRules 2023-04-26 07:22:30 +03:00
Markus Lehtonen
dda7b195ee apis/nfd: add unit tests for Feature type 2023-04-25 19:40:35 +03:00
Kubernetes Prow Robot
54bd4c5d74
Merge pull request #1167 from PiotrProkop/fix-reactive-updates
nfd-topology-updater: fix wrong kubelet_internal_checkpoint path and compare basename to full path
2023-04-24 04:41:01 -07:00
pprokop
5a9a12151c nfd-topology-updater: fix kubelet state file notifier
- kubelet_internal_checkpoint file is in /var/lib/kubelet/device-plugins not /var/lib/kubelet
  fsWatcher doesn't watch dirs recursively
- e.Name returned from fsWatcher events is a full path not a basename

Signed-off-by: pprokop <pprokop@nvidia.com>
2023-04-24 13:21:56 +02:00
Kubernetes Prow Robot
2356223ffc
Merge pull request #1139 from AhmedGrati/feat-configure-master-resync
feat: add master resync period configurability
2023-04-24 03:49:02 -07:00
AhmedGrati
7917434d38 feat: add master resync period configurability
This PR adds a config option for setting the NFD API controller resync period.
The resync period is only activated when the NodeFeature API has been
enabled (with -enable-nodefeature-api).

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-04-24 11:52:38 +02:00
Kubernetes Prow Robot
64fe26ed37
Merge pull request #1169 from ArangoGutierrez/i1168
nfd-master: reject malformed extended resource dynamic capacity assignment
2023-04-24 00:17:15 -07:00
Carlos Eduardo Arango Gutierrez
f5df7b658c
nfd-master: reject malformed extended resource dynamic capacity assignment
Reject malformed extended resource dynamic capacity assignment
capacity should be in the form of domain.feature.element,
add logic at func filterExtendedResources to check if true or ignore
ExtendedResource, logging as an error.

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-04-22 08:43:50 +02:00
Kubernetes Prow Robot
d5bccda7c5
Merge pull request #1171 from ArangoGutierrez/foundon_typo
pkg/nfd-master/nfd-master.go: Fix typo
2023-04-21 12:21:11 -07:00
Kubernetes Prow Robot
c2c1e18908
Merge pull request #1173 from marquiz/devel/fix-master
nfd-master: fix a crash when processing NodeFeatureRules
2023-04-21 09:49:11 -07:00
Markus Lehtonen
9523f1e411 nfd-master: fix a crash when processing NodeFeatureRules
Fix a a bug where nfd-master with NodeFeature API enabled would crash
when NodeFeatureRule objects were processed in the case where no
NodeFeature objects existed. This was caused by trying to insert values
into a non-initialized NodeFeatureSpec in the code.

This patch adds two safety measures to prevent that from happening in
the future. First, add a constructor function for the NodeFeatureSpec
type, and second, check for uninitialized object in the function
inserting new functions.

TODO: add unit tests for the API helper functions.
2023-04-21 19:24:08 +03:00
Carlos Eduardo Arango Gutierrez
ae22031547
pkg/nfd-master/nfd-master.go: Fix typo
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-04-21 16:17:11 +02:00
Markus Lehtonen
37306662fe nfd-master: don't create emtpy annotations
Make the nfd.node.kubernetes.io/feature-labels and
nfd.node.kubernetes.io/extended-resources annotations behave similary to
the taints annotation: only create the annotations if some labels or
extended resources are created.
2023-04-21 16:14:17 +03:00
Markus Lehtonen
f0f6bbcf36 nfd-master: configure before prune
Otherwise prune will crash because of uninitialized configuration.
2023-04-20 20:38:11 +03:00
Markus Lehtonen
32db081f3a nfd-master: support noPublish with -prune
Better this way than to crash which is what currently happens with this
combination.
2023-04-19 15:58:06 +03:00
Markus Lehtonen
18f7bfa8e8 generate: update mockery to v2.25.1
Bump the vektra/mockery tool to the latest release.
2023-04-19 13:33:42 +03:00
Markus Lehtonen
117baac1a6 generate: update protoc to v22.3 2023-04-19 10:44:55 +03:00
Markus Lehtonen
ca7ed04a34 generate: update auto-generated code
Re-run "make generate".
2023-04-19 09:49:17 +03:00
Markus Lehtonen
e2d5ba1a2b pkg/podres: update mocked PodResourcesListerClient
Update mocked implementation of
k8s.io/kubelet/pkg/apis/podresources/v1.PodResourcesListerClient. The
mocked implementation is moved to a separate "mocks" subpackage as it's
for an external interface.

This patch also adds code for auto-generation for the mocked interface.
2023-04-18 20:51:51 +03:00
Kubernetes Prow Robot
8d71ed6755
Merge pull request #1086 from AhmedGrati/feat-support-builtin-kernel-mods
feat: support builtin kernel mods
2023-04-13 10:30:40 -07:00
Markus Lehtonen
6b2d10753f nfd-master: re-try on node update failures
Change the NFD API handler to re-try on node update failures. Will work
around transient failures, making sure that failed nodes (i.e. nodes
that we failed to update) don't need to wait for the 1 hour resync
period before being tried again.
2023-04-13 16:30:31 +03:00
AhmedGrati
109caa1f28 feat: support builtin kernel mods
This PR adds the combination of dynamic and builtin kernel modules into
one feature called `kernel.enabledmodule`. It's a superset of the
`kernel.loadedmodule` feature.

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-04-13 10:19:24 +01:00
Markus Lehtonen
70ac19ea66 nfd-master: increase controller resync period to 1 hour
Increase the NFD API controller resync period from 5 minutes to 1 hour.
The resync causes nfd-master to replay all NodeFeature and
NodeFeatureRule objects, being effectively a "big hammer reset all"
button. This should only be needed as an "insurance" to fix labels et al
in case they have been manually tampered (outside NFD) and against
certain bugs in nfd itself. NFD is not supposed to manage anything
fast-changing so 1 hour should be enough.

This change only affects behavior when the NodeFeature API has been
enabled (with -enable-nodefeature-api).
2023-04-12 16:38:47 +03:00
Kubernetes Prow Robot
ad07829d0a
Merge pull request #1099 from ArangoGutierrez/extended_resources_v2
Create extended resources with NodeFeatureRule
2023-04-07 08:09:15 -07:00
Fabiano Fidêncio
250aea4741
Create extended resources with NodeFeatureRule
Add support for management of Extended Resources via the
NodeFeatureRule CRD API.

There are usage scenarios where users want to advertise features
as extended resources instead of labels (or annotations).

This patch enables the discovery of extended resources, via annotation
and patch of node.status.capacity and node.status.allocatable. By using
the NodeFeatureRule API.

Co-authored-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
Co-authored-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-04-07 16:14:56 +02:00
Markus Lehtonen
f64c23968a nfd-master: fix node update
Update node status before node metadata. This fixes a problem where we
lose track of NFD-managed extended resources in case patching node
status fails. Previously we removed all labels and annotations
(including the one listing our ERs) and only after that updated node
status. If node status update failed we had lost the annotation but
extended resources were still there, leaving them orphaned.
2023-04-06 22:04:35 +03:00
Markus Lehtonen
cc6c20ff5f nfd-master: disallow unprefixed and kubernetes taints
Disallow taints having a key with "kubernetes.io/" or "*.kubernetes.io/"
prefix. This is a precaution to protect the user from messing up with
the "official" well-known taints from Kubernetes itself. The only
exception is that the "nfd.node.kubernetes.io/" prefix is allowed.

However, there is one allowed NFD-specific namespace (and its
sub-namespaces) i.e. "feature.node.kubernetes.io" under the
kubernetes.io domain that can be used for NFD-managed taints.

Also disallow unprefixed taint keys. We don't add a default prefix to
unprefixed taints (like we do for labels) from NodeFeatureRules. This is
to prevent unpleasant surprises to users that need to manage matching
tolerations for their workloads.
2023-04-06 16:12:37 +03:00
Kubernetes Prow Robot
193c552b33
Merge pull request #1084 from AhmedGrati/feat-add-master-config-file
feat: add master config file
2023-04-04 10:41:40 -07:00
AhmedGrati
3fff409f6d Add master config file
Similar to the nfd-worker, in this PR we want to support the
dynamic run-time configurability through a config file for the nfd-master.

We'll use a json or yaml configuration file along with the fsnotify in
order to watch for changes in the config file. As a result, we're
allowing dynamic control of logging params, allowed namespaces,
extended resources, label whitelisting, and denied namespaces.

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-04-03 09:52:09 +01:00
AhmedGrati
d0a6289c0f chore: add debug dump of nfd worker configuration
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-03-18 00:49:07 +01:00
Kubernetes Prow Robot
13f92faa77
Merge pull request #1031 from k8stopologyawareschedwg/reactive_updates
topology-updater: reactive updates
2023-03-17 10:13:17 -07:00
Talor Itzhak
5c6be580f4 reactive updates: add an option to disable the feature
Access to the kubelet state directory may raise concerns in some setups, added an option to disable it.
The feature is enabled by default.

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
2023-03-16 11:53:16 +02:00
Kubernetes Prow Robot
a06e44ef0b
Merge pull request #1083 from fmuyassarov/mockery
codegen: fix code-generation
2023-03-15 06:46:16 -07:00
Markus Lehtonen
4a8fc811be pkg/utils: add UnmarshalJSON method to StringSetVal
Make it possible to specify values in yaml as an array like

  conf:
    - foo
    - bar

Instead of unwieldy map like

  conf:
    foo:
    bar:
2023-03-14 10:53:24 +02:00
Talor Itzhak
8924213d14 topology-updater: make it possible to disable sleep-interval
Especially convenient for testing porpuses and
completely harmless

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
2023-03-12 12:43:17 +02:00
Talor Itzhak
1c12876815 topology-updater: log event type that triggered update
Specify the event type as part of the log message.
In order to reduce the log volume, make it V4

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
2023-03-12 12:37:24 +02:00
Talor Itzhak
7b248ecae2 topology-updater: update CRs when notified
When a message received via the channel,
the main loop updates the `NodeResourceTopology` objects.

The notifier will send a message via the channel if:
1. It reached the sleep timeout.
2. It detected a change in Kubelet state files

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
2023-03-12 12:37:24 +02:00
Talor Itzhak
175e0c81aa topology-updater: add kubelet-state-dir flag
On different Kubernetes flavors like OpenShift for exmaple,
the Kubelet state directory path is different. make it configurable
for maximum flexability.

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
2023-03-12 12:37:24 +02:00
Talor Itzhak
0f65b87329 kubeletnotifier: introduce kubeletnotifier package
Enabling reactive update for nfd-topology-updater
by detecting changes in Kubelet state/checkpoint files,
and signaling to the main loop to update the NodeResourceTopology
objects.

This has high value when scaling is an issue.
Having multiple pods deployed in between single update instance
might reflect incorrect resource accounting in the NRT CRs.
Example:
Time Interval = 5s
t0 - New update sent to NRT CRs
t1 - Schedule guaranteed podA
t2 - Schedule guaranteed podB
time elapsed between t0-t2 < 5 seconds,
IOW the update on t0 is the recent update.

In t2 the resource accounting reflected by NRT
is not aligned with the actual accounting because
NRT CRs doesn't reflect the change happened in t1.

With this reactive update feature we expect an update to be trigger
between t1 and t2 so the NRT objects will reflect more accurate
picture.

There still might be a scenario when the updates
aren't fast enough, but this is an additional
future planned optimization.

The notifier has two event types:
1. Time based - keeping the old behavior, trigger
an update per interval.
2. FS event - trigger an update when Kubelet state/checkpoint files modified.

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
2023-03-12 12:37:24 +02:00
Muyassarov, Feruzjon
e3a856b405 update re-generated code with make-generate results
Update generated code based on the updated from re-running make
generate.

Signed-off-by: Muyassarov, Feruzjon <feruzjon.muyassarov@intel.com>
2023-03-11 22:15:11 +02:00
Jose Luis Ojosnegros Manchón
b340d112a8 topology-updater:compute pod set fingerprint
Add an option to compute the fingerprint of the current pod set on each
node.

Report this new fingerprint using an attribute in NRT object.
2023-02-22 10:22:50 +01:00
Jose Luis Ojosnegros Manchón
1a687cb286 topology-updater: Refactor Scan to expand response
We are gonna add new data to Scan response so better introduce a new
ScanResponse struct as Scan return value to make it easier.
2023-02-22 09:56:28 +01:00
Kubernetes Prow Robot
a92614c292
Merge pull request #1051 from AhmedGrati/feat-add-deny-label-ns-with-wildcard
feat: add deny-label-ns flag which supports wildcard
2023-02-15 03:42:25 -08:00
Kubernetes Prow Robot
38cc370e69
Merge pull request #1054 from PiotrProkop/use-new-nrt-api
Advertise TopologyManger policy and scope as Attributes in NRT api v1alpha2
2023-02-15 01:12:25 -08:00
AhmedGrati
b499799364 feat: add deny-label-ns flag which supports wildcard
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-02-15 09:47:00 +01:00
PiotrProkop
f76fc5bf6b Read Kubelet configuration the same way as Kubelet to apply default values
Signed-off-by: PiotrProkop <pprokop@nvidia.com>
2023-02-15 09:27:25 +01:00
Ville Pihlava
b1c6b229fe Add discovery duration logging. 2023-02-13 12:55:57 +02:00
pprokop
5484babcb1 Advertise TopologyManger policy and scope as Attributes
Signed-off-by: pprokop <pprokop@nvidia.com>
2023-02-10 12:03:11 +01:00
Kubernetes Prow Robot
ac271b3c29
Merge pull request #1050 from VillePihlava/interval-fix
Change nfd-worker to use Ticker instead of After.
2023-02-09 07:54:22 -08:00
Ville Pihlava
2101cb20e4 Change nfd-worker to use Ticker instead of After. 2023-02-09 17:14:39 +02:00
Jose Luis Ojosnegros Manchón
2967f3307a nrt-api: move from v1alpha1 to v1alpha2 2023-02-09 12:29:54 +01:00
Carlos Eduardo Arango Gutierrez
9b3171bce2
nfd-master: always start gRPC server
Don't register gRPC LabelServer when using the NodeFeature option, only
turn the gRPC server on for Health and Readiness probes.
2023-01-16 19:33:15 +01:00
Kubernetes Prow Robot
ea921a8b14
Merge pull request #1024 from PiotrProkop/nrt-garbage-collector
Add NRT garbage collector
2023-01-11 01:59:44 -08:00
PiotrProkop
59afae50ba Add NodeResourceTopology garbage collector
NodeResourceTopology(aka NRT) custom resource is used to enable NUMA aware Scheduling in Kubernetes.
As of now node-feature-discovery daemons are used to advertise those
resources but there is no service responsible for removing obsolete
objects(without corresponding Kubernetes node).

This patch adds new daemon called nfd-topology-gc which removes old
NRTs.

Signed-off-by: PiotrProkop <pprokop@nvidia.com>
2023-01-11 10:15:21 +01:00
PiotrProkop
1bae2867e2 Release v0.0.13 of NodeResourceTopology API added missing TopologyManagerPolicy.
Expose new policies:
* RestrictedContainerLevel
* RestrictedPodLevel
* BestEffortContainerLevel
* BestEffortPodLevel

Signed-off-by: PiotrProkop <pprokop@nvidia.com>
2023-01-09 16:02:12 +01:00
Kubernetes Prow Robot
8eb6640754
Merge pull request #1020 from marquiz/devel/worker-refactor
worker: move code
2022-12-27 00:45:34 -08:00
Kubernetes Prow Robot
e97b2c1579
Merge pull request #1017 from marquiz/devel/nfd-api-optional-fields
apis/nfd: make all fields in NodeFeatureSpec optional
2022-12-27 00:45:28 -08:00
Markus Lehtonen
1026d91d12 worker: move code
Simplify code bu dropping the unnecessary base client package.
2022-12-23 11:38:21 +02:00
Markus Lehtonen
0283f68702 topology-updater: move code
Move and rename the Go package. It has nothing to do with NFD gRPC
client anymore so move it out of the nfd-client package.
2022-12-23 11:37:46 +02:00
Markus Lehtonen
aa97105854 Add common utility function for getting node name 2022-12-23 09:50:15 +02:00
Markus Lehtonen
dfda9bccad apis/nfd: update auto-generated code 2022-12-22 17:58:20 +02:00
Markus Lehtonen
a4fc15a424 apis/nfd: make all fields in NodeFeatureSpec optional
Don't require features to be specified. The creator possibly only wants
to create labels or only some types of features. No need to specify
empty structs for the unused fields.
2022-12-22 17:53:42 +02:00
Markus Lehtonen
f5ae3fe2c7 Simplify usage of ObjectMeta fields
No need to explicitly spell out ObjectMeta as it's embedded in the
object types.
2022-12-19 17:40:10 +02:00
Kubernetes Prow Robot
28a5daa338
Merge pull request #999 from marquiz/fixes/nodefeature-missing
nfd-master: update node if no NodeFeature objects are present
2022-12-19 00:39:44 -08:00