1
0
Fork 0
mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2024-12-14 11:57:51 +00:00
Commit graph

399 commits

Author SHA1 Message Date
Markus Lehtonen
5171ae0f90 Refactor metrics
Move common boilerplate code under pkg/utils.
2023-10-09 10:49:12 +03:00
Markus Lehtonen
1d8a83b045 nfd-master: stop creating NFD version annotations
We now have metrics for getting detailed information about the NFD
instances running. There should be no need to pollute the node object
with NFD version annotations.

One problem with the annotations also that they were incomplete in the
sense that they only covered nfd-master and nfd-worker but not
nfd-topology-updater or nfd-gc.

Also, there was a problem with stale annotations, giving misleading
information. E.g. there was no way to remove old/stale master.version
annotations if nfd-master was scheduled on another node where it was
previously running.
2023-10-05 14:53:29 +03:00
Markus Lehtonen
9ea0a1b420 nfd-master: correctly clean up annotations
Delete correct annotations if -instance is specified.
2023-10-05 11:10:06 +03:00
Markus Lehtonen
dbf00dcda6 apis/nfd: drop one stale comment line
Drop a leftover "docstring" comment that wasn't removed with the type it
refers to.
2023-09-27 14:23:12 +03:00
Markus Lehtonen
b09ce75c8e nfd-master: fix filtering of extended resources
Fix a bug in checking the allowed ".feature.node.kubernetes.io" ns
suffix for extended resources. Also update e2e-tests to cover this case.
2023-09-27 10:55:11 +03:00
AhmedGrati
7ab6314bdc chore: introduce a commong klog handling for cmd/nfd-*
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-09-07 22:38:15 +01:00
AhmedGrati
b0be40aa09 feat: add logging parameters in configuration file for nfd master
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-09-06 15:27:27 +01:00
Kubernetes Prow Robot
19520c079c
Merge pull request #1325 from ffromani/nfd-updater-fix-events
nfd-updater: events: enable timer-only flow
2023-09-04 05:47:49 -07:00
Francesco Romani
000c919071 nfd-updater: events: enable timer-only flow
The nfd-topology-updater has state-directories notification mechanism
enabled by default.
In theory, we can have only timer-based updates, but if the option
is given to disable the state-directories event source, then all
the update mechanism is mistakenly disabled, including the
timer-based updates.

The two updaters mechanism should be decoupled.
So this PR changes this to make sure we can enable just and only
the timer-based updates.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-09-04 13:05:50 +02:00
Kubernetes Prow Robot
f852c32a55
Merge pull request #1252 from AhmedGrati/test-add-updater-pool-unit-tests
test: add node updater pool unit tests
2023-09-01 07:34:32 -07:00
Kubernetes Prow Robot
e1f90a233b
Merge pull request #1305 from marquiz/devel/nf-gc
Garbage collection of NodeFeature objects
2023-08-28 02:59:42 -07:00
Kubernetes Prow Robot
6d95e59cd0
Merge pull request #1290 from marquiz/devel/metrics-new
metrics: additional metrics for nfd-master
2023-08-28 02:07:42 -07:00
Markus Lehtonen
e3415ec484 nfd-gc: support garbage collection of NodeFeatures
Hook into the same logic already exercised for NodeResourceTopology
objects: GC watches for node delete events and immediately drops stale
objects (NRT and now also NF). In addition there is a periodic resync to
catch any missed node deletes, once every hour by default.
2023-08-22 21:24:26 +03:00
Markus Lehtonen
01c08d67b6 Rename nfd-topology-gc to nfd-gc
This is preparation for making it a generic garbage collector for all
nfd-managed api objects.
2023-08-21 21:46:11 +03:00
Kubernetes Prow Robot
e0c477090b
Merge pull request #1311 from marquiz/devel/refactor-gc-5
topology-gc: simplify listing of node objects
2023-08-21 11:40:05 -07:00
Markus Lehtonen
f05b0e26ea topology-gc: move initial GC out of startNodeInformer()
Small refactor. Contextually this feels more like under periodicGC().
2023-08-21 10:11:46 +03:00
Kubernetes Prow Robot
a60502a313
Merge pull request #1307 from marquiz/devel/refactor-gc
topology-gc: refactor unit tests
2023-08-21 00:09:23 -07:00
Kubernetes Prow Robot
536f9d17d0
Merge pull request #1295 from marquiz/devel/topology-updater-metrics
nfd-topology-updater: add metrics support
2023-08-20 23:25:24 -07:00
Markus Lehtonen
2e8da8849a topology-gc: simplify listing of node objects
Hopefully makes the code slightly more readable.
2023-08-21 09:13:41 +03:00
Markus Lehtonen
0b5e51bd35 topology-gc: refactor unit tests
Remove a lot of boilerplate code by defining reusable functions.

Also, test the Run() method instead of the functions callees of Run() as
it is the top level functionality that was tested in practice (we don't
have separate unit tests for the callee functions).
2023-08-21 09:10:24 +03:00
Kubernetes Prow Robot
4674bce27d
Merge pull request #1310 from marquiz/devel/refactor-gc-4
topology-gc: rename runGC to garbageCollect()
2023-08-18 11:26:34 -07:00
Kubernetes Prow Robot
f4cf4877f2
Merge pull request #1309 from marquiz/devel/refactor-gc-3
topology-gc: rename run()
2023-08-18 11:26:28 -07:00
Markus Lehtonen
ec51b29b3c topology-gc: rename runGC to garbageCollect()
One less function named run.
2023-08-18 17:57:05 +03:00
Markus Lehtonen
98b0b36b87 topology-gc: rename run()
Too many run methods here.
2023-08-18 17:52:11 +03:00
Markus Lehtonen
108d603bdc topology-gc: fix Stop
The stop channel has multiple readers to we need to close it so that all
of the readers get notified.
2023-08-18 17:46:54 +03:00
Kubernetes Prow Robot
9d61b19454
Merge pull request #1287 from freelizhun/fix-empty-hugepages
fix empty hugepages in some numa nodes caused no such file or directory errors
2023-08-08 02:50:16 -07:00
lizhun
a4ad3d4411 fix empty hugepages in some numa nodes caused no such file or directory error
Signed-off-by: lizhun <lizhun@kylinos.cn>
2023-08-08 15:14:44 +08:00
Markus Lehtonen
5ad2294c14 metrics: add nfd_node_update_requests_total counter
Add a counter for total number of node update/sync requests. In
practice, this counts the number of gRPC requests received if the gRPC
API is in use. If the NodeFeature API is enabled, this counts the
requests initiated by the NFD API controller, i.e. updates triggered by
changes in NodeFeature or NodeFeatureRule objects plus updates initiated
by the controller resync period.
2023-08-07 09:37:29 +03:00
Markus Lehtonen
4b24cc1afa metrics: counters for rejected labels, extended resources and taints
Add counters for labels, extended resources and taints rejected/filtered
out by nfd-master.
2023-08-07 09:37:29 +03:00
Markus Lehtonen
a8a29e6df2 metrics: add nfd_nodefeaturerule_processing_errors_total counter
Add a counter for errors encountered when processing NodeFeatureRules.
Another simple counter without any additional prometheus labels -
nfd-master logs can provide further details.
2023-08-07 09:37:29 +03:00
Markus Lehtonen
b90f2c318e metrics: add nfd_node_update_failures_total counter
Add a new counter for tracking node update failures from nfd-master.
This tracks both normal feature updates and the --prune sub-command.
This is a simple counter without any additional labels - nfd-master logs
can be used for further diagnostics.
2023-08-07 09:37:27 +03:00
Markus Lehtonen
06b333db1e nfd-topology-updater: add metrics support
For now, add only one metric, a counter for the errors occurring while
scanning pod resources on the node.
2023-08-04 16:48:37 +03:00
Markus Lehtonen
039378c725 nfd-master: use term node update instead of labeling
Rename symbols and reword log messages to correlate with the
functionality (we may do other updates than just modify labels
nowadays).
2023-08-01 16:42:34 +03:00
Markus Lehtonen
d8f167d8a9 nfd-master: remove one stale empty line 2023-08-01 16:38:32 +03:00
Kubernetes Prow Robot
c1cb63243b
Merge pull request #1288 from marquiz/devel/metrics
Improve metrics
2023-07-31 10:38:39 -07:00
Markus Lehtonen
5091fef84b metrics: improve feature discovery duration metric
Rename the "NodeName" prometheus label to  "node", aligning  with
common prometheus/kubernetes conventions. Also reconfigure the
prometheus histogram buckets (now 10ms to 1s) to better match the
expected sample range.
2023-07-31 19:45:22 +03:00
Markus Lehtonen
47f621d970 metrics: improve the node updates gauge
Rename the metric, better describe what we're measuring and better
comply with prometheus naming conventions. Also change it to represent
actual updates of the node object on the Kubernetes apiserver.
2023-07-31 19:45:22 +03:00
Markus Lehtonen
945e7fcb3f metrics: improve nfr processing time metric
Change the metric from a simple gauge (that basically was a single value
for the whole cluster) into a HistogramVec, aligning with the feature
discovery duration metric in nfd-worker. This improved metric now has
prometheus labels for the NFR name and node name, i.e. it is tracking
per-NFR metric for each node being processed. Also, change the naming to
better comply with prometheus suggested conventions.
2023-07-31 19:45:22 +03:00
Kubernetes Prow Robot
01ca8cb91d
Merge pull request #1284 from marquiz/devel/generator-deps
generate: bump tools to their latest versions
2023-07-31 06:32:39 -07:00
Kubernetes Prow Robot
e0f10a81de
Merge pull request #1256 from PiotrProkop/fix-topo-updater-policy-and-scope-advertisment
Fix Topology Manager policy and scope not being updated after NRT creation
2023-07-28 00:25:54 -07:00
Markus Lehtonen
7e375ad1f0 generate: bump tools to their latest versions
Bump tools versions and re-auto-generate files.
2023-07-27 14:29:48 +03:00
Kubernetes Prow Robot
77d869c4f7
Merge pull request #1242 from ArangoGutierrez/metrics
Enable metrics via prometheus operator
2023-07-21 02:26:08 -07:00
Carlos Eduardo Arango Gutierrez
e3aedd33e2
Enable metrics via prometheus operator
Expose metrics via prometheus.monitoring.coreos.com/v1

The exposed metrics are

| Metric        | Type | Meaning |
| --------------- | ---------------- | ---------------- |
|  `nfd_master_build_info`           | Gauge | Version from which nfd-master was built. |
|  `nfd_worker_build_info`           | Gauge | Version from which nfd-worker was built. |
|  `nfd_updated_nodes`           |  Counter | Time taken to label a node |
|  `nfd_crd_processing_time`          |  Gauge | Time taken to process a NodeFeatureRule CRD |
| `nfd_feature_discovery_duration_seconds` |  HistogramVec | Time taken to discover features on a node |

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
2023-07-21 10:59:52 +02:00
pprokop
6d98b6150b Fix Topology Manager policy and scope not being updated properly
NFD is only detecting policy and scope of Topology Manager when NRT object doesn't exist.
This means that topologyManagerScope and topologyManagerPolicy attributes won't be updated
even if kubelet config was changed to use other TopologyManager policy and scope.

Signed-off-by: pprokop <pprokop@nvidia.com>
2023-07-20 16:31:12 +02:00
AhmedGrati
8e55d78d85 test: add node updater pool unit tests
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-07-19 12:03:35 +01:00
Markus Lehtonen
dac45be28c nfd-master: check for nil references in nfdAPIUpdateAllNodes
Just a safeguard.
2023-07-17 17:49:44 +03:00
hang.jiang
698031fc2d Stop ticker in time to avoid memory leak
Because it will cause memory leak if we do not stop ticker when the function has completed.

Signed-off-by: hang.jiang <hang.jiang@daocloud.io>
2023-07-05 18:35:01 +08:00
guoguangwu
b946bcc0f5 nfd-master-internal_test.go rm pkg imported twice
Signed-off-by: guoguangwu <guoguangwu@magic-shield.com>
2023-06-21 16:53:55 +08:00
Kubernetes Prow Robot
306969a945
Merge pull request #1133 from AhmedGrati/feat-parallelize-nodes-update
feat: parallelize nodes update
2023-06-02 05:28:57 -07:00
AhmedGrati
b3cfe17392 feat: parallelize nodes update
This PR aims to optimize the process of updating nodes with
corresponding features. In fact, previously, we were updating nodes
sequentially even though they are independent from each other.
Therefore, we integrated new components: LabelersNodePool which is
responsible for spininng a goroutine whenever there's a request for
updating nodes, and a Workqueue which is responsible for holding nodes names
that should be updated.

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-06-02 11:41:50 +01:00
AhmedGrati
08b9c3486e feat: support dynamic values for labels in the NodeFeatureRule
This PR aims to support the dynamic values for labels in the
NodeFeatureRule CRD, it would offer more flexible labeling for users.
To achieve this, we check whether label value starts with "@", and if
it's the case, we will get the value of the feature value, and update
the value of the label with the feature value.

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-05-31 23:30:26 +01:00
Markus Lehtonen
bf670de68d pkg/utils: migrate KlogDump to structured logging
Drop the KlogDump helper in favor of klog.InfoS. However, that patch
introduces a new DelayedDumper() helper to avoid processing
(marshalling) of object unless really evaluated by the logging function.
2023-05-31 14:43:08 +03:00
Markus Lehtonen
4947ebf336 pkg/util: migrate to structured logging
We gRPC logging interface is not compatible with structured logging so
grpcLogger is left intact.
2023-05-31 14:43:08 +03:00
Markus Lehtonen
64d5af016e apis/nfd: migrate to structured logging 2023-05-31 14:43:08 +03:00
Markus Lehtonen
6e3b181ab4 topology-updater: migrate to structured logging 2023-05-31 14:43:08 +03:00
Markus Lehtonen
7be08f9e7f nfd-worker: migrate to structured logging 2023-05-31 14:43:08 +03:00
Markus Lehtonen
8113d651c2 nfd-master: migrate to structured logging 2023-05-31 14:43:05 +03:00
Markus Lehtonen
2a3c7e4c93 nfd-master: add validation of label names and values
Validate labels before trying to update the node. Makes us fail early
nad prevent useless retries in case invalid labels are tried.
2023-05-29 16:54:14 +03:00
Markus Lehtonen
1809c24314 nfd-master: use close for stop channel
Simpler and more reliable (in case of multiple consumers) to just close
the channel.
2023-05-24 16:51:48 +03:00
PiotrProkop
272fd4784f Add new flag enable-leader-election for nfd-master.
It allows NFD-master to be run in active-passive way when running
multiple instances of NFD-master to prevent multiple components
from updating same custom resources.

Signed-off-by: PiotrProkop <pprokop@nvidia.com>
2023-05-15 13:30:07 +02:00
Kubernetes Prow Robot
85073525c3
Merge pull request #1185 from AhmedGrati/fix-resync-period-functionality
nfd-master: fix resync period config option
2023-05-02 11:14:16 -07:00
AhmedGrati
87c2d7e184 nfd-master: fix resync period config option
This PR fixes the resync-period configuration option of the nfd-master.
In fact, previously, changes were not reflected in the nfd-master at
runtime. e2e tests are also implemented to make sure that the fix is
already working as expected.

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-05-02 13:17:01 +02:00
Markus Lehtonen
fb20388028 nfd-master: refactor filtering of taints 2023-04-28 18:13:54 +03:00
Markus Lehtonen
43ced0c1a1 nfd-master: refactor filtering of feature labels
More consistent error messages. Also preparation for dynamic labels
values (that '@' notation currently supported for extended resources).
2023-04-28 18:13:54 +03:00
Markus Lehtonen
6ca687fbef nfd-master: refactor filtering of extended resources
Simplify code a bit and get more consistent error messages (in addition
to fixing some of those).
2023-04-28 18:13:54 +03:00
Markus Lehtonen
131325fb2c nfd-master: refactor api-controller object handling
Split out resolving of node name (of the node to be updated) into a
separate function. Makes it possible to add unit tests. Also. do
unconditional type casting in the handler functions – that shouldn't
fail unless there is a really serious internal inconsistency in the
codebase so it should be ok to panic.
2023-04-28 17:33:33 +03:00
Kubernetes Prow Robot
d84248bc7d
Merge pull request #1190 from marquiz/devel/api-unit-tests
apis/nfd: add unit tests for Feature type
2023-04-26 23:32:15 -07:00
Markus Lehtonen
77011a775f nfd-master: log node name when processing NodeFeatureRules 2023-04-26 07:22:30 +03:00
Markus Lehtonen
dda7b195ee apis/nfd: add unit tests for Feature type 2023-04-25 19:40:35 +03:00
Kubernetes Prow Robot
54bd4c5d74
Merge pull request #1167 from PiotrProkop/fix-reactive-updates
nfd-topology-updater: fix wrong kubelet_internal_checkpoint path and compare basename to full path
2023-04-24 04:41:01 -07:00
pprokop
5a9a12151c nfd-topology-updater: fix kubelet state file notifier
- kubelet_internal_checkpoint file is in /var/lib/kubelet/device-plugins not /var/lib/kubelet
  fsWatcher doesn't watch dirs recursively
- e.Name returned from fsWatcher events is a full path not a basename

Signed-off-by: pprokop <pprokop@nvidia.com>
2023-04-24 13:21:56 +02:00
Kubernetes Prow Robot
2356223ffc
Merge pull request #1139 from AhmedGrati/feat-configure-master-resync
feat: add master resync period configurability
2023-04-24 03:49:02 -07:00
AhmedGrati
7917434d38 feat: add master resync period configurability
This PR adds a config option for setting the NFD API controller resync period.
The resync period is only activated when the NodeFeature API has been
enabled (with -enable-nodefeature-api).

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-04-24 11:52:38 +02:00
Kubernetes Prow Robot
64fe26ed37
Merge pull request #1169 from ArangoGutierrez/i1168
nfd-master: reject malformed extended resource dynamic capacity assignment
2023-04-24 00:17:15 -07:00
Carlos Eduardo Arango Gutierrez
f5df7b658c
nfd-master: reject malformed extended resource dynamic capacity assignment
Reject malformed extended resource dynamic capacity assignment
capacity should be in the form of domain.feature.element,
add logic at func filterExtendedResources to check if true or ignore
ExtendedResource, logging as an error.

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-04-22 08:43:50 +02:00
Kubernetes Prow Robot
d5bccda7c5
Merge pull request #1171 from ArangoGutierrez/foundon_typo
pkg/nfd-master/nfd-master.go: Fix typo
2023-04-21 12:21:11 -07:00
Kubernetes Prow Robot
c2c1e18908
Merge pull request #1173 from marquiz/devel/fix-master
nfd-master: fix a crash when processing NodeFeatureRules
2023-04-21 09:49:11 -07:00
Markus Lehtonen
9523f1e411 nfd-master: fix a crash when processing NodeFeatureRules
Fix a a bug where nfd-master with NodeFeature API enabled would crash
when NodeFeatureRule objects were processed in the case where no
NodeFeature objects existed. This was caused by trying to insert values
into a non-initialized NodeFeatureSpec in the code.

This patch adds two safety measures to prevent that from happening in
the future. First, add a constructor function for the NodeFeatureSpec
type, and second, check for uninitialized object in the function
inserting new functions.

TODO: add unit tests for the API helper functions.
2023-04-21 19:24:08 +03:00
Carlos Eduardo Arango Gutierrez
ae22031547
pkg/nfd-master/nfd-master.go: Fix typo
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-04-21 16:17:11 +02:00
Markus Lehtonen
37306662fe nfd-master: don't create emtpy annotations
Make the nfd.node.kubernetes.io/feature-labels and
nfd.node.kubernetes.io/extended-resources annotations behave similary to
the taints annotation: only create the annotations if some labels or
extended resources are created.
2023-04-21 16:14:17 +03:00
Markus Lehtonen
f0f6bbcf36 nfd-master: configure before prune
Otherwise prune will crash because of uninitialized configuration.
2023-04-20 20:38:11 +03:00
Markus Lehtonen
32db081f3a nfd-master: support noPublish with -prune
Better this way than to crash which is what currently happens with this
combination.
2023-04-19 15:58:06 +03:00
Markus Lehtonen
18f7bfa8e8 generate: update mockery to v2.25.1
Bump the vektra/mockery tool to the latest release.
2023-04-19 13:33:42 +03:00
Markus Lehtonen
117baac1a6 generate: update protoc to v22.3 2023-04-19 10:44:55 +03:00
Markus Lehtonen
ca7ed04a34 generate: update auto-generated code
Re-run "make generate".
2023-04-19 09:49:17 +03:00
Markus Lehtonen
e2d5ba1a2b pkg/podres: update mocked PodResourcesListerClient
Update mocked implementation of
k8s.io/kubelet/pkg/apis/podresources/v1.PodResourcesListerClient. The
mocked implementation is moved to a separate "mocks" subpackage as it's
for an external interface.

This patch also adds code for auto-generation for the mocked interface.
2023-04-18 20:51:51 +03:00
Kubernetes Prow Robot
8d71ed6755
Merge pull request #1086 from AhmedGrati/feat-support-builtin-kernel-mods
feat: support builtin kernel mods
2023-04-13 10:30:40 -07:00
Markus Lehtonen
6b2d10753f nfd-master: re-try on node update failures
Change the NFD API handler to re-try on node update failures. Will work
around transient failures, making sure that failed nodes (i.e. nodes
that we failed to update) don't need to wait for the 1 hour resync
period before being tried again.
2023-04-13 16:30:31 +03:00
AhmedGrati
109caa1f28 feat: support builtin kernel mods
This PR adds the combination of dynamic and builtin kernel modules into
one feature called `kernel.enabledmodule`. It's a superset of the
`kernel.loadedmodule` feature.

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-04-13 10:19:24 +01:00
Markus Lehtonen
70ac19ea66 nfd-master: increase controller resync period to 1 hour
Increase the NFD API controller resync period from 5 minutes to 1 hour.
The resync causes nfd-master to replay all NodeFeature and
NodeFeatureRule objects, being effectively a "big hammer reset all"
button. This should only be needed as an "insurance" to fix labels et al
in case they have been manually tampered (outside NFD) and against
certain bugs in nfd itself. NFD is not supposed to manage anything
fast-changing so 1 hour should be enough.

This change only affects behavior when the NodeFeature API has been
enabled (with -enable-nodefeature-api).
2023-04-12 16:38:47 +03:00
Kubernetes Prow Robot
ad07829d0a
Merge pull request #1099 from ArangoGutierrez/extended_resources_v2
Create extended resources with NodeFeatureRule
2023-04-07 08:09:15 -07:00
Fabiano Fidêncio
250aea4741
Create extended resources with NodeFeatureRule
Add support for management of Extended Resources via the
NodeFeatureRule CRD API.

There are usage scenarios where users want to advertise features
as extended resources instead of labels (or annotations).

This patch enables the discovery of extended resources, via annotation
and patch of node.status.capacity and node.status.allocatable. By using
the NodeFeatureRule API.

Co-authored-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
Co-authored-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-04-07 16:14:56 +02:00
Markus Lehtonen
f64c23968a nfd-master: fix node update
Update node status before node metadata. This fixes a problem where we
lose track of NFD-managed extended resources in case patching node
status fails. Previously we removed all labels and annotations
(including the one listing our ERs) and only after that updated node
status. If node status update failed we had lost the annotation but
extended resources were still there, leaving them orphaned.
2023-04-06 22:04:35 +03:00
Markus Lehtonen
cc6c20ff5f nfd-master: disallow unprefixed and kubernetes taints
Disallow taints having a key with "kubernetes.io/" or "*.kubernetes.io/"
prefix. This is a precaution to protect the user from messing up with
the "official" well-known taints from Kubernetes itself. The only
exception is that the "nfd.node.kubernetes.io/" prefix is allowed.

However, there is one allowed NFD-specific namespace (and its
sub-namespaces) i.e. "feature.node.kubernetes.io" under the
kubernetes.io domain that can be used for NFD-managed taints.

Also disallow unprefixed taint keys. We don't add a default prefix to
unprefixed taints (like we do for labels) from NodeFeatureRules. This is
to prevent unpleasant surprises to users that need to manage matching
tolerations for their workloads.
2023-04-06 16:12:37 +03:00
Kubernetes Prow Robot
193c552b33
Merge pull request #1084 from AhmedGrati/feat-add-master-config-file
feat: add master config file
2023-04-04 10:41:40 -07:00
AhmedGrati
3fff409f6d Add master config file
Similar to the nfd-worker, in this PR we want to support the
dynamic run-time configurability through a config file for the nfd-master.

We'll use a json or yaml configuration file along with the fsnotify in
order to watch for changes in the config file. As a result, we're
allowing dynamic control of logging params, allowed namespaces,
extended resources, label whitelisting, and denied namespaces.

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-04-03 09:52:09 +01:00
AhmedGrati
d0a6289c0f chore: add debug dump of nfd worker configuration
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-03-18 00:49:07 +01:00
Kubernetes Prow Robot
13f92faa77
Merge pull request #1031 from k8stopologyawareschedwg/reactive_updates
topology-updater: reactive updates
2023-03-17 10:13:17 -07:00
Talor Itzhak
5c6be580f4 reactive updates: add an option to disable the feature
Access to the kubelet state directory may raise concerns in some setups, added an option to disable it.
The feature is enabled by default.

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
2023-03-16 11:53:16 +02:00
Kubernetes Prow Robot
a06e44ef0b
Merge pull request #1083 from fmuyassarov/mockery
codegen: fix code-generation
2023-03-15 06:46:16 -07:00