Markus Lehtonen
039378c725
nfd-master: use term node update instead of labeling
...
Rename symbols and reword log messages to correlate with the
functionality (we may do other updates than just modify labels
nowadays).
2023-08-01 16:42:34 +03:00
Markus Lehtonen
d8f167d8a9
nfd-master: remove one stale empty line
2023-08-01 16:38:32 +03:00
Kubernetes Prow Robot
45dc46ab81
Merge pull request #1289 from marquiz/devel/metrics
...
docs: align metrics documentation with latest changes on naming
2023-08-01 06:20:39 -07:00
Markus Lehtonen
a1406767a9
docs: align metrics documentation with latest changes on naming
...
Also change table formatting and fix one incorrect description.
2023-08-01 15:53:06 +03:00
Kubernetes Prow Robot
c1cb63243b
Merge pull request #1288 from marquiz/devel/metrics
...
Improve metrics
2023-07-31 10:38:39 -07:00
Markus Lehtonen
5091fef84b
metrics: improve feature discovery duration metric
...
Rename the "NodeName" prometheus label to "node", aligning with
common prometheus/kubernetes conventions. Also reconfigure the
prometheus histogram buckets (now 10ms to 1s) to better match the
expected sample range.
2023-07-31 19:45:22 +03:00
Markus Lehtonen
47f621d970
metrics: improve the node updates gauge
...
Rename the metric, better describe what we're measuring and better
comply with prometheus naming conventions. Also change it to represent
actual updates of the node object on the Kubernetes apiserver.
2023-07-31 19:45:22 +03:00
Markus Lehtonen
945e7fcb3f
metrics: improve nfr processing time metric
...
Change the metric from a simple gauge (that basically was a single value
for the whole cluster) into a HistogramVec, aligning with the feature
discovery duration metric in nfd-worker. This improved metric now has
prometheus labels for the NFR name and node name, i.e. it is tracking
per-NFR metric for each node being processed. Also, change the naming to
better comply with prometheus suggested conventions.
2023-07-31 19:45:22 +03:00
Kubernetes Prow Robot
01ca8cb91d
Merge pull request #1284 from marquiz/devel/generator-deps
...
generate: bump tools to their latest versions
2023-07-31 06:32:39 -07:00
Kubernetes Prow Robot
e0f10a81de
Merge pull request #1256 from PiotrProkop/fix-topo-updater-policy-and-scope-advertisment
...
Fix Topology Manager policy and scope not being updated after NRT creation
2023-07-28 00:25:54 -07:00
Markus Lehtonen
7e375ad1f0
generate: bump tools to their latest versions
...
Bump tools versions and re-auto-generate files.
2023-07-27 14:29:48 +03:00
Kubernetes Prow Robot
65b7216313
Merge pull request #1283 from marquiz/docs/deprecation-policy
...
docs: deprecation policy for Helm chart params
2023-07-25 10:46:06 -07:00
Kubernetes Prow Robot
463a737b82
Merge pull request #1277 from marquiz/docs/k8s-compat
...
docs: describe supported Kubernetes versions
2023-07-25 08:54:06 -07:00
Markus Lehtonen
b1328b3166
docs: describe supported Kubernetes versions
2023-07-25 17:40:06 +03:00
Markus Lehtonen
b72b537261
docs: deprecation policy for Helm chart params
2023-07-24 14:06:30 +03:00
Kubernetes Prow Robot
73bdaa2e89
Merge pull request #1282 from jcpunk/podmon-labels
...
Add optional labels to the podmonitor
2023-07-24 03:40:12 -07:00
Pat Riehecky
0523257d1a
Add optional labels to the podmonitor
...
Signed-off-by: Pat Riehecky <riehecky@fnal.gov>
2023-07-21 10:03:50 -05:00
Kubernetes Prow Robot
c9f3550237
Merge pull request #1280 from marquiz/docs/tocs
...
docs: remove useless TOCs
2023-07-21 06:50:15 -07:00
Kubernetes Prow Robot
ebbea564a8
Merge pull request #1278 from marquiz/docs/fixes
...
docs: fix toc of topology-updater and topology-gc reference
2023-07-21 06:50:08 -07:00
Kubernetes Prow Robot
e195e8563f
Merge pull request #1279 from marquiz/docs/version-policy
...
docs: document version and deprecation policy
2023-07-21 06:44:08 -07:00
Markus Lehtonen
312ef308d1
docs: remove useless TOCs
...
Drop table of contents from short pages where it is only cluttering the
page.
2023-07-21 16:35:12 +03:00
Markus Lehtonen
f825812229
docs: document version and deprecation policy
2023-07-21 16:28:38 +03:00
Markus Lehtonen
d4d6963473
docs: fix toc of topology-updater and topology-gc reference
...
Exclude the main title from to (with the empty line the "no_toc"
directive took no effect).
2023-07-21 15:41:59 +03:00
Kubernetes Prow Robot
5223d1f77f
Merge pull request #1276 from marquiz/devel/readme
...
README: update to v0.13.3
2023-07-21 03:22:09 -07:00
Markus Lehtonen
ad27cdcc83
README: update to v0.13.3
2023-07-21 13:14:46 +03:00
Kubernetes Prow Robot
77d869c4f7
Merge pull request #1242 from ArangoGutierrez/metrics
...
Enable metrics via prometheus operator
2023-07-21 02:26:08 -07:00
Carlos Eduardo Arango Gutierrez
e3aedd33e2
Enable metrics via prometheus operator
...
Expose metrics via prometheus.monitoring.coreos.com/v1
The exposed metrics are
| Metric | Type | Meaning |
| --------------- | ---------------- | ---------------- |
| `nfd_master_build_info` | Gauge | Version from which nfd-master was built. |
| `nfd_worker_build_info` | Gauge | Version from which nfd-worker was built. |
| `nfd_updated_nodes` | Counter | Time taken to label a node |
| `nfd_crd_processing_time` | Gauge | Time taken to process a NodeFeatureRule CRD |
| `nfd_feature_discovery_duration_seconds` | HistogramVec | Time taken to discover features on a node |
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
2023-07-21 10:59:52 +02:00
Kubernetes Prow Robot
1868242169
Merge pull request #1274 from marquiz/devel/gh-templates
...
github: update assignees in new-release issue template
2023-07-21 00:04:07 -07:00
Markus Lehtonen
415c7981f3
github: update assignees in new-release issue template
...
Sync with OWNERS file.
2023-07-21 09:06:42 +03:00
pprokop
6d98b6150b
Fix Topology Manager policy and scope not being updated properly
...
NFD is only detecting policy and scope of Topology Manager when NRT object doesn't exist.
This means that topologyManagerScope and topologyManagerPolicy attributes won't be updated
even if kubelet config was changed to use other TopologyManager policy and scope.
Signed-off-by: pprokop <pprokop@nvidia.com>
2023-07-20 16:31:12 +02:00
Kubernetes Prow Robot
195e7908f1
Merge pull request #1268 from marquiz/devel/deps
...
go.mod: update kubernetes to v1.27.4
2023-07-20 05:40:07 -07:00
Markus Lehtonen
045eb28dbe
go.mod: update kubernetes to v1.27.4
2023-07-20 14:29:03 +03:00
Kubernetes Prow Robot
fd0ba3f9d9
Merge pull request #1265 from fidencio/topic/cpu-misc-cgroups-take-cgroupsv1-into-account
...
cpu: Take cgroupsv1 into account when reading misc.capacity
2023-07-19 06:12:05 -07:00
AhmedGrati
8e55d78d85
test: add node updater pool unit tests
...
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-07-19 12:03:35 +01:00
Fabiano Fidêncio
7532ac3192
cpu: Add retrieveCgroupMiscCapacityValue() for legibility
...
Let's refactor part of the getCgroupMiscCapacity() out to its own
retrieveCgroupMiscCapacityValue(), for the legibility sake.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-19 12:03:27 +02:00
Fabiano Fidêncio
8ed5a2343f
cpu: Take cgroupsv1 into account when reading misc.capacity
...
We've been only considering cgroupsv2 when trying to read misc.capacity.
However, there are still a bunch of systems out there relying on
cgroupsv1.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-19 10:49:53 +02:00
Kubernetes Prow Robot
5f181cc6d0
Merge pull request #1258 from marquiz/fixes/nfd-master
...
nfd-master: check for nil references in nfdAPIUpdateAllNodes
2023-07-18 05:23:09 -07:00
Markus Lehtonen
dac45be28c
nfd-master: check for nil references in nfdAPIUpdateAllNodes
...
Just a safeguard.
2023-07-17 17:49:44 +03:00
Kubernetes Prow Robot
9a108c0505
Merge pull request #1255 from hangscer8/clean_ticker
...
Stop ticker in time to avoid memory leak
2023-07-06 01:59:03 -07:00
hang.jiang
698031fc2d
Stop ticker in time to avoid memory leak
...
Because it will cause memory leak if we do not stop ticker when the function has completed.
Signed-off-by: hang.jiang <hang.jiang@daocloud.io>
2023-07-05 18:35:01 +08:00
Kubernetes Prow Robot
f02d172d07
Merge pull request #1253 from adrianchiris/fix-typo-in-helm-template
...
fix typo in helm chart
2023-07-03 02:46:53 -07:00
adrianc
904f3739a3
fix typo in helm chart
...
else statement of crd-controller should
also refer to crd-controller flag.
Signed-off-by: adrianc <adrianc@nvidia.com>
2023-07-02 18:01:31 +03:00
Kubernetes Prow Robot
10bbc8f253
Merge pull request #1248 from testwill/pkg-import
...
Remove pkg's imported twice
2023-06-28 05:54:32 -07:00
guoguangwu
29118f67bb
fix: Drop the e2elog instead
...
Signed-off-by: guoguangwu <guoguangwu@magic-shield.com>
2023-06-25 09:44:08 +08:00
Kubernetes Prow Robot
407a610e0c
Merge pull request #1182 from fmuyassarov/disable-hooks-by-default
...
hooks: disable hooks by default from v0.14
2023-06-22 04:43:40 -07:00
guoguangwu
92482e45d8
node_feature_discovery_test.go rm pkg imported twice
...
Signed-off-by: guoguangwu <guoguangwu@magic-shield.com>
2023-06-21 16:55:25 +08:00
guoguangwu
b946bcc0f5
nfd-master-internal_test.go rm pkg imported twice
...
Signed-off-by: guoguangwu <guoguangwu@magic-shield.com>
2023-06-21 16:53:55 +08:00
Kubernetes Prow Robot
aa55cd5999
Merge pull request #1247 from ArangoGutierrez/fix_docs_typo
...
Docs: Fix typo on customization-guide
2023-06-09 01:38:13 -07:00
Carlos Eduardo Arango Gutierrez
563cc862de
Docs: Fix typo on customization-guide
...
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-06-09 10:23:33 +02:00
Kubernetes Prow Robot
ad1bf43d25
Merge pull request #1246 from dipankardas011/fix-depricated-use-of-base-in-kustomize
...
Removal of the bases field as it is deprecated by kustomize
2023-06-09 00:30:13 -07:00