1
0
Fork 0
mirror of https://github.com/kubernetes-sigs/node-feature-discovery.git synced 2024-12-14 11:57:51 +00:00
Commit graph

188 commits

Author SHA1 Message Date
Markus Lehtonen
6b2d10753f nfd-master: re-try on node update failures
Change the NFD API handler to re-try on node update failures. Will work
around transient failures, making sure that failed nodes (i.e. nodes
that we failed to update) don't need to wait for the 1 hour resync
period before being tried again.
2023-04-13 16:30:31 +03:00
Markus Lehtonen
70ac19ea66 nfd-master: increase controller resync period to 1 hour
Increase the NFD API controller resync period from 5 minutes to 1 hour.
The resync causes nfd-master to replay all NodeFeature and
NodeFeatureRule objects, being effectively a "big hammer reset all"
button. This should only be needed as an "insurance" to fix labels et al
in case they have been manually tampered (outside NFD) and against
certain bugs in nfd itself. NFD is not supposed to manage anything
fast-changing so 1 hour should be enough.

This change only affects behavior when the NodeFeature API has been
enabled (with -enable-nodefeature-api).
2023-04-12 16:38:47 +03:00
Kubernetes Prow Robot
ad07829d0a
Merge pull request #1099 from ArangoGutierrez/extended_resources_v2
Create extended resources with NodeFeatureRule
2023-04-07 08:09:15 -07:00
Fabiano Fidêncio
250aea4741
Create extended resources with NodeFeatureRule
Add support for management of Extended Resources via the
NodeFeatureRule CRD API.

There are usage scenarios where users want to advertise features
as extended resources instead of labels (or annotations).

This patch enables the discovery of extended resources, via annotation
and patch of node.status.capacity and node.status.allocatable. By using
the NodeFeatureRule API.

Co-authored-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Markus Lehtonen <markus.lehtonen@intel.com>
Co-authored-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-04-07 16:14:56 +02:00
Markus Lehtonen
f64c23968a nfd-master: fix node update
Update node status before node metadata. This fixes a problem where we
lose track of NFD-managed extended resources in case patching node
status fails. Previously we removed all labels and annotations
(including the one listing our ERs) and only after that updated node
status. If node status update failed we had lost the annotation but
extended resources were still there, leaving them orphaned.
2023-04-06 22:04:35 +03:00
Markus Lehtonen
cc6c20ff5f nfd-master: disallow unprefixed and kubernetes taints
Disallow taints having a key with "kubernetes.io/" or "*.kubernetes.io/"
prefix. This is a precaution to protect the user from messing up with
the "official" well-known taints from Kubernetes itself. The only
exception is that the "nfd.node.kubernetes.io/" prefix is allowed.

However, there is one allowed NFD-specific namespace (and its
sub-namespaces) i.e. "feature.node.kubernetes.io" under the
kubernetes.io domain that can be used for NFD-managed taints.

Also disallow unprefixed taint keys. We don't add a default prefix to
unprefixed taints (like we do for labels) from NodeFeatureRules. This is
to prevent unpleasant surprises to users that need to manage matching
tolerations for their workloads.
2023-04-06 16:12:37 +03:00
AhmedGrati
3fff409f6d Add master config file
Similar to the nfd-worker, in this PR we want to support the
dynamic run-time configurability through a config file for the nfd-master.

We'll use a json or yaml configuration file along with the fsnotify in
order to watch for changes in the config file. As a result, we're
allowing dynamic control of logging params, allowed namespaces,
extended resources, label whitelisting, and denied namespaces.

Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-04-03 09:52:09 +01:00
AhmedGrati
b499799364 feat: add deny-label-ns flag which supports wildcard
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
2023-02-15 09:47:00 +01:00
Carlos Eduardo Arango Gutierrez
9b3171bce2
nfd-master: always start gRPC server
Don't register gRPC LabelServer when using the NodeFeature option, only
turn the gRPC server on for Health and Readiness probes.
2023-01-16 19:33:15 +01:00
Markus Lehtonen
aa97105854 Add common utility function for getting node name 2022-12-23 09:50:15 +02:00
Markus Lehtonen
f5ae3fe2c7 Simplify usage of ObjectMeta fields
No need to explicitly spell out ObjectMeta as it's embedded in the
object types.
2022-12-19 17:40:10 +02:00
Kubernetes Prow Robot
28a5daa338
Merge pull request #999 from marquiz/fixes/nodefeature-missing
nfd-master: update node if no NodeFeature objects are present
2022-12-19 00:39:44 -08:00
Markus Lehtonen
4c955ad72c nfd-master: update node if no NodeFeature objects are present
Correctly handle the case where no NodeFeature objects exist for certain
node (and NodeFeature API has been enabled with
-enable-nodefeature-api). In this case all the labels should be removed.
2022-12-19 10:22:04 +02:00
Markus Lehtonen
b9c09e6674 nfd-master: update all nodes at startup when NodeFeature API enabled
We want to always update all nodes at startup. Without this patch we
don't get any update event from the controller if no NodeFeature or
NodeFeatureRule objects exist in the cluster. Thus all nodes would stay
untouched whereas we really want to remove all labels from all nodes in
this case.
2022-12-14 21:49:50 +02:00
Kubernetes Prow Robot
d1b314842c
Merge pull request #989 from marquiz/devel/nodefeature-multi-object
nfd-master: handle multiple NodeFeature objects
2022-12-14 07:51:34 -08:00
Markus Lehtonen
740e3af681 nfd-master: implement ratelimiter for nfd api updates
Implement a naive ratelimiter for node update events originating from
the nfd API. We might get a ton of events in short interval. The
simplest example is startup when we get a separate Add event for every
NodeFeature and NodeFeatureRule object. Without rate limiting we
run "update all nodes" separately for each NodeFeatureRule object, plus,
we would run "update node X" separately for each NodeFeature object
targeting node X. This is a huge amount of wasted work because in
principle just running "update all nodes" once should be enough.
2022-12-14 15:45:43 +02:00
Markus Lehtonen
79ed747be8 nfd-master: handle multiple NodeFeature objects
Implement handling of multiple NodeFeature objects by merging all
objects (targeting a certain node) into one before processing the data.
This patch implements MergeInto() methods for all required data types.

With support for multiple NodeFeature objects per node, The "nfd api
workflow" can be easily demonstrated and tested from the command line.
Creating the folloiwing object (assuming node-n exists in the cluster):

    apiVersion: nfd.k8s-sigs.io/v1alpha1
    kind: NodeFeature
    metadata:
      labels:
        nfd.node.kubernetes.io/node-name: node-n
      name: my-features-for-node-n
    spec:
      # Features for NodeFeatureRule matching
      features:
        flags:
          vendor.domain-a:
            elements:
              feature-x: {}
        attributes:
          vendor.domain-b:
            elements:
              feature-y: "foo"
              feature-z: "123"
        instances:
          vendor.domain-c:
            elements:
            - attributes:
                name: "elem-1"
                vendor: "acme"
            - attributes:
                name: "elem-2"
                vendor: "acme"
      # Labels to be created
      labels:
        vendor-feature.enabled: "true"
        vendor-setting.value: "100"

will create two feature labes:

    feature.node.kubernetes.io/vendor-feature.enabled: "true"
    feature.node.kubernetes.io/vendor-setting.value: "100"

In addition it will advertise hidden/raw features that can be used for
custom rules in NodeFeatureRule objects. Now, creating a NodeFeatureRule
object:

    apiVersion: nfd.k8s-sigs.io/v1alpha1
    kind: NodeFeatureRule
    metadata:
      name: my-rule
    spec:
      rules:
        - name: "my feature rule"
          labels:
            "my-feature": "true"
          matchFeatures:
            - feature: vendor.domain-a
              matchExpressions:
                feature-x: {op: Exists}
            - feature: vendor.domain-c
              matchExpressions:
                vendor: {op: In, value: ["acme"]}

will match the features in the NodeFeature object above and cause one
more label to be created:

    feature.node.kubernetes.io/my-feature: "true"
2022-12-14 15:44:52 +02:00
Markus Lehtonen
9f0806593d nfd-master: rename -featurerules-controller flag to -crd-controller
Deprecate the '-featurerules-controller' command line flag as the name
does not describe the functionality anymore: in practice it controls the
CRD controller handling both NodeFeature and NodeFeatureRule objects.
The patch introduces a duplicate, more generally named, flag
'-crd-controller'. A warning is printed in the log if
'-featurerules-controller' flag is encountered.
2022-12-14 10:23:45 +02:00
Markus Lehtonen
6ddd87e465 nfd-master: support NodeFeature objects
Add initial support for handling NodeFeature objects. With this patch
nfd-master watches NodeFeature objects in all namespaces and reacts to
changes in any of these. The node which a certain NodeFeature object
affects is determined by the "nfd.node.kubernetes.io/node-name"
annotation of the object. When a NodeFeature object targeting certain
node is changed, nfd-master needs to process all other objects targeting
the same node, too, because there may be dependencies between them.

Add a new command line flag for selecting between gRPC and NodeFeature
CRD API as the source of feature requests. Enabling NodeFeature API
disables the gRPC interface.

 -enable-nodefeature-api   enable NodeFeature CRD API for incoming
                           feature requests, will disable the gRPC
                           interface (defaults to false)

It is not possible to serve gRPC and watch NodeFeature objects at the
same time. This is deliberate to avoid labeling races e.g. by nfd-worker
sending gRPC requests but NodeFeature objects in the cluster
"overriding" those changes (labels from the gRPC requests will get
overridden when NodeFeature objects are processed).
2022-12-14 07:31:28 +02:00
Markus Lehtonen
079655b42c nfd-master: add error checking for CRD controller creation 2022-12-14 00:27:27 +02:00
Feruzjon Muyassarov
b296bdf0b3 update test functions according to upstream deprecated/removed methods
Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>
2022-12-13 12:12:50 +02:00
Markus Lehtonen
f13ed2d91c nfd-topology-updater: update NodeResourceTopology objects directly
Drop the gRPC communication to nfd-master and connect to the Kubernetes
API server directly when updating NodeResourceTopology objects.
Topology-updater already has connection to the API server for listing
Pods so this is not that dramatic change. It also simplifies the code
a lot as there is no need for the NFD gRPC client and no need for
managing TLS certs/keys.

This change aligns nfd-topology-updater with the future direction of
nfd-worker where the gRPC API is being dropped and replaced by a
CRD-based API.

This patch also update deployment files and documentation to reflect
this change.
2022-12-08 11:03:22 +02:00
Feruzjon Muyassarov
2bdf427b89 nfd-master logic update for setting node taints
This commits extends NFD master code to support adding node taints
from NodeFeatureRule CR. We also introduce a new annotation for
taints which helps to identify if the taint set on node is owned
by NFD or not. When user deletes the taint entry from
NodeFeatureRule CR, NFD will remove the taint from the node. But
to avoid accidental deletion of taints not owned by the NFD, it
needs to know the owner. Keeping track of NFD set taints in the
annotation can be used during the filtering of the owner. Also
enable-taints flag is added to allow users opt in/out for node
tainting feature. The flag takes precedence over taints defined
in NodeFeatureRule CR. In other words, if enbale-taints is set to
false(disabled) and user still defines taints on the CR, NFD will
ignore those taints and skip them from setting on the node.

Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>
2022-12-02 17:25:00 +02:00
Feruzjon Muyassarov
7ea0e0b0a7 Add argument to updateNodeFeatures method to pass client from caller
This commit adds an argument to updateNodeFeatures method for receiving
client argument, which currently gets initialized within the method
itself. This is a minor improvement for https://github.com/kubernetes-sigs/node-feature-discovery/pull/910.

Ref:https://github.com/kubernetes-sigs/node-feature-discovery/pull/910#discussion_r1012703631

Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>
2022-11-06 22:37:11 +02:00
Markus Lehtonen
b907d07d7e apis/nfd: flatten the structure of features data type
Flatten the data structure that stores features, dropping the "domain"
level from the data model. That extra level of hierarchy brought little
benefit but just caused some extra complexity, instead. The new
structure nicely matches what we have in the NodeFeatureRule object (the
matchFeatures field of uses the same flat structure with the "feature"
field having a value <domain>.<feature>, e.g. "kernel.version").

This is pre-work for introducing a new "node feature" CRD that contains
the raw feature data. It makes the life of both users and developers
easier when both CRDs, plus our internal code, handle feature data in a
similar flat structure.
2022-10-18 18:37:28 +03:00
Markus Lehtonen
0e1d4a9046 apis/nfd: migrate pkg/api/feature
Move the previously-protobuf-only internal "feature api" over to the
public "nfd api" package. This is in preparation for introducing a new
CRD API for communicating features.

This patch carries no functional changes. Just moving code around.
2022-10-15 07:42:20 +03:00
Feruzjon Muyassarov
71434a1392 Standardize "k8s.io/api/core/v1" package short name
Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@intel.com>
2022-10-15 02:22:41 +03:00
Markus Lehtonen
edcaf9a3bb nfd-master: refactor gRPC into a separate method
Refactor the code so that the initialization and running of the gRPC
server is done in a separate function. The goal is to make the code more
maintainable in terms of disabling (and eventually removing) the gRPC
functionality in the future.
2022-10-07 14:45:44 +03:00
Kubernetes Prow Robot
4097198848
Merge pull request #908 from marquiz/devel/type-rename
pkg/api/feature: rename types
2022-10-06 01:59:51 -07:00
Markus Lehtonen
abdbd420d1 pkg/api/feature: rename types
Sync type names with NFD documentation. Aims at making the codebase
easier to follow.
2022-10-06 11:25:01 +03:00
Markus Lehtonen
c1e6b41e56 apis/nfd: move annotation and label consts from nfd-master
Move consts related to NFD annotations and labels from nfd-master to the
api. Makes them more logically accessible for clients.
2022-10-06 11:23:56 +03:00
Markus Lehtonen
658ffaa6a5 nfd-master: rename crd controller
Prepare for adding support for other nfd api objects. Just rename file
and some symbols, no functional changes.
2022-10-04 20:23:24 +03:00
Markus Lehtonen
8b652ab8ec nfd-master: log if node was modified (or not)
Be a bit more verbose what is happning.
2022-09-21 14:23:37 +03:00
Markus Lehtonen
389a3d4e2e nfd-master: drop cleanup of ancient incubator labels
Remove the cleanup code that removes ancient NFD labels with the
node.alpha.kubernetes-incubator.io/ prefix. This label namespace was
deprecated/dropped already in v0.4.0 so it should be safe to drop this
code.
2022-09-20 19:56:58 +03:00
Markus Lehtonen
2c92e1dcff logging: do not use %w with klog.Errorf
It is not recognized (and does not work like with fmt.Errorf) so use %v
instead.
2022-08-22 14:39:52 +03:00
Markus Lehtonen
889e4c1351 nfd-master: more fixes to log messages
Use correct name for the CR (NodeFeatureRule) object. Also, the resource
is cluster-scoped so don't print the namespace.
2022-08-17 10:07:26 +03:00
Markus Lehtonen
f5ee836bbf nfd-master: fix incorrect log messages in crd controller 2022-08-16 16:39:27 +03:00
Dipto Chakrabarty
19a57789ad
Additional Lint Fixes in Codebase (#779)
* fix comments and conditonals to fix lint issues

* more linter fixes and spelling fixes

* fix linter issues based on feedback
2022-03-02 17:12:46 -08:00
Kubernetes Prow Robot
0c330b1a35
Merge pull request #736 from marquiz/devel/grpc-stop
nfd-master: do graceful stop of gRPC server
2022-01-21 03:05:59 -08:00
Markus Lehtonen
e53d053475 nfd-master: do graceful stop of gRPC server 2022-01-21 12:03:07 +02:00
Markus Lehtonen
e95a4dd460 nfd-master: print gRPC server error correctly 2022-01-21 11:56:28 +02:00
Kubernetes Prow Robot
86bfe74cd7 Merge pull request #671 from marquiz/fixes/single-dash-flags
Use single-dash format of cmdline flags
2021-12-01 06:45:15 -08:00
Markus Lehtonen
a57a25f63c Use single-dash format of cmdline flags
Use the single-dash (i.e. '-option' instead of '--option') format
consistently accross log messages and documentation. This is the format
that was mostly used, already, and shown by command line help of the
binaries, for example.
2021-11-25 18:03:54 +02:00
Markus Lehtonen
f75303ce43 pkg/apis/nfd: add variables to rule spec and support backreferences
Support backreferencing of output values from previous rules. Enables
complex rule setups where custom features are further combined together
to form even more sophisticated higher level labels. The labels created
by preceding rules are available as a special 'rule.matched' feature
(for matchFeatures to use).

If referencing rules accross multiple configs/CRDs care must be taken
with the ordering. Processing order of rules in nfd-worker:

1. Static rules
2. Files from /etc/kubernetes/node-feature-discovery/custom.d/
   in alphabetical order. Subdirectories are processed by reading their
   files in alphabetical order.
3. Custom rules from main nfd-worker.conf

In nfd-master, NodeFeatureRule objects are processed in alphabetical
order (based on their metadata.name).

This patch also adds new 'vars' fields to the rule spec. Like 'labels',
it is a map of key-value pairs but no labels are generated from these.
The values specified in 'vars' are only added for backreferencing into
the 'rules.matched' feature. This may by desired in schemes where the
output of certain rules is only used as intermediate variables for other
rules and no labels out of these are wanted.

An example setup:

  - name: "kernel feature"
    labels:
      kernel-feature:
    matchFeatures:
      - feature: kernel.version
        matchExpressions:
          major: {op: Gt, value: ["4"]}

  - name: "intermediate var feature"
    vars:
      nolabel-feature: "true"
    matchFeatures:
      - feature: cpu.cpuid
        matchExpressions:
          AVX512F: {op: Exists}
      - feature: pci.device
        matchExpressions:
          vendor: {op: In, value: ["8086"]}
          device: {op: In, value: ["1234", "1235"]}

  - name: top-level-feature
    matchFeatures:
      - feature: rule.matched
        matchExpressions:
          kernel-feature: "true"
          nolabel-feature: "true"
2021-11-25 12:50:47 +02:00
Markus Lehtonen
33fdf75190 nfd-master: process labeling rules from CRs
Enable Custom Resource based label creation in nfd-master. This extends
the previously implemented controller stub for watching NodeFeatureRule
objects. NFD-master watches NodeFeatureRule objects in the cluster and
processes the rules on every incoming labeling request from workers.
The functionality relies on the "raw features" (identical to how
nfd-worker handles custom rules) submitted by nfd-worker, making it
independent of the label source configuration of the worker. This means
that the labeling functions as expected even if all sources in the
worker are disabled.

NOTE: nfd-master is stateless and re-labeling only happens on the
reception of SetLabelsRequest from the worker – i.e. on intervals
specified by the core.sleepInterval configuration option (or
-sleep-interval cmdline flag) of each nfd-worker instance. This means
that modification/creation of NodeFeatureRule objects does not
automatically update the node labels. Instead, the changes only come
visible when workers send their labeling requests.
2021-11-23 09:18:07 +02:00
Markus Lehtonen
e8872462dc nfd-master: add -featurerules-controller flag
Add a new command line flag for disabling/enabling the controller for
NodeFeatureRule objects. In practice, disabling the controller disables
all labels generated from rules in NodeFeatureRule objects.
2021-11-22 16:57:42 +02:00
Markus Lehtonen
e6e32a88c3 nfd-master: implement controller for NodeFeatureRule CRs
Implement a simple controller stub that operates on NodeFeatureRule
objects. The controller does not yet have any functionality other than
logging changes in the (NodeFeatureRule) objecs it is watching.

Also update the documentation on the -no-publish flag to match the new
functionality.
2021-11-22 16:57:42 +02:00
Markus Lehtonen
237c4f7824 pkg/apihelpers: split out loading of kubeconfig to a separate function
Make kubeconfig loading and parsing re-usable for multiple clients.
2021-11-22 16:57:42 +02:00
Markus Lehtonen
47e7c47594 Send raw features over gRPC
Enable transfer of raw features between nfd-worker and nfd-master.
2021-11-16 17:32:28 +02:00
Swati Sehgal
b444ef95a8 NFD-Topology-Updater: Bump NRT API to version v0.0.12
The NodeResourceTopology API has been made cluster
scoped as in the current context a CR corresponds to
a Node and since Node is a cluster scoped resource it
makes sense to make NRT cluster scoped as well.

Ref: https://github.com/k8stopologyawareschedwg/noderesourcetopology-api/pull/18
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2021-11-16 13:28:23 +00:00
Markus Lehtonen
0b386981a6 pkg/nfd-master: fix linter errors in tests 2021-10-04 09:51:38 +03:00
Swati Sehgal
a311719d1e topologyupdater: Updates based on latest changes made to CRD API
There have been recent changes made to the noderesourcetopology API
storing the proto file generated using go-to-protobuf tool and
this code inports the proto generated in the API in the topology-updater.proto
The PRs corresponding to the changes are as follows:
https://github.com/k8stopologyawareschedwg/noderesourcetopology-api/pull/9
https://github.com/k8stopologyawareschedwg/noderesourcetopology-api/pull/13

Commands used to generate topology-updater.pb.go file:

go install github.com/golang/protobuf/protoc-gen-go@v1.4.3
go mod vendor
protoc --go_opt=paths=source_relative  --go_out=plugins=grpc:. pkg/topologyupdater/topology-updater.proto -I. -Ivendor

As part of implmentation of this patch, reserved (non-allocatable) CPUs
are evaluated by performing a difference between all the CPUs on a system
(determined by using ghw) and allocatable CPUs (determined by querying
GetAllocatableResources podResource API endpoint).

When aggregator creates the NUMA zones, it will skip the zone creation if
there are no allocatable resources. In this update we creates those missing
zone with zero allocatable/available resources so we won't have holes in the
array of reported zones.

Co-Authored-by: Talor Itzhak <titzhak@redhat.com>
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2021-09-21 10:48:10 +01:00
Francesco Romani
b4c92e4eed topologyupdater: Bootstrap nfd-topology-updater in NFD
- This patch allows to expose Resource Hardware Topology information
  through CRDs in Node Feature Discovery.
- In order to do this we introduce another software component called
  nfd-topology-updater in addition to the already existing software
  components nfd-master and nfd-worker.
- nfd-master was enhanced to communicate with nfd-topology-updater
  over gRPC followed by creation of CRs corresponding to the nodes
  in the cluster exposing resource hardware topology information
  of that node.
- Pin kubernetes dependency to one that include pod resource implementation
- This code is responsible for obtaining hardware information from the system
  as well as pod resource information from the Pod Resource API in order to
  determine the allocatable resource information for each NUMA zone. This
  information along with Costs for NUMA zones (obtained by reading NUMA distances)
  is gathered by nfd-topology-updater running on all the nodes
  of the cluster and propagate NUMA zone costs to master in order to populate
  that information in the CRs corresponding to the nodes.
- We use GHW facilities for obtaining system information like CPUs, topology,
  NUMA distances etc.
- This also includes updates made to Makefile and Dockerfile and Manifests for
  deploying nfd-topology-updater.
- This patch includes unit tests
- As part of the Topology Aware Scheduling work, this patch captures
  the configured Topology manager scope in addition to the Topology manager policy.
  Based on the value of both attribues a single string will be populated to the CRD.
  The string value will be on of the following {SingleNUMANodeContainerLevel,
  SingleNUMANodePodLevel, BestEffort, Restricted, None}

Co-Authored-by: Artyom Lukianov <alukiano@redhat.com>
Co-Authored-by: Francesco Romani <fromani@redhat.com>
Co-Authored-by: Talor Itzhak <titzhak@redhat.com>
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2021-09-21 10:47:39 +01:00
Kubernetes Prow Robot
189f86bec8
Merge pull request #548 from marquiz/devel/profile-ns
nfd-master: allow profile.node.kubernetes.io label ns
2021-08-27 07:24:04 -07:00
Carlos Eduardo Arango Gutierrez
dece85b394
Add livenessProbe via grpc to nfd-master
Signed-off-by: Carlos Eduardo Arango Gutierrez <carangog@redhat.com>
2021-08-18 10:23:10 -05:00
Markus Lehtonen
55bd633425 nfd-master: allow profile.node.kubernetes.io label ns
Add a separate label namespace for profile labels, intended for
user-specified higher level "meta features". Also sub-namespaces of this
(i.e. <sub-ns>.profile.node.kubernetes.io) are allowed.
2021-08-10 19:39:59 +03:00
Markus Lehtonen
c3760fbbab nfd-master: rename LabelNs to FeatureLabelNs 2021-08-10 19:13:08 +03:00
Kubernetes Prow Robot
4a22a39928
Merge pull request #536 from marquiz/devel/label-sub-ns
nfd-master: allow sub-namespaces of the default label ns
2021-08-10 04:19:18 -07:00
Markus Lehtonen
eb666f521d nfd-master: allow sub-namespaces of the default label ns
Allow <sub-ns>.feature.node.kubernetes.io label namespaces. Makes it
possible to have e.g. vendor specific label ns without the need to user
-extra-label-ns.
2021-08-10 11:41:52 +03:00
Markus Lehtonen
a55783d533 Straighten wrinkles in lint fixes
Fix small mistakes that slipped through with lint fixes (in
1230945564).
2021-07-07 14:32:11 +03:00
Carlos Eduardo Arango Gutierrez
1230945564
make golint happy
Signed-off-by: Carlos Eduardo Arango Gutierrez <carangog@redhat.com>
2021-06-14 12:27:58 -05:00
Carlos Eduardo Arango Gutierrez
894b7901ff
make gofmt happy by running gofmt -s
Signed-off-by: Carlos Eduardo Arango Gutierrez <carangog@redhat.com>
2021-06-14 12:24:44 -05:00
robertdavidsmith
77bd4e4cf6
Accept client certs based on SAN, not just CN (#514)
* first attempt at SAN-based VerifyNodeName

* Update docs on verify-node-name
2021-04-20 01:44:32 -07:00
Markus Lehtonen
e771a35a21 nfd-master: support certificate rotation
Add a helper/wrapper in pkg/utils to handle gRPC server-side certificate
rotation.
2021-03-09 14:40:04 +02:00
Markus Lehtonen
3ffb7b8fc5 nfd-master: switch to klog 2021-02-25 07:50:37 +02:00
Markus Lehtonen
47033db9c1 nfd-master: use flag for command line parsing 2021-02-24 12:06:16 +02:00
Markus Lehtonen
e52ec3480f nfd-master: implement --instance flag
This can be used to help running multiple parallel NFD deployments in
the same cluster. The flag changes the node annotation namespace to
<instance>.nfd.node.kubernetes.io allowing different nfd-master intances
to store metadata in separate annotations.
2021-02-10 13:48:31 +02:00
Markus Lehtonen
705687192d nfd-master: make updateNodeFeatures a method of nfdMaster 2021-02-10 13:46:59 +02:00
Markus Lehtonen
cdca6d667a nfd-master: make nodeName non-global 2021-02-10 13:46:59 +02:00
Markus Lehtonen
b146508e64 nfd-master: drop separate labelerServer type
Simplify code by changing nfdMaster to implement LabelerServer interface
by itself.
2021-02-10 13:46:59 +02:00
Markus Lehtonen
76b95b6c55 Replace improper usage of filepath.Join with path.Join
In JSON and kubernetes API object names we want to use slashes instead
of the OS dependent file path separator.
2021-02-10 12:54:31 +02:00
Markus Lehtonen
19b8f2cd3d nfd-master: more detailed unit testing of extended resources 2020-11-24 12:45:06 +02:00
Markus Lehtonen
d17743a0b9 nfd-master: handle label annotations in the same func
Handle both creation and parsing of the "feature-labels" and
"extended-resources" annotations in the function. I think this is more
logical to keep them together.
2020-11-24 12:45:06 +02:00
Markus Lehtonen
95ff300d74 nfd-master: patch node object instead of rewriting it
When updating node labels and annotations use JSON patches instead of
doing a read-modify-write on the whole node object. Patching is already
being used in managing extended resources so some of the existing code
was re-usable.

This patch should mitigate the problem of node update failures caused by
race conditions (a change in the node object between our read and write)
resulting e.g. in errors/restarts in nfd worker pods.
2020-11-24 12:45:06 +02:00
Markus Lehtonen
1ea301d272 nfd-master: change statusOp to a more generalized JSON patch
Generalize and rename 'statusOp' type to a more flexible 'JsonPatch'.
Move it to the apihelper package.
2020-11-24 12:45:06 +02:00
Markus Lehtonen
bb1e4c60fb nfd-master: use namespaced label and annotation names internally
For historical reasons the labels in the default nfd namespace have been
internally represented without the namespace part. I.e. instead of
"feature.node.kubernetes.io/foo" we just use "foo". NFD worker uses this
representation, too, both internally and over the gRPC requests. The
same scheme has been used for annotations.

This patch changes NFD master to use fully namespaced label and
annotation names internally. This hopefully makes the code a bit more
understandable. It also addresses some corner cases making the handling
of label names consistent, making it possible to use both "truncated"
and fully namespaced names over the gRPC interface (and in the
annotations).
2020-11-24 12:45:06 +02:00
Markus Lehtonen
458dd8dc58 nfd-master: add --kubeconfig flag
Useful with --prune and for development purposes.
2020-09-07 07:51:42 +03:00
Markus Lehtonen
4669770020 nfd-master: implement --prune flag
A new sub-command like flag for cleaning up a cluster. When --prune is
specified nfd-master removes all NFD related labels, annotations and
extended resources from all nodes of the cluster and exits.

This should help undeployment of NFD and be useful for development.
2020-09-07 07:51:42 +03:00
Markus Lehtonen
6869a99ceb nfd-master: fix one docstring 2020-09-07 07:51:42 +03:00
Markus Lehtonen
853609f721 nfd-master: lint fixes 2020-05-20 21:48:06 +03:00
Ukri Niemimuukko
903a939836 nfd-master: add extended resource support
This adds support for making selected labels extended resources.

Labels which have integer values, can be promoted to Kubernetes extended
resources by listing them to the added command line flag
`--resource-labels`. These labels won't then show in the node label
section, they will appear only as extended resources.

Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
2020-03-19 13:19:22 +02:00
Markus Lehtonen
54eaf16871 nfd-master: export label and annotation prefixes
In order to be able to use the constants in end-to-end tests.
2020-02-27 14:21:00 +02:00
Jordan Jacobelli
40918827f6
Allow to change labels namespace
The aim here is to allow to override the default namespace
of NFD. The allowed namespaces are whitelisted.
See https://github.com/kubernetes-sigs/node-feature-discovery/issues/227

Signed-off-by: Jordan Jacobelli <jjacobelli@nvidia.com>
2019-05-09 13:17:52 -07:00
Markus Lehtonen
470cf8dff2 nfd-master: correct a mistake in unit tests
Annotations were not correctly checked when testing
mockServer.updateNodeFeatures().
2019-05-08 23:07:52 +03:00
Markus Lehtonen
7f43a3db4e nfd-master: fix --label-whitelist
Make the --label-whitelist effective. Previously, it was unused and had
no effect. Also, add simple unit test for that.
2019-05-08 23:07:52 +03:00
Markus Lehtonen
75a8f0c146 Refactor APIHelpers
Remove functionality that was not interacting with Kubernetes API.
Makes the architecture a bit simpler and simplifies testing.
2019-05-06 16:26:41 +03:00
Markus Lehtonen
35d26001e4 nfd-worker: extend unit test to cover 'main'
Also, adds new method WaitForReady() into NfdMaster.

In practice, this quite widely tests nfd-master, too, as the tests
create an instance of NfdMaster and verify that the communication
between master and worker works.
2019-05-06 16:26:41 +03:00
Markus Lehtonen
2de0a019a3 Move most of functionality in cmd/ to pkg/
Move most of the code under cmd/nfd-master and cmd/nfd-worker into new
packages pkg/nfd-master and pk/nfd-worker, respectively. Makes extending
unit tests to "main" functions easier.
2019-05-06 16:26:41 +03:00