mirror of
https://github.com/prometheus-operator/prometheus-operator.git
synced 2025-04-21 03:38:43 +00:00
Merge pull request #114 from brancz/exposing-metrics
Documentation: exposing metrics
This commit is contained in:
commit
baf9688008
4 changed files with 83 additions and 24 deletions
|
@ -51,12 +51,12 @@ A healthy node would be one that has joined the existing mesh network and has
|
|||
been communicated the state that it missed while that particular instance was
|
||||
down for the upgrade.
|
||||
|
||||
Currently there is no way to tell whether an Alertmanger instance is healthy
|
||||
Currently there is no way to tell whether an Alertmanager instance is healthy
|
||||
under the above conditions. There are discussions of using vector clocks to
|
||||
resolve merges in the above mentioned situation, and ensure on a best effort
|
||||
basis that joining the network was successful.
|
||||
|
||||
> Note that single instance Alertmanger setups will therefore not have zero
|
||||
> Note that single instance Alertmanager setups will therefore not have zero
|
||||
> downtime on deployments.
|
||||
|
||||
The current implementation of rolling deployments simply decides based on the
|
||||
|
|
48
Documentation/exposing-metrics.md
Normal file
48
Documentation/exposing-metrics.md
Normal file
|
@ -0,0 +1,48 @@
|
|||
# Exposing Metrics
|
||||
|
||||
There are a number of
|
||||
[applications](https://prometheus.io/docs/instrumenting/exporters/#directly-instrumented-software)
|
||||
that are natively instrumented with Prometheus metrics, those applications
|
||||
simply expose the metrics through an HTTP server.
|
||||
|
||||
The Prometheus developers and the community are maintaining [client
|
||||
libraries](https://prometheus.io/docs/instrumenting/clientlibs/#client-libraries)
|
||||
for various languages. If you want to monitor your own applications and
|
||||
instrument them natively, chances are there is already a client library for
|
||||
your language.
|
||||
|
||||
Not all software is natively instrumented with Prometheus metrics, however, do
|
||||
record metrics in some other form. For these kinds of applications there are so
|
||||
called
|
||||
[exporters](https://prometheus.io/docs/instrumenting/exporters/#third-party-exporters).
|
||||
|
||||
Exporters can generally be divided into two categories:
|
||||
|
||||
* Instance exporters: These expose metrics about a single instance of an
|
||||
application. For example the HTTP requests that a single HTTP server has
|
||||
exporters served. These exporters are deployed as a
|
||||
[side-car](http://blog.kubernetes.io/2015/06/the-distributed-system-toolkit-patterns.html)
|
||||
container in the same pod as the actual instance of the respective application.
|
||||
A real life example is the [`dnsmasq` metrics
|
||||
sidecar](https://github.com/kubernetes/dns/blob/master/docs/sidecar/README.md),
|
||||
which converts the proprietary metrics format communicated over the DNS
|
||||
protocol by dnsmasq to the Prometheus exposition format and exposes it on an
|
||||
HTTP server.
|
||||
|
||||
* Cluster-state exporters: These expose metrics about an entire system, they
|
||||
could be native to the environment the application constructs. For example
|
||||
these could be the number 3D objects in a game, or metrics about a Kubernetes
|
||||
deployment. These exporters are typically deployed as a normal Kubernetes
|
||||
deployment, but can vary depending on the nature of the particular exporter. A
|
||||
real life example of this is the
|
||||
[`kube-state-metrics`](https://github.com/kubernetes/kube-state-metrics)
|
||||
exporter, which exposes metrics about the cluster state of a Kubernetes
|
||||
cluster.
|
||||
|
||||
Lastly in some cases it is not a viable option to expose metrics via an HTTP
|
||||
server. For example a `CronJob` may only run for a few seconds - not long
|
||||
enough for Prometheus to be able to scrape the HTTP endpoint. The Pushgateway
|
||||
was developed to be able to collect metrics in a scenarion like that, however,
|
||||
it is highly recommended to not use the Pushgateway if possible. Read more
|
||||
about when to use the Pushgateway and alternative strategies here:
|
||||
https://prometheus.io/docs/practices/pushing/#should-i-be-using-the-pushgateway .
|
|
@ -111,6 +111,7 @@ it brought up as data sources in potential Grafana deployments.
|
|||
Prometheus instances are deployed with default values for requested and maximum
|
||||
resource usage of CPU and memory. This will be made configurable in the `Prometheus`
|
||||
TPR eventually.
|
||||
|
||||
Prometheus comes with a variety of configuration flags for its storage engine that
|
||||
have to be tuned for better performance in large Prometheus servers. It will be the
|
||||
operators job to tune those correctly to be aligned with the experiences load
|
||||
|
|
|
@ -2,18 +2,41 @@
|
|||
|
||||
The `ServiceMonitor` third party resource (TPR) allows to declaratively define
|
||||
how a dynamic set of services should be monitored. Which services are selected
|
||||
to be monitored with the desired configuration is defined using label selections.
|
||||
This allows to dynamically express monitoring without having to update additional
|
||||
configuration for services that follow known monitoring patterns.
|
||||
to be monitored with the desired configuration is defined using label
|
||||
selections. This allows an organization to introduce convensions around how
|
||||
metrics are exposed, and then following these conventions new services are
|
||||
automatically discovered, without the need to reconfigure the system.
|
||||
|
||||
A service may expose one or more service ports, which are backed by a list
|
||||
of multiple endpoints that point to a pod in the common case.
|
||||
## Design
|
||||
|
||||
In the `endpoints` section of the TPR, we can configure which ports of these
|
||||
endpoints we want to scrape for metrics and with which paramters. For advanced use
|
||||
cases one may want to monitor ports of backing pods, which are not directly part
|
||||
of the service endpoints. This is also made possible by the Prometheus Operator.
|
||||
For Prometheus to monitor any application within Kubernetes an `Endpoints`
|
||||
object needs to exist. `Endpoints` objects are essentially lists of IP
|
||||
addresses. Typically an `Endpoints` object is populated by a `Service` object.
|
||||
A `Service` object discovers `Pod`s by a label selector and adds those to the
|
||||
`Endpoints` object.
|
||||
|
||||
A `Service` may expose one or more service ports, which are backed by a list of
|
||||
multiple endpoints that point to a `Pod` in the common case. This is reflected
|
||||
in the respective `Endpoints` object as well.
|
||||
|
||||
The `ServiceMonitor` object introduced by the Prometheus Operator in turn
|
||||
discovers those `Endpoints` objects and configures Prometheus to monitor those
|
||||
`Pod`s.
|
||||
|
||||
The `endpoints` section of the `ServiceMonitorSpec`, is used to configure which
|
||||
ports of these `Endpoints` are going to be scraped for metrics, and with which
|
||||
parameters. For advanced use cases one may want to monitor ports of backing
|
||||
`Pod`s, which are not directly part of the service endpoints. Therefore when
|
||||
specifying an endpoint in the `endpoints` section, they are strictly used.
|
||||
|
||||
> Note: `endpoints` (lowercase) is the TPR field, while `Endpoints`
|
||||
> (capitalized) is the Kubernetes object kind.
|
||||
|
||||
While `ServiceMonitor`s must live in the same namespace as the `Prometheus`
|
||||
TPR, discovered targets may come from any namespace. This is important to allow
|
||||
cross-namespace monitoring use cases, e.g. for meta-monitoring. Using the
|
||||
`namespaceSelector` of the `ServiceMonitorSpec`, one can restrict the
|
||||
namespaces the `Endpoints` objects are allowed to be discovered from.
|
||||
|
||||
## Specification
|
||||
|
||||
|
@ -61,16 +84,3 @@ of the service endpoints. This is also made possible by the Prometheus Operator.
|
|||
| any | Match any namespace | false | bool | false |
|
||||
| matchNames | Explicit list of namespace names to select | false | string array | |
|
||||
|
||||
|
||||
## Current state and roadmap
|
||||
|
||||
### Namespaces
|
||||
|
||||
While `ServiceMonitor`s must live in the same namespace as the `Prometheus` TPR,
|
||||
discovered targets may come from any namespace. This is important to allow cross-namespace
|
||||
monitoring use cases, e.g. for meta-monitoring.
|
||||
|
||||
Currently, targets are always discovered from all namespaces. In the future, the
|
||||
`ServiceMonitor` should allow to restrict this to one or more namespaces.
|
||||
How such a configuration would look like, i.e. explicit namespaces, selection by labels,
|
||||
or both, and what the default behavior should be is still up for discussion.
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue